Skip to content

v2.0.0

Latest

Choose a tag to compare

@ftosoni ftosoni released this 07 Jun 23:45
· 12 commits to main since this release

Version 2.0.0 - SoftwareX Submission Release

This release contains the complete and frozen source code of the MediaWiki Code2Code Search engine submitted to the SoftwareX journal.

Key Features included in this release:

  • Pre-processing pipeline for discovery, mirroring, and parsing of 2,500+ MediaWiki repositories.
  • High-precision structural parsing utilising Tree-sitter AST queries for 10+ programming languages.
  • Neural retrieval backend utilising FastAPI, PyTorch, and Sentence-Transformers (Qwen3-Embedding-0.6B).
  • Memory-optimised search index backend built with FAISS IndexIVFPQ and SQLite metadata store.
  • Codex-based responsive frontend fully localised in 17 languages.

How to run:

Please refer to the README.md file for detailed installation, model downloading, and local deployment instructions.


📦 Pre-computed Data (Zenodo)

Due to GitHub's size constraints, the pre-built FAISS index (mediawiki.index) and SQLite metadata database (snippets.db) are hosted separately.

You can download the ready-to-use database and index here:
DOI
(Direct link: https://doi.org/10.5281/zenodo.20586256)