Version 2.0.0 - SoftwareX Submission Release
This release contains the complete and frozen source code of the MediaWiki Code2Code Search engine submitted to the SoftwareX journal.
Key Features included in this release:
- Pre-processing pipeline for discovery, mirroring, and parsing of 2,500+ MediaWiki repositories.
- High-precision structural parsing utilising Tree-sitter AST queries for 10+ programming languages.
- Neural retrieval backend utilising FastAPI, PyTorch, and Sentence-Transformers (
Qwen3-Embedding-0.6B). - Memory-optimised search index backend built with FAISS
IndexIVFPQand SQLite metadata store. - Codex-based responsive frontend fully localised in 17 languages.
How to run:
Please refer to the README.md file for detailed installation, model downloading, and local deployment instructions.
📦 Pre-computed Data (Zenodo)
Due to GitHub's size constraints, the pre-built FAISS index (mediawiki.index) and SQLite metadata database (snippets.db) are hosted separately.
You can download the ready-to-use database and index here:
(Direct link: https://doi.org/10.5281/zenodo.20586256)