bm-25

A lightweight BM25 search server for querying a folder of Markdown files, exposed as an MCP server.

Repository layout

bm-25/
├── mcp_server.py      # MCP server exposing the `search` tool over stdio
├── search.py          # BM25 indexing + search logic (also usable as a CLI)
├── requirements.txt   # Python dependencies
└── data/              # Place your .md files here
    ├── information_retrieval.md
    ├── machine_learning.md
    └── python_intro.md

Setup

Install Python 3.9+ (if not already installed).
Install dependencies:
```
pip install -r requirements.txt
```

Usage

MCP server

Run the server over stdio:

python mcp_server.py

It registers a single tool, search(query: str, top_n: int = 3), which returns the top-N BM25 matches (score, path, line range, and chunk text) from the markdown corpus in data/. OpenTelemetry spans are emitted to stderr so stdout stays clean for JSON-RPC.

Example client config (e.g. ~/.config/claude/mcp.json or equivalent):

{
  "mcpServers": {
    "bm25-search": {
      "command": "python",
      "args": ["/absolute/path/to/bm-25/mcp_server.py"]
    }
  }
}

Docker

Build the image:

docker build -t bm25-mcp-server .

Configure Claude Desktop to use the containerized server (in claude_desktop_config.json):

{
  "mcpServers": {
    "bm25-search": {
      "command": "docker",
      "args": ["run", "-i", "--rm", "bm25-mcp-server"]
    }
  }
}

CLI: interactive mode

Run the script without arguments to enter an interactive query loop:

python search.py

Loading documents from '.../data' …
Indexed 3 document(s): ['information_retrieval.md', 'machine_learning.md', 'python_intro.md']

Enter a search query (or 'quit' to exit):
> machine learning algorithms
  1. machine_learning.md  (score: 0.4210)
  2. python_intro.md  (score: 0.1898)
  3. information_retrieval.md  (score: 0.0751)

> quit

CLI: single-query mode

Pass a query directly on the command line:

python search.py "BM25 ranking"

Loading documents from '.../data' …
Indexed 3 document(s): ['information_retrieval.md', 'machine_learning.md', 'python_intro.md']

Query: 'BM25 ranking'
  1. information_retrieval.md  (score: 1.6854)

Adding your own documents

Drop any .md files into the data/ directory.
They are automatically discovered and indexed the next time search.py is run.

How it works

All .md files in data/ are read and tokenized (lowercased, punctuation stripped).
A BM25Okapi index is built over the token lists.
Each query is tokenised the same way, and the top-scoring documents are returned.

BM25 key parameters (can be tuned inside search.py):

Parameter	Default	Effect
`k1`	1.5	Term-frequency saturation
`b`	0.75	Document-length normalization

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.devcontainer		.devcontainer
.github/workflows		.github/workflows
data		data
src		src
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
mcp_server.py		mcp_server.py
requirements.txt		requirements.txt
search.py		search.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

bm-25

Repository layout

Setup

Usage

MCP server

Docker

CLI: interactive mode

CLI: single-query mode

Adding your own documents

How it works

About

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

bm-25

Repository layout

Setup

Usage

MCP server

Docker

CLI: interactive mode

CLI: single-query mode

Adding your own documents

How it works

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages