# Semantic Search Over Markdown Docs with ZeroEntropy

In this guide, we’ll build a simple semantic search engine over Markdown documentation files using ZeroEntropy. This is helpful if you want to query large internal wikis, blog posts, or dev docs.

---

### Pre-requisites
- Python 3.8+
- `zeroentropy` client (`pip install zeroentropy`)
- A ZeroEntropy API key ([Get yours here](https://dashboard.zeroentropy.dev))
- A .env file with the following: 

```bash
ZEROENTROPY_API_KEY=your_api_key_here
```

---

### What You’ll Learn
- How to use ZeroEntropy to semantically index markdown files
- How to query your docs using semantic search (top documents + top snippets)

---

### Directory Structure

This guide expects a directory like this:

```bash
zcookbook/
├── guides/
│   ├── search_over_many_pdfs.ipynb
│   └── semantic_search_over_markdown/
│       ├── semantic_search_over_markdown.ipynb
│       └── sample_docs/
│           ├── intro.md
│           ├── tutorial.md
│           └── api_reference.md
├── LICENSE
└── README.md
```

### Setting up your ZeroEntropy Client

First, install dependencies:

```bash
!pip install zeroentropy python-dotenv
```

Now load your API key and initialize the client

In [4]:
from zeroentropy import ZeroEntropy
from dotenv import load_dotenv
import os

load_dotenv(dotenv_path="../../.env")

api_key = os.getenv("ZEROENTROPY_API_KEY")
if not api_key:
    raise ValueError("API Key not found. Make sure your .env file has ZEROENTROPY_API_KEY.")

zclient = ZeroEntropy(api_key=api_key)

### Creating and Uploading the Markdown Docs

In [5]:
collection_name = "md_docs_demo_vn"
zclient.collections.add(collection_name=collection_name)

CollectionAddResponse(message='Success!')

Now define a function to upload .md files as base64 content:

In [6]:
import base64

def upload_md_file(filepath, collection_name):
    with open(filepath, 'r', encoding='utf-8') as f:
        text = f.read()
        # b64 = base64.b64encode(f.read()).decode('utf-8')

    file_ext = os.path.splitext(filepath)[1]

    if file_ext == ".md":
        content = {"type": "text", "text": text}
    else:
        raise ValueError("Unsupported file type")

    response = zclient.documents.add(
        collection_name=collection_name,
        path=filepath,
        content=content
    )
    return response

Let’s upload all .md files in our sample folder:

In [7]:
folder_path = "./sample_docs"

for filename in os.listdir(folder_path):
    if filename.endswith(".md"):
        filepath = os.path.join(folder_path, filename)
        print(upload_md_file(filepath, collection_name))

DocumentAddResponse(message='Success!')
DocumentAddResponse(message='Success!')
DocumentAddResponse(message='Success!')


## Confirming Your Documents

Once uploaded, you can list all documents in your collection like this:

In [8]:
response = zclient.documents.get_info_list(collection_name=collection_name)
print([doc.path for doc in response.documents])

['./sample_docs\\api_reference.md', './sample_docs\\intro.md', './sample_docs\\tutorial.md']


## Querying with ZeroEntropy
We’ll now use semantic search to retrieve the most relevant markdown documents and snippets for a natural language query.

### Top Document Matches

In [9]:
query = "How to integrate with our API?"
response = zclient.queries.top_documents(
    collection_name=collection_name,
    query=query,
    k=3
)

for r in response.results:
    print(f"\nScore: {r.score}\nPath: {r.path}")


Score: 1.6567331566640742
Path: ./sample_docs\api_reference.md

Score: 1.444334998181506
Path: ./sample_docs\intro.md

Score: 1.2319368396989376
Path: ./sample_docs\tutorial.md


### Top Snippet Matches

In [10]:
response = zclient.queries.top_snippets(
        collection_name=collection_name,
        query=query,
        k=3
    )

for r in response.results:
    print(f"\n📎 Snippet:\n{r.content}\n📁 Path: {r.path}\n🔢 Score: {r.score:.2f}")



📎 Snippet:
# Introduction

Welcome to the Markdown Docs Demo! This project showcases how you can use ZeroEntropy to index and search over plain-text documentation.

These markdown files simulate typical developer docs — from getting started instructions to API references. You can use this setup as a starting point to build internal search for wikis, blogs, or changelogs.

Let's get started!

📁 Path: ./sample_docs\intro.md
🔢 Score: 0.24

📎 Snippet:
# API Reference

This API allows you to interact with the Markdown Docs Demo system — upload files, search them using natural language, and manage your document collections.

---

## 📤 POST /upload

Upload a new markdown document for semantic indexing.

**Request Body:**

```json
{
  "filename": "example.md",
  "content": "base64_encoded_content"
}

📁 Path: ./sample_docs\api_reference.md
🔢 Score: 0.22

📎 Snippet:
# Tutorial

This tutorial walks you through setting up the Markdown Docs Demo environment.

## Step 1: Clone the repository

```ba

### ✅ That's It!

You’ve now built a working semantic search engine over markdown files using ZeroEntropy — great for indexing changelogs, guides, and internal dev docs.