A local, searchable catalog of ~5,500 APIs exposed over a REST endpoint and an MCP server. Used personally by the author to let Claude Code and other LLM agents find APIs by name, tag, or natural-language description.
api-catalog ingests API descriptions from a handful of public sources (the MCP server registry, the public-apis community list, APIs.guru, plus a few hand-written YAML cards) and loads them into PostgreSQL. Each API record has a name, URL, description, auth scheme, a base_url, optional endpoints (method, path, example request/response body), and one or more hierarchical tags stored as PostgreSQL ltree paths (e.g. AI.Language_Models.Chat). Three retrieval paths are exposed: (1) full-text search via a generated tsvector column, (2) tag browsing via ltree prefix/descendant queries, and (3) semantic search via pgvector. Embeddings are computed by calling a local Ollama instance running nomic-embed-text (768-dim output), one embedding per API card, stored in an embeddings table; semantic queries embed the query string the same way and order by cosine distance. A FastAPI process (api_server.py) exposes HTTP endpoints, and a stdio MCP server (mcp_server.py) exposes six tools that wrap the same queries so an MCP client (Claude Code, etc.) can call them directly. A scheduled_refresh.py script re-scrapes sources into a staging table, validates a minimum row count, and swaps tables in a transaction so the live catalog is never half-updated.
Working. Used by one person (the author) in a home setup. Not multi-tenant, no auth on the REST API, no rate limiting. The code runs; the data pipeline runs. It has not been tested by anyone else.
The ~5,500 API records are aggregated from third-party sources. The code in this repo is Apache-2.0, but the scraped catalog data is not and is not checked into the repo for that reason. Sources and their licenses at time of writing:
| Source | Approx. count | License / terms |
|---|---|---|
| MCP server registry | ~4,100 | Per-entry; check the upstream registry before republishing |
public-apis (github.com/public-apis/public-apis) |
~1,000 | MIT on the list itself; individual API terms vary |
| APIs.guru | ~170 | CC0 on the directory metadata |
| Hand-written cards | 4 | Written by the author; Apache-2.0 |
If you plan to publish a built database dump alongside this code, verify each source's current terms first. The safe path is to ship only the code and the scraper, and let each user build their own catalog locally.
- Python 3.10+
- PostgreSQL 14+ with the
vector(pgvector) andltreeextensions - Ollama running locally with
nomic-embed-textpulled (only required for semantic search; full-text and tag browse work without it)
git clone <this-repo>
cd api-catalog
pip install psycopg2-binary fastapi uvicorn pyyaml requests tabulate
cp .env.example .env
# Edit .env: DB_HOST, DB_PORT, DB_USER, DB_PASSWORD, DB_NAME, OLLAMA_HOST
psql -d "$DB_NAME" -c "CREATE EXTENSION IF NOT EXISTS vector;"
psql -d "$DB_NAME" -c "CREATE EXTENSION IF NOT EXISTS ltree;"
# Build the catalog from the three upstream sources
python scheduled_refresh.py
# Or, if you already have card YAML files locally:
python migrate_to_postgres.py
# Compute embeddings (needs Ollama + nomic-embed-text)
python compute_embeddings.pypython api_server.py # REST on 127.0.0.1:8002
python mcp_server.py # MCP over stdioMCP client config:
{
"api-catalog": {
"type": "stdio",
"command": "python",
"args": ["/path/to/api-catalog/mcp_server.py"]
}
}python query_pg.py stats
python query_pg.py search "email sending"
python query_pg.py browse "AI.Language_Models"
python query_pg.py card "openai-chat"
python query_pg.py tags "Communication"| Tool | Description |
|---|---|
search_apis |
Full-text search by name/description |
semantic_search |
pgvector cosine similarity over embeddings |
browse_by_tag |
List APIs under an ltree tag path |
get_api_card |
Full API details including endpoints |
list_tags |
Walk the tag hierarchy |
catalog_stats |
Row counts and source breakdown |
| Method | Path | Description |
|---|---|---|
| GET | /stats |
Catalog statistics |
| GET | /tags?prefix=AI |
List tags with optional prefix |
| GET | /search?q=email&limit=50 |
Full-text search |
| GET | /browse/{tag_path} |
APIs under a tag |
| GET | /card/{api_name} |
API details + endpoints |
| GET | /embeddings/search?q=email |
Semantic search |
| GET | /health |
Health check |
name: openai-chat
slug: openai-chat
description: OpenAI Chat Completions API
url: https://platform.openai.com/docs/api-reference/chat
auth:
type: bearer
header: Authorization
tags:
- AI.Language_Models.Chat
base_url: https://api.openai.com/v1
endpoints:
- method: POST
path: /chat/completions
description: Create a chat completionSee schema/api_card.yaml and schema/tags.yaml.
apis — Core records (name, url, description, auth, source)
tags — ltree paths
api_tags — Many-to-many junction
endpoints — HTTP call examples
embeddings — pgvector, 768-dim
| Variable | Default | Description |
|---|---|---|
DB_HOST |
127.0.0.1 |
PostgreSQL host |
DB_PORT |
5432 |
PostgreSQL port |
DB_USER |
— | Database user |
DB_PASSWORD |
— | Database password |
DB_NAME |
— | Database name |
OLLAMA_HOST |
http://127.0.0.1:11434 |
Ollama endpoint |
OLLAMA_MODEL |
nomic-embed-text |
Embedding model |
API_HOST |
127.0.0.1 |
REST bind address |
API_PORT |
8002 |
REST port |
LOG_LEVEL |
INFO |
Log verbosity |
LOG_DIR |
./logs |
Log directory |
REFRESH_MIN_API_COUNT |
1000 |
Minimum rows for a refresh to be considered valid |
- No auth on the REST API. Bind to localhost or put it behind a reverse proxy if you expose it.
- No rate limiting on embedding calls; a full re-embed of ~5,500 records hits Ollama hard.
- Card quality varies. The MCP registry entries are machine-generated and often lack endpoint examples; the hand-written cards are richer.
- Tag ontology (
schema/tags.yaml) was authored by hand and is opinionated. Re-tagging a scraped record is best-effort keyword matching. - Semantic search quality is bounded by
nomic-embed-text. Good enough for "find me an email API", not for fine-grained disambiguation. - Scraper assumes source formats stay stable. When
public-apisor the MCP registry reshape their data, the scraper needs manual fixes. - No tests included.
- Catalog data is not checked in (see Data provenance). You must run the refresh yourself.
Apache-2.0 on the code. Data license depends on upstream source — see Data provenance above.