This project provides a lightweight, deployable web interface for performing internet searches using the DDGS library. It exposes multiple structured endpoints through a FastAPI backend, enabling flexible and programmatic access to general, PDF, repository, and Wikipedia searches.
It also ships as an MCP (Model Context Protocol) server, allowing AI assistants (Claude, Cursor, GitHub Copilot, etc.) to call the search tools directly.
podman-compose up -d #to start the service
podman-compose logs -f #to view logs
podman-compose down #to stop the service
For generating dataset for training/finetuning llms a huge number of dataset is required for that you can use this api to fetch relevent urls on your domain.
- Built with FastAPI, offering a clean and performant API layer.
- Also available as an MCP server via the
mcpPython SDK.
Run the MCP server over stdio (default) for use with Claude Desktop, Cursor, VS Code Copilot and other MCP-compatible clients:
python mcp_server.py| Tool | Description |
|---|---|
web_search |
General web search, returns a list of URLs |
search_by_engine |
Search via a specific engine (bing, brave, duckduckgo, google, mojeek, yandex, yahoo, wikipedia) |
search_papers |
Search academic papers, returns DOIs |
search_books |
Search for books, returns URLs |
search_news |
Search news articles, returns URLs |
search_pdfs |
Search for PDF documents, returns URLs |
search_filetype |
Search for files of a given type (pdf, docx, pptx …) |
search_repositories |
Search GitHub & GitLab repositories, returns URLs |
search_wiki |
Search Wikipedia and Wikimedia sites, returns URLs |
{
"mcpServers": {
"searchapi": {
"command": "python",
"args": ["/path/to/searchapi/mcp_server.py"]
}
}
}so, I was trying to fine tune a llm on a particular domain and i needed a lot of relevent information to build a dataset. but I found no free search engine api options (except duckduckgo ) Kudos to them. I had tried 1st with Instant Data Scraper It was a manual thing. why should I be copying and pasting urls from an web extension no way.
So I was looking for some options and found DDGS a cli based meta search engine. It's good but having an api is always sweet. So I built this api.
I have tried following the KISS(Keep it simple, stupid) principle while building this project. Also given option for simple deployment because of this principle.
Have built this just to satisfy my need. If you find it useful please star ⭐ this repo.
- Includes CORSMiddleware with permissive configuration:
- All origins allowed
- All methods allowed
- Credentials supported
- Designed for easy cross-origin integration
Returns a simple message confirming that the API is operational.
Returns a simple message confirming that the API is operational.
Returns a random user agent.
searches for research papers related to the query .
{
query: "topics ..", # string data type ...
}
Performs a general web search on a particulatr search engine engine ,
query and returns up to limit results as a list of URLs.
{
engine: "Enter Your Engine", # string data type eg ["bing", "brave", "duckduckgo", "google", "mojeek", "yandex", "yahoo", "wikipedia"]
query: "Enter Your Query", # string data type ...
limit: "Enter Your Limit" # int data type
}
** this can raise some flags for some search engines btw so use the search endpoint instead (it just randomly selects a search engine and searches on it)**
Searches specifically for a particular type of resource related to the query.
{
query: "Enter Your Query", # string data type
filetype: "Enter Your File Type", # pdf, doc, docx, ppt, pptx, xls, xlsx, etc
limit: "Enter Your Limit" # int data type
}
Performs a general web search for the specified query and returns up to limit results as a list of URLs.
Searches specifically for PDF resources related to the query.
Searches specifically for books related to the query searches annas archive.
Queries GitHub and GitLab for repositories matching the given search term.
Aggregates results from Wikipedia and related Wikimedia platforms for more comprehensive content retrieval.
Returns a list of news articles related to the specified query.
- All endpoints return structured JSON responses on failure.
- Errors use standard HTTP status codes such as:
400 Bad Request500 Internal Server Error
- Responses include clear diagnostic messages.
To avoid getting rate limited:
- Use proxies, VPNs, or Tor as a routing layer (not the browser). for proxying with tor you can use use
- When invoking the API repeatedly, apply a politeness delay to avoid overloading upstream engines.
This project is intended for educational use only. You are fully responsible for complying with all applicable laws of your country (Be a law abiding citizen).
You can find my donation link on my github profile and personal website
