Search API

This project provides a lightweight, deployable web interface for performing internet searches using the DDGS library. It exposes multiple structured endpoints through a FastAPI backend, enabling flexible and programmatic access to general, PDF, repository, and Wikipedia searches.

It also ships as an MCP (Model Context Protocol) server, allowing AI assistants (Claude, Cursor, GitHub Copilot, etc.) to call the search tools directly.

Deployment

Installation guide

podman-compose up -d             #to start the service
podman-compose logs -f           #to view logs
podman-compose down              #to stop the service

Features

For generating dataset for training/finetuning llms a huge number of dataset is required for that you can use this api to fetch relevent urls on your domain.

Framework

Built with FastAPI, offering a clean and performant API layer.
Also available as an MCP server via the mcp Python SDK.

MCP Server

Run the MCP server over stdio (default) for use with Claude Desktop, Cursor, VS Code Copilot and other MCP-compatible clients:

python mcp_server.py

Available MCP tools

Tool	Description
`web_search`	General web search, returns a list of URLs
`search_by_engine`	Search via a specific engine (`bing`, `brave`, `duckduckgo`, `google`, `mojeek`, `yandex`, `yahoo`, `wikipedia`)
`search_papers`	Search academic papers, returns DOIs
`search_books`	Search for books, returns URLs
`search_news`	Search news articles, returns URLs
`search_pdfs`	Search for PDF documents, returns URLs
`search_filetype`	Search for files of a given type (pdf, docx, pptx …)
`search_repositories`	Search GitHub & GitLab repositories, returns URLs
`search_wiki`	Search Wikipedia and Wikimedia sites, returns URLs

Claude Desktop configuration example

{
  "mcpServers": {
    "searchapi": {
      "command": "python",
      "args": ["/path/to/searchapi/mcp_server.py"]
    }
  }
}

Why?

so, I was trying to fine tune a llm on a particular domain and i needed a lot of relevent information to build a dataset. but I found no free search engine api options (except duckduckgo ) Kudos to them. I had tried 1st with Instant Data Scraper It was a manual thing. why should I be copying and pasting urls from an web extension no way.

So I was looking for some options and found DDGS a cli based meta search engine. It's good but having an api is always sweet. So I built this api.

I have tried following the KISS(Keep it simple, stupid) principle while building this project. Also given option for simple deployment because of this principle.

Have built this just to satisfy my need. If you find it useful please star ⭐ this repo.

Middleware

Includes CORSMiddleware with permissive configuration:
- All origins allowed
- All methods allowed
- Credentials supported
- Designed for easy cross-origin integration

Endpoints

`GET /`

Returns a simple message confirming that the API is operational.

`GET /health`

Returns a simple message confirming that the API is operational.

`GET /useragent`

Returns a random user agent.

`GET /paper/`

searches for research papers related to the query .

{
  query: "topics ..", # string data type ...
 }

`GET /search/engine`

Performs a general web search on a particulatr search engine engine , query and returns up to limit results as a list of URLs.

{
  engine: "Enter Your Engine", # string data type  eg ["bing", "brave", "duckduckgo", "google", "mojeek", "yandex", "yahoo", "wikipedia"]

  query: "Enter Your Query", # string data type ...
  limit: "Enter Your Limit" # int data type
 }

** this can raise some flags for some search engines btw so use the search endpoint instead (it just randomly selects a search engine and searches on it)**

`GET /search/specific/{filetype}`

Searches specifically for a particular type of resource related to the query.

{
  query: "Enter Your Query", # string data type
  filetype: "Enter Your File Type", # pdf, doc, docx, ppt, pptx, xls, xlsx, etc
  limit: "Enter Your Limit" # int data type
 }

All the routes bellow will be taking `query` and `limit` as parameters

`GET /search/`

Performs a general web search for the specified query and returns up to limit results as a list of URLs.

`GET /searchpdf/`

Searches specifically for PDF resources related to the query.

`GET /books/`

Searches specifically for books related to the query searches annas archive.

`GET /repositories/`

Queries GitHub and GitLab for repositories matching the given search term.

`GET /wiki/`

Aggregates results from Wikipedia and related Wikimedia platforms for more comprehensive content retrieval.

`GET /news/`

Returns a list of news articles related to the specified query.

Error Handling

All endpoints return structured JSON responses on failure.
Errors use standard HTTP status codes such as:
- 400 Bad Request
- 500 Internal Server Error
Responses include clear diagnostic messages.

Rate Limiting & Usage Notes

To avoid getting rate limited:

Use proxies, VPNs, or Tor as a routing layer (not the browser). for proxying with tor you can use use
When invoking the API repeatedly, apply a politeness delay to avoid overloading upstream engines.

Disclaimer

This project is intended for educational use only. You are fully responsible for complying with all applicable laws of your country (Be a law abiding citizen).

Donate me if you liked this project

You can find my donation link on my github profile and personal website

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github/workflows		.github/workflows
.dockerignore		.dockerignore
.env		.env
.gitignore		.gitignore
.python-version		.python-version
DockerFile.deb		DockerFile.deb
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
favicon.ico		favicon.ico
main.py		main.py
pyproject.toml		pyproject.toml
render.yaml		render.yaml
searchweb.py		searchweb.py
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Search API

Deployment

Installation guide

Features

Framework

MCP Server

Available MCP tools

Claude Desktop configuration example

Why?

Middleware

Endpoints

GET /

GET /health

GET /useragent

GET /paper/

GET /search/engine

GET /search/specific/{filetype}

All the routes bellow will be taking query and limit as parameters

GET /search/

GET /searchpdf/

GET /books/

GET /repositories/

GET /wiki/

GET /news/

Error Handling

Rate Limiting & Usage Notes

Disclaimer

Donate me if you liked this project

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

`GET /`

`GET /health`

`GET /useragent`

`GET /paper/`

`GET /search/engine`

`GET /search/specific/{filetype}`

All the routes bellow will be taking `query` and `limit` as parameters

`GET /search/`

`GET /searchpdf/`

`GET /books/`

`GET /repositories/`

`GET /wiki/`

`GET /news/`

Packages