Skip to content

A FastAPI service that takes a search term and optional website parameter, uses the Google Custom Search API to fetch the top 10 results, extracts the content from these websites, and returns them to the API caller.

Notifications You must be signed in to change notification settings

opensourcehustle/Google-Search-Content-Extractor-API

Repository files navigation

Google Search Content Extractor API

A FastAPI service that takes a search term and optional website parameter, uses the Google Custom Search API to fetch the top 10 results, extracts the content from these websites, and returns them to the API caller.

Features

  • Search Google for specific terms
  • Limit searches to specific websites
  • Extract content from search result pages
  • Return structured data with titles, links, snippets, and full content

Requirements

  • Python 3.7+
  • Google Custom Search API key
  • Google Programmable Search Engine ID

Setup

  1. Clone this repository
  2. Create a virtual environment:
    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
  3. Install dependencies:
    pip install -r requirements.txt
  4. Set up Google Custom Search API:
  5. Update the .env file with your credentials:
    GOOGLE_API_KEY=your_actual_api_key_here
    GOOGLE_SEARCH_ENGINE_ID=your_search_engine_id_here
    API_KEY=your_secret_api_key_here
    

Usage

Start the server:

uvicorn main:app --reload

The API will be available at http://localhost:8000.

Endpoints

  • GET / - Health check endpoint
  • GET /search - Search endpoint with parameters:
    • query (required): Search term
    • site (optional): Limit search to specific website
    • num_results (optional, default: 10): Number of results to return (max 10)

API Authentication

This API is protected with API key authentication. You must include your API key in the X-API-Key header of all requests to the /search endpoint.

Example with curl:

# Set your API key
API_KEY="your_secret_api_key_here"

# Search for "python tutorial" with API key authentication
curl -H "X-API-Key: $API_KEY" "http://localhost:8000/search?query=python+tutorial"

# Search for "machine learning" on wikipedia.org with API key authentication
curl -H "X-API-Key: $API_KEY" "http://localhost:8000/search?query=machine+learning&site=wikipedia.org"

Example Requests

# Search for "python tutorial"
curl -H "X-API-Key: your_secret_api_key_here" "http://localhost:8000/search?query=python+tutorial"

# Search for "machine learning" on wikipedia.org
curl -H "X-API-Key: your_secret_api_key_here" "http://localhost:8000/search?query=machine+learning&site=wikipedia.org"

# Search for "fastapi" and return only 5 results
curl -H "X-API-Key: your_secret_api_key_here" "http://localhost:8000/search?query=fastapi&num_results=5"

Response Format

{
  "results": [
    {
      "title": "Page Title",
      "link": "https://example.com/page",
      "snippet": "Short description from search results",
      "content": "Full text content extracted from the page..."
    }
  ]
}

Error Handling

The API includes proper error handling for:

  • Missing API credentials
  • Network issues
  • Invalid requests
  • Content extraction failures

Errors will be returned in JSON format with appropriate HTTP status codes.

Dependencies

  • FastAPI - Web framework
  • Uvicorn - ASGI server
  • Requests - HTTP library
  • BeautifulSoup4 - HTML parsing for content extraction
  • Python-dotenv - Environment variable management

License

MIT

About

A FastAPI service that takes a search term and optional website parameter, uses the Google Custom Search API to fetch the top 10 results, extracts the content from these websites, and returns them to the API caller.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published