Google Search Scraper

A powerful and flexible Google search scraper built with Crawl4AI. This tool allows you to extract search results from Google and save them in JSON format.

Features

Extract search results including titles, URLs, and snippets
Configurable number of results
Language and country-specific searches
Stealth mode to avoid detection
Caching for faster repeated searches
Command-line interface for easy use

Installation

Create a virtual environment:

python -m venv google_scraper_env
source google_scraper_env/bin/activate  # On Windows: google_scraper_env\Scripts\activate

Install the required packages:

pip install crawl4ai

Install browser dependencies:

python -m playwright install --with-deps chromium

Usage

Command-line Interface

The easiest way to use the scraper is through the command-line interface:

python google_search_cli.py "your search query" --results 20 --language en --country us

Options:

query: The search query (required)
--results, -r: Number of results to retrieve (default: 10)
--language, -l: Language code for search (default: en)
--country, -c: Country code for search (default: us)
--output, -o: Output file path (default: auto-generated)
--no-cache: Disable caching of requests
--no-headless: Disable headless mode (show browser)

Python API

You can also use the scraper in your Python code:

import asyncio
from google_search_scraper import GoogleSearchScraper

async def main():
    # Create an instance of GoogleSearchScraper
    scraper = GoogleSearchScraper(headless=True, cache=True)
    
    # Search and save results
    output_file = await scraper.search_and_save(
        query="artificial intelligence news",
        num_results=10,
        language="en",
        country="us"
    )
    
    print(f"Results saved to {output_file}")

if __name__ == "__main__":
    asyncio.run(main())

Output Format

The scraper saves results in JSON format with the following structure:

{
  "query": "your search query",
  "timestamp": "2024-11-06T12:34:56.789012",
  "num_results": 10,
  "results": [
    {
      "title": "Result Title",
      "url": "https://example.com/page",
      "snippet": "This is a snippet of the search result..."
    },
    ...
  ]
}

Notes

Google may block automated requests if too many are made in a short period. Use responsibly.
The scraper uses stealth mode to avoid detection, but it's not guaranteed to work in all cases.
The CSS selectors used to extract results may need to be updated if Google changes its HTML structure.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgements

Crawl4AI - The powerful web crawler used in this project

Name		Name	Last commit message	Last commit date
Latest commit History 245 Commits
.cursor/rules		.cursor/rules
.github/workflows		.github/workflows
.vscode		.vscode
config		config
data		data
docker_helpers		docker_helpers
docs		docs
outdate		outdate
reports		reports
scripts		scripts
src		src
templates		templates
test		test
tests		tests
~/Library/Application Support/Cursor/User/globalStorage/saoudrizwan.claude-dev/settings		~/Library/Application Support/Cursor/User/globalStorage/saoudrizwan.claude-dev/settings
.cursorignore		.cursorignore
.cursorrules		.cursorrules
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Aksaray_Meteoroloji_Verileri.png		Aksaray_Meteoroloji_Verileri.png
Dockerfile		Dockerfile
README.md		README.md
README_DOCKER.md		README_DOCKER.md
README_GOOGLE_SCRAPER.md		README_GOOGLE_SCRAPER.md
README_USAGE.md		README_USAGE.md
docker-compose.debug.yml		docker-compose.debug.yml
docker-compose.fixed.yml		docker-compose.fixed.yml
docker-compose.selenium.yml		docker-compose.selenium.yml
docker-compose.yml		docker-compose.yml
docker_config.py		docker_config.py
docker_main.py		docker_main.py
google_search_advanced_example.py		google_search_advanced_example.py
google_search_cli.py		google_search_cli.py
google_search_example.py		google_search_example.py
google_search_scraper.py		google_search_scraper.py
log.txt		log.txt
main.py		main.py
pyscraper.py		pyscraper.py
requirements.docker.txt		requirements.docker.txt
requirements.txt		requirements.txt
requirements_google_scraper.txt		requirements_google_scraper.txt
run-fixed-docker.sh		run-fixed-docker.sh
run-fixed-spider.sh		run-fixed-spider.sh
sample_keywords.txt		sample_keywords.txt
sentiment_dashboard.py		sentiment_dashboard.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Google Search Scraper

Features

Installation

Usage

Command-line Interface

Options:

Python API

Output Format

Notes

License

Acknowledgements

TODO

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Google Search Scraper

Features

Installation

Usage

Command-line Interface

Options:

Python API

Output Format

Notes

License

Acknowledgements

TODO

About

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages