Simple script to scrape Google Advanced Search Operators for finding backlink and posting opportunities based on Useful Google Advanced Search Operators For SEO Guide
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
Python3+
https://www.python.org/downloads/
# clone repo
git pull https://github.com/googio/scrape-google-footprint.git
# install requirements
pip install -r requirements.txt
# modify query in search.py and run script
python scrape.py
- Python3 - Python is a programming language that lets you work quickly and integrate systems more effectively.
- Pip - The Python Package Installer
- Requests - HTTP for Humans
- Beautiful Soup - a Python library for pulling data out of HTML and XML files
- Google Search API - an API for scraping unlimited Google search results
The footprint used are in the footprints folder
- articles
- blogs
- bookmarks
- directory
- drupal
- edu
- forums
- guestbooks
- image
- indexer
- microblogs
- pingback
- social networks
- trackbacks
- videosite
- wiki
The script requires two argument, footprint
and keyword
. Footprint is the name of the footprints you want to use, defaults to edu.txt
. The keyword is the keywords you want to search for.
python3 scrape.py --footprint edu.txt --keywords "best crossfit workout"
python3 scrape.py --footprint guestbook.txt --keywords "iPhone reviews"
The results are saved into results.txt
Scraping footprint: "Powered by Movable Type" "You may use HTML tags for style" , keyword: "best crossfit workout"
Found 9 results.
Scraping footprint: "powered by Mephisto" "a response" -"are closed for" Email Address Website , keyword: "best crossfit workout"
Found 67 results.
Scraping footprint: "Lisa kommentaar" "Kommentaare veel pole." , keyword: "best crossfit workout"
Found 31 results.
Scraping footprint: "Zostaw komentarz" , keyword: "best crossfit workout"
Found 75 results.
Scraping footprint: "Your email address will not be published. Required fields are marked" , keyword: "best crossfit workout"
Found 149 results.
Scraping footprint: "powered by Serendipity" "Remember Information?" , keyword: "best crossfit workout"
Scraping footprint: "Email addresses are never displayed, but they are required to confirm your comments" , keyword: "best crossfit workout"
Found 100 results.
Scraping footprint: "Add a comment" Website , keyword: "best crossfit workout"
Found 137 results.
Scraping footprint: site:.blogspot.es , keyword: "best crossfit workout"
Found 124 results.
Scraping footprint: "add new comment" "what is the first word in the phrase" , keyword: "best crossfit workout"
Found 19 results.
Scraping footprint: "Powered by s9y" "add comment" , keyword: "best crossfit workout"
Found 4 results.
Eventually you will run into CATPCHAs. Consider using a proxy or a service like Serply