Skip to content

serply-inc/scrape-google-footprint

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Scraping Google Footprint Using Python

Simple script to scrape Google Advanced Search Operators for finding backlink and posting opportunities based on Useful Google Advanced Search Operators For SEO Guide

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

Prerequisites

Python3+

https://www.python.org/downloads/

Installing

# clone repo
git pull https://github.com/googio/scrape-google-footprint.git
# install requirements
pip install -r requirements.txt
# modify query in search.py and run script
python scrape.py

Features

  • Python3 - Python is a programming language that lets you work quickly and integrate systems more effectively.
  • Pip - The Python Package Installer
  • Requests - HTTP for Humans
  • Beautiful Soup - a Python library for pulling data out of HTML and XML files
  • Google Search API - an API for scraping unlimited Google search results

Footprints

The footprint used are in the footprints folder

Usage

The script requires two argument, footprint and keyword. Footprint is the name of the footprints you want to use, defaults to edu.txt. The keyword is the keywords you want to search for.

Example 1: searching for the keywords "best crossfit workout" with the footpints of edu.txt

python3 scrape.py --footprint edu.txt --keywords "best crossfit workout"

Example 2: searching for the keywords "iPhone review" with the footpints of guestbook.txt

python3 scrape.py --footprint guestbook.txt --keywords "iPhone reviews"

Results

The results are saved into results.txt

Console Results

Scraping footprint: "Powered by Movable Type" "You may use HTML tags for style" , keyword: "best crossfit workout"
Found 9 results.
Scraping footprint: "powered by Mephisto" "a response" -"are closed for" Email Address Website , keyword: "best crossfit workout"
Found 67 results.
Scraping footprint: "Lisa kommentaar" "Kommentaare veel pole." , keyword: "best crossfit workout"
Found 31 results.
Scraping footprint: "Zostaw komentarz" , keyword: "best crossfit workout"
Found 75 results.
Scraping footprint: "Your email address will not be published. Required fields are marked" , keyword: "best crossfit workout"
Found 149 results.
Scraping footprint: "powered by Serendipity" "Remember Information?" , keyword: "best crossfit workout"
Scraping footprint: "Email addresses are never displayed, but they are required to confirm your comments" , keyword: "best crossfit workout"
Found 100 results.
Scraping footprint: "Add a comment" Website , keyword: "best crossfit workout"
Found 137 results.
Scraping footprint: site:.blogspot.es , keyword: "best crossfit workout"
Found 124 results.
Scraping footprint: "add new comment" "what is the first word in the phrase" , keyword: "best crossfit workout"
Found 19 results.
Scraping footprint: "Powered by s9y" "add comment" , keyword: "best crossfit workout"
Found 4 results.

CAPTCHA

Eventually you will run into CATPCHAs. Consider using a proxy or a service like Serply

Contributing

Versioning

Authors

  • googio - Initial work - Googio
  • serply - Maintainer - Serply

License

Apache license 2.0

Acknowledgments

About

Script To Scrape Google Advanced Search Operators For SEO

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages