Skip to content

Latest commit

 

History

History
236 lines (179 loc) · 7.03 KB

README.md

File metadata and controls

236 lines (179 loc) · 7.03 KB

GoogleSearch


Important Note

As of August 2016, the free Google API that this library used is no longer available. As a result, googlesearch is no longer in PyPI. See here for more information.


#####Search the web with python

GoogleSearch is a Python 2 library for searching the web, using Google's Custom Search JSON/Atom API. It provides a simple python API for this task, as a wrapper around Google's.

>>> from googlesearch import GoogleSearch
>>> gs = GoogleSearch("An intriguing query")
>>> for url in gs.top_urls():
...    print url
...
http://www.torontosun.com/2015/02/08/cbcs-ascension-an-intriguing-sci-fi-drama
http://www.agentquery.com/writer_hq.aspx
http://www.girlfridayproductions.com/2015/02/how-to-write-a-great-query-letter/
http://nelsonagency.com/2015/04/special-treat-rhiannon-thomass-original-query-letter-for-a-wcked-thing/

##Installation

pip install -U googlesearch

##Examples

Print a list of top hits for a query. Like a miniature first page of hits on Google.

from googlesearch import GoogleSearch
from pprint import pprint

gs = GoogleSearch("Bacon")
for hit in gs.top_results():
    pprint(hit)
    print

Output:

{u'GsearchResultClass': u'GwebSearch',
 u'cacheUrl': u'http://www.google.com/search?q=cache:JkI9aWzUvbgJ:en.wikipedia.org',
 u'content': u'<b>Bacon</b> is a meat product prepared from a pig and usually cured. It is first cured \nusing large quantities of salt, either in a brine or in a dry packing; the result is \nfresh\xa0...',
 u'title': u'<b>Bacon</b> - Wikipedia, the free encyclopedia',
 u'titleNoFormatting': u'Bacon - Wikipedia, the free encyclopedia',
 u'unescapedUrl': u'http://en.wikipedia.org/wiki/Bacon',
 u'url': u'http://en.wikipedia.org/wiki/Bacon',
 u'visibleUrl': u'en.wikipedia.org'}

{u'GsearchResultClass': u'GwebSearch',
 u'cacheUrl': u'http://www.google.com/search?q=cache:_cHIoqEzleAJ:en.wikipedia.org',
 u'content': u'Francis <b>Bacon</b>, 1st Viscount St. Alban, QC (/\u02c8be\u026ak\u0259n/; 22 January 1561 \u2013 9 April \n1626), was an English philosopher, statesman, scientist, jurist, orator, essayist\xa0...',
 u'title': u'Francis <b>Bacon</b> - Wikipedia, the free encyclopedia',
 u'titleNoFormatting': u'Francis Bacon - Wikipedia, the free encyclopedia',
 u'unescapedUrl': u'http://en.wikipedia.org/wiki/Francis_Bacon',
 u'url': u'http://en.wikipedia.org/wiki/Francis_Bacon',
 u'visibleUrl': u'en.wikipedia.org'}

{u'GsearchResultClass': u'GwebSearch',
 u'cacheUrl': u'http://www.google.com/search?q=cache:uKyfbazYgokJ:baconaustin.com',
 u'content': u'<b>Bacon</b>. <b>Bacon</b>; 900 W 10th St; Austin, Texas 78703. Hours: Monday - Friday: \n11am - 9pm; Saturday: 9am - 9pm; Sunday: 9am - 3pm. View Larger Map\xa0...',
 u'title': u'<b>Bacon</b>',
 u'titleNoFormatting': u'Bacon',
 u'unescapedUrl': u'http://baconaustin.com/',
 u'url': u'http://baconaustin.com/',
 u'visibleUrl': u'baconaustin.com'}

{u'GsearchResultClass': u'GwebSearch',
 u'cacheUrl': u'http://www.google.com/search?q=cache:oxQ3rEMOdAwJ:www.foodnetwork.com',
 u'content': u'Make <b>bacon</b> the star ingredient in pastas, salads, snacks and more from Food \nNetwork Magazine.',
 u'title': u'50 Things to Make With <b>Bacon</b> : Recipes and Cooking : Food Network',
 u'titleNoFormatting': u'50 Things to Make With Bacon : Recipes and Cooking : Food Network',
 u'unescapedUrl': u'http://www.foodnetwork.com/recipes/articles/50-things-to-make-with-bacon.html',
 u'url': u'http://www.foodnetwork.com/recipes/articles/50-things-to-make-with-bacon.html',
 u'visibleUrl': u'www.foodnetwork.com'}

Query Wikipedia and show the top hit.

from googlesearch import GoogleSearch

def search_wikipedia(query):
    gs = GoogleSearch("site:wikipedia.com %s" % query)
	print gs.top_result()['titleNoFormatting']
	print gs.top_url()
	return gs.top_url()

wiki_url = search_wikipedia("Porcupine")

Output:

Porcupine - Wikipedia, the free encyclopedia
http://en.wikipedia.org/wiki/Porcupine

Which of the two words is used more on the Internet?

from googlesearch import GoogleSearch

def x_vs_y_count_match(x, y):
	nx = GoogleSearch(x).count()
	ny = GoogleSearch(y).count()
	print '%s vs %s:' % (x,y)
	report = '%s wins with %i vs %i'
	if   nx > ny:
	    print report % (x,nx,ny)
	elif nx < ny:
	    print report % (y,ny,nx)
	else:
	    print "it's a tie with %s each!" % nx
	return nx, ny

counts = x_vs_y_count_match("color", "colour")

Output:

color vs colour:
color wins with 259000000 vs 55500000

Retrieve the imdb id for a movie using only its name (and year if there are remakes).

from googlesearch import GoogleSearch
import re
    
def imdb_id_for_movie(movie_name):
	query = 'site:imdb.com %s' % movie_name
	url = GoogleSearch( query ).top_url()
	imdb_id = re.search('/tt[0-9]+/', url).group(0).strip('/')
	print 'The imdb id for %s is %s' % (movie_name, imdb_id)
	return imdb_id

TotRecall_id = imdb_id_for_movie("Total Recall 1990")

Output:

The imdb id for Total Recall 1990 is tt0100802

Documentation

class googlesearch.GoogleSearch(query, use_proxy=True, verbose=True)

  • A Google search object for a specific query.

  • Parameters

    • query: str
      The search query for this search.

    • use_proxy: bool, default: True
      If True, GoogleSearch will use the proxies defined in the PROXIES_LIST variable defined in the googlesearch.settings module to do the searches. If a proxy starts getting HTTP 403 FORBIDDEN responses, it will switch to the next proxy in the list. It will raise a GoogleAPIError only if all proxies get 403 responses.

    • verbose: bool, default: True
      If True, GoogleSearch will report to sys.stderr when it switches to another proxy. No logging at all if False.

    • hl: str, default: None
      If setted, the hl parameter is added to the query, returning search results for the specified language. For example set hl='es' to get results in spanish.

Methods

  • GoogleSearch.top_results()

    • Returns a list of results for a google search. Google API determines how many results are returned, current default is 4.
      A result is a dictionary with the following fields:
      cacheUrl
      content
      title
      titleNoFormatting
      unescapedUrl
      url
      visibleUrl
  • GoogleSearch.top_result()

    • Returns only the top result, the best match. This is the equivalent of "I feel lucky" See GoogleSearch.top_results() for the keys in the result dictionary.
  • GoogleSearch.top_urls()

    • Returns a list of urls for a google search. Google API determines how many urls are returned, current default is 4.
  • GoogleSearch.top_url()

    • Returns the url of the top hit.
  • GoogleSearch.count()

    • Returns the total number of matches to the query.

Requirements

  • Python >= 2.6
  • requests

License

MIT licensed. See the bundled LICENSE file for more details.