Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
A web-scraping interface to various Google Services.
Ruby
Branch: master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
lib
spec
.gitignore
.rspec
.yardopts
COPYING.txt
ChangeLog.md
README.md
Rakefile
gemspec.yml
gscraper.gemspec

README.md

GScraper

Description

GScraper is a web-scraping interface to various Google Services.

Features

  • Supports the Google Search service.
    • Provides access to search results and ranks.
    • Provides access to the Sponsored Links.
  • Provides HTTP access with custom User-Agent strings.
  • Provides proxy settings for HTTP access.

Requirements

Install

$ gem install gscraper

Examples

Basic query:

q = GScraper::Search.query(:query => 'ruby')

Advanced query:

q = GScraper::Search.query(:query => 'ruby') do |q|
  q.without_words = 'is'
  q.within_past_day = true
  q.numeric_range = 2..10
end

Queries from URLs:

q = GScraper::Search.query_from_url('http://www.google.com/search?as_q=ruby&as_epq=&as_oq=rails&as_ft=i&as_qdr=all&as_occt=body&as_rights=%28cc_publicdomain%7Ccc_attribute%7Ccc_sharealike%7Ccc_noncommercial%29.-%28cc_nonderived%29')

q.query # => "ruby"
q.with_words # => "rails"
q.occurs_within # => :title
q.rights # => :cc_by_nc

Getting the search results:

q.first_page.select do |result|
  result.title =~ /Blog/
end

q.page(2).map do |result|
  result.title.reverse
end

q.result_at(25) # => Result

q.top_result # => Result

A Result object contains the rank, title, summary, cahced URL, similiar query URL and link URL of the search result.

page = q.page(2)

page.urls # => [...]
page.summaries # => [...]
page.ranks_of { |result| result.url =~ /^https/ } # => [...]
page.titles_of { |result| result.summary =~ /password/ } # => [...]
page.cached_pages # => [...]
page.similar_queries # => [...]

Iterating over the search results:

q.each_on_page(2) do |result|
  puts result.title
end

page.each do |result|
  puts result.url
end

Iterating over the data within the search results:

page.each_title do |title|
  puts title
end

page.each_summary do |text|
  puts text
end

Selecting search results:

page.results_with do |result|
  ((result.rank > 2) && (result.rank < 10))
end

page.results_with_title(/Ruby/i) # => [...]

Selecting data within the search results:

page.titles # => [...]

page.summaries # => [...]

Selecting the data of search results based on the search result:

page.urls_of do |result|
  result.description.length > 10
end

Selecting the Sponsored Links of a Query:

q.sponsored_links # => [...]

q.top_sponsored_link # => SponsoredAd

Setting the User-Agent globally:

GScraper.user_agent # => nil
GScraper.user_agent = 'Awesome Browser v1.2'

License

GScraper - A web-scraping interface to various Google Services.

Copyright (c) 2007-2012 Hal Brodigan (postmodern.mod3 at gmail.com)

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA

Something went wrong with that request. Please try again.