Skip to content
Ruby gem for web scraping purposes. It scrapes a given URL, and returns you its title, meta description, meta keywords, an array with all the links, all the images in it, etc.
Find file
New pull request
Fetching latest commit...
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.



MetaInspector is a gem for web scraping purposes. You give it an URL, and it lets you easily get its title, links, and meta tags.


Install the gem from RubyGems:

gem install metainspector


Initialize a scraper instance for an URL, like this:

page ='')

or, for short, a convenience alias is also available:

page ='')

Then you can see the scraped data like this:

page.address            # URL of the page
page.title              # title of the page, as string
page.links              # array of strings, with every link found on the page
page.meta_description   # meta description, as string
page.meta_keywords      # meta keywords, as string

MetaInspector uses dynamic methods for meta_tag discovery, so all these will work, and will be converted to a search of a meta tag by the corresponding name, and return its content attribute

page.meta_description       # <meta name="description" content="..." />
page.meta_keywords          # <meta name="keywords" content="..." />
page.meta_robots            # <meta name="robots" content="..." />
page.meta_generator         # <meta name="generator" content="..." />

It will also work for the meta tags of the form <meta http-equiv=“name” … />, like the following:

page.meta_content_language  # <meta http-equiv="content-language" content="..." />
page.meta_Content_Type      # <meta http-equiv="Content-Type" content="..." />

Please notice that MetaInspector is case sensitive, so page.meta_Content_Type is not the same as page.meta_content_type

The full scraped document if accessible from:

page.document # Nokogiri doc that you can use it to get any element from the page


You can find some sample scripts on the samples folder, including a basic scraping and a spider that will follow external links using a queue. What follows is an example of use from irb:

$ irb
>> require 'metainspector'
=> true

>> page ='')
=> #<MetaInspector:0x11330c0 @document=nil, @links=nil, @address="", @description=nil, @keywords=nil, @title=nil>

>> page.title
=> " :: Track your PageRank changes"

>> page.meta_description
=> "Track your PageRank(TM) changes and receive alerts by email"

>> page.meta_keywords
=> "pagerank, seo, optimization, google"

>> page.links.size
=> 8

>> page.links[5]
=> ""

>> page.document.class
=> String

>> page.parsed_document.class
=> Nokogiri::HTML::Document

To Do

Copyright © 2009-2011 Jaime Iniesta, released under the MIT license

Something went wrong with that request. Please try again.