Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
scrape the best content from a page
Ruby
Branch: master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
lib new version
spec
.gitignore
.rspec
.rvmrc
Gemfile
LICENSE
README.rdoc
Rakefile
mechanize_content.gemspec

README.rdoc

mechanize-content

Returns the most important pieces of content on a web page. Finds the best block of text, image and title by analysing the page content.

Usage

Pass in a URL on initialisation and then call the helpers to pull the best content out.

mc = MechanizeContent::Parser.new("www.joystiq.com/2010/03/19/mag-gets-free-trooper-gear-pack-dlc-next-week/")

mc.best_title

"MAG gets free 'Trooper Gear Pack' DLC next week -- Joystiq"

mc.best_text

"Ten-hut, soldiers! HQ has just sent word that some new gear will be shipping to the front lines of MAG next week, free of charge: the Trooper Gear Pack. In this parcel, we'll finally get access to the Flashbang grenade..."

mc.best_image

""

The gem also supports multiple URLs and will find the best content between them. The order in which they are inserted determines priority.

Dependancies

  • Mechanize

  • imagesize

Note on Patches/Pull Requests

  • Fork the project.

  • Make your feature addition or bug fix.

  • Add tests for it. This is important so I don't break it in a future version unintentionally.

  • Commit, do not mess with rakefile, version, or history. (if you want to have your own version, that is fine but bump version in a commit by itself I can ignore when I pull)

  • Send me a pull request. Bonus points for topic branches.

Copyright

Copyright © 2010 John Griffin. See LICENSE for details.

Something went wrong with that request. Please try again.