Skip to content
The simple and hussle-free Ruby web scraper.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.circleci
bin
lib
spec
.gitignore
.hound.yml
.pryrc
.rspec
.rubocop.yml
.travis.yml
CHANGELOG.md
CODE_OF_CONDUCT.md
Gemfile
LICENSE.txt
README.md
Rakefile
tsumamigui.gemspec

README.md

Tsumamigui

Gem Version circleci Build Status Code Climate Test Coverage Dependency Status Inline docs codebeat badge

Tsumamigui(つまみぐい) is a simple and hussle-free Ruby web scraping library.

Requirement

Ruby 2.1+

Installation

Add this line to your application's Gemfile:

gem 'tsumamigui'

Or install it yourself as:

$ gem install tsumamigui

Usage

You just give it a URL(or URLs) and Xpath to data you want to get with its label as a hash. Then you can get scraped and parsed data as array.

Tsumamigui.scrape('http://example.com', {h1: 'html/body/div/h1/text()'})

# Returns:
# [
#   {h1: 'Example Domain', scraped_from: 'http://example.com'}
# ]

You can specify multiple URLs if you want to scrape different pages which they have the same HTML structure.

urls = ['http://example.com/page/1', 'http://example.com/page/2']
Tsumamigui.scrape(urls, {h1: 'html/body/div/h1/text()'})

# Returns:
# [
#   {h1: 'Example Domain 1', scraped_from: 'http://example.com/page/1'}
#   {h1: 'Example Domain 2', scraped_from: 'http://example.com/page/2'}
# ]

Important: Tsumamigui requests each urls at intervals of 1.0~3.0sec automatically.

TODO

  • Custom request headers.

etc...

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/obiyuta/tsumamigui. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.

Guideline

  1. Fork it ( http://github.com/obiyuta/tsumamigui )
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Write codes and specs.
    • Run test suite with bundle exec rspec and confirm that it passes
    • Run lint checker with the bundle exec rubocop and confirm that it passes
  4. Commit your changes (git commit -am 'Add some feature')
  5. Push to the branch (git push origin my-new-feature)
  6. Create new Pull Request

License

The gem is available as open source under the terms of the MIT License.

Copyright (c) 2017 Obi Yuta. See MIT-LICENSE for details.

You can’t perform that action at this time.