Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

ScrApify is a library to build APIs by scraping static sites and use data as models or JSON APIs. It powers APIfy which is used to create JSON APIs from any html or wikipedia page

branch: master
README.md

ScrApify

ScrApify is a library to build APIs by scraping static sites and use data as models or JSON APIs

Installation

$ gem install scrapify

If you're using Bundler, add this to your Gemfile:

gem 'scrapify'

Usage

Define html url and declare attributes using xpath or css selectors. Scrapify classes must have a key attribute defined.

class Pizza
  include Scrapify::Base
  html "http://www.dominos.co.in/menuDetails_ajx.php?catgId=1"

  attribute :name, css: ".menu_lft li a"
  attribute :image_url, xpath: "//li//input//@value"

  key :name
end

Now you can use finder methods to extract data from a static site

> Pizza.all

> pizza = Pizza.find('mushroom')
> pizza.name
> pizza.image_url

> Pizza.first
> Pizza.last

> Pizza.count

JSON API (Rack application example)

Scrapify comes with a Rack application called Jsonify which can expose scraped models as JSON.

Check out this Rack application example for more details:

https://github.com/sathish316/jsonify_rack_example

In your Rack application map the routes you want to expose as JSON using Rack::Builder#map

  map '/pizzas' do
    run Jsonify.new('/pizzas', ::Pizza)
  end

This will respond to two urls index and show with JSON:

  • /pizzas
  • /pizzas/:id

Jsonify currently has a limitation where the URLs /pizzas.json and /pizzas/1.json cannot be matched by the same map entry in Rack routes

JSON API (Rails example)

Scrapify comes with a Rack application called Jsonify which can be used in rails routes to expose scraped models as JSON.

Check out this Rails example for more details:

https://github.com/sathish316/jsonify_rails_example

1 Add scrapify to Gemfile

gem 'scrapify'

2 Define model to scrap data in app/models

class Pizza
  include Scrapify::Base
end

3 Add index and show API to routes

  pizza_api = Jsonify.new('/pizzas', Pizza)
  get 'pizzas' => pizza_api
  get 'pizzas/:id' => pizza_api

Jsonify scraps url and exposes index and show urls as JSON APIs

License

ScrApify is released under the MIT license:

http://www.opensource.org/licenses/MIT

Something went wrong with that request. Please try again.