Skip to content
This repository

Download, unpack from a ZIP/TAR/GZ/BZ2 archive, parse, correct, convert units and import Google Spreadsheets, XLS, ODS, XML, CSV, HTML, etc. into your ActiveRecord models. Uses RemoteTable gem internally.

Fetching latest commit…

Cannot retrieve the latest commit at this time

README.rdoc

data_miner

Mine remote data into your ActiveRecord models.

Quick start

Put this in config/environment.rb:

config.gem 'data_miner'

You need to define data_miner blocks in your ActiveRecord models. For example, in app/models/country.rb:

class Country < ActiveRecord::Base
  set_primary_key :iso_3166

  data_miner do
    import 'The official ISO country list', :url => 'http://www.iso.org/iso/list-en1-semic-3.txt', :skip => 2, :headers => false, :delimiter => ';' do
      key 'iso_3166'
      store 'iso_3166', :field_number => 1
      store 'name', :field_number => 0
    end

    import 'A Princeton dataset with better capitalization for some countries', :url => 'http://www.cs.princeton.edu/introcs/data/iso3166.csv' do
      key 'iso_3166'
      store 'iso_3166', :field_name => 'country code'
      store 'name', :field_name => 'country'
    end
  end
end

…and in app/models/airport.rb:

class Airport < ActiveRecord::Base
  set_primary_key :iata_code

  data_miner do
    import :url => 'http://openflights.svn.sourceforge.net/viewvc/openflights/openflights/data/airports.dat', :headers => false, :select => lambda { |row| row[4].present? } do
      key 'iata_code'
      store 'name', :field_number => 1
      store 'city', :field_number => 2
      store 'country_name', :field_number => 3
      store 'iata_code', :field_number => 4
      store 'latitude', :field_number => 6
      store 'longitude', :field_number => 7
    end
  end
end

Put this in lib/tasks/data_miner_tasks.rake: (unfortunately I don't know a way to automatically include gem tasks, so you have to do this manually for now)

namespace :data_miner do
  task :run => :environment do
    resource_names = %w{R RESOURCES RESOURCE RESOURCE_NAMES}.map { |possible_key| ENV[possible_key].to_s }.join.split(/\s*,\s*/).flatten.compact
    DataMiner.run :resource_names => resource_names
  end
end

Once you have (1) set up the order of data mining and (2) defined data_miner blocks in your classes, you can:

$ rake data_miner:run RESOURCES=Airport,Country

Complete example

~ $ rails testapp
~ $ cd testapp/
~/testapp $ ./script/generate model Airport iata_code:string name:string city:string country_name:string latitude:float longitude:float
[...edit migration to make iata_code the primary key...]
~/testapp $ ./script/generate model Country iso_3166:string name:string
[...edit migration to make iso_3166 the primary key...]
~/testapp $ rake db:migrate
~/testapp $ touch lib/tasks/data_miner_tasks.rb
[...edit per quick start...]
~/testapp $ rake data_miner:run RESOURCES=Airport,Country

Now you should have

~/testapp $ ./script/console 
Loading development environment (Rails 2.3.3)
>> Airport.first.iata_code
=> "GKA"
>> Airport.first.country_name
=> "Papua New Guinea"

Authors

  • Seamus Abshere <seamus@abshere.net>

  • Andy Rossmeissl <andy@rossmeissl.net>

Copyright

Copyright © 2010 Brighter Planet. See LICENSE for details.

Something went wrong with that request. Please try again.