Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Download, unpack from a ZIP/TAR/GZ/BZ2 archive, parse, correct, convert units and import Google Spreadsheets, XLS, ODS, XML, CSV, HTML, etc. into your ActiveRecord models. Uses RemoteTable gem internally.
Ruby

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
lib
test
.document
.gitignore
CHANGELOG
LICENSE
README.rdoc
Rakefile
VERSION
data_miner.gemspec

README.rdoc

data_miner

Mine remote data into your ActiveRecord models.

Quick start

Put this in config/environment.rb:

config.gem 'data_miner'

You need to define data_miner blocks in your ActiveRecord models. For example, in app/models/country.rb:

class Country < ActiveRecord::Base
  data_miner do |step|
    # import country names and country codes
    step.import :url => 'http://www.cs.princeton.edu/introcs/data/iso3166.csv' do |attr|
      attr.key :iso_3166, :field_name => 'country code'
      attr.store :iso_3166, :field_name => 'country code'
      attr.store :name, :field_name => 'country'
    end
  end
end

…and in app/models/airport.rb:

class Airport < ActiveRecord::Base
  belongs_to :country

  data_miner do |step|
    # import airport iata_code, name, etc.
    step.import(:url => 'http://openflights.svn.sourceforge.net/viewvc/openflights/openflights/data/airports.dat', :headers => false) do |attr|
      attr.key :iata_code, :field_number => 3
      attr.store :name, :field_number => 0
      attr.store :city, :field_number => 1
      attr.store :country, :field_number => 2, :foreign_key => :name       # will use Country.find_by_name(X)
      attr.store :iata_code, :field_number => 3
      attr.store :latitude, :field_number => 5
      attr.store :longitude, :field_number => 6
    end
  end
end

Put this in lib/tasks/data_miner_tasks.rake: (unfortunately I don't know a way to automatically include gem tasks, so you have to do this manually for now)

namespace :data_miner do
  task :run => :environment do
    DataMiner.run :class_names => ENV['CLASSES'].to_s.split(/\s*,\s*/).flatten.compact
  end
end

You need to specify what order to mine data. For example, in config/initializers/data_miner_config.rb:

DataMiner.enqueue do |queue|
  queue << Country  # class whose data should be mined 1st
  queue << Airport  # class whose data should be mined 2nd
  # etc
end

Once you have (1) set up the order of data mining and (2) defined data_miner blocks in your classes, you can:

$ rake data_miner:run

Complete example

~ $ rails testapp
~ $ cd testapp/
~/testapp $ ./script/generate model Airport iata_code:string name:string city:string country_id:integer latitude:float longitude:float
~/testapp $ ./script/generate model Country iso_3166:string name:string
~/testapp $ rake db:migrate
~/testapp $ touch lib/tasks/data_miner_tasks.rb
[...edit per quick start...]
~/testapp $ touch config/initializers/data_miner_config.rake
[...edit per quick start...]
~/testapp $ rake data_miner:run

Now you should have

~/testapp $ ./script/console 
Loading development environment (Rails 2.3.3)
>> Airport.first.iata_code
=> "GKA"
>> Airport.first.country.name
=> "Papua New Guinea"

Authors

  • Seamus Abshere <seamus@abshere.net>

  • Andy Rossmeissl <andy@rossmeissl.net>

Copyright

Copyright © 2009 Brighter Planet. See LICENSE for details.

Something went wrong with that request. Please try again.