Skip to content
Download, unpack from a ZIP/TAR/GZ/BZ2 archive, parse, correct, convert units and import Google Spreadsheets, XLS, ODS, XML, CSV, HTML, etc. into your ActiveRecord models. Uses RemoteTable gem internally.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.



Download, pull out of a ZIP/TAR/GZ/BZ2 archive, parse, correct, and import XLS, ODS, XML, CSV, HTML, etc. into your ActiveRecord models.

Tested in MRI 1.8.7+, MRI 1.9.2+, and JRuby 1.6.7+. Thread safe.

Real-world usage

Brighter Planet logo

We use data_miner for data science at Brighter Planet and in production at

The killer combination for us is:

  1. active_record_inline_schema - define table structure
  2. remote_table - download data and parse it
  3. errata - apply corrections in a transparent way
  4. data_miner (this library!) - import data idempotently


Check out the extensive documentation.

Quick start

You define data_miner blocks in your ActiveRecord models. For example, in app/models/country.rb:

class Country < ActiveRecord::Base
  self.primary_key = 'iso_3166_code'

  data_miner do
    import("'s Country Codes to Country Names list",
           :url => '',
           :format => :delimited,
           :delimiter => '; ',
           :headers => false,
           :skip => 22) do
      key   :iso_3166_code, :field_number => 0
      store :iso_3166_alpha_3_code, :field_number => 1
      store :iso_3166_numeric_code, :field_number => 2
      store :name, :field_number => 5

Now you can run:

>> Country.run_data_miner!
=> nil

More advanced usage

The earth library has dozens of real-life examples showing how to download, pull out of a ZIP/TAR/BZ2 archive, parse, correct, and import CSVs, fixed-width files, ODS, XLS, XLSX, even HTML and XML:

Model Highlights Reference
Aircraft parsing Microsoft Frontpage HTML (!) data_miner.rb
Airports forcing column names and use of :select block (Proc) data_miner.rb
Automobile model variants super advanced usage of "custom parser" and errata data_miner.rb
Country parsing CSV and a few other tricks data_miner.rb
EGRID regions parsing XLS data_miner.rb
Flight segment (stage) super advanced usage of POSTing form data data_miner.rb
Zip codes downloading a ZIP file and pulling an XLSX out of it data_miner.rb

And many more - look for the data_miner.rb file that corresponds to each model. Note that you would normally put the data_miner declaration right inside the ActiveRecord model file... it's kept separate in earth so that loading it is optional.



Copyright (c) 2012 Brighter Planet. See LICENSE for details.

Something went wrong with that request. Please try again.