Given a file (or a string) along with options, it will return a hash of those values. Great for importing poorly formatted CSV files.
Ruby
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
lib
spec
tasks
.gitignore
.travis.yml
Gemfile
Guardfile
LICENSE
LICENSE.txt
README.md
Rakefile
table_importer.gemspec

README.md

Gem Version Build Status Coverage Status Code Climate

============== Table Importer

Given a file (or a string) containing a container, along with options, it will return a hash of those values. Great for importing poorly formatted CSV files. It can handle CSV, Excel (xls and xlsx), Google Drive Spreadsheet, and a copy and pasted string.

Only works for ruby versions >= 1.9.3.

Contributing to Table Importer

We love your contributions to Table Importer. Before submitting a pull request, please make sure that your changes are well tested.

Then, you'll need to install bundler and the gem dependencies:

gem install bundler && bundle install

You should now be able to run the local tests:

bundle exec rake

Interact with table_importer by creating a TableImporter instance, and then calling methods on that instance.

importer = TableImporter::Source.new({options})

The options you pass in are:

  # The type of the spreadsheet/input you want to import
  :type => "google" # Google Drive spreadsheet
        => "csv" # CSV file
        => "xls" # Excel spreadsheet
        => "copy_and_paste" # Copy and pasted input
  
  # The content to input. Either a file, a string, or google oauth keys.
  :content => File.open("path/to/file") # for types csv, xls
           => "Name, Email, Phone Number
              Nick, nick@example.com, 6412345678" # For type copy_and_paste
           =>  "google_access_token, spreadsheet_id" # For type google
  
  # Whether the first row of input contains column headers
  :headers_present => true # First row of input is headers
                   => false # First row of input is not headers

  # Optionally you can provide mapping for the columns. (This can be incomplete).
  :user_headers => {
                      "email"=>"0", 
                      "organization"=>"4", 
                      "url"=>"9"
                   }
  # Used to separate columns. Pass in 'nil' if using Google Spreadsheet, Excel or you don't know.
  :column_separator => :comma # ','
                    => :space # ' '
                    => :tab # '\t'
                    => :semicolon # ';'
                       
  # Used to separate rows. Pass in 'nil' if using Google Spreadsheet, Excel or you don't know.
  :record_separator => :newline_mac # '\n'
                    => :newline_windows # '\r\n'
                    => :old_newline_mac # '\r' (from OSX 9 days)
  
  # A hash of compulsory headers. At the moment only "email" is supported.
  :compulsory_headers => {
                            :email => true, false # Does each record require an email address to be valid?
                          }

  # Whether nil values that are a string (ie strings that equal "NULL", "null", "nil", or "undefined") should be replaced with actual nil values.
  :remove_nil_values => true
  :remove_nil_values => false
  

There are a few ways to interact with the table importer:

  options = { type: "csv",  }
  importer = TableImporter::Source.new(options)

  # get the type
  puts importer.get_type
    => "csv"
    
  # get the column separator
  puts importer.get_column_separator
    => "semicolon"
    
  # get the row separator
  puts importer.get_record_separator
    => "newline_mac"
    
  # Get the headers (either the first row if headers are provided, or else default headers
  puts importer.get_headers
   => "column_1, column_2, column_3"
  
  # Get the first 8 lines (useful for providing a matching option for the user to map their own headers, like Mailchimp's contact import.
  puts importer.get_preview_lines
    => [{:column_1 => "r1c1", :column_2 => "r1c2", :column_3 => "r1c3"}, {:column_1 => "r2c1", :column_2 => "r2c2", :column_3 => "r2c3"} etc]
  
  # Get input chunked in an input size (size defaults to 50)
  puts importer.get_chunks
    => All input chunked into 50 line blocks.
  
  puts importer.get_chunks(25)
    => All input chunked into 25 line blocks.
  
  # The format for the returned chunks is not a simple array of hashes, like get_preview_lines, as it also includes per-row errors
  puts importer.get_chunks(2)
    => [{:lines => [{:column_1 => "r1c1", :column_2 => "r1c2", :column_3 => "r1c3"}, {:column_1 => "r2c1", :column_2 => "r2c2", :column_3 => "r2c3"}], :errors => []}, {:lines => [{:column_1 => "r3c1", :column_2 => "r3c2", :column_3 => "r3c3"}, {:column_1 => "r4c1", :column_2 => "r4c2", :column_3 => "r4c3"}], :errors => []}]

  # The errors hash is for lines that don't contain the compulsory headers, are blank/empty, or the entire line contains no alphanumeric characters.

  # Gets lines of input returned in an array of hashes (doesn't work for CSV yet)
  # Pass in start and end points
  puts importer.get_lines(0, 1)
    => [{:column_1 => "r1c1", :column_2 => "r1c2", :column_3 => "r1c3"}]
  
  # Or let it default to getting all lines
  puts importer.get_lines
    => All of the lines
    
  puts importer.get_lines(5, 25) 
    => Line 5 up to line 25
    
  puts importer.get_lines(5, -1)
    => Line 5 to the end of the input.