Ruby ETL Utilities
Ruby
Switch branches/tags
Nothing to show
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
bin
introducing-abtab
lib
test/fixtures/files
.gitignore
README.textile
Rakefile
TODO
abstract-tables.gemspec

README.textile

Abstract Table Library

In the spirit of the standard Unix utilites (like cat, grep, cut, etc.), this library creates an abstraction over tabular data. The library implements a series of drivers for sources with different encodings that act like sequences.

Command Line Utilities

atcat

# dump a postgres table to the console (default is tab delimited)
atcat dbi://user:pass@Pg/localhost/db_name/table_name | atview | less
# if you create an $HOME/.abtab.dbrc  (see below), you can omit the user/pass:
atcat dbi://pg/localhost/database_name/table_name

# export a table to a tab file:
atcat dbi://pg/localhost/database_name/table_name tab://table_name.tab

# you can omit the schema for files, the schema will be guessed based on the file extension
atcat dbi://pg/localhost/database_name/table_name table_name.tab

# dump to csv
atcat dbi://pg/localhost/database_name/table_name csv://table_name.csv
# you can omit the scehma
atcat dbi://pg/localhost/database_name/table_name table_name.csv

# convert from csv to tab
atcat csv://some-file.csv tab://some-file.tab

atcat some-file.csv some-file.tab

# convert form csv to pipe
atcat csv://some-file.csv 'tab://some-file.tab?col_sep=|'

atview

atview tab://some-file.tab | less
atview csv://some-file.csv | less
atview dbi://pg/localhost/database_name/table_name | less
# view a list of tables:
atview dbi://pg/localhost/database_name | less

atgrep

Filter a table source by passing an expression, which has access to the record via the var r:

atgrep -e 'r[0] == "this"' test/fixtures/files/file1.csv | atview

# ruby is mutable, so if you feel like being naughty, you can modify during the grep:
atgrep -e 'r[0] = r[0].upcase; r[0] == "THIS"' test/fixtures/files/file1.csv

# but that's what atmod is for...

atcut

Filter a source, selecting specific columns.

# pull only the first and 3rd column:
bin/atcut -f 1,3 test/fixtures/files/file1.tab
<pre>

h1. Suported Drivers

h2. tab

Native Ruby for now.

h3. col_sep

Defaults to a tab character.

h2. csv

Via the "FasterCSV":http://fastercsv.rubyforge.org/ ruby gem.

h3. col_sep

Override the default ',' column seperator.

h3. quote_char

Override the default quote_char (").

h2. dbi

(only tested with PostgreSQL)

h3. @$HOME/.abtab.dbrc@

With the dbi driver, you can create the file @.abtab.dbrc@ in your home directory and include the connection credentails for the databases you commonly use.  It must be a YAML file, you can create it from a ruby script or irb with:

<pre>
require 'yaml'

cfg = { 
  "localhost" => {
    "database_name" => {
      "user" => "username",
      "pass" => "password" 
    }
  }
}
File.open("#{ENV['HOME']}/.abtab.dbrc","w") do |f|
  f.puts cfg.to_yaml
end

License

Authors

Kyle Burton <kyle.burton@gmail.com>