GoogleBig Query ActiveRecord Adapter & API client
Ruby HTML Other
Latest commit 7eb36f4 Apr 20, 2015 @michelson Merge pull request #21 from michelson/naming-conventions
Naming conventions
Failed to load latest commit information.
gemfiles testing insturctions on Readme Apr 19, 2015
lib version bump Apr 19, 2015
.gitignore Merge branch 'master' of into que… Jul 8, 2014
.pryrc datetime parse options Feb 23, 2014
.rspec fix joins column creation in new record instanciation Feb 23, 2014
.ruby-gemset .rvmrc switched to .ruby-* files to comply with RVM suggestions Mar 7, 2014
.ruby-version .rvmrc switched to .ruby-* files to comply with RVM suggestions Mar 7, 2014
Appraisals appraisals support & rails 4.0 support Feb 26, 2015
Gemfile appraisals Feb 26, 2015
LICENSE.txt first commit, basic datasets crud Feb 20, 2014
Rakefile appraisals support & rails 4.0 support Feb 26, 2015
bigbroda.gemspec rename class names Mar 14, 2015


GoogleBigQuery ActiveRecord Adapter & standalone API client

Use Cases:

BigQuery is fantastic for running ad hoc aggregate queries across a very very large dataset - large web logs, ad analysis, sensor data, sales data... etc. Basically, many kinds of "full table scan" queries. Queries are written in a SQL-style language (you don't have to write custom MapReduce functions).

But!, Bigquery has a constraint to consider before diving in, BQ is append only , that means that you can't update records or delete them.

So, use BigQuery as an OLAP (Online Analytical Processing) service, not as OLTP (Online Transactional Processing). In other words, use BigQuery as a DataWareHouse.


Add 'bigbroda' to your application's Gemfile or install it yourself as:

$ gem install bigbroda

Rails / ActiveRecord:

This gem supports ActiveRecord 4.0 / 4.1.

Support for 4.2 is on the way!.

Configure GoogleBigQuery:

rails g bigbroda:install

Or generate a file in config/initializers/bigquery.rb with the following contents:

BigBroda::Config.setup do |config|
  config.pass_phrase = ["pass_phrase"]
  config.key_file    = ["key_file"]
  config.scope       = ["scope"]       = ["email"]
  config.retries     = [retries]

retries indicates the number of times to retry on recoverable errors (no retries if set to one or not present)

Active Record Adapter


ActiveRecord connection in plain ruby:

  :adapter => 'bigquery',
  :project => "MyBigQueryProject",
  :database => "MyBigTable"

In Rails app you can use the :adapter, :project and :database options in your database.yml or use the establish_bq_connection(bq_connection) connection in specific models.

  adapter: sqlite3
  database: db/development.sqlite3
  pool: 5

  database: "dummy_dev"
  adapter: 'bigquery'
  project: 123456
  #database: "dummy_test"

By default if you set the development/production/test BD configuration as a bigquery connection all models are Bigquery, migrations and rake:db operations use the BigQuery migration system.

If you don't want to make all your models BigQuery you can set up specific BQ activeRecord models this way:

class UserLog < ActiveRecord::Base
  establish_bq_connection "bigquery"

Then you will have to execute the migration programaticly. like this:





The GoogleBigQuery Adapter brings some of the ActiveRecord nicieties out of the box:

User.first, User.last
User.find_by(name: "")"name")"name").where("name contains ?", "frank")"name, id").where("name contains ?", "frank").count
User.where("id =? and name= ?", "some-id-1393025921", "Frank")
User.where.not("admin = ?", false)

Note about Joins:

BigQuery supports two types of JOIN operations:

  • JOIN requires that the right-side table contains less than 8 MB of compressed data.
  • JOIN EACH allows join queries for tables of any size.

BigQuery supports INNER and LEFT OUTER joins. The default is INNER.

see more at:


  User.create(name: "frank capra")
  @user  = = "Frank"
User.create([{name: "miki"}, {name: "jara"}])

NOTE: by default the adapter will set Id values as an SecureRandom.hex, and for now all the foreign keys are created as a STRING type

Deletion and edition of single rows:

BigQuery tables are append-only. The query language does not currently support either updating or deleting data. In order to update or delete data, you must delete the table, then recreate the table with new data. Alternatively, you could write a query that modifies the data and specify a new results table.

I would actually recommend creating a new table for each day. Since BigQuery charges by amount of data queried over, this would be most economical for you, rather than having to query over entire massive datasets every time.

By the way - how are you currently collecting your data?

Massive Export / Import of data

Google Bigquery allows to import and export large datasets of data the default formats are JSON and CSV, currently the adapter is only able to export JSON format.


The export can be acomplished very easy from an active record model as:


where destination should be a valid google cloud store uri. The adapter will manage that , so you only need to pass the file name. Example:


the adapter will convert that option to gs://[configured_database]/[file.json]. Just be sure to create the bucket propperly in Cloud Storage panel. Also if you don't pass the file argument you will get an generated uri like: gs://[configured_database]/[table_name].json.


There are two ways to import massive data in bigquery, one is from a file from google cloud store and the second is from multipart Post

From google cloud storage:


From multipart/related post:



This adapter has migration support migrations built in, but

class CreateUsers < ActiveRecord::Migration
  def self.up
    create_table :users do |t|
      t.string :name
      t.record :nested_data
      t.references :taggable, :polymorphic => true
      t.boolean :admin

  def self.down
    drop_table :users

class AddPublishedToUser < ActiveRecord::Migration
  def change
    add_column :users, :published, :boolean, default: true


  • Big query does not provide a way to update columns nor delete, so update_column, or remove_column migration are cancelled with an catched exception.
  • Also the schema_migrations table is not created in DB, is created as a json file in db/schema_migrations.json instead. Be sure to add the file in your version control.

Standalone Client:

Configuration setup:

Configure BigBroda client:

BigBroda::Config.setup do |config|
  config.pass_phrase = "notasecret"
  config.key_file    = /location/to_your/key_file.p12
  config.scope       = ""       = ""
  config.retries     = 1

retries indicates the number of times to retry on recoverable errors (no retries if set to one or not present)

And authorize client:

@auth =

Then you are ready to go!




Exporting data into multiple files

BigQuery can export up to 1 GB of data per file. If you plan to export more than 1 GB, you can use a wildcard character to instruct BigQuery to export to multiple files.

Note: it may take a while.

  BigBroda::Jobs.export(project_id, dataset_id, table_id, bucket_location)


BigBroda::Jobs.query(@project, {"query"=> "SELECT * FROM [#{@dataset_id}.#{@table_name}] LIMIT 1000" })





BigBroda::Dataset.create(@project, {"datasetReference"=> { "datasetId" => @dataset_id }} )


BigBroda::Dataset.delete(@project, @dataset_id }} )


Updates information in an existing dataset. The update method replaces the entire dataset resource, whereas the patch method only replaces fields that are provided in the submitted dataset resource.

BigBroda::Dataset.update(@project, @dataset_id,
      {"datasetReference"=> {
       "datasetId" =>@dataset_id },
      "description"=> "foobar"} )

Updates information in an existing dataset. The update method replaces the entire dataset resource, whereas the patch method only replaces fields that are provided in the submitted dataset resource. This method supports patch semantics.

BigBroda::Dataset.patch(@project, @dataset_id,
        {"datasetReference"=> {
         "datasetId" =>@dataset_id },
        "description"=> "foobar"} )



@table_body = {  "tableReference"=> {
                    "projectId"=> @project,
                    "datasetId"=> @dataset_id,
                    "tableId"=> @table_name},
        "schema"=> [fields:
                      {:name=> "name", :type=> "string", :mode => "REQUIRED"},
                      {:name=>  "age", :type=> "integer"},
                      {:name=> "weight", :type=> "float"},
                      {:name=> "is_magic", :type=> "boolean"}

BigBroda::Table.create(@project, @dataset_id, @table_body


    BigBroda::Table.update(@project, @dataset_id, @table_name,
        {"tableReference"=> {
         "projectId" => @project, "datasetId" =>@dataset_id, "tableId"  => @table_name },
        "description"=> "foobar"} )


BigBroda::Table.delete(@project, @dataset_id, @table_name )


    BigBroda::Table.list(@project, @dataset_id )

Table Data


Streaming data into BigQuery is free for an introductory period until January 1st, 2014. After that it will be billed at a flat rate of 1 cent per 10,000 rows inserted. The traditional jobs().insert() method will continue to be free. When choosing which import method to use, check for the one that best matches your use case. Keep using the jobs().insert() endpoint for bulk and big data loading. Switch to the new tabledata().insertAll() endpoint if your use case calls for a constantly and instantly updating stream of data.

@rows =   {"rows"=> [
                        "json"=> {
                          "name"=> "User #{}"

BigBroda::TableData.create(@project, @name, @table_name , @rows )


BigBroda::TableData.list(@project, @dataset_id, @table_name)


Install deps

appraisal install

Run rspec suite for versions:

appraisal rails-3 rake spec
appraisal rails-4.0.3 rake spec
appraisal rails-4.1 rake spec
appraisal rails-4.2 rake spec


Api Explorer:

Google Big query developer guide



  1. Fork it
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create new Pull Request



  • AR migration copy tables to update it (copy to gs:// , delete table, import table from gs://)
  • AR migrate BQ record type
  • Make id and foreign keys types and values configurable
  • Jobs make multipart/related upload