This repository has been archived by the owner. It is now read-only.
Stream data into Google BigQuery concurrently using InsertAll()
Switch branches/tags
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
.gitignore complete refactor of entire package Aug 28, 2016
.travis.yml add missing golang 1.7, 1.5.x, 1.6.x versions to travis Aug 28, 2016
LICENSE remove redundant line from LICENSE (still MIT, no worries) Oct 29, 2017
Makefile fix makefile, accidently skips unit tests Aug 28, 2016
README.md no longer mainting this project Oct 29, 2017
async_option_test.go complete refactor of entire package Aug 28, 2016
async_worker.go add SyncWorker.RowLen function that counts number of enqueued rows Aug 29, 2016
async_worker_group.go complete refactor of entire package Aug 28, 2016
async_worker_group_example_test.go complete refactor of entire package Aug 28, 2016
async_worker_group_options.go complete refactor of entire package Aug 28, 2016
async_worker_group_test.go complete refactor of entire package Aug 28, 2016
async_worker_test.go add SyncWorker.RowLen function that counts number of enqueued rows Aug 29, 2016
bigquery.png forgot to add bigquery.png logo in #16 Aug 28, 2016
bqstreamer.go complete refactor of entire package Aug 28, 2016
error_insert.go complete refactor of entire package Aug 28, 2016
error_insert_test.go complete refactor of entire package Aug 28, 2016
error_row.go complete refactor of entire package Aug 28, 2016
error_row_test.go complete refactor of entire package Aug 28, 2016
error_table.go complete refactor of entire package Aug 28, 2016
insert_errors_example_test.go complete refactor of entire package Aug 28, 2016
jwt.go complete refactor of entire package Aug 28, 2016
project.go complete refactor of entire package Aug 28, 2016
row.go complete refactor of entire package Aug 28, 2016
sync_example_test.go complete refactor of entire package Aug 28, 2016
sync_integration_test.go complete refactor of entire package Aug 28, 2016
sync_option_test.go complete refactor of entire package Aug 28, 2016
sync_options.go complete refactor of entire package Aug 28, 2016
sync_worker.go add SyncWorker.RowLen function that counts number of enqueued rows Aug 29, 2016
sync_worker_test.go add SyncWorker.RowLen function that counts number of enqueued rows Aug 29, 2016

README.md

Kik and me (@oryband) are no longer maintaining this repository. Thanks for all the contributions. You are welcome to fork and continue development.

BigQuery Streamer BigQuery GoDoc

Stream insert data into BigQuery fast and concurrently, using InsertAll().

Features

  • Insert rows from multiple tables, datasets, and projects, and insert them bulk. No need to manage data structures and sort rows by tables - bqstreamer does it for you.
  • Multiple background workers (i.e. goroutines) to enqueue and insert rows.
  • Insert can be done in a blocking or in the background (asynchronously).
  • Perform insert operations in predefined set sizes, according to BigQuery's quota policy.
  • Handle and retry BigQuery server errors.
  • Backoff interval between failed insert operations.
  • Error reporting.
  • Production ready, and thoroughly tested. We - at Rounds (now acquired by Kik) - are using it in our data gathering workflow.
  • Thorough testing and documentation for great good!

Getting Started

  1. Install Go, version should be at least 1.5.
  2. Clone this repository and download dependencies:
  3. Version v2: go get gopkg.in/kikinteractive/go-bqstreamer.v2
  4. Version v1: go get gopkg.in/kikinteractive/go-bqstreamer.v1
  5. Acquire Google OAuth2/JWT credentials, so you can authenticate with BigQuery.

How Does It Work?

There are two types of inserters you can use:

  1. SyncWorker, which is a single blocking (synchronous) worker.
  2. It enqueues rows and performs insert operations in a blocking manner.
  3. AsyncWorkerGroup, which employes multiple background SyncWorkers.
  4. The AsyncWorkerGroup enqueues rows, and its background workers pull and insert in a fan-out model.
  5. An insert operation is executed according to row amount or time thresholds for each background worker.
  6. Errors are reported to an error channel for processing by the user.
  7. This provides a higher insert throughput for larger scale scenarios.

Examples

Check the GoDoc examples section.

Contribute

  1. Please check the issues page.
  2. File new bugs and ask for improvements.
  3. Pull requests welcome!

Test

# Run unit tests and check coverage.
$ make test

# Run integration tests.
# This requires an active project, dataset and pem key.
$ export BQSTREAMER_PROJECT=my-project
$ export BQSTREAMER_DATASET=my-dataset
$ export BQSTREAMER_TABLE=my-table
$ export BQSTREAMER_KEY=my-key.json
$ make testintegration