Skip to content

CLI Tool written in python that allows to migrate your data from mysql to google bigquery.

Notifications You must be signed in to change notification settings

inkrement/MySQLbq

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Readme

This is a CLI tool (written in Python) that allows you to load your data from Mysql to Google BigQuery. Although, I tried to optimize it for speed, there are some TODOs left. The script is based on Ryan's import script.

Requirements

  • libmysqlclient-dev (on ubuntu just run: sudo apt install libmysqlclient-dev)
  • Google Cloud SDK
  • Python

Install Google Cloud SDK

The following steps are based on the official Google Dokumentation:

export CLOUD_SDK_REPO="cloud-sdk-$(lsb_release -c -s)"
echo "deb https://packages.cloud.google.com/apt $CLOUD_SDK_REPO main" | sudo tee -a /etc/apt/sources.list.d/google-cloud-sdk.list
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -

sudo apt-get update && sudo apt-get install google-cloud-sdk
sudo apt-get install google-cloud-sdk-app-engine-python

To configure it, run:

gcloud init

To authenticate your app create a service account key, name it google_key.json and place it in this folder.

Install dependencies

I would recommend to use a virtualenv. Run the following command to install all dependencies using pip:

pip install -r requirements.txt

Pypy

I successfully used it with pypy.

sudo apt install pypy pypy-dev
virtualenv -p pypy ve

Benchmarks

Small table with 14719 rows

cpython pypy
real 0m6.964s 0m9.186s
user 0m0.888s 0m2.500s
sys 0m0.152s 0m0.244s

1M row subset of Text-Table (22 040 759 rows, ~ 2,6 GiB)

3000 5000 10000
real 5m8.826s 5m28.480s 5m27.202s
user 0m28.468s 0m32.820s 0m36.176s
sys 0m2.728s 0m3.724s 0m4.548s

Usage

Usage: run.py [OPTIONS]

Options:
  -h, --host TEXT           MySQL hostname
  -d, --database TEXT       MySQL database  [required]
  -u, --user TEXT           MySQL user
  -p, --password TEXT       MySQL password
  -t, --table TEXT          MySQL table  [required]
  -i, --projectid TEXT      Google BigQuery Project ID  [required]
  -n, --dataset TEXT        Google BigQuery Dataset name  [required]
  -l, --limit INTEGER       max num of rows to load
  -s, --batch_size INTEGER  max num of rows to load
  --help                    Show this message and exit.

TODO

  • use async load?
  • adjust bucket-size
  • add support for other data types

About

CLI Tool written in python that allows to migrate your data from mysql to google bigquery.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages