Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gcp/bigquery support #279

Closed
fippo opened this issue Nov 16, 2019 · 5 comments
Closed

gcp/bigquery support #279

fippo opened this issue Nov 16, 2019 · 5 comments

Comments

@fippo
Copy link
Owner

fippo commented Nov 16, 2019

https://cloud.google.com/bigquery/streaming-data-into-bigquery#bigquery_table_insert_rows-nodejs
https://cloud.google.com/bigquery-transfer/docs/redshift-migration (the features-v2 schema should just work)
cc @juandebravo @jbgwsr

@fippo
Copy link
Owner Author

fippo commented Nov 16, 2019

note: https://stackoverflow.com/questions/50470044/syntax-error-expected-in-bigquery -- no varchar, no bigint, no real. easy to search-replace in features-v2.sql

@fippo
Copy link
Owner Author

fippo commented Dec 8, 2019

https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-json -- this seems like the best way to import data, quite similar to how kinesis is writing to s3 and that gets loaded into redshift

@fippo
Copy link
Owner Author

fippo commented Dec 8, 2019

so the UI for that is a bit weird... one can load into existing tables by creating a new table.
It looks like there is an issue with booean fields. redshift is happy to take 0/1 values but bigquery insists on boolean values. That will require making sure all features flagged as boolean return actual boolean values...

@fippo
Copy link
Owner Author

fippo commented Dec 8, 2019

ah no... extract.js converts booleans:

    if (feature === false) feature = 0;
    if (feature === true) feature = 1;

I wonder if that is legacy or actually required for redshift. If it is then this should be moved to database.js

fippo added a commit that referenced this issue Dec 8, 2019
ref #279
do not convert booleans to 0/1. This was done for redshift but breaks bigquery when trying to load files.
fippo added a commit that referenced this issue Dec 8, 2019
ref #279
do not convert booleans to 0/1. This was done for redshift but breaks bigquery when trying to load files.
fippo added a commit that referenced this issue Dec 11, 2019
Ref #279
adds support for uploading to a GCP bucket instead of S3.
GCP automatically supports  gzip-ing. This was tested similar to #296
by uploading a single-byte file which results in a 22 byte gzip file in storage.
@fippo fippo changed the title gcp-bigquery support gcp/bigquery support Dec 11, 2019
fippo added a commit that referenced this issue Dec 11, 2019
ref #279
Adds stub bigquery support
@fippo
Copy link
Owner Author

fippo commented Dec 11, 2019

I think the best way to load it is to buffer 100 entries (or 60 seconds) or so as a line-json file and then use load() https://cloud.google.com/bigquery/docs/loading-data-local?hl=en#loading_data_from_a_local_data_source

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant