Conversation
Building on the GCS support added in the previous commit, this adds support for loading datasets from GCS into Bigquery & running queries to generate other tables. There's no export function, but it would be trivial to add one (and I may in the very near future). The docs for that are at https://cloud.google.com/bigquery/exporting-data-from-bigquery . A word of warning - the tests take FOREVER to run (the simple load data test alone task 120s!). I'm not sure how to make this better given there's no mocks for the BQ api (and moreover there's a ton of parameters that need checking).
|
Cool! :D Given how often a You could create one Table specifier with |
|
Hmm I see your point. On the other hand The big eyesore is probably copy() with its 6 params. Maybe I could convert that to a task that takes 2 BigqueryTargets (which are effectively, the namedtuple you describe)? Do you think that's closer to what you're looking for or should I make something like a BQDataset & BQTable? Separately, the internal version of this code actually just implemented a FileSystem interface over bigquery in the form of |
|
Hmm, yea, I think I stared over Also, can you tell me more about the "FileSystem interface over bigquery in the form of The |
There was a problem hiding this comment.
Is this syntax valid in Python 3?
There was a problem hiding this comment.
Yes, it turns out. The tests pass on python3 for me.
|
I vote for "BQDataset & BQTable". I think those are semantically distinct objects. |
|
Just so we're on the same page. You intend that BQDataset is a |
|
Updated to use tuples (and fixed docstrings). I left BigqueryTarget as a non-tuple, since I'm assuming in most cases folks won't care about BQClient internals and just want to run a load/query job. |
|
As for the URI approach - yea in our codebase we have a central "get (flag) target by uri" function that understands s3:// mock:// gs:// file:// and bq://. It comes in very useful when writing unit tests (e.g. to redirect output to mock://). Having some kind of system for doing that within luigi would be nice, but certainly it's not difficult to roll your own right now. |
|
friendly ping :) |
|
Oh super sorry! I'll try to look at this tomorrow! |
There was a problem hiding this comment.
Oh I didn't you could inherit like that. Nice trick!
Building on the GCS support added in the previous commit, this adds support for loading datasets from GCS into
Bigquery & running queries to generate other tables.
There's no export function, but it would be trivial to add one (and I may in the very near future). The docs for
that are at https://cloud.google.com/bigquery/exporting-data-from-bigquery .
A word of warning - the tests take FOREVER to run (the simple load data test alone task 120s!). I'm not sure how to
make this better given there's no mocks for the BQ api (and moreover there's a ton of parameters that need checking).