You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, bigDiffy does not seem to support comparing natively partitioned BigQuery tables. I ran bigDiffy with $<partition>-decorated BQ tables on the rhs and lhs. The $partition argument was not retained in the arguments to the Dataflow job.
The job did not read any input records (according to the job graph) and failed after 15 minutes with NullPointerException messages (not 100% sure if the null pointers are related to the native partitioning).
@catherinejelder my instinct is to set this up to work automatically for inputs with a $partition decorator without specifying any additional arguments.
the updates should be incorporated here. I believe this method is not currently tested. is that accurate? or am I missing it somewhere? just want to adhere to existing approach if there is a test for this method currently.
After going down the rabbit hole on the previous PR, I concluded that there were too many cases to cover to allow the user to simply feed in a $partition decorator on a natively partitioned table. We would need to determine what type of partitioning is implemented (date or integer), the size of the ranges (hourly, daily, monthly, some sort of integer range), and the field the table is partitioned on, before constructing a rowRestriction that selects the user's desired partition.
Much simpler is #489, adding an optional rowRestriction parameter in the CLI. Might be a nice bonus that users could use this in other ways as well, such as diffing only on "US" data to reduce the size of the data processed.
rowRestriction and the storage api seem very useful, thanks! I left a couple comments and then will probably ask another blizzard to take a look since I don't work in this repo very often
Currently,
bigDiffy
does not seem to support comparing natively partitioned BigQuery tables. I ranbigDiffy
with$<partition>
-decorated BQ tables on the rhs and lhs. The$partition
argument was not retained in the arguments to the Dataflow job.The job did not read any input records (according to the job graph) and failed after 15 minutes with
NullPointerException
messages (not 100% sure if the null pointers are related to the native partitioning).FYI @catherinejelder
The text was updated successfully, but these errors were encountered: