Skip to content
This repository has been archived by the owner on Aug 25, 2023. It is now read-only.

Commit

Permalink
Update NOT_SUPPORTED_USE_CASES.md
Browse files Browse the repository at this point in the history
  • Loading branch information
marcin-kolda authored May 7, 2019
1 parent 09ec7e1 commit 5366106
Showing 1 changed file with 9 additions and 7 deletions.
16 changes: 9 additions & 7 deletions NOT_SUPPORTED_USE_CASES.md
Original file line number Diff line number Diff line change
@@ -1,42 +1,44 @@
# Rare cases in which backups are not supported by BBQ
# Known (rare) cases where BBQ backups may not work as expected

In case of some rare circumstances BBQ is not able to verify backup last update time. Such situation results in not creating most up to date backup or creating wrong backup.

## `lastModifiedTime` changed to past value when chunk of data is deleted

When part of BigQuery table/partition is deleted and deleted part was whole chunk of BigQuery internal storage, then `lastModifiedTime` is max value taken from other chunks which might be earlier than `lastModifiedTime` of deleted chunk.
BBQ will not be able to pick up this delete operation as data change and will not create up to date backup .

#### Prerequisites
* Part of data needs to be deleted
* Deleted part is whole BigQuery internal chunk of data
* There is no further changes to the partition
* There are no further data changes in that partition

#### Result
* New backup is not created. BBQ stores previous version which includes deleted data

## Data stored in [__UNPARTITIONED__ partition](https://cloud.google.com/bigquery/docs/querying-partitioned-tables#ingestion-time_partitioned_tables_unpartitioned_partition) for a long time

When data is slowly streamed to BigQuery partitioned table, then that data might be moved from `__UNPARTITIONED__` partition into correct one after several hours. `lastModifiedTime` is set to time of streaming (which might be several hours ago), not when data was moved between partitions. This results in not backing up new part of data, because BBQ looks at time of last backup.
When data is slowly streamed to BigQuery partitioned table, then that data might be moved from `__UNPARTITIONED__` partition into correct one after several hours. `lastModifiedTime` is set to time of streaming (which might be several hours ago), not when data was moved between partitions.
BBQ might not discover this "late-streamed" data and skip the backup.

#### Prerequisites
* Ingestion-time partitioned table
* Data is streamed to the BigQuery without specifying `partitionId`
* Ingestion is very slow so that `__UNPARTITIONED__` partition store data for more than 24 hours
* There is no further changes to the partition
* There are no further changes to the partition

#### Result
* Part of new data is not backed up

## Backing up empty partition

Due to asynchronous nature of scheduling backup for table/partition it is possible that:
Due to asynchronous nature of scheduling backup for table/partition it can happen that:
1. Source data is modified and backup is scheduled for given table/partition
1. Between scheduling copy-job task and this task execution, the data is deleted manually or by partition expiration
1. Data is deleted manually (or by partition expiration) after copy-job is scheduled but before task execution
1. The newest version of backup is empty, the second newest has proper data.

#### Prerequisites
* Backup is scheduled for given table/partition, i.e. `lastModifiedTime` is modified
* Between scheduling copy-job task and this task execution, the data is deleted manually or by partition expiration
* Data is deleted manually (or by partition expiration) after copy-job is scheduled but before task execution

#### Result
* Backup for given table/partition (if exists) will be deleted after 7 months, as only the most recent empty backup will be retained

0 comments on commit 5366106

Please sign in to comment.