Skip to content
This repository has been archived by the owner on Aug 25, 2023. It is now read-only.

YACHT-1295: documented not supported use cases #136

Merged
merged 4 commits into from
May 8, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 44 additions & 0 deletions NOT_SUPPORTED_USE_CASES.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Known (rare) cases where BBQ backups may not work as expected

In case of some rare circumstances BBQ is not able to verify backup last update time. Such situation results in not creating most up to date backup or creating wrong backup.

## `lastModifiedTime` changed to past value when chunk of data is deleted

When part of BigQuery table/partition is deleted and deleted part was whole chunk of BigQuery internal storage, then `lastModifiedTime` is max value taken from other chunks which might be earlier than `lastModifiedTime` of deleted chunk.
marcin-kolda marked this conversation as resolved.
Show resolved Hide resolved
BBQ will not be able to pick up this delete operation as data change and will not create up to date backup .

#### Prerequisites
* Part of data needs to be deleted
* Deleted part is whole BigQuery internal chunk of data
* There are no further data changes in that partition

#### Result
* New backup is not created. BBQ stores previous version which includes deleted data

## Data stored in [__UNPARTITIONED__ partition](https://cloud.google.com/bigquery/docs/querying-partitioned-tables#ingestion-time_partitioned_tables_unpartitioned_partition) for a long time

When data is slowly streamed to BigQuery partitioned table, then that data might be moved from `__UNPARTITIONED__` partition into correct one after several hours. `lastModifiedTime` is set to time of streaming (which might be several hours ago), not when data was moved between partitions.
BBQ might not discover this "late-streamed" data and skip the backup.

#### Prerequisites
* Ingestion-time partitioned table
* Data is streamed to the BigQuery without specifying `partitionId`
* Ingestion is very slow so that `__UNPARTITIONED__` partition store data for more than 24 hours
* There are no further changes to the partition

#### Result
* Part of new data is not backed up

## Backing up empty partition

marcin-kolda marked this conversation as resolved.
Show resolved Hide resolved
Due to asynchronous nature of scheduling backup for table/partition it can happen that:
1. Source data is modified and backup is scheduled for given table/partition
1. Data is deleted manually (or by partition expiration) after copy-job is scheduled but before task execution
1. The newest version of backup is empty, the second newest has proper data.

#### Prerequisites
* Backup is scheduled for given table/partition, i.e. `lastModifiedTime` is modified
* Data is deleted manually (or by partition expiration) after copy-job is scheduled but before task execution

#### Result
* Backup for given table/partition (if exists) will be deleted after 7 months, as only the most recent empty backup will be retained
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,8 +57,9 @@ In such scenario we're not able to restore data using BigQuery build-in features
* [external data sources](https://cloud.google.com/bigquery/external-data-sources),
* Views (you can use [GCP Census](https://github.com/ocadotechnology/gcp-census) for that),
* Dataset/table labels as they are not copied by BigQuery copy job (again, you can use [GCP Census](https://github.com/ocadotechnology/gcp-census) for that)
* Empty partitioned tables without any partitions.
* Clustered partitioned table.
* Empty partitioned tables without any partitions,
* Clustered partitioned table,
* Tables in [very rare use cases](NOT_SUPPORTED_USE_CASES.md).

### Known caveats
* Modification of table metadata (including table description) qualifies table to be backed up at the next cycle. It can be a problem for partitioned tables, where such change updates last modified time in every partition. Then BBQ will backup all partitions again, even though there was no actually change in partition data,
Expand Down