Skip to content
This repository has been archived by the owner on Aug 25, 2023. It is now read-only.

YACHT-1295: documented not supported use cases #136

Merged
merged 4 commits into from
May 8, 2019

Conversation

marcin-kolda
Copy link
Member

No description provided.

@coveralls
Copy link

Pull Request Test Coverage Report for Build 1100

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 84.234%

Totals Coverage Status
Change from base Build 1098: 0.0%
Covered Lines: 2415
Relevant Lines: 2867

💛 - Coveralls

@coveralls
Copy link

coveralls commented Apr 4, 2019

Pull Request Test Coverage Report for Build 1184

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 22 unchanged lines in 2 files lost coverage.
  • Overall coverage increased (+0.06%) to 84.295%

Files with Coverage Reduction New Missed Lines %
src/backup/default_backup_predicate.py 2 94.44%
src/commons/big_query/big_query.py 20 76.74%
Totals Coverage Status
Change from base Build 1098: 0.06%
Covered Lines: 2410
Relevant Lines: 2859

💛 - Coveralls

NOT_SUPPORTED_USE_CASES.md Outdated Show resolved Hide resolved
NOT_SUPPORTED_USE_CASES.md Outdated Show resolved Hide resolved
NOT_SUPPORTED_USE_CASES.md Outdated Show resolved Hide resolved
NOT_SUPPORTED_USE_CASES.md Outdated Show resolved Hide resolved
NOT_SUPPORTED_USE_CASES.md Outdated Show resolved Hide resolved
radkomateusz
radkomateusz previously approved these changes May 6, 2019
@@ -0,0 +1,42 @@
# Rare cases in which backups are not supported by BBQ
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Known (rare) cases where BBQ backups may not work as expected

NOT_SUPPORTED_USE_CASES.md Show resolved Hide resolved
#### Prerequisites
* Part of data needs to be deleted
* Deleted part is whole BigQuery internal chunk of data
* There is no further changes to the partition
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no further data changes in that partition


## Data stored in [__UNPARTITIONED__ partition](https://cloud.google.com/bigquery/docs/querying-partitioned-tables#ingestion-time_partitioned_tables_unpartitioned_partition) for a long time

When data is slowly streamed to BigQuery partitioned table, then that data might be moved from `__UNPARTITIONED__` partition into correct one after several hours. `lastModifiedTime` is set to time of streaming (which might be several hours ago), not when data was moved between partitions. This results in not backing up new part of data, because BBQ looks at time of last backup.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace:

This results in not backing up new part of data, because BBQ looks at time of last backup.
with:
BBQ might not discover this "late-streamed" data and skip the backup.

* Ingestion-time partitioned table
* Data is streamed to the BigQuery without specifying `partitionId`
* Ingestion is very slow so that `__UNPARTITIONED__` partition store data for more than 24 hours
* There is no further changes to the partition
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<There are no further changes to the partition


## Backing up empty partition

Due to asynchronous nature of scheduling backup for table/partition it is possible that:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... it can happen that:


Due to asynchronous nature of scheduling backup for table/partition it is possible that:
1. Source data is modified and backup is scheduled for given table/partition
1. Between scheduling copy-job task and this task execution, the data is deleted manually or by partition expiration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Data is deleted manually (or by partition expiration) after copy-job is scheduled but before task execution


#### Prerequisites
* Backup is scheduled for given table/partition, i.e. `lastModifiedTime` is modified
* Between scheduling copy-job task and this task execution, the data is deleted manually or by partition expiration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Data is deleted manually (or by partition expiration) after copy-job is scheduled but before task execution

@MZatorski MZatorski merged commit b07d9a3 into master May 8, 2019
@MZatorski MZatorski deleted the YACHT-1295_document_not_supported_use_cases branch May 8, 2019 07:51
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants