Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add/expiration_table #290

Merged
merged 17 commits into from
Feb 7, 2023
Merged

Add/expiration_table #290

merged 17 commits into from
Feb 7, 2023

Conversation

eduwxyz
Copy link
Contributor

@eduwxyz eduwxyz commented Feb 2, 2023

Não precisamos ter algumas tabelas materializadas para sempre no bigquery. Para evitar isso, é necessário definir um tempo de expiração para essas mesmas tabelas, evitando o trabalho de ter que excluir manualmente.

@eduwxyz eduwxyz self-assigned this Feb 2, 2023
@eduwxyz eduwxyz marked this pull request as ready for review February 2, 2023 14:22
amora/models.py Outdated
@@ -128,6 +129,7 @@ class ModelConfig:
cluster_by: Optional[List[str]] = None
labels: Labels = dataclasses.field(default_factory=set)
owner: Optional[Owner] = None
expiration_table: Optional[datetime] = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A API do BigQuery espera um datetime como parâmetro, porém, se colocarmos um valor fixo (ex.: datetime(2023, 2, 3, 10, 0, 0)) esse valor vai ser fixo. Ou seja, a tabela vai expirar em datetime(2023, 2, 3, 10, 0, 0) e depois essa data vai ficar no passado, provavelmente fazendo falhar a chamada na API.

O mais correto seria como o dbt fez: o parâmetro é um delta em horas para o tempo de expiração e o valor final é computado somente na hora da chamada da API

Copy link
Contributor Author

@eduwxyz eduwxyz Feb 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No caso, só para confirmar se entendi.

Então o ideal é passarmos um parâmetro de hora, e na materialização enviar pra api o datetime.now + delta(hours)?

@@ -71,6 +74,7 @@ class TableModelByrange(AmoraModel):
cluster_by=["x", "y"],
labels={Label("freshness", "daily")},
description=uuid4().hex,
expiration_table=TABLE_EXPIRATION,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aqui é o exemplo do que falei em https://github.com/mundipagg/amora-data-build-tool/pull/290/files#r1095814416 : Esse valor sempre será o valor computado de datetime.utcnow() no momento que o bytecode desse módulo for gerado

@eduwxyz eduwxyz requested a review from thulio February 7, 2023 12:55
@eduwxyz eduwxyz marked this pull request as draft February 7, 2023 13:18
@eduwxyz eduwxyz marked this pull request as ready for review February 7, 2023 18:01


@patch("amora.materialization.Client", spec=Client)
def test_materialize_with_expiration_table_update(Client: MagicMock):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Esse cenário não vai acontecer com a implementação atual. A gente nunca atualiza uma tabela. Sempre deletamos, criamos e populamos em seguida

DATE_EXPIRATION = datetime.utcnow() + timedelta(hours=TABLE_EXPIRATION)

table = client.create_table.call_args.args[0]
assert table.expires.strftime("%Y/%m/%d %H:%M:%S") == DATE_EXPIRATION.strftime(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Um teste mais simples e menos frágil é checar que table.expires > datetime.utcnow()

Copy link
Contributor Author

@eduwxyz eduwxyz Feb 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ainda sim precisei passar o strftime, pois sem ele estava dando o erro: can't compare offset-naive and offset-aware datetimes

amora/models.py Outdated
@@ -128,6 +129,7 @@ class ModelConfig:
cluster_by: Optional[List[str]] = None
labels: Labels = dataclasses.field(default_factory=set)
owner: Optional[Owner] = None
expiration_table: Optional[int] = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
expiration_table: Optional[int] = None
hours_to_expire: Optional[int] = None

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tem que lembrar de atualizar a docstring acima

@eduwxyz eduwxyz requested a review from thulio February 7, 2023 19:03
@eduwxyz eduwxyz merged commit 250d822 into main Feb 7, 2023
@eduwxyz eduwxyz deleted the add/expiration_table branch February 7, 2023 19:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants