-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add/expiration_table #290
Add/expiration_table #290
Conversation
amora/models.py
Outdated
@@ -128,6 +129,7 @@ class ModelConfig: | |||
cluster_by: Optional[List[str]] = None | |||
labels: Labels = dataclasses.field(default_factory=set) | |||
owner: Optional[Owner] = None | |||
expiration_table: Optional[datetime] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A API do BigQuery espera um datetime
como parâmetro, porém, se colocarmos um valor fixo (ex.: datetime(2023, 2, 3, 10, 0, 0)
) esse valor vai ser fixo. Ou seja, a tabela vai expirar em datetime(2023, 2, 3, 10, 0, 0)
e depois essa data vai ficar no passado, provavelmente fazendo falhar a chamada na API.
O mais correto seria como o dbt fez: o parâmetro é um delta em horas para o tempo de expiração e o valor final é computado somente na hora da chamada da API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No caso, só para confirmar se entendi.
Então o ideal é passarmos um parâmetro de hora, e na materialização enviar pra api o datetime.now + delta(hours)?
tests/test_materialization.py
Outdated
@@ -71,6 +74,7 @@ class TableModelByrange(AmoraModel): | |||
cluster_by=["x", "y"], | |||
labels={Label("freshness", "daily")}, | |||
description=uuid4().hex, | |||
expiration_table=TABLE_EXPIRATION, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aqui é o exemplo do que falei em https://github.com/mundipagg/amora-data-build-tool/pull/290/files#r1095814416 : Esse valor sempre será o valor computado de datetime.utcnow()
no momento que o bytecode desse módulo for gerado
tests/test_materialization.py
Outdated
|
||
|
||
@patch("amora.materialization.Client", spec=Client) | ||
def test_materialize_with_expiration_table_update(Client: MagicMock): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Esse cenário não vai acontecer com a implementação atual. A gente nunca atualiza uma tabela. Sempre deletamos, criamos e populamos em seguida
tests/test_materialization.py
Outdated
DATE_EXPIRATION = datetime.utcnow() + timedelta(hours=TABLE_EXPIRATION) | ||
|
||
table = client.create_table.call_args.args[0] | ||
assert table.expires.strftime("%Y/%m/%d %H:%M:%S") == DATE_EXPIRATION.strftime( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Um teste mais simples e menos frágil é checar que table.expires > datetime.utcnow()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ainda sim precisei passar o strftime, pois sem ele estava dando o erro: can't compare offset-naive and offset-aware datetimes
amora/models.py
Outdated
@@ -128,6 +129,7 @@ class ModelConfig: | |||
cluster_by: Optional[List[str]] = None | |||
labels: Labels = dataclasses.field(default_factory=set) | |||
owner: Optional[Owner] = None | |||
expiration_table: Optional[int] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
expiration_table: Optional[int] = None | |
hours_to_expire: Optional[int] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tem que lembrar de atualizar a docstring acima
e56912e
to
170506e
Compare
Não precisamos ter algumas tabelas materializadas para sempre no bigquery. Para evitar isso, é necessário definir um tempo de expiração para essas mesmas tabelas, evitando o trabalho de ter que excluir manualmente.