Skip to content

Conversation

@jklukas
Copy link
Contributor

@jklukas jklukas commented May 3, 2019

This is exploring a potential new pattern of generating a static table based on a bash script that runs bq mk and bq load commands to load CSVs to BQ.

I am putting these in the static dataset, but we don't have the same permissions there for redash, etc. to be able to reach them. We could simply put them in telemetry for now, or we could use this as an excuse to question how we want to lay out datasets again.

@jklukas jklukas requested a review from relud May 3, 2019 19:06
Copy link
Collaborator

@relud relud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm only if we make table updates atomic wrt rows (descriptions being separate is fine).

might be worth considering making these sql files that do a create or replace table ... as, and then we wouldn't need the bash, schema would be inline, and description updates would be in the same operation

--source_format=CSV \
--schema=<(echo "$COUNTRY_CODES_SCHEMA") \
moz-fx-data-derived-datasets:static.country_names_v1 \
country_codes.csv
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should do this as a single operation, so that it is atomic and won't cause interruptions

Suggested change
country_codes.csv
<(cat country_codes.csv && sed -E '1d;s/(.*),(.*)/\2,\1/' country_names_alternate.csv)

or after switching the columns in country_names_alternate.csv

Suggested change
country_codes.csv
<(cat country_codes.csv && sed 1d country_names_alternate.csv)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out bq load doesn't like getting a named pipe for the final argument, so had to materialize this to a file.

@jklukas jklukas force-pushed the country-tables branch 2 times, most recently from cd74330 to c145d92 Compare May 6, 2019 20:04
@jklukas
Copy link
Contributor Author

jklukas commented May 6, 2019

I just need to decide what to do about permissions here before merging. We either need to ensure all users have read access to the static dataset, or we need to move this out of static to telemetry.

@relud
Copy link
Collaborator

relud commented May 6, 2019

i'm in favor of adjusting permissions rather than moving the dataset, fwiw

@jklukas
Copy link
Contributor Author

jklukas commented May 6, 2019

See https://bugzilla.mozilla.org/show_bug.cgi?id=1549573 for discussion of permissions

@jklukas jklukas changed the title Add static country_code and country_name tables Blocked: Add static country_code and country_name tables May 7, 2019
@jklukas
Copy link
Contributor Author

jklukas commented May 7, 2019

This may be on hold for a while in order to hammer out permissions on the static dataset.

@jklukas jklukas changed the title Blocked: Add static country_code and country_name tables Add static country_code and country_name tables May 8, 2019
@jklukas jklukas merged commit 3ba5b0f into master May 8, 2019
@jklukas jklukas deleted the country-tables branch May 8, 2019 14:23
quiiver pushed a commit that referenced this pull request Jun 25, 2024
The Airflow task [did not succeed](https://workflow.telemetry.mozilla.org/log?dag_id=kpi_forecasting&task_id=kpi_forecasting_desktop_non_cumulative&execution_date=2023-05-06T04%3A00%3A00%2B00%3A00) after recent PRs. This appears to be due to the type of the `predictions["ds"]`; the current column type is `DATETIME`, but the BQ schema uses `TIMESTAMP`. From the Airflow logs:
```
Provided Schema does not match Table moz-fx-data-shared-prod:telemetry_derived.kpi_automated_forecast_v1. Field ds has changed type from TIMESTAMP to DATETIME
```

This type change was not intentionally made in the previous PRs, but is likely a result of updating the `prophet` package. This PR forces `predictions["ds"]` to be a `TIMESTAMP` at the time of db write.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants