-
Notifications
You must be signed in to change notification settings - Fork 124
Add static country_code and country_name tables #116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
relud
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm only if we make table updates atomic wrt rows (descriptions being separate is fine).
might be worth considering making these sql files that do a create or replace table ... as, and then we wouldn't need the bash, schema would be inline, and description updates would be in the same operation
sql/static/country_codes.sh
Outdated
| --source_format=CSV \ | ||
| --schema=<(echo "$COUNTRY_CODES_SCHEMA") \ | ||
| moz-fx-data-derived-datasets:static.country_names_v1 \ | ||
| country_codes.csv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should do this as a single operation, so that it is atomic and won't cause interruptions
| country_codes.csv | |
| <(cat country_codes.csv && sed -E '1d;s/(.*),(.*)/\2,\1/' country_names_alternate.csv) |
or after switching the columns in country_names_alternate.csv
| country_codes.csv | |
| <(cat country_codes.csv && sed 1d country_names_alternate.csv) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out bq load doesn't like getting a named pipe for the final argument, so had to materialize this to a file.
cd74330 to
c145d92
Compare
|
I just need to decide what to do about permissions here before merging. We either need to ensure all users have read access to the |
|
i'm in favor of adjusting permissions rather than moving the dataset, fwiw |
|
See https://bugzilla.mozilla.org/show_bug.cgi?id=1549573 for discussion of permissions |
|
This may be on hold for a while in order to hammer out permissions on the |
The Airflow task [did not succeed](https://workflow.telemetry.mozilla.org/log?dag_id=kpi_forecasting&task_id=kpi_forecasting_desktop_non_cumulative&execution_date=2023-05-06T04%3A00%3A00%2B00%3A00) after recent PRs. This appears to be due to the type of the `predictions["ds"]`; the current column type is `DATETIME`, but the BQ schema uses `TIMESTAMP`. From the Airflow logs: ``` Provided Schema does not match Table moz-fx-data-shared-prod:telemetry_derived.kpi_automated_forecast_v1. Field ds has changed type from TIMESTAMP to DATETIME ``` This type change was not intentionally made in the previous PRs, but is likely a result of updating the `prophet` package. This PR forces `predictions["ds"]` to be a `TIMESTAMP` at the time of db write.
This is exploring a potential new pattern of generating a static table based on a bash script that runs
bq mkandbq loadcommands to load CSVs to BQ.I am putting these in the
staticdataset, but we don't have the same permissions there for redash, etc. to be able to reach them. We could simply put them intelemetryfor now, or we could use this as an excuse to question how we want to lay out datasets again.