-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Surface Nextclade versions #467
Conversation
Creates one version JSON for each Nextclade TSV and one version JSON for the metadata TSV. Since the metadata just uses the Nextclade TSV columns directly, just add the `metadata_tsv_sha256sum` to the SARS-CoV-2 dataset version JSON. If we ever want to track data provenance by column, we will update the schema to include the 21L dataset version. The two Nextclade version JSONs will be used to check whether the workflow should use the existing cache. The metadata version JSON will be used to surface the version info to downstream users of the data.
"nextclade_version.json": f"data/{database}/nextclade_version.json", | ||
"nextclade_21L_version.json": f"data/{database}/nextclade_21L_version.json", | ||
"metadata_version.json": f"data/{database}/metadata_version.json", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Open to different S3 file names for the version JSONs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, @joverlee521! I like this approach of building separate versions JSONs per file and merging the main Nextclade version JSON into the metadata version JSON.
I don't have strong feelings about the names, either. These names seem fine. Some day we'll have nextclade_25A_version.json
or more than one additional file, but this approach is flexible enough to support that.
Assuming you tested locally, we could merge and see how it works on the next run?
Yup! I'll plan to merge tomorrow morning and monitor the automated runs. |
The public metadata version file is available at https://data.nextstrain.org/files/ncov/open/metadata_version.json |
Description of proposed changes
Creates one version JSON for each Nextclade TSV and one version JSON for the metadata TSV. Since the metadata just uses the Nextclade TSV columns directly, just add the
metadata_tsv_sha256sum
to the SARS-CoV-2 dataset version JSON. If we ever want to track data provenance by column, we will update the schema to include the 21L dataset version.The two Nextclade version JSONs will be used to check whether the workflow should use the existing cache. The metadata version JSON will be used to surface the version info to downstream users of the data.
Related issue(s)
Depends on #466
Resolves #458
Checklist