Skip to content

Prepare data#3

Merged
trvrb merged 3 commits into
mainfrom
prepare-data
Mar 4, 2025
Merged

Prepare data#3
trvrb merged 3 commits into
mainfrom
prepare-data

Conversation

@trvrb
Copy link
Copy Markdown
Member

@trvrb trvrb commented Feb 25, 2025

A small fix to update metadata and a larger change to prepare sequence counts. This takes a more nuanced approach to preparing sequence count data (borrowing from forecasts-ncov). This subsets sequence counts between min_date and max_date (relative or absolute) and only includes locations with at least location_min_seq in this time period. It also collapses variants to "other" with less than clade_min_seq.

This logic takes the place of the previous threshold and the previous plotting aids loc_lst and var_lst. I didn't like how there were a bunch of countries with sparse data that were informing the hierarchical estimates, but we'd never see their frequencies or growth advantages. I'd prefer to have everything on the table that's used in the final analysis.

This takes a more nuanced approach to preparing sequence count data (borrowing from forecasts-ncov). This subsets sequence counts between min_date and max_date (relative or absolute) and only includes locations with at least location_min_seq in this time period. It also collapses variants to "other" with less than clade_min_seq.

This logic takes the place of the previous "threshold" and the previous plotting aids loc_lst and var_lst. I didn't like how there were a bunch of countries with sparse data that were informing the hierarchical estimates, but we'd never see their frequencies or growth advantages. I'd prefer to have everything on the table that's used in the final analysis.
@trvrb trvrb requested a review from huddlej February 25, 2025 22:19
Copy link
Copy Markdown
Contributor

@huddlej huddlej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @trvrb! @plsteinberg and I had just chatted recently about whether to adopt this prepare_data.py script from forecasts-ncov or not. We opted not to in the other repo, just because the full script has more features than we need, but the simpler version here makes sense.

It is much nicer to know all data that went into the model appear in the figures!

I only had a minor comment about date filtering below. I can implement that change, if you agree, or you can if you're in the zone with this work... :D

Comment thread Snakefile
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants