Prepare data by trvrb · Pull Request #3 · nextstrain/forecasts-flu

trvrb · 2025-02-25T22:19:37Z

A small fix to update metadata and a larger change to prepare sequence counts. This takes a more nuanced approach to preparing sequence count data (borrowing from forecasts-ncov). This subsets sequence counts between min_date and max_date (relative or absolute) and only includes locations with at least location_min_seq in this time period. It also collapses variants to "other" with less than clade_min_seq.

This logic takes the place of the previous threshold and the previous plotting aids loc_lst and var_lst. I didn't like how there were a bunch of countries with sparse data that were informing the hierarchical estimates, but we'd never see their frequencies or growth advantages. I'd prefer to have everything on the table that's used in the final analysis.

This takes a more nuanced approach to preparing sequence count data (borrowing from forecasts-ncov). This subsets sequence counts between min_date and max_date (relative or absolute) and only includes locations with at least location_min_seq in this time period. It also collapses variants to "other" with less than clade_min_seq. This logic takes the place of the previous "threshold" and the previous plotting aids loc_lst and var_lst. I didn't like how there were a bunch of countries with sparse data that were informing the hierarchical estimates, but we'd never see their frequencies or growth advantages. I'd prefer to have everything on the table that's used in the final analysis.

huddlej

Thanks, @trvrb! @plsteinberg and I had just chatted recently about whether to adopt this prepare_data.py script from forecasts-ncov or not. We opted not to in the other repo, just because the full script has more features than we need, but the simpler version here makes sense.

It is much nicer to know all data that went into the model appear in the figures!

I only had a minor comment about date filtering below. I can implement that change, if you agree, or you can if you're in the zone with this work... :D

trvrb added 3 commits February 25, 2025 22:10

Fix typo in H3N2 mapping

fdd979d

Specify clade column in update metadata

7ed0529

trvrb requested a review from huddlej February 25, 2025 22:19

huddlej reviewed Feb 25, 2025

View reviewed changes

Comment thread Snakefile

trvrb merged commit 57cb748 into main Mar 4, 2025

trvrb deleted the prepare-data branch March 4, 2025 16:21

This was referenced Mar 4, 2025

Add script to help users prepare data for model fitting blab/evofr#48

Closed

Circulation windows may cause issues with inference #6

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prepare data#3

Prepare data#3
trvrb merged 3 commits into
mainfrom
prepare-data

trvrb commented Feb 25, 2025

Uh oh!

huddlej left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

trvrb commented Feb 25, 2025

Uh oh!

huddlej left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants