-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
serum_passage_category should be set to "egg" instead of "cell" for CDC human pool data like "L21/22 H3-EGG HUMAN POOL" #129
Comments
@joverlee521 Maybe we can work on this together? It seems like a good opportunity for me to learn more about fauna's internal workings... |
Here's the current parsing of the serum passage category for CDC titers:
We can special case the human pool titers and use the |
Thank you for laying out the steps so clearly, @joverlee521! Special casing the human pool titers sounds reasonable. Would that logic live in the |
Hmm, I'm a little hesitant to make diff --git a/tdb/cdc_upload.py b/tdb/cdc_upload.py
index 3a007c2..7aa6b3d 100644
--- a/tdb/cdc_upload.py
+++ b/tdb/cdc_upload.py
@@ -72,6 +72,7 @@ class cdc_upload(upload):
self.test_virus_strains.add(meas['virus_strain'])
if "Human" in meas['serum_id']:
meas['serum_host'] = 'human'
+ self.format_passage(meas, 'serum_id', 'serum_passage_category')
self.rethink_io.check_optional_attributes(meas, self.optional_fields)
self.remove_fields(meas)
if len(self.new_different_date_format) > 0: |
I know what you mean! That function is among the hairier I've seen in this repo. If we start getting human data from other CCs, though, would you want to encode the human-specific parsing in each respective upload script? Or just refactor any shared parsing logic into a new function when we need to? |
Yup, I would want to keep the human-specific parsing in each respective upload script because I'm expecting each CC to provide them in different formats...If there's any parsing logic that can be shared then we can refactor into a new function. |
Sounds good to me! |
Current Behavior
Human pool titers represent measurements for people vaccinated with either cell-passaged or egg-passaged vaccine strains. Data from the CDC represent this passage status with names like
L21/22 H3-EGG HUMAN POOL
in the serum id. Egg-passaged data appear in the cell-passaged downloads from fauna, however. For example, the following command returns a list of egg-passaged data for H3N2:Expected behavior
These egg-passaged data should only appear in the corresponding egg-passaged titer file (e.g.,
data/h3n2/who_egg_fra_titers.tsv
for the example above). Theserum_passage_category
of these records should be set toegg
instead ofcell
.Possible solution
We may need to check each measurement's serum id for the appearance of "egg" and override the inferred serum passage status based on what we find. For example, similar logic already exists to set the "host" for each measurement based. There might be a cleaner fauna-style way to implement this check though.
The text was updated successfully, but these errors were encountered: