-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set serum_id
to lot_number
for CDC titer imports
#126
Comments
One small issue with changing the underlying data for Another issue is do we need to change over the Stepping back a bit, I think the current values of |
Good points! I hadn't thought about repopulating the database. I was thinking more about updating the mapping now so future imports (new records) use the lot number instead. Would that kind of change disrupt uploads of new data, though, because the uploads work from the full CDC database TSV and all older records would get an updated index? If so, would it be possible to do a one-time delete of everything from 2019-onward and a fresh upload that maps the lot number to the serum id column? Then subsequent updates wouldn't change the index, right? Surfacing The issue of recreating fauna from older data could be important to figure out eventually, if we really hope to deprecate fauna in favor of a cloud-based file store solution...but this probably isn't the place to discuss that undertaking. 😅 |
Yup, I can track down the date that we started using the new database dump and delete records in
Oh right! The I did a little more digging into the fauna/tdb code and the parse step automatically assigns the |
Never mind, just realized the start date of the new database dump doesn't matter because the data contains tests from 2019. Like you said, we can delete records based on |
Current Behavior
We currently ingest CDC titer data into fauna using the
sr_ferret
column as theserum_id
of each measurement.Expected behavior
However, the values in the
sr_ferret
column are not what the CDC uses to discuss these measurements, so reporting these values has no meaning. Instead, the CDC uses thesr_lot
column to discuss measurements. We currently map thesr_lot
column to a field in the tdb database calledlot_number
.Possible solution
One solution would be to set the
serum_id
to thesr_lot
column value in the CDC tdb upload script instead of using thesr_ferret
column. The simplest way to make this change might be to changesr_ferret
tosr_lot
in the mapping of columns for CDC data.Interestingly, there is some code in the upload script that effectively does this mapping of the
lot_number
to theserum_id
, but that mapping only happens when theserum_id
isn't already set.The text was updated successfully, but these errors were encountered: