Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for ONS_CIS #773

Merged
merged 8 commits into from
Apr 7, 2022
Merged

Add support for ONS_CIS #773

merged 8 commits into from
Apr 7, 2022

Conversation

rebkwok
Copy link
Contributor

@rebkwok rebkwok commented Apr 5, 2022

Fixes #587

Adds patients_with_an_ons_cis_record method to the TPP backend.

Similar to ISARIC, this method allows returning any column from the dataset. There are multiple rows per patient (more or less unique by visit_id and visit_date), but as there are multiple date columns, we also have a date_filter_column that can be specified.

Many of the columns are coded - e.g. sex takes values of 1 or 2, where 1=male and 2=female. The keys are numeric, and mostly these columns are integer type, although a few are varchar. (but still numbers, stringified). Unless explicitly requested, we turn these codes into their string labels in the returned data. If the original codes are requested (with return_category_labels=False), we have to cast them to varchar, otherwise any integer-type values will return 0 when there's no data, which is often a valid code itself.

See also corresponding PRS:
restrict use of ONS_CIS dataset: opensafely-core/opensafely-cli#111
docs: opensafely/documentation#677

@rebkwok rebkwok changed the title Ons cis Add support for ONS_CIS Apr 5, 2022
@rebkwok rebkwok force-pushed the ons-cis branch 2 times, most recently from 920b741 to 3d8451d Compare April 6, 2022 08:24
@rebkwok rebkwok marked this pull request as ready for review April 6, 2022 09:14
@rebkwok
Copy link
Contributor Author

rebkwok commented Apr 6, 2022

Note this comment

There are rows that have the same visit ID for more than one patient. Where that happens, it tends to happen several times for same 2 patients across multiple different visits, which seems like it is valid, not just a data entry error. Possibly a form can be completed once for a household, not just for a single patient - waiting for confirmation from TPP

Copy link
Member

@iaindillingham iaindillingham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of typo suggestions, but that's all. Really clear to read and reason about 🙂

),
# Coded categories are converted to long-form string labels
# result_tdi is a coded category value; varchar type in the db
# raw values are stringified ints, return the long form category lable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# raw values are stringified ints, return the long form category lable
# raw values are stringified ints, return the long form category label

date_format="YYYY-MM-DD",
),
# country is a coded category value, int type in the db
# raw values are ints, return the long form category lable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# raw values are ints, return the long form category lable
# raw values are ints, return the long form category label

Patient(
ONS_CIS=[
ONS_CIS(
age_at_visit="20",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a string for testing some aspect of the method? We use ints elsewhere.

Suggested change
age_at_visit="20",
age_at_visit=20,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's an int in the raw table, will update it

country=2,
self_isolating=0,
),
# duplicate record ignored in number_of_matches_in_period counts
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for commenting here. It saved me a lot of head-scratching 🙂

Comment on lines +5996 to +5997
age_at_visit=[20, 30, 41],
age_at_visit_date=["2021-10-01", "2021-10-01", "2021-10-01"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for returning the dates here (and elsewhere). It made associating the fixtures with the results much easier. It's also a helpful reminder that visit_date is the default date column.

@rebkwok
Copy link
Contributor Author

rebkwok commented Apr 6, 2022

Thanks @iaindillingham - I'm having some second thoughts on whether it's sensible to determine the date_filter_column based on the returning value, and whether we should either always use the same default, or require users to explicitly specifiy which date_filter_column they want; added a comment to the issue, so I'll wait for feedback before merging this one

@rebkwok rebkwok merged commit 62328e5 into main Apr 7, 2022
@rebkwok rebkwok deleted the ons-cis branch April 7, 2022 12:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Developing backend functionality to analysis ONS CIS extract
2 participants