-
Notifications
You must be signed in to change notification settings - Fork 13
Summary (mean, min, max) recorded values across a range of dates #726
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I won't claim to have gone through this with a fine-toothed comb, but it does look correct to me and I can see it's being tested. It might benefit from someone else checking they're happy with the detail though.
Incidentally, it's great to have more people adding meaty features to cohortextractor v1 because I'm confident it will result it more people who are absolutely convinced we needed to do the databuilder rewrite 😄
Absolutely :). In databuilder I'm pretty sure we'd just get this for free (when the aggregation methods are available). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left a few documentation-related nitpick comments.
cohortextractor/patients.py
Outdated
This creates a variable returning a float of the mean recorded creatine level | ||
creatinine level over a 6 month period: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicated "creatine level"
cohortextractor/patients.py
Outdated
include_measurement_date: a boolean indicating if an extra column, named `date_of_bmi`, | ||
should be included in the output. | ||
include_measurement_date: a boolean indicating if an extra column, named `<variable_name>_date_measured`, | ||
should be included in the output. Can only be used if include_measurement_date is `True`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to reread the sentence "Can only be used..." several times, and I'm still unsure I've understood it: Are we saying that the only permitted value of the argument is True
?
cohortextractor/patients.py
Outdated
@@ -383,7 +384,7 @@ def mean_recorded_value( | |||
This creates a variable `bp_sys` returning a float of the most recent systolic blood pressure from | |||
the record within the time period. In the event of repeated measurements on the same day, these | |||
are averaged. Patient who do not have this information | |||
available do not return a value: | |||
available do not return a value. The date of measurement is returned as `bp_sys_date_measured`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may also wish to note that the date of measurement is to year-month resolution.
cohortextractor/patients.py
Outdated
creatinine=patients.mean_recorded_value( | ||
creatinine_codes, | ||
on_most_recent_day_of_measurement=False, | ||
between=["2019-09-16", "2020-03-15"], | ||
return_expectations={ | ||
"float": {"distribution": "normal", "mean": 150, "stddev": 200}, | ||
"date": {"earliest": "2019-09-16", "latest": "2020-03-15"}, | ||
"incidence": 0.75, | ||
}, | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a researcher were to add include_measurement_date=True
and date_format="YYYY-MM-DD
, what date would be returned? (Similarly for month and year, especially when the range spans months and years.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this example, because on_most_recent_day_of_measurement=False
, it would raise an exception (an AssertionError), because it doesn't make any sense to inclde the measurement date if the mean is taken over more than one day. I didn't add that to the docstring because there are various other places where cohortextractor just asserts that invalid kwargs aren't included, without explicitly documenting it. It'll fail immediately on a local run, with a message, so it should be obvious what's wrong.
If on_most_recent_day_of_measurement=True
and include_measurement_date=True
, it'll return the single date that was the most recent day of measurement. If the date format was YYYY
, it'll just return that day with year, which won't necessarily be any use to anyone if the range is a number of months within the same year, but I guess that's up to the individual researcher.
There's also another way to get the date. If you leave out the include_measurement_date=True
(or make it False), you can define another variable to get the date, by doing patients.date_of(creatinine, date_format="YYYY=MM-DD")
(as we do in the tests), which also lets you give it a custom name. I'm guessing that this is the more recent way to do it, but I followed the documentation of the existing mean_recorded_value
. Maybe it makes sense to include that option in the docstrings too.
assert ( | ||
include_date_of_match is False | ||
), "Can only include measurement date if on_most_recent_day_of_measurement is True" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Answers my earlier question 🙂
Fixes #724
See issue for rationale for adding this functionality.
Extends
mean_recorded_value
to optionally cover the entire provided range, not just the most recent day of measurement, and adds the equivalent methods for min_recorded_valueand
max_recorded_value`