Skip to content

Summary (mean, min, max) recorded values across a range of dates #726

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jan 27, 2022

Conversation

rebkwok
Copy link
Contributor

@rebkwok rebkwok commented Jan 26, 2022

Fixes #724
See issue for rationale for adding this functionality.

Extends mean_recorded_value to optionally cover the entire provided range, not just the most recent day of measurement, and adds the equivalent methods for min_recorded_valueandmax_recorded_value`

@rebkwok rebkwok changed the title Summary recorded values Summary (mean, min, max) recorded values across a range of dates Jan 26, 2022
Copy link
Contributor

@evansd evansd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't claim to have gone through this with a fine-toothed comb, but it does look correct to me and I can see it's being tested. It might benefit from someone else checking they're happy with the detail though.

Incidentally, it's great to have more people adding meaty features to cohortextractor v1 because I'm confident it will result it more people who are absolutely convinced we needed to do the databuilder rewrite 😄

@rebkwok
Copy link
Contributor Author

rebkwok commented Jan 27, 2022

Incidentally, it's great to have more people adding meaty features to cohortextractor v1 because I'm confident it will result it more people who are absolutely convinced we needed to do the databuilder rewrite smile

Absolutely :). In databuilder I'm pretty sure we'd just get this for free (when the aggregation methods are available).

Copy link
Member

@iaindillingham iaindillingham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left a few documentation-related nitpick comments.

Comment on lines 402 to 403
This creates a variable returning a float of the mean recorded creatine level
creatinine level over a 6 month period:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicated "creatine level"

include_measurement_date: a boolean indicating if an extra column, named `date_of_bmi`,
should be included in the output.
include_measurement_date: a boolean indicating if an extra column, named `<variable_name>_date_measured`,
should be included in the output. Can only be used if include_measurement_date is `True`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to reread the sentence "Can only be used..." several times, and I'm still unsure I've understood it: Are we saying that the only permitted value of the argument is True?

@@ -383,7 +384,7 @@ def mean_recorded_value(
This creates a variable `bp_sys` returning a float of the most recent systolic blood pressure from
the record within the time period. In the event of repeated measurements on the same day, these
are averaged. Patient who do not have this information
available do not return a value:
available do not return a value. The date of measurement is returned as `bp_sys_date_measured`:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may also wish to note that the date of measurement is to year-month resolution.

Comment on lines 405 to 414
creatinine=patients.mean_recorded_value(
creatinine_codes,
on_most_recent_day_of_measurement=False,
between=["2019-09-16", "2020-03-15"],
return_expectations={
"float": {"distribution": "normal", "mean": 150, "stddev": 200},
"date": {"earliest": "2019-09-16", "latest": "2020-03-15"},
"incidence": 0.75,
},
),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a researcher were to add include_measurement_date=True and date_format="YYYY-MM-DD, what date would be returned? (Similarly for month and year, especially when the range spans months and years.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this example, because on_most_recent_day_of_measurement=False, it would raise an exception (an AssertionError), because it doesn't make any sense to inclde the measurement date if the mean is taken over more than one day. I didn't add that to the docstring because there are various other places where cohortextractor just asserts that invalid kwargs aren't included, without explicitly documenting it. It'll fail immediately on a local run, with a message, so it should be obvious what's wrong.

If on_most_recent_day_of_measurement=True and include_measurement_date=True, it'll return the single date that was the most recent day of measurement. If the date format was YYYY, it'll just return that day with year, which won't necessarily be any use to anyone if the range is a number of months within the same year, but I guess that's up to the individual researcher.

There's also another way to get the date. If you leave out the include_measurement_date=True (or make it False), you can define another variable to get the date, by doing patients.date_of(creatinine, date_format="YYYY=MM-DD") (as we do in the tests), which also lets you give it a custom name. I'm guessing that this is the more recent way to do it, but I followed the documentation of the existing mean_recorded_value. Maybe it makes sense to include that option in the docstrings too.

Comment on lines +814 to +816
assert (
include_date_of_match is False
), "Can only include measurement date if on_most_recent_day_of_measurement is True"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answers my earlier question 🙂

@rebkwok rebkwok merged commit 80ae55c into main Jan 27, 2022
@rebkwok rebkwok deleted the summary-recorded-values branch January 27, 2022 16:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Mean/min/max methods for recorded value over a time period
3 participants