Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add date expressions for first and last day of NHS financial year #723

Merged
merged 1 commit into from
Jan 24, 2022

Conversation

rebkwok
Copy link
Contributor

@rebkwok rebkwok commented Jan 24, 2022

Fixes #719

session.commit()
study = StudyDefinition(
index_date=index_date,
population=patients.with_these_clinical_events(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you tested against the EMIS backend, you tested the expressions with the context of population and event_date; the former because it's special and the latter because it's not, I assume. Why not do the same here?

I'm sure there's a good reason 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did the tpp tests first, and it made sense to just test against the patient ids that were returned. population on its own just gets you a set of patient_id in the results. I tried to do the same thing in the emis tests, but the emis backend generates patient ids (I confess I didn't look too closely at exactly what it does), so the patient ids in the results were always 0 and 1. The emis one is only that way to have some way to checking that the 2 patients it returns are the expected ones.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, of course! Yes, the EMIS backend handles patient IDs differently, making is much harder to test against just population. More here:

def truncate_patient_id(patient_id):
"""
EMIS patient IDs are 128 byte strings which significantly bloat our output
files. However, because they are actually hex-encoded SHA-512 hashes we can
get away with throwing away all but the first 63 bits and storing them as
int64s (63 bits so we can safely store them as signed int64s without
worrying about negative patient IDs, which might upset expectations
elsewhere). 63 bits still gives us plenty of entropy as sqrt(2^63) is about
2 billion so well above the threshold where we'd need to worry about
collisions with a population of a few tens of millions.
"""
return int(patient_id[:16], 16) >> 1

Copy link
Member

@iaindillingham iaindillingham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very clear! I feel I've learnt a lot more about cohort-extractor, just from looking at this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

New helper function: first/last_day_of_nhs_financial_year
2 participants