Skip to content

Add support for ICD10MultiCodeString type#2274

Merged
rw251 merged 12 commits into
mainfrom
rw/list-of-icd10-codes-as-string
Dec 10, 2024
Merged

Add support for ICD10MultiCodeString type#2274
rw251 merged 12 commits into
mainfrom
rw/list-of-icd10-codes-as-string

Conversation

@rw251

@rw251 rw251 commented Dec 4, 2024

Copy link
Copy Markdown
Contributor

fixes #2010

Notes

  • I've currently just implemented this for the all_diagnoses field of apcs. If successful I will then also implement the all_procedures field as an OPCS4MultiCodeString.
  • We don't want people to do something like all_diagnoses == "N170", which would run previously (but return no results). I'm overriding __eq__ (and __ne__) so that the error is more helpful. I've also updated the docstrings so it's clear that it doesn't make sense to use these operators.
  • There are other methods that are inherited which don't make sense for this field type e.g. is_in, map_values, and count_distinct_for_patients. They all appear in the docs (reference/language/#MultiCodeStringEventSeries). I could have overriden them as well, but thought a better solution (in the future) would be to have a better way to hide, or remove, these methods in some way
  • I don't understand the serializer.py file enough to know if my changes just made the tests pass. I wasn't sure if I needed a method with a @marshal.register(BaseMultiCodeString) decorator - or if the fact that it's just a string means that it's fine.

@cloudflare-workers-and-pages

cloudflare-workers-and-pages Bot commented Dec 4, 2024

Copy link
Copy Markdown

Deploying databuilder-docs with  Cloudflare Pages  Cloudflare Pages

Latest commit: 905e801
Status: ✅  Deploy successful!
Preview URL: https://be0fec54.databuilder.pages.dev
Branch Preview URL: https://rw-list-of-icd10-codes-as-st.databuilder.pages.dev

View logs

@evansd evansd left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking really good. I think the ratio of usefulness of feature to lines of production code is particularly high here, which is always nice to see.

I've made a few minor comments. I think the major thing we still need to do is validation for strings, but that should be reasonably straightforward to add.

There's also some messing around with exactly where tests live which isn't significant in one sense but worth getting right.

I agree with your call to leave the inherited methods in place, even thought they don't quite make sense. The cleanest way to deal with this would be to fiddle with the class hierarchy a bit so we have some lighter weight base classes to inherit from and then classes providing the standard methods which other types can use by default. But that's definitely out of scope here.

Oh micro-nit, but the PR title could do with changing from the default!

Comment thread ehrql/query_language.py Outdated
Comment thread ehrql/tables/tpp.py
Comment thread tests/integration/tables/test_tpp.py Outdated
Comment thread ehrql/query_language.py Outdated
Comment thread tests/integration/tables/test_tpp.py Outdated
Comment thread ehrql/serializer.py
@rw251

rw251 commented Dec 5, 2024

Copy link
Copy Markdown
Contributor Author

Couple of other things to note about the implementation - mainly around the fact we want a spec test table to contain || characters but not have any parsers split on them.

  • I've done it with a regex that matches single |s but not double |s and tried to explain that in comments
  • I haven't changed the logic that splits the headers, so that's still a split("|") call
  • I notice that there is also something like this happening in in_memory_database.py i.e. bits like line.split("|"). I haven't changed any of these, and everything works, which makes me think either it's fine, or I'm not testing something correctly.

@evansd evansd left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! I'm happy this does all the right things. A few minor comments but nothing that's immediately blocking.

Comment thread ehrql/codes.py Outdated
Comment thread tests/spec/multi_code_string_series_ops/test_containment.py Outdated
Comment thread tests/spec/conftest.py
@evansd evansd changed the title List of icd10 codes as string Add support for ICD10MultiCodeString type Dec 6, 2024
@rw251 rw251 merged commit 331f8f9 into main Dec 10, 2024
@rw251 rw251 deleted the rw/list-of-icd10-codes-as-string branch December 10, 2024 09:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Better validation for "list of codes shoved in a string" columns

2 participants