Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schemas for contained resources are all jumbled together - can we do better? #250

Open
mikix opened this issue Jul 21, 2023 · 0 comments
Open
Labels
enhancement New feature or request

Comments

@mikix
Copy link
Contributor

mikix commented Jul 21, 2023

Right now, all contained resources are just equal members of the contained array.

And so when we write to an output target that expects a schema (like Delta Lake), the ETL ends up creating a Frankenstein schema of the union of all contained resources it finds in the parent resource.

In practice, this doesn't seem to be a high priority issue for two reasons:

  • Most EHRs don't stuff many different kinds of contained resources in the same parent resource. i.e. MedicationRequest might contain Medication but is unlikely to also hold Patient
  • Most FHIR fields with the same name but on different resource don't have different types (do they ever?)

Still, this is an odd situation that could potentially cause trouble in the future.

Brainstormed solutions:

  • Modify incoming data and separate out the contained resources into separate lists like containedMedications. But that feels like a whole can of worms as well. Doing it in a way that still writes out valid FHIR would be tricky, because while we can stuff our new arrays into an extension, the internal ids would still expect to be found in the main contained list. Maybe a modifierExtension would let us avoid that.
  • Write out new tables for contained resources, like medicationrequest_contained_medication or similar. Then you'd have to change the referencing ID in a way that consumers would have to know which table to go look up. And you'd have to handle the contained resource changing IDs / being dropped. Hmm. This all kind of sucks.
  • Move contained resources into the external table where they belong. That might breaks a few current assumptions of the ETL currently (i.e. that running task A will only modify a set of known tables) and would require inventing an ID scheme that wouldn't conflict with any existing IDs. And you'd still have to handle contained IDs changing / being dropped.

Something to think about.

@mikix mikix added the enhancement New feature or request label Jul 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant