Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Materialise calculated dates used repeatedly in a dataset definition #2024

Open
evansd opened this issue May 16, 2024 · 0 comments
Open

Materialise calculated dates used repeatedly in a dataset definition #2024

evansd opened this issue May 16, 2024 · 0 comments

Comments

@evansd
Copy link
Contributor

evansd commented May 16, 2024

A common pattern in dataset definitions is to determine an index date, which varies for each patient , and then use it to restrict the time period for other queries in the dataset e.g.

index_date = maximum_of(study_end_date, date_of_death, intervention_date)
...
where(events.date.is_on_or_between(index_date - years(1), index_date)

Because of the fiddliness involved in date arithmetic, and in calculating row-wise maxima, the SQL expression corresponding to is_on_or_between(index_date - years(1), index_date) can often be quite complex. And they are repeated over and over again for each query that uses them.

It would be good to spot this pattern, calculate the pair of dates once for each patient, write it to a temporary table and then just re-use it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant