-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support parameterising study definitions #860
Conversation
d1e5672
to
805b3ac
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One typo and some questions that are all along the lines of "what happens if a user does this weird thing"; feel free to ignore if they're irrelevant
expectations_population, | ||
dummy_data_file, | ||
index_date_range=index_date_range, | ||
skip_existing=skip_existing, | ||
output_format=output_format, | ||
output_name=output_name or "input", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
urgh
class PatchStack(ExitStack): | ||
""" | ||
Apply multiple `patch` context managers without nesting or using the ugly multi | ||
argument form |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👏
This is fab! Couple of questions: If I do Can I do |
🌊 🐦 : have you done documentation and "platform news" for this? Cos it's cool. |
Thanks @wjchulme. This is interesting feedback because it already exposes one misunderstanding which is that you can't write That said, I think we can make the space separated version work with a bit of fiddling. We'd need to let You can definitely include whitespace. If we make the change above then your example would work exactly as written. Otherwise you'd have to write it as As for types, there isn't a neat solution with this approach. I think the only sane option is to make it clear that everything is a string and needs to be explicitly converted if you want something else. There is a totally different approach we could take though that would give type support, and that's to use the generate_cohort:
run: cohortextractor:latest generate_cohort --study-definition study_definition
config:
param_1: some string
param_2: true
param_3: 10
outputs:
highly_sensitive:
cohort: output/input.csv I actually think that would be less work than trying to add the support for spaces I described above. So maybe that's the way to go? |
No, but was planning to do the docs once it was merged. And maybe the platform news after at least one person has tried it first? |
This is how the equivalent functionality works in R, so at least there's some consistency. The problem is that most users won't know how to do the conversion in python. These are edges cases though -- I reckon a vast majority of parameters used will be needed as strings (even if conceptually they're numeric, eg |
I think users would quickly get used to having to use I do like the |
This allows specifying the full path to the output file, including the output directory and extension. We need this in order to support parameterising study definitions because at the moment the output file name is determined by the study definition name, but we want to use the same study definition to produce different output files. We could acheive the same thing by adding an `--output-name` argument and combining these with the pre-existing `--output-directory` and `--output-format` arguments. But it seems much simpler to specify all three together in a single argument.
77ce53b
to
856faff
Compare
Yeah, I go a bit back and forth on this. It relies on coupling bits of the system together in ways which make me a bit uncomfortable. Not saying we definitely shouldn't do it, but I'd rather avoid baking that behaviour in at this stage. For now I've just kept the |
This teaches `generate_cohort` to accept a new mutli-valued `--param` argument e.g. generate_cohort --param key1=value1 --param key2=value2 --param key3 The study definition can access these via the `params` dict in the `cohortextractor` module: from cohortextractor import params print(params) # {'key1': 'value1', 'key2': 'value2', 'key3': ''}
856faff
to
df6ec7d
Compare
* Document the `--param` argument to cohortextractor As added in: opensafely-core/cohort-extractor#860
This teaches
generate_cohort
to accept a new, multi-valued--param
argument e.g.The study definition can access these via the
params
dict in thecohortextractor
module:There was a suggestion earlier of using
--arg
for this, but I think--param
avoids confusion with other kinds of thing which are also called "arguments", and fits neatly with the term "parameterised study definition".This was all much harder than it ought to be because implicit in this feature is the need for the output file name to be customised. Previously the output directory and format could be changed, but the file name was determined by the study definition name e.g.
study_definition_test.py
would always produce a file called<output_dir>/input_test.<extension>
.So we now support an
--output-file
argument which sets the directory, base name and file format in one go e.g.some_dir/my_results.feather
.