New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use dags package for implementation of DAG backend. #399
Conversation
In GETTSIM, we currently proceed in 2 steps:
I think it would be beneficial to keep the functionality to have the DAG as object within GETTSIM.
I, hence, suggest that I adjust dags such that it allows to return the DAG. This would likely just split the @janosg @hmgaudecker, what do you think? |
Having said that: I see no problem in adding a |
I would prefer to refactor I'd prefer that to changing the return types. Calling the Once that is in place, it will be trivial to have the plotting code in either location, once we have something in GETTSIM we can decide whether it is general enough to port it to dags or not. Finally, the above sounds like a might good idea to swap the order of applying decorators and creating the DAG right away, before doing anything else. |
That's what I have in mind. Will do! Sounds all very good! However, I am not sure whether I understand correctly what you mean by decorators/higher order functions. The most important higher order changes we are currently doing are:
|
What I meant is that I would not modify the dag after it is created. Instead I would always modify a built-in or user provided function dict and let the dag do the rest. E.g. if a function, say This second step often requires renaming arguments of functions. This is where the decorators are helpful. |
Thanks! 2 questions
Based on
Yes, I guess we could just remove functions that are part of the columns? Maybe easiest if |
I would have simply created all those functions and let the dag figure out which of them are needed. Creating such functions is almost cost free compared to calculating the taxes and transfers. |
Ok, that is in line with what I meant. 👍
We currently just look at the set of all functions and all arguments of those functions to determine the automatic aggregation specs. Janos' solution to create all potential aggregation function seems also fine, but might not be necessary.
Yes. |
Yes, but that is the core functionality of dags and we would just be repeating it. |
To find out the set of all functions and all arguments, yes. But determining the automatic aggregation specs based on the naming convention seems very specific to GETTSIM. I would rather not add this functionality to dags unless another project would also use it. |
This is what I meant. How could we do the second step without repeating this first step? I'd think just creating all functions would require much less code on GETTSIM's side and certainly much less sophisticated code. |
I have problems understanding this comment. What exactly do you mean by second step? And are you proposing to create aggregation functions for all functions (for all supported groupings) independent of whether the aggregated function is requested somewhere and of whether it is defined somewhere explicitely? |
First step:
Second step:
Yes, anything that is not required will be weeded out by the DAG. According to @janosg, should be very quick! |
Codecov ReportBase: 92.88% // Head: 92.76% // Decreases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## main #399 +/- ##
==========================================
- Coverage 92.88% 92.76% -0.12%
==========================================
Files 76 74 -2
Lines 3905 3896 -9
==========================================
- Hits 3627 3614 -13
- Misses 278 282 +4
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
Thanks, good idea! I moved the respective functions. I also adjusted the comments in |
In the process of changing the docstring, I also changed
in 14afe03. Hope that is uncontroversial? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost there, I think, excellent!!!
Please double-check my commits, I think they all should be fairly uncontroversial, adding some clarity / consistency (I hope).
I did not go through interface.py in detail yet. Two things, first minor, second important imho.
- Can we get by without that
KeyErrorMessage
class? I find it distracting and I'd hope that by re-usingformat_errors_and_warnings
. Could you check, please? - The module is very large and likely to grow. This might be a good point to think about the order of the functions (reproduced just below). I do not see any particular structure in that order right now, which makes the module difficult to navigate. Could you make a suggestion here? We can discuss and then implement later. Thanks!
Functions in interface.py
:
compute_taxes_and_transfers
set_up_dag
_process_and_check_data
_round_and_partial_parameters_to_functions
_create_input_data
_prepare_results
_fail_if_columns_overriding_functions_are_not_in_dag
_convert_data_to_correct_types
_fail_if_group_variables_not_constant_within_groups
_fail_if_columns_overriding_functions_are_not_in_data
_reorder_columns
_fail_if_root_nodes_are_missing
_reduce_to_necessary_data
_fail_if_pid_is_non_unique
_fail_if_duplicates_in_columns
_root_nodes
_add_rounding_to_one_function
_add_rounding_to_functions
Apparently, An alternative to using the try:
raise KeyError("This is a \n Line break")
except KeyError as e:
print(e.args[0]) My prefered alternative would be to keep |
Proposal for order of functions is below. public functions
data checks and data processing before calculations(
function processing(
data processing after calculations
|
Looks great, thanks! Can you implement that? I think I'll want to inline |
…s/gettsim into use_dags
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Just a couple of little notes left.
Small comment on the commit message in 52f97cb: It should be "Merge branch 'main'." or so -- whether that required resolving merge conflicts is irrelevant. This way I needed to open it in a GUI to understand what was happening.
Thanks! Will consider next time. I added four new test cases and opened the issue. Will merge as soon as tests have run through. |
What problem do you want to solve?
dag.py
.Todo
compute_taxes_and_transfers
plot_dag
CHANGES.rst
.