New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pass through parameters on Range tasks to of tasks #1675
Conversation
Parameters to tasks wrapped by RangeBase and its subclasses can now be passed through the Range to the 'of' task. This both makes it easier to specify parameters to tasks running inside of ranges in Python code and allows for Luigi to differentiate between multiple Range instances running with the same 'of' parameter. closes spotify#1627
Current coverage is 75.14%@@ master #1675 diff @@
==========================================
Files 94 94
Lines 10236 10259 +23
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
- Hits 8049 7709 -340
- Misses 2187 2550 +363
Partials 0 0
|
Actually, I don't think bulk_complete can work correctly with this change. You'd have to pass in the arguments both explicitly and using the of_params field. I'm working on a fix, but I don't know if there's a way to preserve the existing interface of bulk_complete. The problem being that RangeDaily and RangeHourly were both providing a list of parameters to those functions rather than a complete argument tuple. The way I'm approaching a change to this is to pass a list of dictionaries. If there's a way to make python use a single value as both the positional and keyword arguments for a function, that'd be nice, but I'm not aware of it. |
I see that MixinBulkComplete is set to handle the case that the parameter tuples can be any of a single value, a positional argument tuple, or a keyword argument tuple. However, the BigQuery code from contribs doesn't expect this behavior. |
In RangeDaily and RangeHourly, the bulk_complete parameter was invoked with a list of the date/hour parameters to be used for the of task. This prevents of_params from being provided to the bulk_complete code. Instead, we change the invocation of the bulk_complete method to use a partial with the of_params values curried. Thus, any code using the cls parameter of bulk_complete can behave as normal while still allowing the of_params to be passed through. For code that would rather subclass and override missing_datetimes, two new helper methods are provided: datetime_to_parameters and parameters_to_datetime. These are parallel to datetime_to_parameter and parameter_to_datetime and rather than handling only the first argument to the of class, handle all the required arguments.
Changed the determination of the first positional argument to feed through on the Range tasks to actually check whether an argument was positional or not
Found one more small issue. Seems to be working correctly in our system now though. |
Docs are missing my friend. ^^ They can either be added in the code module or in the docs which mention range (you can |
Sorry. Hope that addresses the docs issue or is there a specific area that needs to be addressed? |
|
||
Alternatively, you can specify parameters at the task family level (as described :ref:`here <Parameter-class-level-parameters>`), | ||
however these will not appear in the task name for the upstream Range task which | ||
can have implications in how the scheduler and visualizer handle task instances. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very well written. Thanks! :)
A bit unfortunate that we have to resort to code with Other than that I'm fine with this. @ulzha, do you want to take a quick look? Other than that I'll merge day this in a day or two. |
Actually let's not merge this until we've resolved #1657 (comment). In particular it seems like execution summary is broken for anything using DictParameter. |
Ah, I noticed that earlier but thought it must have been some other bug on my end. Since all the tests passed, I figured it was fine. Makes sense to wait for a fix on that before putting in DictParameters retroactively. Please let me know when a solution is available so I can perform refactoring as necessary. I'll be using this solution for the time being since I actually don't have an issue if these error out after completing. Will update if I encounter any weirdness. I considered changing the interface of bulk_complete, but figured it'd be better to make it a little weird for now and not significantly break the expected API. Then later at a point where you are comfortable making a backwards incompatible change, move it to a staticmethod. |
There's no need for you to change your implementation once that bug-fix is done. You can just sit back and relax for now. ^^ Yea. I think you did the right thing regarding the |
Should that be in the docs for Range or for Task? |
@nugend I guess for Task right? Because when somebody looks at how to implement |
The reason I asked was because nothing actually changed in Task and any other usage of bulk_complete is likely custom created (and likely does not have any sort of similar issue). Would it make sense to add bulk_complete to RangeBase so that it could be documented inside the Range classes themselves? |
I don't think it makes sense to add |
Yes, sorry. I didn't yet have my coffee when I replied earlier. What I mean is just that this isn't really a change in how you use bulk_complete. It's a change in how bulk_complete is invoked when used with a RangeBase derived class. Strictly speaking, there could be some other user implemented WrapperTask which uses bulk_complete to calculate the required child tasks and that usage of bulk_complete wouldn't change and the ability to use MixinNaiveBulkComplete would support that just as happily. To the larger point, depending on how it is decided that bulk_complete should change, maybe just more thorough documentation of what the possible types parameter_tuples can inhabit would be sufficient for ensuring correct usage. Or possibly, a helper to generate the tasks associated with each parameter_tuple |
Ok. Let's wait with the docs change. This PR is still blocked on #1657 though. :) |
This is a fix up for #1675 I used this as reference https://github.com/spotify/luigi/pull/1675/files#diff-bcfedf11b58a3378d447759fcaf6c843R837
This is a fix up for #1675 I used this as reference https://github.com/spotify/luigi/pull/1675/files#diff-bcfedf11b58a3378d447759fcaf6c843R837
Description
I've added "of_params" as an additional parameter on the RangeBase class. The approach is based on discussion with @Tarrasch
Motivation and Context
Please see issue from #1627
Have you tested this? If so, how?
I have included unit tests.
I have not yet tested this in our system, but I will do so if the maintainers think the approach used is fine and report back on the behavior.