Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass through parameters on Range tasks to of tasks #1675

Merged
merged 6 commits into from Jun 9, 2016

Conversation

nugend
Copy link
Contributor

@nugend nugend commented May 2, 2016

Description

I've added "of_params" as an additional parameter on the RangeBase class. The approach is based on discussion with @Tarrasch

Motivation and Context

Please see issue from #1627

Have you tested this? If so, how?

I have included unit tests.

I have not yet tested this in our system, but I will do so if the maintainers think the approach used is fine and report back on the behavior.

Parameters to tasks wrapped by RangeBase and its subclasses can now be
passed through the Range to the 'of' task. This both makes it easier to
specify parameters to tasks running inside of ranges in Python code and
allows for Luigi to differentiate between multiple Range instances
running with the same 'of' parameter.

closes spotify#1627
@mention-bot
Copy link

By analyzing the blame information on this pull request, we identified @Tarrasch, @joeshaw and @ulzha to be potential reviewers

@codecov-io
Copy link

codecov-io commented May 2, 2016

Current coverage is 75.14%

Merging #1675 into master will decrease coverage by -3.49%

@@             master      #1675   diff @@
==========================================
  Files            94         94          
  Lines         10236      10259    +23   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
- Hits           8049       7709   -340   
- Misses         2187       2550   +363   
  Partials          0          0          
  1. 6 files (not in diff) in luigi/contrib were modified. more
    • Misses +288
    • Hits -287
  2. 2 files (not in diff) in luigi were modified. more
    • Misses +68
    • Hits -68

Powered by Codecov. Last updated by 7cbc0fe...8dc302c

@nugend
Copy link
Contributor Author

nugend commented May 3, 2016

Actually, I don't think bulk_complete can work correctly with this change. You'd have to pass in the arguments both explicitly and using the of_params field. I'm working on a fix, but I don't know if there's a way to preserve the existing interface of bulk_complete. The problem being that RangeDaily and RangeHourly were both providing a list of parameters to those functions rather than a complete argument tuple.

The way I'm approaching a change to this is to pass a list of dictionaries. If there's a way to make python use a single value as both the positional and keyword arguments for a function, that'd be nice, but I'm not aware of it.

@nugend
Copy link
Contributor Author

nugend commented May 3, 2016

I see that MixinBulkComplete is set to handle the case that the parameter tuples can be any of a single value, a positional argument tuple, or a keyword argument tuple.

However, the BigQuery code from contribs doesn't expect this behavior.

Dan Nugent added 2 commits May 3, 2016 19:04
In RangeDaily and RangeHourly, the bulk_complete parameter was invoked
with a list of the date/hour parameters to be used for the of task. This
prevents of_params from being provided to the bulk_complete code.
Instead, we change the invocation of the bulk_complete method to use a
partial with the of_params values curried. Thus, any code using the cls
parameter of bulk_complete can behave as normal while still allowing the
of_params to be passed through.

For code that would rather subclass and override missing_datetimes, two
new helper methods are provided: datetime_to_parameters and
parameters_to_datetime. These are parallel to datetime_to_parameter and
parameter_to_datetime and rather than handling only the first argument
to the of class, handle all the required arguments.
Changed the determination of the first positional argument to feed
through on the Range tasks to actually check whether an argument was
positional or not
@nugend
Copy link
Contributor Author

nugend commented May 4, 2016

Found one more small issue. Seems to be working correctly in our system now though.

@Tarrasch
Copy link
Contributor

Tarrasch commented May 4, 2016

Docs are missing my friend. ^^

They can either be added in the code module or in the docs which mention range (you can git grep Range for it).

@nugend
Copy link
Contributor Author

nugend commented May 4, 2016

Sorry. Hope that addresses the docs issue or is there a specific area that needs to be addressed?


Alternatively, you can specify parameters at the task family level (as described :ref:`here <Parameter-class-level-parameters>`),
however these will not appear in the task name for the upstream Range task which
can have implications in how the scheduler and visualizer handle task instances.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very well written. Thanks! :)

@Tarrasch
Copy link
Contributor

Tarrasch commented May 5, 2016

A bit unfortunate that we have to resort to code with __func__ and stuff. And actually bulk_complete won't be a @classmethod any more.

Other than that I'm fine with this. @ulzha, do you want to take a quick look? Other than that I'll merge day this in a day or two.

@Tarrasch
Copy link
Contributor

Tarrasch commented May 5, 2016

Actually let's not merge this until we've resolved #1657 (comment). In particular it seems like execution summary is broken for anything using DictParameter.

@nugend
Copy link
Contributor Author

nugend commented May 5, 2016

Ah, I noticed that earlier but thought it must have been some other bug on my end. Since all the tests passed, I figured it was fine. Makes sense to wait for a fix on that before putting in DictParameters retroactively. Please let me know when a solution is available so I can perform refactoring as necessary. I'll be using this solution for the time being since I actually don't have an issue if these error out after completing. Will update if I encounter any weirdness.

I considered changing the interface of bulk_complete, but figured it'd be better to make it a little weird for now and not significantly break the expected API. Then later at a point where you are comfortable making a backwards incompatible change, move it to a staticmethod.

@Tarrasch
Copy link
Contributor

Tarrasch commented May 6, 2016

There's no need for you to change your implementation once that bug-fix is done. You can just sit back and relax for now. ^^

Yea. I think you did the right thing regarding the bulk_complete. But perhaps you should add an implementation note that one "must" use the cls variable to instantiate the class and not use YourTaskName as a constructor?

@nugend
Copy link
Contributor Author

nugend commented May 6, 2016

Should that be in the docs for Range or for Task?

@Tarrasch
Copy link
Contributor

Tarrasch commented May 9, 2016

@nugend I guess for Task right? Because when somebody looks at how to implement bulk_complete, they're looking at the documentation for Tasks in general.

@nugend
Copy link
Contributor Author

nugend commented May 9, 2016

The reason I asked was because nothing actually changed in Task and any other usage of bulk_complete is likely custom created (and likely does not have any sort of similar issue).

Would it make sense to add bulk_complete to RangeBase so that it could be documented inside the Range classes themselves?

@Tarrasch
Copy link
Contributor

I don't think it makes sense to add bulk_complete to RangeBase. Because it's supposed to be overridden by classes that aren't subclasses of RangeBase. Right?

@nugend
Copy link
Contributor Author

nugend commented May 10, 2016

Yes, sorry. I didn't yet have my coffee when I replied earlier. What I mean is just that this isn't really a change in how you use bulk_complete. It's a change in how bulk_complete is invoked when used with a RangeBase derived class. Strictly speaking, there could be some other user implemented WrapperTask which uses bulk_complete to calculate the required child tasks and that usage of bulk_complete wouldn't change and the ability to use MixinNaiveBulkComplete would support that just as happily.

To the larger point, depending on how it is decided that bulk_complete should change, maybe just more thorough documentation of what the possible types parameter_tuples can inhabit would be sufficient for ensuring correct usage. Or possibly, a helper to generate the tasks associated with each parameter_tuple

@Tarrasch
Copy link
Contributor

Ok. Let's wait with the docs change. This PR is still blocked on #1657 though. :)

@Tarrasch Tarrasch merged commit 859095a into spotify:master Jun 9, 2016
@nugend nugend deleted the pass-through branch June 9, 2016 15:50
This was referenced Jun 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants