Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support to RangeByMinutes backfilling tasks #1863

Merged
merged 6 commits into from
Sep 26, 2016

Conversation

j-santander
Copy link
Contributor

Description

This change adds RangeByMinutesBase (inheriting from existing RangeBase) and RangeByMinutes (inheriting from RangeByMinutesBase) providing support for back-filling classes operating in units of minutes.

Behavior (and coding) follows closely the existing RangeHourly and RangeDaily classes. Only addition is
the ability to specify the number of minutes considered as one interval (defaulting to 5 minutes)

Motivation and Context

In our project we produce reports on 5 minutes intervals and the existing backfilling tasks do not cover
such a short interval.

Have you tested this? If so, how?

We've tested the code using on our existing jobs, working succesfully.

In addition, I've added additional test cases to test/range_test.py, following the same structure as existing cases for hours and days intervals.

Copy link
Contributor

@Tarrasch Tarrasch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I first thought of RangeMinuetly but that doesn't make sense now does it? ^^

Overall very good. I just don't get the =5 default.

"in minutes from current time. Prevents infinite loop when stop is none")

minutes_interval = luigi.IntParameter(
default=5,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems way to arbitrary. Why not just default=1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Tarrasch, thanks for reviewing it.
Not sure what you mean with your first comment, is it just the choice of name of RangeByMinutes vs RangeMinutely? English is not my mother tongue, but minutely didn't sound correct to me. Please suggest a name that you think is best.

About the default value of 5 minutes interval. Ultimately it responds to the fact it is the interval that was needed in our project. I could justify it as being a reasonable interval in the minutes range, making every minute a too frequent interval... but it is not trouble to change it.

@Tarrasch
Copy link
Contributor

@j-santander I think your name of RangeByMinutes makes sense. You're right that RangeMinutely is weird English. I can't think of a better name than RangeByMinutes. Does @ulzha have suggestions?


Thanks for changing the default to =1. I also realize your point that as low as 1 very rarely is useful as it's so frequent.


A lost point which I would like you to fix. Can you throw a ParameterException if the value does not evently divide 60? Surely it's still possible to have 7 or 8, but it's certainly a very bad idea. Can you add that to the patch and have a small test case alongside it?

@j-santander
Copy link
Contributor Author

Added the validations, let me know if anything else is needed.
Thanks
Julian

Copy link
Contributor

@Tarrasch Tarrasch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks almost ready to merge. I added some comments to further improve quality. After that I think this can be merged! :)

@@ -460,6 +460,11 @@ def finite_datetimes(self, finite_start, finite_stop):
"""
Simply returns the points in time that correspond to a whole number of minutes intervals.
"""
# Validate that the minutes_interval can divide 60 and it is greater than 0.
if self.minutes_interval <= 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While you're at it, can you change this to

if not (0 < self.minutes_interval < 60)

task.requires()
except luigi.parameter.ParameterException:
return
self.fail("Expected a parameter exception")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for task_path in task_a_paths:
MockTarget(task_path)
# this test takes a few seconds. Since stop is not defined,
# finite_datetimes constitute many years to consider
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this comment? (and some for the other test)

@Tarrasch Tarrasch merged commit 338b64d into spotify:master Sep 26, 2016
@Tarrasch
Copy link
Contributor

Thanks @j-santander! :)

This was referenced Jun 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants