-
Notifications
You must be signed in to change notification settings - Fork 33
Conversation
job_definition.output.should =~ /\Apath-\d{10}-\d{5}\Z/ | ||
end | ||
|
||
it 'should default to the job name when dir is not set' do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I may quote something that someone, I believe you, wisely wrote in another pull request only hours ago:
While I'm aware that it's common to include "should" in your example descriptions, I prefer not to as it makes you seem uncertain about what your code actually does.
I realize you're just trying to follow the style of the rest of the tests, although you're not consistent in doing that, but just like you I really prefer assertive descriptions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's hard for me to say I don't agree with what you're saying, and I will consistently use assertive descriptions throughout.
I kind of like the idea, but I'm a bit torn about whether or not I want this in Rubydoop. I'd like to get rid of the job definition DSL completely, so adding to it might not be right. On the other hand, this feature would probably fit with whatever it's replaced by. |
I think the gain over something like the following would be quite small. Given that the DSL should be reworked, I'm not sure the added functionality makes sense. second_tmp_path = Dir::Tmpname.make_tmpname('second', nil)
job 'second-step' do
input first_path
output second_tmp_path
end
job 'third-step' do
input second_tmp_path
output output_path
end |
I thought about the same thing as well, but I like my solution a bit better for two reasons; firstly, it hides away the "unique" path generation from the user, and secondly, it makes it a little bit harder to make mistakes, someone might for instance use another variable for the output path of the second step in your example without changing the definition of the third step. Having that said, I don't know whether this should be merged or not, given that the DSL may be removed soon. But as @iconara said, this should fit with a non-DSL based interface as well. |
What's the consensus of this? I don't mind including this in a v1.2 release. A v2.0 with a new DSL isn't exactly round the corner, so if this helps people I'm not against it. |
The rest of the tests were updated to RSpec 3 in #28, which was merged before.
I've managed to have Hadoop jobs fail due to using the same intermediate path for multiple steps on more than one occasion, and have thought a bit about how to avoid it. I'm not sure how much this particular solution actually helps with the problem, but I thought it might at least serve as a decent start for a discussion.
The idea is that you should be able to configure jobs with multiple steps like:
The argument handling in
JobDefinition#output
is rather hacky, as it can be used as both a getter and a setter, and with a semi-optionaldir
argument. But I figured that it is better to make it complex than to make using it complex. Then again, I wouldn't mind it not being so hacky.