Skip to content
This repository has been archived by the owner on Jan 2, 2023. It is now read-only.

Unique output paths #26

Closed
wants to merge 5 commits into from
Closed

Unique output paths #26

wants to merge 5 commits into from

Conversation

jowl
Copy link
Collaborator

@jowl jowl commented Apr 23, 2015

I've managed to have Hadoop jobs fail due to using the same intermediate path for multiple steps on more than one occasion, and have thought a bit about how to avoid it. I'm not sure how much this particular solution actually helps with the problem, but I thought it might at least serve as a decent start for a discussion.

The idea is that you should be able to configure jobs with multiple steps like:

Rubydoop.configure do |input_path, output_path|
  first_step = job 'first-step' do
    input input_path
    output intermediate: true
    # rest of definition
  end

  second_step = job 'second-step' do
    input first_step.output
    output intermediate: true
    # rest of definition
  end

  job 'third-step' do
    input second_step.output
    output output_path
    # rest of definition
  end
end

The argument handling in JobDefinition#output is rather hacky, as it can be used as both a getter and a setter, and with a semi-optional dir argument. But I figured that it is better to make it complex than to make using it complex. Then again, I wouldn't mind it not being so hacky.

job_definition.output.should =~ /\Apath-\d{10}-\d{5}\Z/
end

it 'should default to the job name when dir is not set' do
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I may quote something that someone, I believe you, wisely wrote in another pull request only hours ago:

While I'm aware that it's common to include "should" in your example descriptions, I prefer not to as it makes you seem uncertain about what your code actually does.

I realize you're just trying to follow the style of the rest of the tests, although you're not consistent in doing that, but just like you I really prefer assertive descriptions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard for me to say I don't agree with what you're saying, and I will consistently use assertive descriptions throughout.

@iconara
Copy link
Owner

iconara commented Apr 23, 2015

I kind of like the idea, but I'm a bit torn about whether or not I want this in Rubydoop. I'd like to get rid of the job definition DSL completely, so adding to it might not be right. On the other hand, this feature would probably fit with whatever it's replaced by.

@grddev
Copy link
Collaborator

grddev commented Apr 24, 2015

I think the gain over something like the following would be quite small. Given that the DSL should be reworked, I'm not sure the added functionality makes sense.

  second_tmp_path = Dir::Tmpname.make_tmpname('second', nil)
  job 'second-step' do
    input first_path
    output second_tmp_path
  end

  job 'third-step' do
    input second_tmp_path
    output output_path
  end

@jowl
Copy link
Collaborator Author

jowl commented Apr 25, 2015

I thought about the same thing as well, but I like my solution a bit better for two reasons; firstly, it hides away the "unique" path generation from the user, and secondly, it makes it a little bit harder to make mistakes, someone might for instance use another variable for the output path of the second step in your example without changing the definition of the third step.

Having that said, I don't know whether this should be merged or not, given that the DSL may be removed soon. But as @iconara said, this should fit with a non-DSL based interface as well.

@iconara
Copy link
Owner

iconara commented May 20, 2015

What's the consensus of this? I don't mind including this in a v1.2 release. A v2.0 with a new DSL isn't exactly round the corner, so if this helps people I'm not against it.

iconara added a commit that referenced this pull request May 21, 2015
The rest of the tests were updated to RSpec 3 in #28, which was merged before.
@iconara
Copy link
Owner

iconara commented May 21, 2015

Merged in 3d3c672. I also update the tests since #28 upgraded RSpec (see dd90f1a).

@iconara iconara closed this May 21, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants