-
Notifications
You must be signed in to change notification settings - Fork 744
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Various fixes to embedded gobblin to allow it to run on mr mode. #1344
Conversation
bdd9370
to
286f539
Compare
@chavdar can you review? |
Changes Unknown when pulling 286f539 on ibuenros:embedded-working into * on linkedin:master*. |
Changes Unknown when pulling 286f539 on ibuenros:embedded-working into * on linkedin:master*. |
111b1bc
to
0d3bc00
Compare
0d3bc00
to
71dba84
Compare
@@ -140,6 +140,10 @@ | |||
|
|||
final FileSystem sourceFs = getSourceFileSystem(state); | |||
final FileSystem targetFs = getTargetFileSystem(state); | |||
|
|||
log.info(String.format("Identified source file system at %s and target file system at %s.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log.info("Identified source file system at {} and target file system at {}., ...) is more efficient.
* uses {@link #_sysConfig}, which is only initialized when the user runs {@link #withSysConfig(Configurable)} after | ||
* construction. | ||
*/ | ||
public Launcher initialize() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like initialize() methods. How about the getMetrics() method check if _metrics is null and set it on the fly?
@@ -358,6 +370,9 @@ public Launcher withSysConfig(Configurable sysConfig) { | |||
/** Parent Gobblin instance */ | |||
public Launcher withGobblinInstanceEnvironment(GobblinInstanceEnvironment gobblinInstance) { | |||
_gobblinEnv = Optional.of(gobblinInstance); | |||
if (!_sysConfig.isPresent()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getSysConfig() is already doing that through getDefaultSysConfig().
throws IOException { | ||
TimingEvent distributedCacheSetupTimer = | ||
this.eventSubmitter.getTimingEvent(TimingEvent.RunJobTimings.MR_DISTRIBUTED_CACHE_SETUP); | ||
|
||
Path jarFileDir = new Path(this.mrJobDir, JARS_DIR_NAME); | ||
Path jarFileDir = this.jarsDir; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is dangerous as jarsDir will be shared across different jobs/runs. This may cause jar with mismatched versions to be stored.
I think you have two options:
- Use both a job-specific and global directory. Use the latter if the name and size match, otherwise upload to the job-specific dir.
- Have a more complex layout for the jar cache which allows different versions to be stored (like the gradle cache).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jars are added individually to distributed classpath, so the same directory can safely have multiple versions of the same jar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I understand. Aren't all embedded gobblin jobs re-use the same path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 362 in MRJobLauncher
is this: DistributedCache.addFileToClassPath(destJarFile, this.conf, this.fs);
, so the jars get added individually to the distributed classpath, the directory is just a container for all jars.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never mind. If the jars are already versioned, it should work.
@chavdar addressed comments. |
Major changes: