New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Mapreduce startup (based on pull request #36) #46
Conversation
$GOBBLIN_HOME/build.gradle: - bump avro version to 1.7.6 to conform with the version of avro-mapred - force guava version 15.0: otherwise during the build gradle may pick up guava 17.0 and will include it in the /lib directory instead of the defined version 15.0. (This may lead to issues like IllegalAccessError: tried to access method com.google.common.base.Stopwatch.<init>()V from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat) $GOBBLIN_HOME/conf/gobblin-env.sh: - A script to initialize environment variables called by gobblin-standalone.sh and gobblin-mapreduce.sh $GOBBLIN_HOME/bin/gobblin-mapreduce.sh: - Job configuration as command line parameter - JobTracker/ResourceManager and FileSystem arguments became optional - Accepts multiple job jars as command line arguments - $GOBBLIN_WORK_DIR can be overridden by the --workdir parameter $GOBBLIN_HOME/bin/gobblin-standalone.sh: - Job configuration as command line parameter - Accepts multiple job jars as command line arguments - $GOBBLIN_WORK_DIR can be overridden by the --workdir parameter $GOBBLIN_HOME/gobblin-runtime/src/main/java/gobblin/runtime/mapreduce/CliMRJobLauncher.java : - Use GenericOptionsParser to be able to pick up Gobblin as well as Hadoop related command line arguments Others: - Removed internal Linkedin dependencies from LIBJARS - Missing command line toggles (sysconfig and jobconfig)added - Load Gobblin related dependencies first to HADOOP_CLASSPATH to avoid jar hell
@lbendig, thanks for the efforts! I am wondering why you want to force guava version 15.0? Actually, it is safe to use the latest version 18.0 in Gobblin. |
@liyinan926 The build.gradle defines Guava 15.0 as external dependency but if that version was not forced, a newer one was pulled during the build (17.0) which caused issues with other Hadoop versions (see HADOOP-10961). |
echo $USAGE | ||
exit 1 | ||
if [ -z "$JOB_CONFIG_FILE" ]; then | ||
die "No job configuration set!" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"No job configuration file set".
LGTM to be merged. @sahilTakiar, can you also take a look? |
LGTM! It looks ready to be merged. One comment I had that can be addressed in another JIRA, is that I don't think we need to force the user to specify "--workdir" or $GOBBLIN_WORK_DIR in order for the script to run. Rather in the gobblin-mapreduce.properties file the user should set GOBBLIN_WORK_DIR. The reason is that the launch scripts should only take in parameters that are needed to launch the job. The properties needed to run the job should be in the ".properties" file. The GOBBLIN_WORK_DIR is not needed to setup and launch the job, but only needed during runtime. Thus, I think it makes more sense to put GOBBLIN_WORK_DIR in the ".properties" file. The layout of the .properties can be simple: gobblin.work.dir=working/ @liyinan926 thoughts? |
I agree that this can be addressed in a separate pull request. But I think the default value of |
Thanks a lot @lbendig for contributing! Merging it now. |
Fix Mapreduce startup (based on pull request #36)
@liyinan926 @sahilTakiar Great, thank you for reviewing the patches! |
I've incorporated the requirements discussed in #36. Handling of the command line parameters has been changed accordingly:
$GOBBLIN_HOME/build.gradle:
the /lib directory
instead of the defined version 15.0. (This may lead to issues like
IllegalAccessError: tried to access method com.google.common.base.Stopwatch.()V from class
org.apache.hadoop.mapreduce.lib.input.FileInputFormat)
$GOBBLIN_HOME/conf/gobblin-env.sh:
$GOBBLIN_HOME/bin/gobblin-mapreduce.sh:
$GOBBLIN_HOME/bin/gobblin-standalone.sh:
$GOBBLIN_HOME/gobblin-runtime/src/main/java/gobblin/runtime/mapreduce/CliMRJobLauncher.java :
arguments
Others: