Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Mapreduce startup (based on pull request #36) #46

Merged
merged 3 commits into from Mar 12, 2015

Conversation

lbendig
Copy link
Contributor

@lbendig lbendig commented Mar 9, 2015

I've incorporated the requirements discussed in #36. Handling of the command line parameters has been changed accordingly:

$GOBBLIN_HOME/build.gradle:

  • bump avro version to 1.7.6 to conform with the version of avro-mapred
  • force guava version 15.0: otherwise during the build gradle may pick up guava 17.0 and will include it in
    the /lib directory
    instead of the defined version 15.0. (This may lead to issues like
    IllegalAccessError: tried to access method com.google.common.base.Stopwatch.()V from class
    org.apache.hadoop.mapreduce.lib.input.FileInputFormat)

$GOBBLIN_HOME/conf/gobblin-env.sh:

  • A script to initialize environment variables called by gobblin-standalone.sh and gobblin-mapreduce.sh

$GOBBLIN_HOME/bin/gobblin-mapreduce.sh:

  • Job configuration as command line parameter
  • JobTracker/ResourceManager and FileSystem arguments became optional
  • Accepts multiple job jars as command line arguments
  • $GOBBLIN_WORK_DIR can be overridden by the --workdir parameter

$GOBBLIN_HOME/bin/gobblin-standalone.sh:

  • Job configuration as command line parameter
  • Accepts multiple job jars as command line arguments
  • $GOBBLIN_WORK_DIR can be overridden by the --workdir parameter

$GOBBLIN_HOME/gobblin-runtime/src/main/java/gobblin/runtime/mapreduce/CliMRJobLauncher.java :

  • Use GenericOptionsParser to be able to pick up Gobblin as well as Hadoop related command line
    arguments

Others:

  • Removed internal LinkedIn dependencies from LIBJARS
  • Missing command line toggles (sysconfig and jobconfig)added
  • Load Gobblin related dependencies first to HADOOP_CLASSPATH to avoid jar hell

$GOBBLIN_HOME/build.gradle:
- bump avro version to 1.7.6 to conform with the version of avro-mapred
- force guava version 15.0: otherwise during the build gradle may pick up guava 17.0 and will include it in the /lib directory
  instead of the defined version 15.0. (This may lead to issues like
  IllegalAccessError: tried to access method com.google.common.base.Stopwatch.<init>()V from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat)

$GOBBLIN_HOME/conf/gobblin-env.sh:
- A script to initialize environment variables called by gobblin-standalone.sh and gobblin-mapreduce.sh

$GOBBLIN_HOME/bin/gobblin-mapreduce.sh:
- Job configuration as command line parameter
- JobTracker/ResourceManager and FileSystem arguments became optional
- Accepts multiple job jars as command line arguments
- $GOBBLIN_WORK_DIR can be overridden by the --workdir parameter

$GOBBLIN_HOME/bin/gobblin-standalone.sh:
- Job configuration as command line parameter
- Accepts multiple job jars as command line arguments
- $GOBBLIN_WORK_DIR can be overridden by the --workdir parameter

$GOBBLIN_HOME/gobblin-runtime/src/main/java/gobblin/runtime/mapreduce/CliMRJobLauncher.java :
- Use GenericOptionsParser to be able to pick up Gobblin as well as Hadoop related command line arguments

Others:
- Removed internal Linkedin dependencies from LIBJARS
- Missing command line toggles (sysconfig and jobconfig)added
- Load Gobblin related dependencies first to HADOOP_CLASSPATH to avoid jar hell
@liyinan926
Copy link
Contributor

@lbendig, thanks for the efforts! I am wondering why you want to force guava version 15.0? Actually, it is safe to use the latest version 18.0 in Gobblin.

@lbendig
Copy link
Contributor Author

lbendig commented Mar 10, 2015

@liyinan926 The build.gradle defines Guava 15.0 as external dependency but if that version was not forced, a newer one was pulled during the build (17.0) which caused issues with other Hadoop versions (see HADOOP-10961).

echo $USAGE
exit 1
if [ -z "$JOB_CONFIG_FILE" ]; then
die "No job configuration set!"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"No job configuration file set".

@liyinan926
Copy link
Contributor

LGTM to be merged. @sahilTakiar, can you also take a look?

@sahilTakiar
Copy link
Contributor

LGTM! It looks ready to be merged.

One comment I had that can be addressed in another JIRA, is that I don't think we need to force the user to specify "--workdir" or $GOBBLIN_WORK_DIR in order for the script to run. Rather in the gobblin-mapreduce.properties file the user should set GOBBLIN_WORK_DIR.

The reason is that the launch scripts should only take in parameters that are needed to launch the job. The properties needed to run the job should be in the ".properties" file. The GOBBLIN_WORK_DIR is not needed to setup and launch the job, but only needed during runtime. Thus, I think it makes more sense to put GOBBLIN_WORK_DIR in the ".properties" file.

The layout of the .properties can be simple:

gobblin.work.dir=working/
writer.staging.dir=${gobblin.work.dir}/task-staging
etc.

@liyinan926 thoughts?

@liyinan926
Copy link
Contributor

I agree that this can be addressed in a separate pull request. But I think the default value of gobblin.work.dir should not be hard-coded. Instead, it should always use ${env:GOBBLIN_WORK_DIR} and we should set GOBBLIN_WORK_DIR to a default in the launch script, say, working/ if it is not set.

@liyinan926
Copy link
Contributor

Thanks a lot @lbendig for contributing! Merging it now.

liyinan926 added a commit that referenced this pull request Mar 12, 2015
Fix Mapreduce startup (based on pull request #36)
@liyinan926 liyinan926 merged commit d04ccfc into apache:master Mar 12, 2015
@lbendig
Copy link
Contributor Author

lbendig commented Mar 12, 2015

@liyinan926 @sahilTakiar Great, thank you for reviewing the patches!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants