Skip to content
This repository has been archived by the owner on Feb 1, 2021. It is now read-only.

Add hive-versions argument to hive-script command #36

Closed
wants to merge 1 commit into from

Conversation

mtibben
Copy link

@mtibben mtibben commented Sep 25, 2012

This solves the issue reported here https://forums.aws.amazon.com/thread.jspa?messageID=350496

The issue is that the hive-script tries to run the wrong version of hive

In log/hadoop/steps/2/stderr I get:

sh: /home/hadoop/.versions/hive-0.7.1/bin/hive: No such file or directory
Command exiting with ret '255'

Adding in these arguments works around the issue

's3://elasticmapreduce/libs/hive/',
'--install-hive',
'--hive-versions',
'latest'
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just fixed indenting here

@rslifka
Copy link
Owner

rslifka commented Sep 25, 2012

Hi Michael,

Thanks for submitting the issue and pull!

Those arguments are present in .aws_installation_step later in the file, a separate step that gets sent up as part of executing a job flow. I've tested this manually and captured it in my integration tests as well so I'm a bit surprised its not working for you. That error message was before the recent 2.x fix when EMR changed that script.

Can you let me know how you're using Elasticity? And if you're on the latest point release?

Rob

@mtibben
Copy link
Author

mtibben commented Sep 25, 2012

Hey Rob,

I'm using the latest Elasticity 2.4.0

I'm running snowplow ETL job, the code I'm using can be found here https://github.com/snowplow/snowplow/blob/feature/emr-etl-runner/3-etl/emr-etl-runner/lib/emr_jobs.rb

I set jobflow.hadoop_version = '1.0.3'

The installation step (Step 1) seems to complete successfully and installs hive to /home/hadoop/.versions/hive-0.8.1

Step1 controller:

2012-09-25T06:41:04.862Z INFO Fetching jar file.
2012-09-25T06:41:08.560Z INFO Working dir /mnt/var/lib/hadoop/steps/1
2012-09-25T06:41:08.560Z INFO Executing /usr/lib/jvm/java-6-sun/bin/java -cp /home/hadoop/conf:/usr/lib/jvm/java-6-sun/lib/tools.jar:/home/hadoop:/home/hadoop/hadoop-core-1.0.3.jar:/home/hadoop/hadoop-tools.jar:/home/hadoop/hadoop-tools-1.0.3.jar:/home/hadoop/hadoop-core.jar:/home/hadoop/lib/*:/home/hadoop/lib/jetty-ext/* -Xmx1000m -Dhadoop.log.dir=/mnt/var/log/hadoop/steps/1 -Dhadoop.log.file=syslog -Dhadoop.home.dir=/home/hadoop -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,DRFA -Djava.io.tmpdir=/mnt/var/lib/hadoop/steps/1/tmp -Djava.library.path=/home/hadoop/native/Linux-i386-32 org.apache.hadoop.util.RunJar /mnt/var/lib/hadoop/steps/1/script-runner.jar s3://elasticmapreduce/libs/hive/hive-script --base-path s3://elasticmapreduce/libs/hive/ --install-hive --hive-versions latest

But then Step 2 fails

controller:

2012-09-25T06:42:10.656Z INFO Executing /usr/lib/jvm/java-6-sun/bin/java -cp /home/hadoop/conf:/usr/lib/jvm/java-6-sun/lib/tools.jar:/home/hadoop:/home/hadoop/hadoop-core-1.0.3.jar:/home/hadoop/hadoop-tools.jar:/home/hadoop/hadoop-tools-1.0.3.jar:/home/hadoop/hadoop-core.jar:/home/hadoop/lib/*:/home/hadoop/lib/jetty-ext/* -Xmx1000m -Dhadoop.log.dir=/mnt/var/log/hadoop/steps/2 -Dhadoop.log.file=syslog -Dhadoop.home.dir=/home/hadoop -Dhadoop.id.str=hadoop -Dhadoop.root.logger=INFO,DRFA -Djava.io.tmpdir=/mnt/var/lib/hadoop/steps/2/tmp -Djava.library.path=/home/hadoop/native/Linux-i386-32 org.apache.hadoop.util.RunJar /mnt/var/lib/hadoop/steps/2/script-runner.jar s3://elasticmapreduce/libs/hive/hive-script --run-hive-script --args -f s3n://mybucket/snowplow/assets/daily-etl.q -d CLOUDFRONT_LOGS=s3n://mybucket/snowplow/in/ -d DATA_DATE=2012-09-24 -d EVENTS_TABLE=s3n://mybucket/snowplow/out/ -d SERDE_FILE=s3n://mybucket/snowplow/serde/snowplow-log-deserializers-0.4.8.jar

stdout:

Downloading 's3://elasticmapreduce/libs/hive/hive-script' to '/mnt/var/lib/hadoop/steps/2/.'
2012-09-25 06:42:15 GMT - ERROR missing required argument base-path
2012-09-25 06:42:15 GMT - INFO Running: /home/hadoop/.versions/hive-0.7.1/bin/hive '-f' 's3n://mybucket/snowplow/assets/daily-etl.q' '-d' 'CLOUDFRONT_LOGS=s3n://mybucket/snowplow/in/' '-d' 'DATA_DATE=2012-09-24' '-d' 'EVENTS_TABLE=s3n://mybucket/snowplow/out/' '-d' 'SERDE_FILE=s3n://mybucket/snowplow/serde/snowplow-log-deserializers-0.4.8.jar'
2012-09-25 06:42:15 GMT - ERROR Error executing cmd: /home/hadoop/.versions/hive-0.7.1/bin/hive '-f' 's3n://mybucket/snowplow/assets/daily-etl.q' '-d' 'CLOUDFRONT_LOGS=s3n://mybucket/snowplow/in/' '-d' 'DATA_DATE=2012-09-24' '-d' 'EVENTS_TABLE=s3n://mybucket/snowplow/out/' '-d' 'SERDE_FILE=s3n://mybucket/snowplow/serde/snowplow-log-deserializers-0.4.8.jar'

stderr:

sh: /home/hadoop/.versions/hive-0.7.1/bin/hive: No such file or directory
Command exiting with ret '255'

What do you think is going on? Am I missing something ?

Michael

@ghost ghost assigned rslifka Sep 28, 2012
@rslifka
Copy link
Owner

rslifka commented Sep 28, 2012

I think it's exactly as that poster specified - you're having to be specific because hive-script isn't working as it should (i.e. when there is only one version of Hive installed, use that). I'm toying around a bit with this at the moment. Will have something for you soon.

@rslifka
Copy link
Owner

rslifka commented Sep 29, 2012

Oh Amazon, you salty dog. hive-script gives an error when base-path isn't specified, even when it's not required (only during installation).

2012-09-28 23:43:09 GMT - ERROR missing required argument base-path

Overly specifying this as of cd5cedc :)

@mtibben
Copy link
Author

mtibben commented Sep 29, 2012

aha! nice find :)

@rslifka
Copy link
Owner

rslifka commented Sep 29, 2012

Or perhaps it is working as it should? This method hasn't been updated for the new(er) versions of hadoop:

def determine_hive_version
    # No hive_versions given = old Ruby client (pre-versioning).
    # Install default version - for backwards compatibility.
    if ! @hive_versions then
      if @hadoop_version == "0.18" then
        @hive_versions = "0.4"
      elsif @hadoop_version == "0.20" then
        @hive_versions = "0.5"
      else
        @hive_versions = "0.7.1.3"
      end
    else
      if @hive_versions == "latest" then
        # latest = new Ruby client.
        @hive_versions = LATEST_HIVE_VERSIONS[@hadoop_version]
      end
    end
  end

You're correct my man, will add in your patch and push a release out tonite.

@rslifka
Copy link
Owner

rslifka commented Sep 29, 2012

Fixed as of bc668b3

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants