HiveRunner occasionally throws exceptions trying to get job status #3

patduin · 2015-02-25T09:14:50Z

We are using a fork of your hive13-hadoop-1 branch with the following dependency changes:
hive -> hive-0.13.0
hadoop -> hdp-2.4.0 jars
No code changes.

When running our hiveRunner tests we occasionally get an error of the form:

java.io.IOException: Could not find status of job:job_local738995590_0016
at org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:294)
at org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:547)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:426)
at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:136)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1508)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1275)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1093)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:916)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:906)
at org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:198)
at com.klarna.hiverunner.HiveServerContainer.executeScript(HiveServerContainer.java:104)
at com.klarna.hiverunner.builder.HiveShellBase.executeScriptsUnderTest(HiveShellBase.java:202)
at com.klarna.hiverunner.builder.HiveShellBase.start(HiveShellBase.java:97)

Looks like we are running into the error described in the comments of hive's HadoopJobExecHelper:
org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper:

        RunningJob newRj = jc.getJob(rj.getID());
        if (newRj == null) {
          // under exceptional load, hadoop may not be able to look up status
          // of finished jobs (because it has purged them from memory). From
          // hive's perspective - it's equivalent to the job having failed.
          // So raise a meaningful exception
          throw new IOException("Could not find status of job:" + rj.getID());

I think we might ran into an issue described in https://issues.apache.org/jira/browse/HIVE-4009.
So I've applied the patch in that ticket to the hive release-0.13.0 branch hoping that would fix things.
What the patch does is setting a boolean and skipping the block of code that causes our issue using:
org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper:

final boolean localMode = ShimLoader.getHadoopShims().isLocalMode(job);

Unfortunately I ran into an issue with HiveRunner it overrides the isLocalMode(job) in the class StandaloneHiveServerContext
com.klarna.hiverunner.StandaloneHiveServerContext:

  protected void configureJobTrackerMode(HiveConf conf) {
    /*
     * Overload shims to make sure that org.apache.hadoop.hive.ql.exec.MapRedTask#runningViaChild validates to false.
     * Search for usage of org.apache.hadoop.hive.shims.HadoopShims#isLocalMode to find other affects of this.
     */
    ReflectionUtils.setStaticField(ShimLoader.class, "hadoopShims", new Hadoop20SShims() {
      @Override
      public boolean isLocalMode(Configuration conf) {
        return false;
      }
    });
  }

Thus the pending fix in hive-1.1.0 (I know still far away from 0.13.0) won't work for HiveRunner.

Long story but my question is, did anyone run into similar problems and know of an easier way to fix this?

edit forgot to say we really like hiverunner! :)

The text was updated successfully, but these errors were encountered:

* commit '2521143fb1fa65d5443729665756041a8af7da8a': GBL-20935 Added comment to README about timeout

…OException mentioned here: #3

PelleUllberg · 2015-10-02T14:53:52Z

Sorry for the very late response. Once we stepped up on hive 14 we have experienced the same issue. By configuring the poll interval to 1000 millis has gotten rid of the problem for us.

    <plugin>
          <groupId>org.apache.maven.plugins</groupId>
          <artifactId>maven-surefire-plugin</artifactId>
          <version>2.17</version>
          <configuration>
              <systemProperties>
                  <hiveconf_hive.exec.counters.pull.interval>1000</hiveconf_hive.exec.counters.pull.interval>
              </systemProperties>
          </configuration>
      </plugin>

Please verify if this solves your problem

/Pelle

patduin · 2015-10-02T15:02:31Z

Yeah I haven't seen it happening anymore, thanks

PelleUllberg added a commit that referenced this issue Sep 10, 2015

Merge pull request #3 in ODIN/hive-runner from timeout to master

d455ca4

* commit '2521143fb1fa65d5443729665756041a8af7da8a': GBL-20935 Added comment to README about timeout

PelleUllberg pushed a commit that referenced this issue Oct 2, 2015

Set pollinterval to 1000ms to hopefully get around the intermittent I…

dffa0e5

…OException mentioned here: #3

patduin closed this as completed Oct 2, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HiveRunner occasionally throws exceptions trying to get job status #3

HiveRunner occasionally throws exceptions trying to get job status #3

patduin commented Feb 25, 2015

PelleUllberg commented Oct 2, 2015

patduin commented Oct 2, 2015

HiveRunner occasionally throws exceptions trying to get job status #3

HiveRunner occasionally throws exceptions trying to get job status #3

Comments

patduin commented Feb 25, 2015

PelleUllberg commented Oct 2, 2015

patduin commented Oct 2, 2015