Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HiveRunner occasionally throws exceptions trying to get job status #3

Closed
patduin opened this issue Feb 25, 2015 · 2 comments
Closed

Comments

@patduin
Copy link
Collaborator

patduin commented Feb 25, 2015

We are using a fork of your hive13-hadoop-1 branch with the following dependency changes:
hive -> hive-0.13.0
hadoop -> hdp-2.4.0 jars
No code changes.

When running our hiveRunner tests we occasionally get an error of the form:

java.io.IOException: Could not find status of job:job_local738995590_0016
at org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:294)
at org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:547)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:426)
at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:136)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1508)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1275)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1093)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:916)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:906)
at org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:198)
at com.klarna.hiverunner.HiveServerContainer.executeScript(HiveServerContainer.java:104)
at com.klarna.hiverunner.builder.HiveShellBase.executeScriptsUnderTest(HiveShellBase.java:202)
at com.klarna.hiverunner.builder.HiveShellBase.start(HiveShellBase.java:97)

Looks like we are running into the error described in the comments of hive's HadoopJobExecHelper:
org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper:

        RunningJob newRj = jc.getJob(rj.getID());
        if (newRj == null) {
          // under exceptional load, hadoop may not be able to look up status
          // of finished jobs (because it has purged them from memory). From
          // hive's perspective - it's equivalent to the job having failed.
          // So raise a meaningful exception
          throw new IOException("Could not find status of job:" + rj.getID());

I think we might ran into an issue described in https://issues.apache.org/jira/browse/HIVE-4009.
So I've applied the patch in that ticket to the hive release-0.13.0 branch hoping that would fix things.
What the patch does is setting a boolean and skipping the block of code that causes our issue using:
org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper:

final boolean localMode = ShimLoader.getHadoopShims().isLocalMode(job);

Unfortunately I ran into an issue with HiveRunner it overrides the isLocalMode(job) in the class StandaloneHiveServerContext
com.klarna.hiverunner.StandaloneHiveServerContext:

  protected void configureJobTrackerMode(HiveConf conf) {
    /*
     * Overload shims to make sure that org.apache.hadoop.hive.ql.exec.MapRedTask#runningViaChild validates to false.
     * Search for usage of org.apache.hadoop.hive.shims.HadoopShims#isLocalMode to find other affects of this.
     */
    ReflectionUtils.setStaticField(ShimLoader.class, "hadoopShims", new Hadoop20SShims() {
      @Override
      public boolean isLocalMode(Configuration conf) {
        return false;
      }
    });
  }

Thus the pending fix in hive-1.1.0 (I know still far away from 0.13.0) won't work for HiveRunner.

Long story but my question is, did anyone run into similar problems and know of an easier way to fix this?

edit forgot to say we really like hiverunner! :)

PelleUllberg added a commit that referenced this issue Sep 10, 2015
* commit '2521143fb1fa65d5443729665756041a8af7da8a':
  GBL-20935 Added comment to README about timeout
PelleUllberg pushed a commit that referenced this issue Oct 2, 2015
@PelleUllberg
Copy link
Contributor

Sorry for the very late response. Once we stepped up on hive 14 we have experienced the same issue. By configuring the poll interval to 1000 millis has gotten rid of the problem for us.

    <plugin>
          <groupId>org.apache.maven.plugins</groupId>
          <artifactId>maven-surefire-plugin</artifactId>
          <version>2.17</version>
          <configuration>
              <systemProperties>
                  <hiveconf_hive.exec.counters.pull.interval>1000</hiveconf_hive.exec.counters.pull.interval>
              </systemProperties>
          </configuration>
      </plugin>

Please verify if this solves your problem

/Pelle

@patduin
Copy link
Collaborator Author

patduin commented Oct 2, 2015

Yeah I haven't seen it happening anymore, thanks

@patduin patduin closed this as completed Oct 2, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants