Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed when query hive table with custom InputFormat #4850

Closed
litao-buptsse opened this issue Mar 23, 2016 · 4 comments
Closed

Failed when query hive table with custom InputFormat #4850

litao-buptsse opened this issue Mar 23, 2016 · 4 comments

Comments

@litao-buptsse
Copy link

I have a hive table with a custom InputFormat. When I query the hive table using presto, It throws the following exception:

com.facebook.presto.spi.PrestoException: Error opening Hive split viewfs://nsX/user/hive/warehouse/default.db/web/uigs/web_uigs_web/201603/20160323/2016032305/web_uigs_web.location_pointer.2016032305 (offset=0, length=74) using com.sogou.datadir.plugin.SymlinkLzoTextInputFormat: org.apache.hadoop.mapred.FileSplit cannot be cast to com.sogou.datadir.plugin.SymlinkLzoTextInputFormat$SymlinkTextInputSplit
    at com.facebook.presto.hive.HiveUtil.createRecordReader(HiveUtil.java:165)
    at com.facebook.presto.hive.GenericHiveRecordCursorProvider.createHiveRecordCursor(GenericHiveRecordCursorProvider.java:47)
    at com.facebook.presto.hive.HivePageSourceProvider.getHiveRecordCursor(HivePageSourceProvider.java:129)
    at com.facebook.presto.hive.HivePageSourceProvider.createPageSource(HivePageSourceProvider.java:107)
    at com.facebook.presto.spi.connector.classloader.ClassLoaderSafeConnectorPageSourceProvider.createPageSource(ClassLoaderSafeConnectorPageSourceProvider.java:44)
    at com.facebook.presto.split.PageSourceManager.createPageSource(PageSourceManager.java:48)
    at com.facebook.presto.operator.TableScanOperator.createSourceIfNecessary(TableScanOperator.java:268)
    at com.facebook.presto.operator.TableScanOperator.isFinished(TableScanOperator.java:210)
    at com.facebook.presto.operator.Driver.processInternal(Driver.java:377)
    at com.facebook.presto.operator.Driver.processFor(Driver.java:303)
    at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:577)
    at com.facebook.presto.execution.TaskExecutor$PrioritizedSplitRunner.process(TaskExecutor.java:529)
    at com.facebook.presto.execution.TaskExecutor$Runner.run(TaskExecutor.java:665)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassCastException: org.apache.hadoop.mapred.FileSplit cannot be cast to com.sogou.datadir.plugin.SymlinkLzoTextInputFormat$SymlinkTextInputSplit
    at com.sogou.datadir.plugin.SymlinkLzoTextInputFormat.getRecordReader(SymlinkLzoTextInputFormat.java:110)
    at com.facebook.presto.hive.HiveUtil.lambda$createRecordReader$2(HiveUtil.java:162)
    at com.facebook.presto.hive.HiveUtil$$Lambda$575/173808831.call(Unknown Source)
    at com.facebook.presto.hive.RetryDriver.run(RetryDriver.java:136)
    at com.facebook.presto.hive.HiveUtil.createRecordReader(HiveUtil.java:162)
    ... 15 more

Here is my SymlinkLzoTextInputFormat:

public class SymlinkLzoTextInputFormat extends SymbolicInputFormat implements
    InputFormat<LongWritable, Text>, JobConfigurable,
    ContentSummaryInputFormat, ReworkMapredInputFormat {

  @Override
  public RecordReader<LongWritable, Text> getRecordReader(InputSplit split,
      JobConf job, Reporter reporter) throws IOException {
    InputSplit fileSplit = ((SymlinkTextInputSplit)split).getTargetSplit();
    TextInputFormat inputFormat = new DeprecatedLzoTextInputFormat();
    inputFormat.configure(job);
    RecordReader innerReader = null;
    try {
      innerReader = inputFormat.getRecordReader(fileSplit, job,reporter);
    } catch (Exception e) {
      innerReader = HiveIOExceptionHandlerUtil
          .handleRecordReaderCreationException(e, job);
    }
    return innerReader;
  }

@Override
  public InputSplit[] getSplits(JobConf job, int numSplits)
      throws IOException {  
    Path[] symlinksDirs = FileInputFormat.getInputPaths(job);
    if (symlinksDirs.length == 0) {
      throw new IOException("No input paths specified in job.");
    }
    List<Path> targetPaths = new ArrayList<Path>();
    List<Path> symlinkPaths = new ArrayList<Path>();
    ......
    for(InputSplit is : iss)
        result.add(new SymlinkTextInputSplit(symlinkPath, (FileSplit)is));
    }
  return result.toArray(new InputSplit[result.size()]);
}

I found that com.facebook.presto.hive.HiveUtil.createRecordReader() will new a FileSplit() object, instead of get InputSplit from SymlinkLzoTextInputFormat.getSplits() which will return a SymlinkTextInputSplit array. So it cause Exception org.apache.hadoop.mapred.FileSplit cannot be cast to com.sogou.datadir.plugin.SymlinkLzoTextInputFormat$SymlinkTextInputSplit.

public static RecordReader<?, ?> createRecordReader(Configuration configuration, Path path, long start, long length, Properties schema, List<HiveColumnHandle> columns)
    {
        // determine which hive columns we will read
        List<HiveColumnHandle> readColumns = ImmutableList.copyOf(filter(columns, not(HiveColumnHandle::isPartitionKey)));
        List<Integer> readHiveColumnIndexes = ImmutableList.copyOf(transform(readColumns, HiveColumnHandle::getHiveColumnIndex));

        // Tell hive the columns we would like to read, this lets hive optimize reading column oriented files
        setReadColumns(configuration, readHiveColumnIndexes);

        InputFormat<?, ?> inputFormat = getInputFormat(configuration, schema, true);
        JobConf jobConf = new JobConf(configuration);
        FileSplit fileSplit = new FileSplit(path, start, length, (String[]) null);

        // propagate serialization configuration to getRecordReader
        schema.stringPropertyNames().stream()
                .filter(name -> name.startsWith("serialization."))
                .forEach(name -> jobConf.set(name, schema.getProperty(name)));

        try {
            return retry()
                    .stopOnIllegalExceptions()
                    .run("createRecordReader", () -> inputFormat.getRecordReader(fileSplit, jobConf, Reporter.NULL));
        }
        catch (Exception e) {
            throw new PrestoException(HIVE_CANNOT_OPEN_SPLIT, format("Error opening Hive split %s (offset=%s, length=%s) using %s: %s",
                    path,
                    start,
                    length,
                    getInputFormatName(schema),
                    e.getMessage()),
                    e);
        }
    }

How about get InputSplit from inputFormat object's getInputSplit() method? I think it may be help me fix this problem.

@litao-buptsse
Copy link
Author

any response?

@cawallin
Copy link
Member

Fixed by #7002.

@arnaboss
Copy link

arnaboss commented Mar 19, 2019

Hi I am still getting this error as below

java.lang.ClassCastException: org.apache.hadoop.mapred.FileSplit cannot be cast to amazon.conexio.hive.EDXManifestHiveInputFormat$EDXManifestHiveSplit
	at amazon.conexio.hive.EDXManifestHiveInputFormat.getRecordReader(EDXManifestHiveInputFormat.java:79)
	at com.facebook.presto.hive.HiveUtil.createRecordReader(HiveUtil.java:220)
	at com.facebook.presto.hive.GenericHiveRecordCursorProvider.lambda$createRecordCursor$0(GenericHiveRecordCursorProvider.java:72)
	at com.facebook.presto.hive.authentication.UserGroupInformationUtils.lambda$executeActionInDoAs$0(UserGroupInformationUtils.java:29)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:360)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1824)
	at com.facebook.presto.hive.authentication.UserGroupInformationUtils.executeActionInDoAs(UserGroupInformationUtils.java:27)
	at com.facebook.presto.hive.authentication.ImpersonatingHdfsAuthentication.doAs(ImpersonatingHdfsAuthentication.java:39)
	at com.facebook.presto.hive.HdfsEnvironment.doAs(HdfsEnvironment.java:80)
	at com.facebook.presto.hive.GenericHiveRecordCursorProvider.createRecordCursor(GenericHiveRecordCursorProvider.java:71)
	at com.facebook.presto.hive.HivePageSourceProvider.createHivePageSource(HivePageSourceProvider.java:187)
	at com.facebook.presto.hive.HivePageSourceProvider.createPageSource(HivePageSourceProvider.java:95)
	at com.facebook.presto.spi.connector.classloader.ClassLoaderSafeConnectorPageSourceProvider.createPageSource(ClassLoaderSafeConnectorPageSourceProvider.java:44)
	at com.facebook.presto.split.PageSourceManager.createPageSource(PageSourceManager.java:56)
	at com.facebook.presto.operator.TableScanOperator.getOutput(TableScanOperator.java:239)
	at com.facebook.presto.operator.Driver.processInternal(Driver.java:379)
	at com.facebook.presto.operator.Driver.lambda$processFor$8(Driver.java:283)
	at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:675)
	at com.facebook.presto.operator.Driver.processFor(Driver.java:276)
	at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1065)
	at com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:162)
	at com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:483)
	at com.facebook.presto.$gen.Presto_0_215____20190318_190434_1.run(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

This is the Input format getRecorder

    public RecordReader<BytesWritable, BytesWritable> getRecordReader(final InputSplit split, final JobConf job, final Reporter reporter) throws IOException {
        LOG.info("Getting record reader");

        EDXManifestHiveSplit hiveSplit = (EDXManifestHiveSplit) split;

        JobConf confWithTableProperties = HiveJobConfUtil.copyTablePropertiesToJobConf(job);

        return myFormat.getRecordReader(hiveSplit.mySplit, confWithTableProperties, reporter);
    }

and the custom Split as

    public static final class EDXManifestHiveSplit extends FileSplit {
        private EDXManifestSplit mySplit;

        public EDXManifestHiveSplit() {
            this(null, null);
        }

        public EDXManifestHiveSplit(final EDXManifestSplit split, final Path tablePath) {
            super(tablePath, 0, 0, (String[]) null);
            mySplit = split;
        }

        /** {@inheritDoc} */
        @Override
        public void write(final DataOutput out) throws IOException {
            super.write(out);
            mySplit.write(out);
        }

        /** {@inheritDoc} */
        @Override
        public void readFields(final DataInput in) throws IOException {
            super.readFields(in);
            mySplit = new EDXManifestSplit();
            mySplit.readFields(in);
        }

        @Override
        public int hashCode() {
            return new HashCodeBuilder(17, 31). // two randomly chosen prime numbers
                    // if deriving: appendSuper(super.hashCode()).
                            append(mySplit).
                            toHashCode();
        }

        @Override
        public boolean equals(Object obj) {
            if (!(obj instanceof EDXManifestHiveSplit))
                return false;
            if (obj == this)
                return true;

            EDXManifestHiveSplit rhs = (EDXManifestHiveSplit) obj;
            return new EqualsBuilder().
                    // if deriving: appendSuper(super.equals(obj)).
                            append(mySplit, rhs.mySplit).
                            isEquals();
        }

        @Override
        public String toString() {
            return "EDXManifestHiveSplit{" +
                    "mySplit=" + mySplit +
                    '}';
        }
    }

Any idea, how do i solve this? @litao-buptsse @cawallin

@arnaboss
Copy link

@litao-buptsse How did you solve the problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants