Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue while joining two tables stored on Elasticsearch using HiveQL #180

Closed
skhuntia opened this Issue Mar 28, 2014 · 13 comments

Comments

Projects
None yet
2 participants
@skhuntia
Copy link

commented Mar 28, 2014

I tried to join two tables but it doesnt work.
The query didnt fail as such, but in result I got "no data available".

I am using "elasticsearch-hadoop-1.3.0.M2.jar" and "elasticsearch-1.0.0".

This is how I created two hive external tables

CREATE EXTERNAL TABLE activity(description string,username string)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = '/test/activity/',
'es.host' = '10.309.500.163',
'es.port' = '9200');

CREATE EXTERNAL TABLE user(username string,ethnicity string)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = '/test/user/',
'es.host' = '10.309.500.163',
'es.port' = '9200');

and the query that I ran is:

SELECT activity.,user. FROM activity JOIN user ON (activity.username = user.username)

The result was "no data available"
That means somehow the join condition failed.
But The "username" column/field in both tables has a lot of common records.

I dont understand why I am not getting the desired result. Am I doing anything wrong?

Thanks

@costin

This comment has been minimized.

Copy link
Member

commented Mar 28, 2014

This has been fixed in master - please use that instead.

@costin costin closed this Mar 28, 2014

@costin costin added doc and removed doc labels Mar 30, 2014

@skhuntia

This comment has been minimized.

Copy link
Author

commented Apr 1, 2014

Thanks for the reply,
Like you said I used the master.
But still it doesn't work for me.
When I try to join the tables this is the error that I got.

Job Submission failed with exception 'org.elasticsearch.hadoop.EsHadoopIllegalArgumentException(Field(s) [[block__offset__inside__file, input__file__name]] not found in the Elasticsearch mapping specified; did you mean []?)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask

I have tried it with both ES1.0.0 and ES1.0.0RC2

@costin

This comment has been minimized.

Copy link
Member

commented Apr 1, 2014

This is a regression available in master which I'm investigating.
Can you please open up a separate issue next time?

Thanks!

@costin

This comment has been minimized.

Copy link
Member

commented Apr 1, 2014

@skhuntia scratch the 'open up a new issue' - you already did that, I confused the issue.

@costin costin reopened this Apr 1, 2014

@costin costin removed the duplicate label Apr 1, 2014

@skhuntia

This comment has been minimized.

Copy link
Author

commented Apr 2, 2014

Thanks for the info.
I will be waiting for this issue to be fixed.

@costin costin closed this in 9db0317 Apr 3, 2014

@skhuntia

This comment has been minimized.

Copy link
Author

commented Apr 7, 2014

I tried it again, but this time I got the below exception.

Exception in thread "main" java.lang.NoSuchFieldError: VIRTUAL_COLUMNS
at org.elasticsearch.hadoop.hive.HiveUtils.columnToAlias(HiveUtils.java:80)
at org.elasticsearch.hadoop.hive.EsHiveInputFormat.getSplits(EsHiveInputFormat.java:106)
at org.elasticsearch.hadoop.hive.EsHiveInputFormat.getSplits(EsHiveInputFormat.java:51)
at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:388)
at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:516)
at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:496)
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:137)
at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1474)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:270)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)

@costin

This comment has been minimized.

Copy link
Member

commented Apr 7, 2014

What Hive version are you using?

@skhuntia

This comment has been minimized.

Copy link
Author

commented Apr 7, 2014

I am using hive- 0.10.0
cloudera dist 4.4.0

@costin

This comment has been minimized.

Copy link
Member

commented Apr 7, 2014

I've pushed a fix for it and triggered a nightly build - the new jars should be ready in 10'. You can build the sources in the meantime if you want.
I'll ping you once the build completes so you can try it out.

Cheers,

@skhuntia

This comment has been minimized.

Copy link
Author

commented Apr 7, 2014

Thanks a lot Costin.

@costin

This comment has been minimized.

Copy link
Member

commented Apr 7, 2014

Can you try now to the latest snapshot - 20140407.114803-90? Let me know how it goes.

@skhuntia

This comment has been minimized.

Copy link
Author

commented Apr 7, 2014

This works!!!
I am able to join two tables and all other operations.

Thanks

@costin

This comment has been minimized.

Copy link
Member

commented Apr 7, 2014

Great - thanks for confirming!

costin added a commit that referenced this issue Apr 8, 2014

Improve validation of fields and change default to warn
Filter out Hive 'virtual' columns
Filter out Elasticsearch built-in types
Change validation default to warn to cope with cases where the schema
is incomplete
Fix #180
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.