Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MapJoin failed, Configuration and input path are inconsistent #169

Closed
shenguoquan opened this Issue Mar 19, 2014 · 16 comments

Comments

Projects
None yet
2 participants
@shenguoquan
Copy link

shenguoquan commented Mar 19, 2014

 Recently I come across a strange problem. I want to use the elasticsearch-1.0.0 as a backend storage for hive. I use the elasticsearch-hadoop-1.3.0.M2 to create hive tables on elasticsearch. The hive sql are as followings:

create external table supplier_es (S_SUPPKEY BIGINT, S_NAME STRING, S_ADDRESS STRING, S_NATIONKEY BIGINT, S_PHONE STRING, S_ACCTBAL DOUBLE, S_COMMENT STRING) stored by 'org.elasticsearch.hadoop.hive.EsStorageHandler' TBLPROPERTIES('es.resource'='q9/supplier','es.index.auto.create'='true','es.nodes' = 'localhost:9200');

create external table nation_es (N_NATIONKEY BIGINT, N_NAME STRING, N_REGIONKEY BIGINT, N_COMMENT STRING) stored by 'org.elasticsearch.hadoop.hive.EsStorageHandler' TBLPROPERTIES('es.resource'='q9/nation','es.index.auto.create'='true','es.nodes' = 'localhost:9200');

The table join operation is as followings:

select s_suppkey, n_name from supplier_es s join nation_es n on n.n_nationkey = s.s_nationkey;

the error messages( I get from the log file):
2014-03-19 15:16:39,447 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: map.input.file is deprecated. Instead, use mapreduce.map.input.file
2014-03-19 15:16:39,448 INFO [main] org.apache.hadoop.hive.ql.exec.MapOperator: fpath:hdfs://server-220:8020/user/hive/warehouse/nation_es
2014-03-19 15:16:39,462 INFO [main] org.apache.hadoop.hive.ql.exec.MapOperator: getPathToAliases
2014-03-19 15:16:39,463 INFO [main] org.apache.hadoop.hive.ql.exec.MapOperator: Adding alias s to work list for file hdfs://server-220:8020/user/hive/warehouse/supplier_es
2014-03-19 15:16:39,465 ERROR [main] org.apache.hadoop.hive.ql.exec.MapOperator: Configuration does not have any alias for path: hdfs://server-220:8020/user/hive/warehouse/nation_es
2014-03-19 15:16:39,480 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.MapTask.runOldMapper_aroundBody2(MapTask.java:434)
at org.apache.hadoop.mapred.MapTask$AjcClosure3.run(MapTask.java:1)
at org.aspectj.runtime.reflect.JoinPointImpl.proceed(JoinPointImpl.java:149)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
... 19 more
Caused by: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
... 24 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
... 27 more
Caused by: java.lang.RuntimeException: Map operator initialization failed
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:142)
... 32 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: Configuration and input path are inconsistent
at org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:419)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:110)
... 32 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Configuration and input path are inconsistent
at org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:413)
... 33 more
I have try to figure out the problem, but I can't find out the reason. I ask anyone for help. Thanks very much.

@costin

This comment has been minimized.

Copy link
Member

costin commented Mar 19, 2014

Adding my reply from the mailing list
Hi, The issue might be caused by the fact that M2 doesn't support different input and output indices for the same job; that is to use ES both as input and output within the same job (which is essentially what you are doing with the select). This has been fixed in master - can you try the latest nightly build or potentially build master yourself?

@costin costin added bug labels Mar 19, 2014

@shenguoquan

This comment has been minimized.

Copy link
Author

shenguoquan commented Mar 19, 2014

hi Costin
Thank you very much for your reply. I will try the version of
elasticsearch-hadoop you said. By the way, you said my problem is using
elasticsearch as input and output, but I just use it as input storage not
output. please correct me if I am wrong.
ÔÚ 2014-3-19 PM8:50£¬"Costin Leau" notifications@github.comдµÀ£º

Adding my reply from the mailing list

Hi,
The issue might be caused by the fact that M2 doesn't support different
input and output indices for the same job; that is to use ES both as input
and output within the same job (which is essentially what you are doing
with the select).
This has been fixed in master - can you try the latest nightly build or
potentially build master yourself?

¡ª
Reply to this email directly or view it on GitHubhttps://github.com//issues/169#issuecomment-38046460
.

@costin

This comment has been minimized.

Copy link
Member

costin commented Mar 19, 2014

You are not directly but to perform the join, Hive might create some jobs that look at both tables. I'm not certain that's the case however using the latest master should give the answer.
By the way. what distro and version of Hadoop and Hive are you using? I'm assuming the Intel one but I'm interested in the versions of all the aforementioned components.

Thanks,

@shenguoquan

This comment has been minimized.

Copy link
Author

shenguoquan commented Mar 20, 2014

I use the hadoop verison 2.2.0 and hive version 0.12.0. By the way I have a try with the lastest master version . The problem has happened again.

@costin

This comment has been minimized.

Copy link
Member

costin commented Mar 21, 2014

Hey,

I've wasted several hours today trying to reproduce this but the VMs I had gave me just grief. I'll try it again over the weekend.

costin added a commit that referenced this issue Mar 21, 2014

Improve initialization of Hive In/OutputFormat
Lazy initialize settings in a Hive environment
Separate table properties per input/output to prevent clashing
Save input properties (as Hive doesn't pass them in)
relates to #169
@costin

This comment has been minimized.

Copy link
Member

costin commented Mar 21, 2014

@guoquans I think I've fixed the issue in master - can you please give the master a try and let me know whether it works for you?
Thanks!

costin added a commit that referenced this issue Mar 21, 2014

Always copy table properties to job properties
Rather than saving the table properties into our own properties, use
the job properties which seem to be per table. The old logic is still
in place just in case.
Relates to #169
@shenguoquan

This comment has been minimized.

Copy link
Author

shenguoquan commented Mar 23, 2014

@costin,I have fixed the issue. Can I contribute my code to you.
ÔÚ 2014-3-22 AM5:20£¬"Costin Leau" notifications@github.comдµÀ£º

@guoquans https://github.com/guoquans I think I've fixed the issue in
master - can you please give the master a try and let me know whether it
works for you?
Thanks!

¡ª
Reply to this email directly or view it on GitHubhttps://github.com//issues/169#issuecomment-38325473
.

@costin

This comment has been minimized.

Copy link
Member

costin commented Mar 23, 2014

@guoquans Sure - see the contributing link - most important piece is signing the CLA.

Does that mean the fix I pushed in master yesterday did not fix your issue?

@shenguoquan

This comment has been minimized.

Copy link
Author

shenguoquan commented Mar 24, 2014

@costin I'm so sorry for late to repsonse your commit. yeah, I think we both kown the problem about the reason why running the mapJoin failed. The origin code will failed when running two tables join operation. Because they are also stored into elasticsearch. I debug the code and find the problem is configuration mixed. Take the es.resource.read setting for example, The one table EsStorageHandler method use the job configuration to set the parameter es.resource.read='xxx', but another table also use EsStoragehandler method use job configuration to set parameter es.resource.read='xxx'. Because the job configuration is global variable, The later setting will overwritten the before one. I think @costin you fix the problem. But I think if you add the HiveValueWriter,HiveBytesConverter and HiveValueReader setting is much better. I fixed the issue with my idea and test through the complicate case. I'm so appreciate if you can look at my code and test case. Thank you very much.

@costin

This comment has been minimized.

Copy link
Member

costin commented Mar 24, 2014

@guoquans I'm not sure I understand what you are saying:

But I think if you add the HiveValueWriter,HiveBytesConverter and HiveValueReader setting is much better.

Not sure what you mean by this? The settings are currently added - is that a problem or not?

I fixed the issue with my idea and test through the complicate case. I'm so appreciate if you can look at my code and test case.

For various reasons, the best way to move forward is to look at the (contributing)[https://github.com/elasticsearch/elasticsearch-hadoop/blob/master/CONTRIBUTING.md] - which in short means that after you sign the CLA, you can post the code either as a gist or pull request, I can look at it and see where we take it from there - potentially change my fix, integrate some of your,etc...

These being said, have you tried the fix in master? Can you confirm whether it works or not?

Thanks!

@shenguoquan

This comment has been minimized.

Copy link
Author

shenguoquan commented Mar 24, 2014

hi @costin. I'm so sorry that my english is poor. So sometime I can't express precisely what I think.
--> But I think if you add the HiveValueWriter,HiveBytesConverter and HiveValueReader setting is much better.
This clause means that I guess you forget to set the settings about valueReader and valueWriter for EsStorageHandler. Because I can't see this code from master branch.

@costin

This comment has been minimized.

Copy link
Member

costin commented Mar 24, 2014

No need to be sorry :)
The Writer/Converter/Reader are now set in a lazy manner - to postpone clashes - inside the Serialized - see the full commit.

Hence me asking whether you managed to try the master or not against your example; that's the ultimate test that everything runs as expected.

@costin

This comment has been minimized.

Copy link
Member

costin commented Mar 24, 2014

@guoquans Hi - just noticed your pull request ( #173 ). For some reason Github didn't notify me and I didn't check for it until some minutes ago - sorry for the confusion.
It looks like you went for a similar approach to mine - that is using jobProperties to propagate the settings as oppose to using the configuration directly.

Cheers,

@shenguoquan

This comment has been minimized.

Copy link
Author

shenguoquan commented Mar 24, 2014

@costin Hi, That's right. Using the job properties not job configuration can fixed the issue.I have already tested it.

@shenguoquan

This comment has been minimized.

Copy link
Author

shenguoquan commented Mar 24, 2014

@costin thank you very much for your patient answer.

@costin

This comment has been minimized.

Copy link
Member

costin commented Mar 24, 2014

thank you for reporting and testing out the fix! I'll close the issue since it seems to be resolved.

Cheers!

@costin costin closed this Mar 24, 2014

costin added a commit that referenced this issue Apr 8, 2014

Improve initialization of Hive In/OutputFormat
Lazy initialize settings in a Hive environment
Separate table properties per input/output to prevent clashing
Save input properties (as Hive doesn't pass them in)
relates to #169

costin added a commit that referenced this issue Apr 8, 2014

Always copy table properties to job properties
Rather than saving the table properties into our own properties, use
the job properties which seem to be per table. The old logic is still
in place just in case.
Relates to #169
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.