Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
Hive dependency required when not needed #165
We're on hadoop 2.2.0 and have a project with many hadoop jobs. As soon as I bring in the m2 build of the elastic search dependency, existing unrelated jobs blow up.
Exception in thread "main" java.lang.VerifyError: class org.apache.hadoop.yarn.proto.YarnProtos$URLProto overrides final method getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet;
This is b/c of a convergence issue. The optional Hive dep in elasticsearch-hadoop is trying to bring in protofub-java 2.4.1 whereas hadoop 2.2.0 brings in 2.5.0. We're not using hive at all, so I'd just assume exclude the unneeded dependency like so:
Yay, that unbreaks our existing jobs. However jobs that use ESOutputFormat now break.
8:31:20.739 [Thread-12] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local272885569_0001
What's happening in WritableValueWriter is it checks "instanceof ShortWritable" before "instanceof AbstractMapWritable". The class loader fails because the hive artifact has been excluded. Since you have your own HiveValueWriter and almost all the hive stuff is contained within org.elasticsearch.hadoop.hive, it looked a little out of place having a hive class loaded in WritableValueWriter.
Fixed in master. The hive dependency issue is unfortunate - it might be fixed if we move to dedicated artifacts (one for MR, one for Cascading, one for Hive, etc...)
This is now fixed in master (with proper fall backs to Hadoop1) - please try it out and let us know whether it works for you or not.
For what it's worth, in the upcoming M3 (and already in master) in addition to the 'big' single jar, we also ship one jar per module. You can already download