Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement GeoWaveInputFormat for mapreduce #84

Closed
rfecher opened this issue Oct 15, 2014 · 10 comments
Closed

Implement GeoWaveInputFormat for mapreduce #84

rfecher opened this issue Oct 15, 2014 · 10 comments
Assignees
Milestone

Comments

@rfecher
Copy link
Contributor

rfecher commented Oct 15, 2014

This should be an intuitive abstraction on top of GeoWave with the value being the decoded/deserialized entry value and the key being data ID and adapter ID. It should also make the best attempt at providing a re-usable pattern for de-duplication.

@rfecher rfecher added this to the Current milestone Oct 15, 2014
@rfecher rfecher self-assigned this Oct 15, 2014
rfecher added a commit that referenced this issue Nov 5, 2014
rfecher added a commit that referenced this issue Nov 5, 2014
rfecher added a commit that referenced this issue Nov 5, 2014
rfecher added a commit that referenced this issue Nov 5, 2014
rfecher added a commit that referenced this issue Nov 5, 2014
rfecher added a commit that referenced this issue Nov 6, 2014
rfecher added a commit that referenced this issue Nov 7, 2014
rfecher added a commit that referenced this issue Nov 7, 2014
rfecher added a commit that referenced this issue Nov 7, 2014
rfecher added a commit that referenced this issue Nov 10, 2014
rfecher added a commit that referenced this issue Nov 10, 2014
rfecher added a commit that referenced this issue Nov 10, 2014
@chrisbennight
Copy link
Contributor

This looks good to me, unless there was something else I missed? Closing, re-open if I'm wrong

@dlyle65535
Copy link
Contributor

Small issue trying to compile against Apache Hadoop: userClassesTakesPrecedence isn't a method in JobContext. May be easiest to simply remove it, but I can get it from Configuration. Do you want me to open a separate issue or re-open this one?

@chrisbennight
Copy link
Contributor

Which version of hadoop are you building against?

@dlyle65535
Copy link
Contributor

I've tried 2.6.0, 2.5.0 and 2.4.0. All fail with:

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.3.2:compile (default-compile) on project geowave-accumulo: Compilation failure: Compilation failure:
[ERROR] /Users/dml/projects/geowave/geowave-accumulo/target/munged/main/mil/nga/giat/geowave/accumulo/mapreduce/NativeReduceContext.java:[203,24] error: cannot find symbol
[ERROR]
[ERROR] KEYIN extends Object declared in class NativeReduceContext
[ERROR] VALUEIN extends Object declared in class NativeReduceContext
[ERROR] /Users/dml/projects/geowave/geowave-accumulo/target/munged/main/mil/nga/giat/geowave/accumulo/mapreduce/NativeReduceContext.java:[201,1] error: method does not override or implement a method from a supertype
[ERROR] /Users/dml/projects/geowave/geowave-accumulo/target/munged/main/mil/nga/giat/geowave/accumulo/mapreduce/NativeMapContext.java:[193,16] error: cannot find symbol
[ERROR]
[ERROR] KEYIN extends Object declared in class NativeMapContext
[ERROR] VALUEIN extends Object declared in class NativeMapContext
[ERROR] /Users/dml/projects/geowave/geowave-accumulo/target/munged/main/mil/nga/giat/geowave/accumulo/mapreduce/NativeMapContext.java:[191,1] error: method does not override or implement a method from a supertype
[ERROR] -> [Help 1]

Looks as if there was some churn in the Mapreduce V2 api. The NativeMap(Reduce)Context classes override the userClassesTakesPrecedence method by calling the method from the Map(Reduce)Context class. Those methods aren't in Apache Hadoop.

@rfecher
Copy link
Contributor Author

rfecher commented Feb 18, 2015

Thanks, that is confirmed based on this list of changes: http://doc.mapr.com/display/MapR/Recompiling+MapReduce+V1+Applications

JobContext removed userClassesTakesPrecedence. But our default compilation at this time is against hadoop 2.5.0 (cdh5.2.0)...maybe we should start using the vanilla Hadoop dependencies rather than the cloudera distribution?

@dlyle65535
Copy link
Contributor

I went ahead and submitted a pull request: #240, I did have a test failure building geowave-test, but I have the same test on master, so I'm thinking it could be my environment. Curious to hear what you think.

@chrisbennight
Copy link
Contributor

We definitely want it to work with native apache hadoop distros (will probably swap the default over shortly as part of a push to get on maven central) - looks like maybe cloudera backported that method.
Probably need to add those versions to the build matrix as well to catch issues like this in the future.

Looks like there is some wierdness with the pull requests, profiles, and the way we were setting the travis build matrix. I'll run it down after lunch if the latest build doesn't fix it.

@chrisbennight
Copy link
Contributor

Created a new ticket for this - #241

I also tested a quick tweak where everything seems to work. Has your re-implemented methods, and I just added the hadoop version to the build matrix. Didn't bother with profiles (profiles here didn't seem to be worthwile to me, since the artifacts uniquely identified the version, and we have multiple configurations (i.e. multiple versions of cloudera, accumulo, etc. )
https://github.com/ngageoint/geowave/tree/apache-hadoop

It's running right now, but looks like it will pass.

@dlyle65535
Copy link
Contributor

Sure you don't want them? I just finished a branch with everything working. :) I'll put a pull request on the new issue, but if you'd rather skip it, I understand. We will need to add an additional repo to the pom for hadoop-jetty for Hortonworks.

@chrisbennight
Copy link
Contributor

Go for it, I don't have a strong preference one way or another - just wanted to see if there were any other back-ported classes that cropped up as problematic. I canceled the travis stuff in progress to free up worker instances - we can accept the pull request as soon as it finished/passes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants