-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple changes running 4mz in Spark 2.2 #27
Conversation
Modern Hadoop does not require core-site.xml configurations for codecs. This allows the codec to work in Spark by adding the jar to the classpath.
I found that these changes were required to get 4mz working with newAPIHadoopFile. Here is an example spark shell reader.
|
Why hasn't this been merged yet? Specifically, commit f6a57e3 has a really basic fix necessary for ZSTD to function properly. I would also add that FourMcTextInputFormat also needs to add the LongWritable and Text generic fields like FourMzTextInputFormat in your version. |
I can volunteer as a maintainer. I can also make an official repo if you want to avoid notifications. |
I'd like to merge the pull requests of the first part. |
P.S.: I can you your help to rebuild the lib on all platforms. |
I should have pushed the external index code on a branch. I was doing an experiment on timestamp indexing the data in a 4mz. Let me fix the pull request. |
I have removed the incorrect index code commit from this pull request. |
For platform building I will open a separate pull request for a Travis CI integration file. That can build Linux and OS X. I do not have Windows build machines. |
Yes that'd be perfect, even if Linux is not an issue. |
Modern Hadoop does not require core-site.xml configurations
for codecs.
This allows the codec to work in Spark by adding the jar to the classpath.
You can copy the jar to the spark jars directory.
Implementations that do not have JavaServices code will work the same as without this META-INF data.