Multiple changes running 4mz in Spark 2.2 #27

snoe925 · 2017-10-24T18:26:45Z

Modern Hadoop does not require core-site.xml configurations
for codecs.

This allows the codec to work in Spark by adding the jar to the classpath.
You can copy the jar to the spark jars directory.

Implementations that do not have JavaServices code will work the same as without this META-INF data.

Modern Hadoop does not require core-site.xml configurations for codecs. This allows the codec to work in Spark by adding the jar to the classpath.

snoe925 · 2017-10-24T21:04:53Z

I found that these changes were required to get 4mz working with newAPIHadoopFile. Here is an example spark shell reader.

sc.newAPIHadoopFile("data.4mz", classOf[com.hadoop.mapreduce.FourMzTextInputFormat], classOf[org.apache.hadoop.io.LongWritable], classOf[org.apache.hadoop.io.Text])

jordiolivares · 2017-12-21T16:15:09Z

Why hasn't this been merged yet?

Specifically, commit f6a57e3 has a really basic fix necessary for ZSTD to function properly. I would also add that FourMcTextInputFormat also needs to add the LongWritable and Text generic fields like FourMzTextInputFormat in your version.

snoe925 · 2017-12-22T17:51:48Z

I can volunteer as a maintainer. I can also make an official repo if you want to avoid notifications.

carlomedas · 2017-12-31T10:56:34Z

I'd like to merge the pull requests of the first part.
While the index changes on the 4mc CLI is not clear to me.
What is it doing? The index in 4mc/4mz files is already inside the file itself.

carlomedas · 2017-12-31T10:56:59Z

P.S.: I can you your help to rebuild the lib on all platforms.

snoe925 · 2018-01-02T15:12:24Z

I should have pushed the external index code on a branch. I was doing an experiment on timestamp indexing the data in a 4mz. Let me fix the pull request.

snoe925 · 2018-01-02T15:18:09Z

I have removed the incorrect index code commit from this pull request.

snoe925 · 2018-01-02T15:20:47Z

For platform building I will open a separate pull request for a Travis CI integration file. That can build Linux and OS X. I do not have Windows build machines.

carlomedas · 2018-01-02T17:42:14Z

Yes that'd be perfect, even if Linux is not an issue.
I'm going to rebuild a new version of the lib soon and also Mac is easy.
The only issue I have now is with windows, where you need cygwin64 to build it correctly to work good with latest JRE7/8 on latest Windows versions.
Since I don't think there is a lot of people using it, we could even think about releasing without it unless we find the time to recreate the build system I unfortunately lost in the past year...

snoe925 added 4 commits October 24, 2017 14:18

Add JavaServices locators for codecs

f52afa7

Modern Hadoop does not require core-site.xml configurations for codecs. This allows the codec to work in Spark by adding the jar to the classpath.

Fix NPE when returning codec to pool

84bcfcf

Fix NPE on reuse of codec in Spark 2.2

0570476

Correct typo in base class for FourMzTextInputFormat

f6a57e3

snoe925 changed the title ~~Add JavaServices locators for codecs~~ Multiple changes running 4mz in Spark 2.2 Oct 24, 2017

jordiolivares mentioned this pull request Dec 21, 2017

Fix Typo in FourMzTextInputFormat #30

Open

carlomedas merged commit 0ab3864 into fingltd:master Jan 2, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple changes running 4mz in Spark 2.2 #27

Multiple changes running 4mz in Spark 2.2 #27

snoe925 commented Oct 24, 2017

snoe925 commented Oct 24, 2017

jordiolivares commented Dec 21, 2017

snoe925 commented Dec 22, 2017

carlomedas commented Dec 31, 2017

carlomedas commented Dec 31, 2017

snoe925 commented Jan 2, 2018

snoe925 commented Jan 2, 2018

snoe925 commented Jan 2, 2018

carlomedas commented Jan 2, 2018

Multiple changes running 4mz in Spark 2.2 #27

Multiple changes running 4mz in Spark 2.2 #27

Conversation

snoe925 commented Oct 24, 2017

snoe925 commented Oct 24, 2017

jordiolivares commented Dec 21, 2017

snoe925 commented Dec 22, 2017

carlomedas commented Dec 31, 2017

carlomedas commented Dec 31, 2017

snoe925 commented Jan 2, 2018

snoe925 commented Jan 2, 2018

snoe925 commented Jan 2, 2018

carlomedas commented Jan 2, 2018