Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

4mc files not splitting #3

Closed
pbutler opened this issue Jan 23, 2015 · 2 comments
Closed

4mc files not splitting #3

pbutler opened this issue Jan 23, 2015 · 2 comments

Comments

@pbutler
Copy link

pbutler commented Jan 23, 2015

I am running Hadoop 2.4.1. I run my jobs through mrjob (if that matters). When I run against an uncompressed file, splits happen and I automatically have more mappers than files. However when I run against *.4mc files no splitting occurs. Running hadoop fs -text file.4mc works so I know it's decompressing okay and running a job against *.4mc files works just no splitting occurs.

One other thing I noticed is that if I use the files with the .lz4_uc extension hadoop fs -text file.lz4_uc using I get the following error:

15/01/23 01:47:58 INFO Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
Exception in thread "main" java.lang.InternalError: LZ4_decompress_safe returned: -2

I am not sure if that's related or not.

@carlomedas
Copy link
Collaborator

Please make sure to configure the 4mc input with:
job.setInputFormatClass(FourMcTextInputFormat.class);

You can have a look at related example here: https://github.com/carlomedas/4mc/blob/master/java/hadoop-4mc/src/examples/text/TestTextInput.java

@carlomedas
Copy link
Collaborator

I consider this configuration issue closed, please reopen if reproducing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants