New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for filecrushing on Elastic MapReduce #2

Open
wants to merge 4 commits into
base: master
from

Conversation

Projects
None yet
2 participants
@alexanderdean

alexanderdean commented Apr 7, 2013

Work-in-progress PR - do not pull yet

Hi @edwardcapriolo - this is an open pull request to add support for using filecrush on EMR.

There are three main things to fix:

  1. Instantiating the right type of FileSystem
  2. Fix the location of tmpDir - I think we should be referencing "${hadoop.tmp.dir}" rather than raw new Path("tmp/crush-" + UUID.randomUUID());
  3. Replacing the fs.makeQualified(dir).toUri().getPath() pattern with something that doesn't strip important S3 bucket information
    #1 is done, see PR. #2 is doable. #3 is a bit harder - I am working through this for EMR, but might need some help from you to make sure my changes don't break filecrush on standard HDFS.

Hoping this is the start of a collaboration! We're really excited about filecrush here at Snowplow.

@edwardcapriolo

This comment has been minimized.

Show comment
Hide comment
@edwardcapriolo

edwardcapriolo Apr 7, 2013

Owner

It all looks good so far. Just let me know when you want me to merge.

Owner

edwardcapriolo commented Apr 7, 2013

It all looks good so far. Just let me know when you want me to merge.

@alexanderdean

This comment has been minimized.

Show comment
Hide comment
@alexanderdean

alexanderdean Dec 11, 2014

We ended up not using this library in the end. :-) You can merge as-is if you like, or close. I'll delete our fork in a few days.

alexanderdean commented Dec 11, 2014

We ended up not using this library in the end. :-) You can merge as-is if you like, or close. I'll delete our fork in a few days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment