merge to hadoop? #4

Tagar · 2015-04-28T20:53:28Z

I'm surprized it's not yet part of Apache Hadoop project :)
LZO is a pain to index. Plus has some licensing issues.
Great project.

carlomedas · 2015-04-29T09:17:29Z

Thanks for good feedback.
On Hadoop 2.x by default you have LZ4 Codec but it's not configurable w.r.t. desired compression ratio and also not actually providing any splittability.
I would be happy to see this as patch to hadoop 2.x, but so far I was not even able to get attention of ElephantBird guys to work on an integration of 4mc into EB to replace LZO.

Tagar · 2015-04-29T15:34:29Z

I just emailed Cloudera folks to have a look and file a JIRA ticket to integrate it in.
Hopefully this will get integrated. Thanks a lot!

carlomedas · 2015-04-30T07:07:23Z

Thanks!

svravitej · 2015-08-26T13:08:15Z

please let us know when it is integrated.

waiting for integration with hadoop

ianoc · 2015-10-21T23:44:06Z

EB as in elephantbird from twitter? Do you have a PR/issue to add support?

(Replacing isn't really an option for something like a serialization library since people have TB/PB's of data written with existing formats).

carlomedas · 2015-10-22T07:41:14Z

Yes sorry 'replacing' is wrong here, 'add support' makes much more sense.
I got in touch with some EB dev but never had positive feedback about the idea of integration, thus I never did open a PR/issue on EB about that.

ianoc · 2015-10-22T16:58:05Z

I think we'd be fine with the integration, we @ twitter aren't super likely to use it. Though I'd like to try it out, will probably do that outside EB. We have discussed getting off those container formats in EB, so if we were to migrate it would more likely be to something sequence file based for ourselves(which handles splitting regardless of compression). But the extra options and such I plan on trying out from 4mc to see how they perform for our existing lz4 use cases now

carlomedas · 2015-10-22T17:23:30Z

Very good, let me know what you think and how you find it.
Moreover I agree with your approach as well, using protobuf container is not best option from performance point of view when you have already a super-packet containing other info. In our tests we saw some little performance degradation when moving from our data-blocks (compressed with LZ4 anyways) to EB/4mc (also inside only C++ native code). Of course it was more than acceptable wrt the scalability we have in hadoop/EB architecture and most of all wrt having the EB framework coded and bug-free already :)

svravitej · 2016-08-04T00:33:19Z

Hi,

I think I am not in anyway connected to this mail.
Please remove me from the notifications.

Regards,
Ravitej

On Mon, Jul 25, 2016 at 5:53 AM, Carlo Medas notifications@github.com
wrote:

Closed #4 #4.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#4 (comment), or mute the
thread
https://github.com/notifications/unsubscribe-auth/ANI2ORU7_dR4EqNqdGqNs_3BoycgPnz-ks5qZJWsgaJpZM4ELBLt
.

Regards

RaviTej Somayajula

carlomedas closed this as completed Jul 25, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

merge to hadoop? #4

merge to hadoop? #4

Tagar commented Apr 28, 2015

carlomedas commented Apr 29, 2015

Tagar commented Apr 29, 2015

carlomedas commented Apr 30, 2015

svravitej commented Aug 26, 2015

ianoc commented Oct 21, 2015

carlomedas commented Oct 22, 2015

ianoc commented Oct 22, 2015

carlomedas commented Oct 22, 2015

svravitej commented Aug 4, 2016

merge to hadoop? #4

merge to hadoop? #4

Comments

Tagar commented Apr 28, 2015

carlomedas commented Apr 29, 2015

Tagar commented Apr 29, 2015

carlomedas commented Apr 30, 2015

svravitej commented Aug 26, 2015

ianoc commented Oct 21, 2015

carlomedas commented Oct 22, 2015

ianoc commented Oct 22, 2015

carlomedas commented Oct 22, 2015

svravitej commented Aug 4, 2016