Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

copyFromLocal not implemented? #37

Open
interskh opened this issue Dec 4, 2013 · 50 comments
Open

copyFromLocal not implemented? #37

interskh opened this issue Dec 4, 2013 · 50 comments
Assignees
Milestone

Comments

@interskh
Copy link
Contributor

@interskh interskh commented Dec 4, 2013

I notice copyFromLocal exists in commandlineparser.py but not in client.py. Is it not implemented yet?

Thanks!

@wouterdebie
Copy link
Contributor

@wouterdebie wouterdebie commented Dec 4, 2013

Yes that shouldn't be there.. Put was commented out, but I forgot copyFromLocal. I'll submit a patch this week, because this is confusing.

@interskh
Copy link
Contributor Author

@interskh interskh commented Dec 5, 2013

Thanks.

@BlondAngel
Copy link

@BlondAngel BlondAngel commented Dec 18, 2013

So, this means that copyFromLocal/put is not implemented? Do we use 'hadoop fs -copyFromLocal' instead?

I note that in the spotify blog [http://labs.spotify.com/2013/05/07/snakebite/], it states:
there are plans to also implement actions that also involve interaction with the DataNode

In addition, the documentation [http://spotify.github.io/snakebite/] has a 'To Do' section where it states:
put [paths] dst copy sources from local file system to destination

What is the timeline for this 'put'/'copyFromLocal' feature?

@wouterdebie
Copy link
Contributor

@wouterdebie wouterdebie commented Mar 4, 2014

Sorry for the late reply, but we haven't prioritized this. Would be nice to have (just like full YARN support).

@sodul
Copy link

@sodul sodul commented Jun 5, 2014

+ 1
I want to use snakebite to replace a several slow steps in our deployment automation, unfortunately we use copyFromlocal a lot. So this is definitely a must have feature for a lot of people.

Thanks for the good work.

@carolinux
Copy link

@carolinux carolinux commented Sep 17, 2014

seconding sodul's comment

@ravwojdyla ravwojdyla self-assigned this Sep 17, 2014
@briancline
Copy link

@briancline briancline commented Sep 29, 2014

Thanks for an excellent and straightforward client -- just throwing in a makeshift vote for the ability to use put/copyFromLocal to speed up a few data ingress scripts.

@ptrxyz
Copy link

@ptrxyz ptrxyz commented Dec 13, 2014

Great work, keep it up. Would also like to see put/copyfromlocal in the future.

@DandyDev
Copy link

@DandyDev DandyDev commented Jan 31, 2015

Still no word on this?
If communicating through protobuf makes it hard to implement features that require direct access to datanodes (such as the put and append operations), it would be wise to have a look at WebHDFS. Using WebHDFS in Snakebite, instead of Protobuf would make it trivial to implement copyFromLocal/put, and other file write operations.

I think it's a shame that such a promising project gets stuck on something that is really needed, like copyFromLocal.

@wouterdebie
Copy link
Contributor

@wouterdebie wouterdebie commented Jan 31, 2015

@ravwojdyla and I have been discussing this and currently there doesn't seem to be much time to implement this, so it's very hard to give any ETA on this feature.
I don't think we want to add WebHDFS support, since that sort of defeats the purpose of snakebite and requires additional infrastructure.

@simonellistonball
Copy link

@simonellistonball simonellistonball commented Jan 31, 2015

I agree with @wouterdebie webhdfs wouldn't have the speed of snakebite. I'm working on implementing put in RPC at the moment, if anyone has any thoughts or progress they can share to accelerate it would be great to work together.

@DandyDev
Copy link

@DandyDev DandyDev commented Jan 31, 2015

Where can I find the RPC documentation?

@zachmullen
Copy link

@zachmullen zachmullen commented Mar 4, 2015

Has there been progress toward implementing put? I was going to take a crack at it for a project I'm working on, and was considering contributing it upstream, but don't want to duplicate effort if someone already has a handle on this.

@Tarrasch
Copy link
Contributor

@Tarrasch Tarrasch commented Mar 5, 2015

I'm pretty sure it has not, maybe @ravwojdyla can confirm.

@ravwojdyla
Copy link
Contributor

@ravwojdyla ravwojdyla commented Mar 9, 2015

I have started working on this feature some time ago - can probably upload what I have right now (it's far from complete). That said if anyone feels like working on this problem please create issues you plan to work on, and if you need help - please ping me/us. Thanks!

@zachmullen
Copy link

@zachmullen zachmullen commented Mar 9, 2015

@ravwojdyla I'd love to help, I started to do it but the problem that ended up blocking me was that I couldn't find documentation on what RPCs I should even call to do something like an append, and the ones I tried didn't return what they claimed in the auto-generated protobuf spec... I might be able to help with this effort if you could point me to good documentation about the protocol, but I was unable to find any in sufficient detail.

@wouterdebie
Copy link
Contributor

@wouterdebie wouterdebie commented Mar 9, 2015

The problem with Hadoop is that protocols are pretty badly documented. When I started snakebite, I spent a lot of time reading Hadoop code and tcpdumping to figure out what was going on...

@aman572
Copy link

@aman572 aman572 commented May 1, 2015

is there any ETA on when will copyFromLocal/put support would be present?

@tothandor
Copy link

@tothandor tothandor commented Aug 6, 2015

+1

1 similar comment
@ligao101
Copy link

@ligao101 ligao101 commented Aug 31, 2015

+1

@mbultrow
Copy link

@mbultrow mbultrow commented Sep 18, 2015

+1 :)

@ctimmins
Copy link

@ctimmins ctimmins commented Oct 9, 2015

in the mean time:

import subprocess

subprocess.check_call(['hdfs', 'dfs', '-put', '/path/to/src', 'path/to/dst'], shell=False]

@jtaryma
Copy link

@jtaryma jtaryma commented Oct 14, 2015

+1

@jwszolek
Copy link

@jwszolek jwszolek commented Oct 28, 2015

@ravwojdyla - is there a separate branch for that issue? Did you have a chance to push what you had already done? Thanks!

@aeroevan
Copy link

@aeroevan aeroevan commented Nov 7, 2015

It looks like a go library similar to snakebite has started making progress on writing to hdfs:
colinmarc/hdfs#12

@Condla
Copy link

@Condla Condla commented Dec 22, 2015

+1

@sodul
Copy link

@sodul sodul commented Dec 22, 2015

An alternative that is relatively snappy is to use httpfs, it is a service that provide an http interface to hdfs. We actually ended up writing our own REST API in groovy to access hdfs and the hbase shell (which has no API).

https://hadoop.apache.org/docs/current/hadoop-hdfs-httpfs/index.html

@tworec
Copy link

@tworec tworec commented Jan 7, 2016

+1

1 similar comment
@crorella
Copy link

@crorella crorella commented Feb 17, 2016

👍

@wouterdebie
Copy link
Contributor

@wouterdebie wouterdebie commented Feb 17, 2016

Because it was never implemented.
On Feb 17, 2016 17:54, Cristian Orellana notifications@github.com wrote:

—Reply to this email directly or view it on GitHub.

@austintrombley
Copy link

@austintrombley austintrombley commented Mar 17, 2016

+1

1 similar comment
@philpot
Copy link

@philpot philpot commented Mar 24, 2016

+1

@zachmullen
Copy link

@zachmullen zachmullen commented Mar 24, 2016

Github recently added a handy new feature to avoid all the "+1" comment spam. You can now vote +1 on an issue by going to the top comment, clicking the little smile face in the upper right and then click the thumbs up.

@agrebin
Copy link

@agrebin agrebin commented May 31, 2016

+1
I've given my smiley but, just in case :)

@cherrot
Copy link

@cherrot cherrot commented Jun 22, 2016

I planned to implement a storage service for big files using snakebite because I really like its implementation. Sadly it didn't support saving file.

Maybe I would switch back to it when this feature has been implemented :)

@francisar
Copy link

@francisar francisar commented Jul 27, 2016

wait for this to bbe implemented

@wesmadrigal
Copy link

@wesmadrigal wesmadrigal commented Aug 24, 2016

This was opened 3 years ago and still not implemented...wtf?

@printfxyz
Copy link

@printfxyz printfxyz commented Sep 1, 2016

Use webhdfs instead.

@9nix00
Copy link

@9nix00 9nix00 commented Sep 2, 2016

unbelievable,3 years 😱
I am using snakebite from yesterday.

@zachmullen
Copy link

@zachmullen zachmullen commented Sep 2, 2016

If this feature is truly critical to you, I'd suggest checking out hdfs3, it's BSD licensed and implements this capability. Also supports python 3.

@9nix00
Copy link

@9nix00 9nix00 commented Sep 3, 2016

libhdfs3 is so hard to config on MacOSX 😓 I use webhdfs write data.

@arudyk
Copy link

@arudyk arudyk commented Sep 13, 2016

3 years later...

@wouterdebie
Copy link
Contributor

@wouterdebie wouterdebie commented Sep 13, 2016

Honestly, I'm not sure how complaining and pointing out the obvious is
going to help getting this implemented.

Yes, this feature has been open for a very long time, but writing is a
complicated operation in HDFS. Snakebite was conceived to work around long
JVM startup times, which matters mostly for operations that you do often
and should be relatively short (ls, test, etc). In cases where you read or
write, the overhead of the JVM startup time has less impact. At Spotify we
haven't had the need to invest time in write functionality, but of course
if someone feels like it, please do so. That said, please refrain from
complaining when software is open source since people do this in their
spare time or companies invest in getting software out there.

On Sep 13, 2016 2:41 PM, "Andriy Rudyk" notifications@github.com wrote:

3 years later...


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#37 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAKgBgGXtg-413vf-Jou7Hql-X-auAR8ks5qppn8gaJpZM4BRoNr
.

@DandyDev
Copy link

@DandyDev DandyDev commented Sep 13, 2016

To be fair, this feature was mentioned both in an ealier version of the documentation and the original blog post announcing Snakebite. You can't blame people interpreting this as some sort of promise :)

Snakebite was conceived to work around long JVM startup times

Is that really all there is to it? I suggested WebHDFS before, which is just a REST API, so it doesn't have anything to do with JVM startup times, but it does give you a much easier path to implementing features than the undocumented Protobuf interface.
I was told back then, that using WebHDFS "defeats the purpose of Snakebite", but now I don't see how, if the purpose is to circumvent the JVM startup times that using hdfs dfs incurs. WebHDFS eliminates the JVM overhead just as well as Protobuf does.

@tworec
Copy link

@tworec tworec commented Sep 28, 2016

@DandyDev maybe this SO thread will (at least partially) explain why WebHDFS is not exactly what we want:

http://stackoverflow.com/questions/31580832/hdfs-put-vs-webhdfs

It seems that webhdfs is 4x slower.
Furthermore if we were using WebHDFS, theres no need to read Hadoop code and port it to python.

@tworec
Copy link

@tworec tworec commented Jan 6, 2017

nice reading for this thread
http://wesmckinney.com/blog/python-hdfs-interfaces/

@baiyunping333
Copy link

@baiyunping333 baiyunping333 commented Jun 10, 2017

I will implement the feature!

@spyzzz
Copy link

@spyzzz spyzzz commented Apr 10, 2018

Still not implemented yet ?
I really need this feature :|

@brunocampos01
Copy link

@brunocampos01 brunocampos01 commented Aug 26, 2020

@derekcat
Copy link

@derekcat derekcat commented Dec 17, 2020

I suggest: https://pypi.org/project/snakebite-py3/

Why do you suggest that version? It does not appear to have put/copyFromLocal support either.

Aeroevan's response seems to be the only thing that's actually a viable alternative at the moment: https://github.com/colinmarc/hdfs/

Though it's a bit unclear what exactly is required as opposed to the ~/.snakebiterc (straight copies of the core-site.xml and hdfs-site.xml files from a NameNode in HDFS?), and unfortunate that it's not available as a package (doubly unfortunate for me since I'm trying to add some sort of HDFS client to all our developer utility boxes).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet