Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gcsfuse entrypoint conflicts with Google Cloud Platform's fuse solution #135

Open
AneeshSachdeva opened this issue Feb 26, 2019 · 6 comments

Comments

@AneeshSachdeva
Copy link

AneeshSachdeva commented Feb 26, 2019

I use gcsfs to easily load and dump data distributively with Dask. Recently I needed to install Google Cloud's official fuse tool, called gcsfuse, and I ran into a conflict between Google Cloud's "gcsfuse" cli and this repo's "gcsfuse" cli. I had to uninstall gcsfs to fix the conflict.

  1. What are the differentiators between Dask's gcsfuse and Google Cloud's?
  2. Would it be possible to rename the console_scripts entrypoint in setup.py from "gcsfuse" to "gcsfs" so that it remains consistent with the naming of this repo?

Happy to make the change in 2 if needed.

@martindurant
Copy link
Member

That is unfortunate.

It would be a pity to change the command name, although it isn't massively used - but it is in use, for example, by the pangeo group for remote hdf5 files. Perhaps the change could be done in the context of an eventual move to fsspec ( https://github.com/martindurant/filesystem_spec/pull/34 ).

Question: do you have a genuine use for both gcsfs and google's own code? If you use gcsfs but not the fuse component, is it not enough to unlink the CLI executable?

@mrocklin
Copy link
Contributor

mrocklin commented Mar 1, 2019 via email

@AneeshSachdeva
Copy link
Author

@martindurant I use google's gcsfuse to setup the fuse file system for user data and then gcsfs to easily load model-related data for dask within a python environment. It's more practical for me use to use gcsfs to point to remote datasets than to use fuse for that, and vice versa for fuse with user data.

How do you recommend I unlink the CLI executable? That solution is definitely sufficient for my use-case.

@martindurant
Copy link
Member

Where the gcsfuse from gcsfs ends up depend on your system, but you can find it with which or its windows equivalent. Is gcsfs's fuse substantially worse than google's? (I wouldn't be surprised, the library was not designed with this in mind)

@AneeshSachdeva
Copy link
Author

Sorry for the late reply on this, I was able to cleanly unlink gcsfs's fuse via which gcsfuse | rm -rf. For anyone else using this solution keep in mind that I installed gcsfs after installation of google's gcsfuse, which is why that command selected for removal of gcsfs's fuse tool.

From my experience with the two tools I think google's gcsfuse covers everything that gcsfs's intends to, but is a focused repository on its own and will likely make "faster" progress. Although it's also listed as beta-quality software, so if it's really important for gcsfs to have a fuse component maybe it could instead wrap around google's gcsfuse?

Regardless, I've found it really useful to use both in my workflow. gcsfuse sets up a shared filesystem to help orchestrate all of my distributed components, and gcsfs makes it really easy to load and dump data not related to system orchestration.

@martindurant
Copy link
Member

My past experience with google developing rapidly is mixed :) We shall see. I don't think there's much point in gcsfs wrapping google's library, but maybe the fuse component here can eventually be dropped if indeed google's tool becomes popular. Indeed, whenever I get around to integrating all the filesystems into fsspec, the call for setting up fuse will be different anyway (support is there, https://github.com/martindurant/filesystem_spec/pull/34 , but no CLI at all yet).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants