Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Use AWS S3 as Storage Backend #674

Open
dnzprmksz opened this issue Nov 20, 2019 · 7 comments
Open

[Feature Request] Use AWS S3 as Storage Backend #674

dnzprmksz opened this issue Nov 20, 2019 · 7 comments
Labels
enhancement New feature or request feature-idea A feature request or idea for something to implement in the future infra

Comments

@dnzprmksz
Copy link

First of all, I started to use Polynote instead of Apache Zeppelin since last week and it is really great, thanks.

However, if you are working on a remote cluster, which is not up all the time, you need download your notebooks and upload next time to continue working. This is a bit tiring and Zeppelin has a feature to use AWS S3 as its storage backend to overcome this issue. It would be awesome if Polynote would be able to use S3 for storage, too!

Please see this related section in Zeppelin docs.
https://zeppelin.apache.org/docs/0.8.2/setup/storage/storage.html#notebook-storage-in-s3

@dnzprmksz dnzprmksz changed the title [Feature] Use AWS S3 as Storage Backend [Feature Request] Use AWS S3 as Storage Backend Nov 20, 2019
@jeremyrsmith
Copy link
Contributor

Welcome! Thanks for the suggestion!

I agree that this would be great and it's on the roadmap.

@dnzprmksz
Copy link
Author

Thanks! Is there a public place that we can see the roadmap? I checked the milestones before opening issue and couldn't find, that's why I opened it. Having a public roadmap may help to reduce duplicate issues 🤔 For example there is no sum operation in built-in visualization for DataFrames, but it is so easy and minor so I am sure it is known, however there is no visible issue about that.

@JD557
Copy link
Contributor

JD557 commented Aug 24, 2020

As a temporary solution, you can use something like https://github.com/s3fs-fuse/s3fs-fuse to mount a S3 bucket as a disk partition on your instance and store the notebooks in that partition.

Everything should automagically sync to S3 :)

@dnzprmksz
Copy link
Author

Hey @JD557 thanks for the suggestion! It makes sense to try out S3FS, thanks!

@jonathanindig jonathanindig added enhancement New feature or request feature-idea A feature request or idea for something to implement in the future infra labels Sep 2, 2020
@tmnd1991
Copy link
Contributor

tmnd1991 commented Dec 5, 2020

I would like to work on this. Can anyone give me a couple of pointers on how and where "save" happens?

@tmnd1991
Copy link
Contributor

@tmnd1991
Copy link
Contributor

I had a look at it, it does look doable. The thing that I am unsure about is if we want to add a dependency to aws s3 to the polynote-server module, since right now only the Spark related modules have it (transitively from Spark itself). Also I think this feature should be "transparent" to the user (i.e. inputing a path with scheme s3 should "switch" to that implementation). Is it feasible to add s3 dependency to the server module? Or something we want to avoid? If we want to avoid that, we might want to have a module that if added in the class path of the server, will enable support to s3, and if not present, it won't work. I think this would be nice to keep dependencies small, but also over-complicate things to new-comers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request feature-idea A feature request or idea for something to implement in the future infra
Projects
None yet
Development

No branches or pull requests

5 participants