[Feature Request] Use AWS S3 as Storage Backend #674

dnzprmksz · 2019-11-20T16:09:16Z

First of all, I started to use Polynote instead of Apache Zeppelin since last week and it is really great, thanks.

However, if you are working on a remote cluster, which is not up all the time, you need download your notebooks and upload next time to continue working. This is a bit tiring and Zeppelin has a feature to use AWS S3 as its storage backend to overcome this issue. It would be awesome if Polynote would be able to use S3 for storage, too!

Please see this related section in Zeppelin docs.
https://zeppelin.apache.org/docs/0.8.2/setup/storage/storage.html#notebook-storage-in-s3

jeremyrsmith · 2019-11-20T18:32:53Z

Welcome! Thanks for the suggestion!

I agree that this would be great and it's on the roadmap.

dnzprmksz · 2019-11-21T09:37:37Z

Thanks! Is there a public place that we can see the roadmap? I checked the milestones before opening issue and couldn't find, that's why I opened it. Having a public roadmap may help to reduce duplicate issues 🤔 For example there is no sum operation in built-in visualization for DataFrames, but it is so easy and minor so I am sure it is known, however there is no visible issue about that.

JD557 · 2020-08-24T09:25:11Z

As a temporary solution, you can use something like https://github.com/s3fs-fuse/s3fs-fuse to mount a S3 bucket as a disk partition on your instance and store the notebooks in that partition.

Everything should automagically sync to S3 :)

dnzprmksz · 2020-08-25T17:50:47Z

Hey @JD557 thanks for the suggestion! It makes sense to try out S3FS, thanks!

tmnd1991 · 2020-12-05T08:16:31Z

I would like to work on this. Can anyone give me a couple of pointers on how and where "save" happens?

tmnd1991 · 2020-12-11T17:09:53Z

Memo to self, thanks to @JD557 :
look here: https://github.com/polynote/polynote/tree/0675e111c89e7f94c2fad4a5fbd50cf32b136647/polynote-server/src/main/scala/polynote/server/repository

tmnd1991 · 2020-12-25T08:52:43Z

I had a look at it, it does look doable. The thing that I am unsure about is if we want to add a dependency to aws s3 to the polynote-server module, since right now only the Spark related modules have it (transitively from Spark itself). Also I think this feature should be "transparent" to the user (i.e. inputing a path with scheme s3 should "switch" to that implementation). Is it feasible to add s3 dependency to the server module? Or something we want to avoid? If we want to avoid that, we might want to have a module that if added in the class path of the server, will enable support to s3, and if not present, it won't work. I think this would be nice to keep dependencies small, but also over-complicate things to new-comers.

dnzprmksz changed the title ~~[Feature] Use AWS S3 as Storage Backend~~ [Feature Request] Use AWS S3 as Storage Backend Nov 20, 2019

angadsalaria mentioned this issue Jul 28, 2020

[Feature Request] PostgreSQL Backend Support #941

Open

jonathanindig added enhancement New feature or request feature-idea A feature request or idea for something to implement in the future infra labels Sep 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Use AWS S3 as Storage Backend #674

[Feature Request] Use AWS S3 as Storage Backend #674

dnzprmksz commented Nov 20, 2019

jeremyrsmith commented Nov 20, 2019

dnzprmksz commented Nov 21, 2019

JD557 commented Aug 24, 2020

dnzprmksz commented Aug 25, 2020

tmnd1991 commented Dec 5, 2020

tmnd1991 commented Dec 11, 2020

tmnd1991 commented Dec 25, 2020

[Feature Request] Use AWS S3 as Storage Backend #674

[Feature Request] Use AWS S3 as Storage Backend #674

Comments

dnzprmksz commented Nov 20, 2019

jeremyrsmith commented Nov 20, 2019

dnzprmksz commented Nov 21, 2019

JD557 commented Aug 24, 2020

dnzprmksz commented Aug 25, 2020

tmnd1991 commented Dec 5, 2020

tmnd1991 commented Dec 11, 2020

tmnd1991 commented Dec 25, 2020