Add API to replace FS access #832

fionera · 2021-03-14T14:13:03Z

Hi, I would like to use navidrome, but I have my files in an object storage and I would like to directly interact with it. How about an API so it is possible to use other storage providers. This could also allow to use as example rclone as storage source

deluan · 2021-03-14T16:48:29Z

Hi @fionera, thanks for trying Navidrome!

I already thought about introducing this FS abstraction, but after moving my library to GDrive and setting up RClone to mount it, I don't see any advantage of incorporating this functionality directly into Navidrome's codebase. It would be more code to maintain and it would never be better than something dedicated to it, like RClone. Think about all the tweaks and fine-tuning one have to do to make the access as fast and transparent as possible?

Having said that, I may still implement this in a future version, but it is definitely not a priority now because, as I said, you can already achieve this by using RClone.

Let me know your thoughts, and if I'm missing any important aspect of this.

fionera · 2021-03-14T17:42:55Z

I have a big library I cant just mount with rclone and I would implement my own driver for it then anyway. If its wanted, I could take a look into implementing this

deluan · 2021-03-14T23:02:31Z

I'm all for contributions, but I still don't get why you say you can't just mount it.. My library is 70K songs, mostly FLAC and it works pretty well, taking less than 30 seconds to add a new album. I know some folks that have more than a million songs, also mounting using rclone. Can you elaborate a bit more on why this is not possible in your case. And, in the issue description you said:

This could also allow to use as example rclone as storage source

So, I'm kind of confused here: why do you say using rclone as a storage source is good, but mounting with it is not?

Anyway, if we decide to implement something like this, I'd like to use an approach of external plugins, maybe using Hashicorp's go-plugin library

fionera · 2021-03-14T23:24:22Z

The Library I want to use and index currently has over 20 million songs and is over 500TB in size. Its split across multiple s3 buckets and not sorted in a human readable way. There are multiple abstractions in there. If I mount it, its far more overhead.

Using rclone was just an example, so I am not the only one who would use that Interface ;) I dont really like the native go plugin system, since it has multiple problems and the go-plugin lib from Hashicorp is using gRPC under the hood which afaik also adds overhead since you stream real files, tho for the normal User that shouldnt be a problem.

deluan · 2021-03-16T17:58:04Z

Wow! Are you trying to launch your own streaming service? LOL!

Seriously though, abstracting away the FS access is a Good Thing ™️, but I'm not sure how we would handle TagLib's access to the files (it is a C++ library, not Go).

By the way, in your case, you may encounter other issues with this huuuuge library, ex: I'm not sure if SQLite3 is the best option to store a library this size.

fionera · 2021-03-16T20:31:59Z

Wow! Are you trying to launch your own streaming service? LOL!

Nah not really, I am just experimenting with some open Storage Systems and want to access them over a nice UI. In this case I want to try to Index some Items from Archive.org.

Regarding TagLib... Arent there other Libraries that are written in Go for that? Maybe switch?
SQLite3 shouldnt be any Issue since I would probably also implement Elasticsearch or smth like that as Search Engine :)

deluan · 2021-03-17T04:35:04Z

Regarding TagLib... Arent there other Libraries that are written in Go for that? Maybe switch?

Unfortunately TagLib is the best option out there, and there are upcoming features that depends on some of its features, like multi-valued tags. Maybe we can create a File class in C++ whose methods calls the fs.File methods, and pass it to the TagLib::Open (or whatever it is called). Anyway we will have to find a solution for that...

xorander00 · 2021-08-03T16:34:00Z

I'll throw in my $0.02, as I think object store support (i.e. S3) would be very useful...

Mounting an S3 bucket using tools like rclone or s3fs isn't ideal because it's faking a POSIX compliant file system on top of an S3 bucket.
It can be fragile from the perspective of reading/writing since file systems are generally expected to have far less latency than object stores, which in the broader scope can affect the consistency model.
Local file systems are not portable, where as an S3 bucket can be local or remote and understood by the native API. This is great for larger datasets & also availability guarantees. I currently self-host Minio, but if I want to utilize Amazon S3, I can do so without much hassle.
S3 uses HTTP as its substrate, which gets through firewalls and can also be secured via TLS. The same can't be said for local file system access.
This is just a contrived example (though I do have a setup like this), but what if I have 10 physical servers (9 meant to run services + 1 file server), and I want to update/reboot the server that happens to be running Navidrome on it? Not a huge deal if my music isn't available, but it's a pain in the neck because it's now down when it doesn't have to be down. Ideally, something like HAproxy would just switch it over to another instance of Navidrome on the other server. No need to deal with local storage for it, since S3 is available on the file server without any issues. Also, if it's taken a step further, then storage can also be clustered (i.e. Minio in my case), so if I have two file servers & one goes down, there's no service interruption.
S3 can be streamed without an intermediary being required. That is, instead of requiring Navidrome directly to always stream the actual media, Navidrome could actually (if so desired) give the client the URL on the S3 bucket and the client could stream it directly from there. In this scenario, if Navidrome goes down for a minute, there's no interruption to media currently being streamed. Funkwhale actually supports something like this.
S3 supports signed URLs & expirations, which offers another option (de facto standardized in this case) for access control. I want to share a song with some random person and not have to go through the trouble of creating a user account for this one-off thing? Sure! Just ask S3 for a signed URL to the song, set the expiration (e.g. 24h), and send that to them.

Anyway, those are some initial reasons I can think of right now. I wrote this quickly & ad-hoc, so if anything is unclear, let me know and I'll see if I can expand further. If I'm wrong about something, please don't hesitate to correct me.

I'd take a shot at trying to implement this myself, but I'm not really a golang guy, mostly a Rust + TypeScript guy. I have worked with go-plugin a bit though (along with gRPC for HashiCorp clustering tools), so I hope it wouldn't be too foreign, but who knows. I'm also really short on time currently and don't want to do a quick/crappy job, so it might take me a month or two to get around to it.

Thoughts?

EDIT: Gah, I just noticed PR #851 lol :)

deluan · 2021-09-07T03:03:25Z

@xorander00, good points, and I'm starting to think it may be worth to implement something like this. But I still think it would be better to be implemented as a plugin, and I don't think the gRPC would cause any performance issues, as the main bottleneck would be the communication with the remote storage.

As I said before, we would need to write a C++ File class wrapper for fs.File, to allow TagLib to read tags from the remote fs. I think TagLib tries to read the whole file, which could also be an issue for large libraries.

aaronhuggins · 2021-09-15T12:07:31Z

@deluan I'm interested in picking this up. Do you already have a plan in place to implement?

Also, I'm not sure what your thoughts were on the fs.File wrapper, but we may not have to wrap the File instance explicitly and may be able to pass a stream of bytes instead using IOStream.

https://taglib.org/api/classTagLib_1_1IOStream.html

deluan · 2021-09-16T00:34:40Z

@deluan I'm interested in picking this up. Do you already have a plan in place to implement?

I think the first step is what @fionera started doing (still doing?): Abstract away all access to the filesystem with calls to fs.FS. See his PR: #851. THis is problably outdated and has too many conflics, but it gives a good idea of what needs to be done.

Next would be to make taglib and ffmpeg work through fs.FS as well.

For ffmpeg, I think we could read the file with Go code as a Reader and pass it as stdin (instead of an argument as -i), both for the probe command (to read tags) and for the transcoding.

For taglib, your idea of using IOStream could work, but loading the a whole music file in memory just to extract tags can be problematic: It will have to read the whole file so the scanning process will be very slow as it will have to download all files from the cloud provider. This already happens if we use rclone mount, so it won't be worse than what we currently have. Another issue is that if we have large files, like "Close to the Edge" (130MB in my library), it would use too much memory of the server, specially if we start doing this in parallel. Because of these issues, I rather implement a C++ File class wrapper for fs.File as I suggest above, so we could read the file in chunks.

What do you think?

aaronhuggins · 2021-09-16T00:54:34Z

I'd like to take a closer look at IOStream, but you may be right about memory constraints especially in parallel. I do have some experience with the Kodi codebase and they use TagLib too, so I may take a look at how they've implemented it under the hood to see if it provides any useful hints. My instinct could be wrong, and the fs.File wrapper could be more viable.

I'm very interested in this feature, as I would like to be able to natively use Navidrome with webdav. I care a little about File class wrapper vs buffer, but not enough to keep me from hacking on this if you are REEALLY tied to the wrapper. I'll get back to you as I'm able.

github-actions · 2023-03-07T02:13:04Z

This issue has been automatically marked as stale because it has not had recent activity. The resources of the Navidrome team are limited, and so we are asking for your help.
If this is a bug and you can still reproduce this error on the master branch, please reply with all of the information you have about it in order to keep the issue open.
If this is a feature request, and you feel that it is still relevant and valuable, please tell us why.
This issue will automatically be closed in the near future if no further activity occurs. Thank you for all your contributions.

virtualdxs · 2023-03-30T21:28:47Z

I'm still very interested in this; rclone mount has been unreliable for me so i'd really like native s3 support

unbelauscht · 2023-08-22T15:52:19Z

Keep in mind that Navidrome needs to read the tags and will do a request per file. This may get expensive if you're not careful.

github-actions · 2024-02-19T01:43:42Z

This issue has been automatically marked as stale because it has not had recent activity. The resources of the Navidrome team are limited, and so we are asking for your help.
If this is a bug and you can still reproduce this error on the master branch, please reply with all of the information you have about it in order to keep the issue open.
If this is a feature request, and you feel that it is still relevant and valuable, please tell us why.
This issue will automatically be closed in the near future if no further activity occurs. Thank you for all your contributions.

ChrisHorrocks · 2024-02-29T10:40:59Z

Keep in mind that Navidrome needs to read the tags and will do a request per file. This may get expensive if you're not careful.

Pretty sure if Navidrome were hosted on an EC2 instance in the same region as the S3 bucket you wouldn't incur transfer-out cost on scan, only on client access.

Of course if you're hosting Navidrome on EC2 you could also just use a mounted EBS volume or even EFS, but that would be several times the cost.

jvoisin added enhancement go Go code help wanted labels Mar 14, 2021

fionera mentioned this issue Mar 16, 2021

Use Go 1.16 fs package for FS interaction #851

Closed

aaronhuggins mentioned this issue Sep 14, 2021

Add remote file storage support (Looking to discuss and potentially contribute) #1345

Closed

github-actions bot added the stale label Mar 7, 2023

github-actions bot removed the stale label Mar 31, 2023

deluan mentioned this issue Jan 23, 2024

WIP: New scanner #2709

Draft

github-actions bot added the stale label Feb 19, 2024

github-actions bot removed the stale label Mar 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add API to replace FS access #832

Add API to replace FS access #832

fionera commented Mar 14, 2021

deluan commented Mar 14, 2021

fionera commented Mar 14, 2021

deluan commented Mar 14, 2021

fionera commented Mar 14, 2021 •

edited

deluan commented Mar 16, 2021

fionera commented Mar 16, 2021

deluan commented Mar 17, 2021

xorander00 commented Aug 3, 2021 •

edited

deluan commented Sep 7, 2021 •

edited

aaronhuggins commented Sep 15, 2021

deluan commented Sep 16, 2021

aaronhuggins commented Sep 16, 2021 •

edited

github-actions bot commented Mar 7, 2023

virtualdxs commented Mar 30, 2023

unbelauscht commented Aug 22, 2023

github-actions bot commented Feb 19, 2024

ChrisHorrocks commented Feb 29, 2024

Add API to replace FS access #832

Add API to replace FS access #832

Comments

fionera commented Mar 14, 2021

deluan commented Mar 14, 2021

fionera commented Mar 14, 2021

deluan commented Mar 14, 2021

fionera commented Mar 14, 2021 • edited

deluan commented Mar 16, 2021

fionera commented Mar 16, 2021

deluan commented Mar 17, 2021

xorander00 commented Aug 3, 2021 • edited

deluan commented Sep 7, 2021 • edited

aaronhuggins commented Sep 15, 2021

deluan commented Sep 16, 2021

aaronhuggins commented Sep 16, 2021 • edited

github-actions bot commented Mar 7, 2023

virtualdxs commented Mar 30, 2023

unbelauscht commented Aug 22, 2023

github-actions bot commented Feb 19, 2024

ChrisHorrocks commented Feb 29, 2024

fionera commented Mar 14, 2021 •

edited

xorander00 commented Aug 3, 2021 •

edited

deluan commented Sep 7, 2021 •

edited

aaronhuggins commented Sep 16, 2021 •

edited