Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add API to replace FS access #832

Open
fionera opened this issue Mar 14, 2021 · 17 comments
Open

Add API to replace FS access #832

fionera opened this issue Mar 14, 2021 · 17 comments

Comments

@fionera
Copy link

fionera commented Mar 14, 2021

Hi, I would like to use navidrome, but I have my files in an object storage and I would like to directly interact with it. How about an API so it is possible to use other storage providers. This could also allow to use as example rclone as storage source

@deluan
Copy link
Member

deluan commented Mar 14, 2021

Hi @fionera, thanks for trying Navidrome!

I already thought about introducing this FS abstraction, but after moving my library to GDrive and setting up RClone to mount it, I don't see any advantage of incorporating this functionality directly into Navidrome's codebase. It would be more code to maintain and it would never be better than something dedicated to it, like RClone. Think about all the tweaks and fine-tuning one have to do to make the access as fast and transparent as possible?

Having said that, I may still implement this in a future version, but it is definitely not a priority now because, as I said, you can already achieve this by using RClone.

Let me know your thoughts, and if I'm missing any important aspect of this.

@fionera
Copy link
Author

fionera commented Mar 14, 2021

I have a big library I cant just mount with rclone and I would implement my own driver for it then anyway. If its wanted, I could take a look into implementing this

@deluan
Copy link
Member

deluan commented Mar 14, 2021

I'm all for contributions, but I still don't get why you say you can't just mount it.. My library is 70K songs, mostly FLAC and it works pretty well, taking less than 30 seconds to add a new album. I know some folks that have more than a million songs, also mounting using rclone. Can you elaborate a bit more on why this is not possible in your case. And, in the issue description you said:

This could also allow to use as example rclone as storage source

So, I'm kind of confused here: why do you say using rclone as a storage source is good, but mounting with it is not?

Anyway, if we decide to implement something like this, I'd like to use an approach of external plugins, maybe using Hashicorp's go-plugin library

@fionera
Copy link
Author

fionera commented Mar 14, 2021

The Library I want to use and index currently has over 20 million songs and is over 500TB in size. Its split across multiple s3 buckets and not sorted in a human readable way. There are multiple abstractions in there. If I mount it, its far more overhead.

Using rclone was just an example, so I am not the only one who would use that Interface ;) I dont really like the native go plugin system, since it has multiple problems and the go-plugin lib from Hashicorp is using gRPC under the hood which afaik also adds overhead since you stream real files, tho for the normal User that shouldnt be a problem.

@deluan
Copy link
Member

deluan commented Mar 16, 2021

Wow! Are you trying to launch your own streaming service? LOL!

Seriously though, abstracting away the FS access is a Good Thing ™️, but I'm not sure how we would handle TagLib's access to the files (it is a C++ library, not Go).

By the way, in your case, you may encounter other issues with this huuuuge library, ex: I'm not sure if SQLite3 is the best option to store a library this size.

@fionera
Copy link
Author

fionera commented Mar 16, 2021

Wow! Are you trying to launch your own streaming service? LOL!

Nah not really, I am just experimenting with some open Storage Systems and want to access them over a nice UI. In this case I want to try to Index some Items from Archive.org.

Regarding TagLib... Arent there other Libraries that are written in Go for that? Maybe switch?
SQLite3 shouldnt be any Issue since I would probably also implement Elasticsearch or smth like that as Search Engine :)

@deluan
Copy link
Member

deluan commented Mar 17, 2021

Regarding TagLib... Arent there other Libraries that are written in Go for that? Maybe switch?

Unfortunately TagLib is the best option out there, and there are upcoming features that depends on some of its features, like multi-valued tags. Maybe we can create a File class in C++ whose methods calls the fs.File methods, and pass it to the TagLib::Open (or whatever it is called). Anyway we will have to find a solution for that...

@xorander00
Copy link

xorander00 commented Aug 3, 2021

I'll throw in my $0.02, as I think object store support (i.e. S3) would be very useful...

  • Mounting an S3 bucket using tools like rclone or s3fs isn't ideal because it's faking a POSIX compliant file system on top of an S3 bucket.
  • It can be fragile from the perspective of reading/writing since file systems are generally expected to have far less latency than object stores, which in the broader scope can affect the consistency model.
  • Local file systems are not portable, where as an S3 bucket can be local or remote and understood by the native API. This is great for larger datasets & also availability guarantees. I currently self-host Minio, but if I want to utilize Amazon S3, I can do so without much hassle.
  • S3 uses HTTP as its substrate, which gets through firewalls and can also be secured via TLS. The same can't be said for local file system access.
  • This is just a contrived example (though I do have a setup like this), but what if I have 10 physical servers (9 meant to run services + 1 file server), and I want to update/reboot the server that happens to be running Navidrome on it? Not a huge deal if my music isn't available, but it's a pain in the neck because it's now down when it doesn't have to be down. Ideally, something like HAproxy would just switch it over to another instance of Navidrome on the other server. No need to deal with local storage for it, since S3 is available on the file server without any issues. Also, if it's taken a step further, then storage can also be clustered (i.e. Minio in my case), so if I have two file servers & one goes down, there's no service interruption.
  • S3 can be streamed without an intermediary being required. That is, instead of requiring Navidrome directly to always stream the actual media, Navidrome could actually (if so desired) give the client the URL on the S3 bucket and the client could stream it directly from there. In this scenario, if Navidrome goes down for a minute, there's no interruption to media currently being streamed. Funkwhale actually supports something like this.
  • S3 supports signed URLs & expirations, which offers another option (de facto standardized in this case) for access control. I want to share a song with some random person and not have to go through the trouble of creating a user account for this one-off thing? Sure! Just ask S3 for a signed URL to the song, set the expiration (e.g. 24h), and send that to them.

Anyway, those are some initial reasons I can think of right now. I wrote this quickly & ad-hoc, so if anything is unclear, let me know and I'll see if I can expand further. If I'm wrong about something, please don't hesitate to correct me.

I'd take a shot at trying to implement this myself, but I'm not really a golang guy, mostly a Rust + TypeScript guy. I have worked with go-plugin a bit though (along with gRPC for HashiCorp clustering tools), so I hope it wouldn't be too foreign, but who knows. I'm also really short on time currently and don't want to do a quick/crappy job, so it might take me a month or two to get around to it.

Thoughts?

EDIT: Gah, I just noticed PR #851 lol :)

@deluan
Copy link
Member

deluan commented Sep 7, 2021

@xorander00, good points, and I'm starting to think it may be worth to implement something like this. But I still think it would be better to be implemented as a plugin, and I don't think the gRPC would cause any performance issues, as the main bottleneck would be the communication with the remote storage.

As I said before, we would need to write a C++ File class wrapper for fs.File, to allow TagLib to read tags from the remote fs. I think TagLib tries to read the whole file, which could also be an issue for large libraries.

@aaronhuggins
Copy link

@deluan I'm interested in picking this up. Do you already have a plan in place to implement?

Also, I'm not sure what your thoughts were on the fs.File wrapper, but we may not have to wrap the File instance explicitly and may be able to pass a stream of bytes instead using IOStream.

https://taglib.org/api/classTagLib_1_1IOStream.html

@deluan
Copy link
Member

deluan commented Sep 16, 2021

@deluan I'm interested in picking this up. Do you already have a plan in place to implement?

I think the first step is what @fionera started doing (still doing?): Abstract away all access to the filesystem with calls to fs.FS. See his PR: #851. THis is problably outdated and has too many conflics, but it gives a good idea of what needs to be done.

Next would be to make taglib and ffmpeg work through fs.FS as well.

For ffmpeg, I think we could read the file with Go code as a Reader and pass it as stdin (instead of an argument as -i), both for the probe command (to read tags) and for the transcoding.

For taglib, your idea of using IOStream could work, but loading the a whole music file in memory just to extract tags can be problematic: It will have to read the whole file so the scanning process will be very slow as it will have to download all files from the cloud provider. This already happens if we use rclone mount, so it won't be worse than what we currently have. Another issue is that if we have large files, like "Close to the Edge" (130MB in my library), it would use too much memory of the server, specially if we start doing this in parallel. Because of these issues, I rather implement a C++ File class wrapper for fs.File as I suggest above, so we could read the file in chunks.

What do you think?

@aaronhuggins
Copy link

aaronhuggins commented Sep 16, 2021

I'd like to take a closer look at IOStream, but you may be right about memory constraints especially in parallel. I do have some experience with the Kodi codebase and they use TagLib too, so I may take a look at how they've implemented it under the hood to see if it provides any useful hints. My instinct could be wrong, and the fs.File wrapper could be more viable.

I'm very interested in this feature, as I would like to be able to natively use Navidrome with webdav. I care a little about File class wrapper vs buffer, but not enough to keep me from hacking on this if you are REEALLY tied to the wrapper. I'll get back to you as I'm able.

@github-actions
Copy link

github-actions bot commented Mar 7, 2023

This issue has been automatically marked as stale because it has not had recent activity. The resources of the Navidrome team are limited, and so we are asking for your help.
If this is a bug and you can still reproduce this error on the master branch, please reply with all of the information you have about it in order to keep the issue open.
If this is a feature request, and you feel that it is still relevant and valuable, please tell us why.
This issue will automatically be closed in the near future if no further activity occurs. Thank you for all your contributions.

@github-actions github-actions bot added the stale label Mar 7, 2023
@virtualdxs
Copy link

I'm still very interested in this; rclone mount has been unreliable for me so i'd really like native s3 support

@github-actions github-actions bot removed the stale label Mar 31, 2023
@unbelauscht
Copy link

Keep in mind that Navidrome needs to read the tags and will do a request per file. This may get expensive if you're not careful.

Copy link

This issue has been automatically marked as stale because it has not had recent activity. The resources of the Navidrome team are limited, and so we are asking for your help.
If this is a bug and you can still reproduce this error on the master branch, please reply with all of the information you have about it in order to keep the issue open.
If this is a feature request, and you feel that it is still relevant and valuable, please tell us why.
This issue will automatically be closed in the near future if no further activity occurs. Thank you for all your contributions.

@github-actions github-actions bot added the stale label Feb 19, 2024
@ChrisHorrocks
Copy link

Keep in mind that Navidrome needs to read the tags and will do a request per file. This may get expensive if you're not careful.

Pretty sure if Navidrome were hosted on an EC2 instance in the same region as the S3 bucket you wouldn't incur transfer-out cost on scan, only on client access.

Of course if you're hosting Navidrome on EC2 you could also just use a mounted EBS volume or even EFS, but that would be several times the cost.

@github-actions github-actions bot removed the stale label Mar 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants