Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Big Feature Request: Support for audio books (e.g. m4b) #1419

Closed
sandreas opened this issue Oct 19, 2021 · 19 comments
Closed

Big Feature Request: Support for audio books (e.g. m4b) #1419

sandreas opened this issue Oct 19, 2021 · 19 comments

Comments

@sandreas
Copy link

Describe the solution you'd like

This might be a (too) big feature request... I would love to see support for audio books, since this kind of feature is pretty rare on mature streaming server solutions. The one I'm currently using is https://github.com/advplyr/audiobookshelf, but it would be awesome to have an all in one solution and I really prefer navidrome.

Best case scenario:

Maybe this request should be divided into multiple ones, but I just would like to ask, if this is beyond the scope of navidrome?

Best,
sandreas

@certuna
Copy link
Contributor

certuna commented Oct 19, 2021

Part of it is implemented already, through the Subsonic API (i.e. if you have a Subsonic player that supports audiobooks): #245

Filtering on Audiobooks will probably come with support for Release Type: #369

@deluan
Copy link
Member

deluan commented Oct 19, 2021

Yes most of it is supported already if you use a client with audiobook support (ex: DSub). The only thing I didn't understand is:

Supporting playlists

Navidrome already support playlists. What do you have in mind?

Re: filtering on audiobooks, this will also be possible when we introduce multiple libraries, so you could have a library only for audiobooks, apart from your music library.

@sandreas
Copy link
Author

@deluan @certuna thx for the quick response and the tips. I did not check that, but I'll close the issue until I got further impressions of these apps.

@sandreas
Copy link
Author

sandreas commented Oct 19, 2021

Ok, I made some experiments. First let me say that navidrome is awesome (really, the docker image worked nearly out of the box, even behind an nginx-proxy with Basic Auth - I used https://username:password@myserver.domain/). Since I know audio book stuff pretty well (see https://github.com/sandreas/m4b-tool), here are my thoughts, where I see room for improvement:

  • Currently, there is no way to globally "filter" the mediatype (music, audio book, podcast, video, etc.) in the web interface
    • In my case I have music and audiobooks all mixed up in the UI
    • I don't think its useful to have them all in the same view - playing a song followed by and audiobook or vice versa makes no sense to me
    • Having a dropdown in the main menu to set a global filter the media type would be awesome
  • Manually setting the mediatype for files or folders would also be nice to have support for mp3 based audiobooks
  • Chapters support would be nice for players (m4b native chapters and mp3 files with chapters via Chapter Frame Addendum - see https://id3.org/id3v2-chapters-1.0)

So this feature request comes all down to supporting the "media type" with mp4 format files (e.g. m4b):

  • ITUNESMEDIATYPE or stik
  • Possible values
    0 = Movie (old)
    1 = Normal (Music)
    2= Audiobook
    6= Music Video
    9 = Movie
    10 = TV Show
    11 = Booklet
    14 = Ringtone
    23 = iTunes U
    

as well as setting the media type manually for other file types and beeing able to globally filter these and lastly to support embedded chapters.

I used the app substreamer with my iPhone and while the raw playback of audio books worked well, the same media type filtering problem applies to the app as well. Chapters did not work either - next/prev button did nothing on audio books.

What do you think?

@sandreas sandreas reopened this Oct 19, 2021
@certuna
Copy link
Contributor

certuna commented Oct 20, 2021

  • I think the idea is that selecting/filtering by media type in the Web UI would come as part of the bigger Release Type feature, especially in combination with Smart Playlists you'd then be able to make a sub-section of your library with just Audiobooks.
  • manual setting of Audiobooks media type is probably best done outside of Navidrome with a tag editor or other music manager, in principle ND doesn't write anything to the audio files
  • chapter support in the Web UI player would be cool but is difficult to add since the web player component isn't developed by Navidrome, we're using this upstream project so ideally support for chapters needs to be added there (basically, the behaviour that "if the file has chapters, prev/next goes to the next chapter instead of next track")
  • chapters are not part of the Subsonic API specs but 3rd party players could in principle add support for that themselves already, by parsing tags clientside. For example, DSub supports ReplayGain tags in a similar way.

@sandreas
Copy link
Author

I think the idea is that selecting/filtering by media type in the Web UI would come as part of the bigger Release Type feature

Looks good to me. One problem I can think of is, that Release Type is a bit unclear / unspecific. Although there is a release type "audio book", I'm not sure if ONE data field "release type" is enough to represent two pieces of information:

  • The form, how it is released (LP, CD, MP3 Download, etc.)
  • The descriptive information, what content is contained (music, podcast, audiobook, video, etc.)

However, you should definitely consider to set a global filter for "release type" property then to prevent mixing up different types of media in the user interface.

especially in combination with Smart Playlists you'd then be able to make a sub-section of your library with just Audiobooks.

Yes, smart playlists would be awesome. In my opinion smart playlists would be nothing more than "stored filters" and could be used ANYWHERE (including search inputs and music suggestions), which would make them very powerful. One implementation I was very impressed by is the JsonApi.NET one: https://www.jsonapi.net/usage/reading/filtering.html
Maybe you can get some further inspiration here, I especially liked the URL representations of complex filters without having to use JSON.

chapter support in the Web UI player would be cool but is difficult to add since the web player component isn't developed by Navidrome

I understand. You're right, chapter support should go to the upstream project. Although as far as I can tell, chapter support would not be that complex, if you just provide a list of timestamps+titles and "seek" to the according chapter markers timestamp. It is also possible to renderCustomUI or at least use addExtendsContent:
https://github.com/lijinke666/react-music-player/blob/8ec6ff6bcaf374c6790d20002a4ff863e7158849/example/example.js#L762
https://github.com/lijinke666/react-music-player/blob/8ec6ff6bcaf374c6790d20002a4ff863e7158849/example/example.js#L479
just to display a list of these markers

chapters are not part of the Subsonic API specs but 3rd party players could in principle add support for that themselves already, by parsing tags clientside. For example, DSub supports ReplayGain tags in a similar way.

I see. Well, maybe there would be a way to workaround this using virtual bookmarks? Bookmarks are normaly user specific, but specified with a "comment", whereas the comment could contain the chapter title prefixed with a special marker to signal the player, that this is not a user defined or editable bookmark, but an auto generated one that represents a chapter... Well, maybe this is not ideal, since:

If a bookmark already exists for this file it will be overwritten.

which would mean, that chapter markers prevent users to generate a bookmark matching a chapter timestamp... I don't know, just an idea :-)

I hope you get me right... I'm not trying to tell you how to do your job, I just did some research what it would take to implement this myself, because I really like the project. Unfortunately I doubt that I find the time to contribute code... :-/

Keep up the good work and thank you for the polite and meaningful clarifications. Maybe you could also get some inspiration out of my comments, at least I would hope so.

Keep up the good work.

@sandreas
Copy link
Author

Ok, I did some further research and took a look at navidrome code. Seems that you are using ffmpeg to extract file metadata, which looks like a solid decision to me. Although I'm not sure that the parser is accurate / follows the specs (I have to verify that), I did not see any hints of a chapter parsing attempt.

So here is what I would like to do:

  • Rewrite the ffmpeg output parser to be specs compilant and support chapters as tags["chapters"][0] = [{start:0,name:"Chapter 1"}, {start:3434343, length:4500, name:"Last Chapter"}], start + length should be time.Duration, but this does not work: all: decide what can implement TextMarshaler/TextUnmarshaler golang/go#10275
    type Chapter struct {
        start int
        length int // only required if last chapter or next chapters start does not fit prev chapters end
        name  string
    }
  • Now that we have the chapters extracted, we could add chapters to the MediaFile struct:
    mf := &model.MediaFile{}
  • Next we could store the chapters as normalized json into a new database field (I don't think, that chapters have to be normalized to an extra table) or we could store them directly as bookmarks
    • Here we could add a bookmarks.readonly flag to the database to prevent deletion or even better an internal field bookmarks.type(int) that is part of the primary key, so it would be possible to create user bookmarks and have a chapter at the same timestamp
  • Last step would be to create either virtual bookmarks based on the chapters json in the database field or just provide the stored bookmarks with a deletion prevention for type chapter:
    user, _ := request.UserFrom(r.ctx)

Here is some code for parsing chapters out of ffmpeg output I wrote some time ago (as is, untested and not complete):

func LoadAudioBookMeta(path string, ffmpegExecutable string, duration time.Duration) (*types.Item, string, error) {
	meta := new(types.Item)
	absPath, err := filepath.Abs(path)
	if err != nil {
		println("==> error on getting absolute path of", path)
		return nil, "", err
	}
	// ffmpeg, "-i", f.getAbsolutePath(), "-f", "ffmetadata", "-"
	cmdArgs := []string{
		"-i", absPath, "-f", "ffmetadata", "-",
	}
	stdOut, stdErr, err := shell.ExecWithTimeout(ffmpegExecutable, cmdArgs, duration)
	if err != nil {
		println("==> error on shell exec with timeout while loading meta", err.Error())
		return nil, stdOut + stdErr, err
	}
	var currentChapter *types.Item
	timeBase := time.Millisecond
	scanner := bufio.NewScanner(strings.NewReader(stdOut))
	for scanner.Scan() {
		line := scanner.Text()
		// ffmpeg could contain newlines in form of "description=here is text \
		// here is new line text"
		for {
			if !strings.HasSuffix(line, "\\") {
				break
			}
			scanner.Scan()
			line = strings.TrimSuffix(line, "\\") + "\n" + scanner.Text()
		}

		if strings.HasPrefix(line, ";") {
			continue
		}

		if strings.ToLower(line) == "[chapter]" {
			if currentChapter != nil {
				meta.AddChapter(currentChapter)
			}
			currentChapter = new(types.Item)
			continue
		}

		if currentChapter == nil {
			if !strings.Contains(line, "=") {
				log.Printf("line %s should be a key-value-pair but does not contain a = separator\n", line)
				continue
			}
			err = meta.SetPair(types.NewKeyValuePairFromString(line, "="))
			if err != nil {
				log.Printf("line %s results in an empty or unsupported key-value-pair\n", line)
				continue
			}
		} else {
			timeBase = handleChapterMetaData(line, timeBase, currentChapter)
		}
	}
	if currentChapter != nil {
		meta.AddChapter(currentChapter)
	}
	return meta, stdOut, nil
}

func handleChapterMetaData(line string, timeBase time.Duration, currentChapter *types.Item) time.Duration {
	pair := types.NewKeyValuePairFromString(line, "=")
	lowerKey := strings.ToLower(pair.Key)
	switch lowerKey {
	case "timebase":
		timeBasePair := types.NewIndexTotalItemFromString(pair.Value, "/")
		if timeBasePair.Index > 0 && timeBasePair.Total > 0 {
			timeBase = time.Duration(timeBasePair.Index) * time.Second / time.Duration(timeBasePair.Total) * time.Second
		}
	case "start":
		if startInt, err := strconv.Atoi(pair.Value); err == nil {
			currentChapter.Start = time.Duration(startInt) * timeBase
		}
	case "end":
		if endInt, err := strconv.Atoi(pair.Value); err == nil {
			currentChapter.End = time.Duration(endInt) * timeBase
		}
	case "title":
		currentChapter.Title = pair.Value
	}
	return timeBase
}

I could try to take care of this, but I see a lot of open pull requests and I'm not sure if this would be a valid solution. My questions:

  • Would you like me to give it a try and are you willing to provide some help?
  • If I put in enough effort and everything looks good to you (including tests), is there a chance to get this merged in the near future?

@deluan
Copy link
Member

deluan commented Oct 24, 2021

Wow! This is a big reply and I'll try my best to address all of it.

Although I'm not sure that the parser is accurate / follows the specs (I have to verify that), I did not see any hints of a chapter parsing attempt.

I'm not actually using the FFMETADATA info from ffmpeg, I'm using the stderr output and extracting the values from there. We need some info that is not present in the FFMETADATA output (like channels, bitrate, size and duration)

Also, keep in mind that we also support using taglib to extract tags, actually the default extractor is taglib. We need to support both, as taglib has support for multi-valued tags (ffmpeg does not), but it cannot extract tags from some types of files (ex: .DSF), which ffmpeg can.

Next we could store the chapters as normalized json into a new database field (I don't think, that chapters have to be normalized to an extra table) or we could store them directly as bookmarks

I rather store the chapters normalized, and probably in their own table. We can "union" them with bookmarks to implement your suggestion of pseudo-support for chapters

By the way, the primary key for bookmarks is the id of the track, so it only supports one bookmark per track, as per Subsonic API requirements:

ID of the media file to bookmark. If a bookmark already exists for this file it will be overwritten.

We can potentially store multiple bookmarks per track (or union them with the chapters as I said above), but I'm not sure how Subsonic clients will behave if they receive more than one bookmark for a given track.... Needs some testing

For Navidrome's UI, we can be more "creative", we could potentially add a MenuButton to the player with addExtendsContent, that could list all chapters for the current audio book, if the track playing is actually an audiobook. When selected this menu items would seek to the chapter position, as you said. I think this is feasible.

I could try to take care of this, but I see a lot of open pull requests

We have basically 3 categories open PRs:

  • DependaBot automatically updating dependencies: We have 14 of these ATM, and I only take care of them once or twice each release, unless it is a critical update
  • GSoC craziness, when a lot of students submitted nice PRs but also a lot of half-baked ones. From time to time I take a look at some, ask questions see if anyone replies, or try to finish it myself. Sometimes I just close them. I just don't have the time to go through all of them at once... :(
  • Good PRs that are actually pending my review and validation, and will most probably be merged, like Infinite Scroll, ListenBrainz support, Work/Movement support, Auth Rework...

To increase the chances of your PR being merged, we should discuss any implementations before hand (as we are doing here), you can ask questions on Discord (easier for me to reply during the day), and you should break the whole implementation in small PRs, easier to review and validate

Let me know if you want to implement this and we can discuss further.

@sandreas
Copy link
Author

sandreas commented Oct 24, 2021

Wow! This is a big reply and I'll try my best to address all of it.

Sry, I'll try to keep this one shorter :-) Let's summarize:

  • taglib is used by default for extracting tags and ffmpeg is mainly used for stream info
  • You prefer storing the chapters as a separate table
  • Subsonic is limited to one bookmark per track - if we would use this for more, it was not 100% specs compilant
  • A custom audio player feature supporting chapters would be ok
  • If my PRs are prepared, of smaller size and good quality, they have a good chance to get merged

So here is my suggestion. I'm from germany (timezones...) and my time is very limited (family, job), so discord would be a bit difficult, but I would really love to see this happen and I'm passionate for open source. Since I cannot guarantee to submit something good, I'd like to start with a little thing - the chapters storage and extraction.

I'll think over this and get back to you soon. My thoughts so far:

  • Chapters in Audiobooks are often called Chapter 1, Chapter2, etc
  • This would mean a lot of duplicate data in the database (500 Audiobooks, 100 Chapters each = 50000 items in the database)
  • Chapter-Names don't need not to be searchable and are tied to one single media file
  • The subsonic API is pretty limited for this purpose - I'm not sure there is a way to implement this API compilant in an elegant way - but it would be a pity to loose the ability of automatically integrating existing Android / iOS Apps in the process
  • Maybe an on the fly extraction non subsonic API endpoint with caching would be the right choice (/api/chapters/{media-id} returning only the chapters)

Thank you very much for investing the time to work this out. I really hope we can manage this together.

@deluan
Copy link
Member

deluan commented Oct 27, 2021

This would mean a lot of duplicate data in the database (500 Audiobooks, 100 Chapters each = 50000 items in the database)

I don't see that as an issue, the amount of data would be the same if the chapters were embedded in a text field in the media_file. But if you say there's no search based on it, then it may be fine to embed them.

The subsonic API is pretty limited for this purpose - I'm not sure there is a way to implement this API compilant in an elegant way - but it would be a pity to loose the ability of automatically integrating existing Android / iOS Apps in the process

Yeah, we will have to be "creative" if we want this to be supported by Subsonic. Maybe it is not possible at all, let's see

Maybe an on the fly extraction non subsonic API endpoint with caching would be the right choice (/api/chapters/{media-id} returning only the chapters)

I rather do that at scanner time. Some libraries are stored in cloud storage (like my own library), and I rather avoid access to the files unless I want to play them. And chapters could be returned as a field of the media_file (song in the native API), I don't see a need for a separated endpoint

@sandreas
Copy link
Author

sandreas commented Oct 28, 2021

Ok, I did some further research...

Database

The best case in my opinion would be to create a single table storing the chapters referenced to media-id (like in Genres and its struct).

Chapters

  • id
  • name
  • start
  • length

API

The only way to support subsonic API constraints I could find was the playQueue (see here).

Returns the state of the play queue for this user (as set by savePlayQueue). This includes the tracks in the play queue, the currently playing track, and the position within this track. Typically used to allow a user to move between different clients/apps while retaining the same play queue (for instance when listening to an audio book).

Official Example: http://www.subsonic.org/pages/inc/api/examples/playQueue_example_1.xml

The XSD specification tells us more details, e.g. that:

  • There are properties current with position that could be used for the current chapter
  • There are properties for duration and bookmarkPosition for every Child, which could be used for a list of all chapters, although they would point the the same media file

So heres is my suggestion:

  • Create a database table chapters similar to genres as well as according repositories, media file extensions and tests (in contrast to a json-field, this leaves more freedom for future extensions and still is not totally overengineered)
  • Extend the directory reader to store the chapters in the new table
  • Extend the API to support playQueue response with current and position using entry elements for chapters with duration and bookmarkPosition
  • Extend the frontend player to support chapters
  • (Optional) Extend the native API to support chapters

What do you think?

@deluan
Copy link
Member

deluan commented Oct 29, 2021

The only way to support subsonic API constraints I could find was the playQueue (see here).

playQueue won't be a good fit for this, as only a few clients supports them and those that do only get/save playQueue automatically, they don't allow the user to trigger it. It is basically used for a different use-case. I think the best bet is to try the bookmarks API. It's not worth to "bend" the specification if we don't have clients to support the different usage.

Create a database table chapters similar to genres

Well, it will be a bit different than genres: It will be a one-to-many relationship with media_file, and with genres it is a many-to-many relationship. But yeah, one table with the fields you listed above plus media_file_id

I think a good first PR would include the first two items in your task list. Just adding an optional array of chapters in model/mediafile.go, with proper json mapping, is enough to expose the chapters in the Native API. You just have to left join the chapters table with the media_file

Once we have it in the DB, we can do some experimentation with the bookmarks Subsonic endpoint

@sandreas
Copy link
Author

I think a good first PR would include the first two items in your task list. Just adding an optional array of chapters in model/mediafile.go, with proper json mapping, is enough to expose the chapters in the Native API. You just have to left join the chapters table with the media_file

Awesome, I'll try it.

@sandreas
Copy link
Author

@deluan
I'm really sorry, but I have to tell you that already for a while now I'm no longer in the position to open PRs this big (personal reasons). My time is VERY limited at the moment and unfortunately it is not enough to further develop this feature.

So if this can't be delegated or you don't would like to develop it yourself, this issue can be closed (although I would love to see it in the future).

@smolenskij
Copy link

@sandreas If you have any work up until this point, is it in a branch at all? This is a feature I'm also very interested in, and may be able to take a look at things in the next few months

@sandreas
Copy link
Author

If you have any work up until this point, is it in a branch at all? This is a feature I'm also very interested in, and may be able to take a look at things in the next few months

@smolenskij Only what I posted here. I did some experiments but discarded them, because I'm still not sure if navidrome has usable fundamental basics for audio book support. It may work, but the sheer amount of work that has to be invested is just not worth it in my opinion.

There are already solutions that work well (audiobookshelf, jellyfin, etc.) and I don't want to reinvent the wheel.

However, Currently I am working on a cross platform UI application (Windows, Linux, Mac, Android, iOS, eventually WASM), that comes close to the user interface of my good old iPod Nano with a few tweaks for modern streaming support. It is planned to support multiple data sources / APIs (like navidrome, audiobookshelf, jellyfin etc.) to remove any limitation of self-hosted backends.

It is based on AvaloniaUI and LibVLCSharp and currently it looks like this, but this is a HUGE project and maybe I'll never gonna finish it to release state :-)

Bild toneuieliho.jpg auf abload.de

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. The resources of the Navidrome team are limited, and so we are asking for your help.
If this is a bug and you can still reproduce this error on the master branch, please reply with all of the information you have about it in order to keep the issue open.
If this is a feature request, and you feel that it is still relevant and valuable, please tell us why.
This issue will automatically be closed in the near future if no further activity occurs. Thank you for all your contributions.

@sandreas
Copy link
Author

For audio books I'm now using audiobookshelf + Audiobookshelf App, while for music I still use Navidrome and Substreamer App. Thanks for making this.

Copy link

github-actions bot commented Jan 9, 2024

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 9, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants