Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request: Bookmark Summary #80

Open
iamhenry opened this issue Apr 12, 2024 · 7 comments
Open

Request: Bookmark Summary #80

iamhenry opened this issue Apr 12, 2024 · 7 comments

Comments

@iamhenry
Copy link

This will probably require a lot of work but for years i've been looking for an app that can take my bookmarks and create a summary from them

use case: take all my bookmarks from audio to text and transcibe them. max duration 60 secs to transcribe

Basically what Snipd podacst app is doing. It takes all my bookmarks from a podcast and generates a list and provides them as notes for me to review and dive deeper into that topic.

lmk what you think 😊

image

@rasmuslos
Copy link
Owner

The idea is pretty cool but this would depend on ABS providing transcriptions. I have looked into whisper and whisper.cpp to transcribe audio files, but I have not found the time to implement anything yet. But would have to be added to ABS first, then transcriptions in the now playing view, and after that bookmark summaries.

I would also recommend opening an issue in the ABS repo for this feature, as this should probably be implemented server-side, too.

@iamhenry
Copy link
Author

is that the only solution? is it possible to use an llm API via the cloud to generate it on the fly without having transcriptions?

@iamhenry
Copy link
Author

looks like there's a discussion around it that's a bit stale due to lack of eng resources

someone does mention Snipd which is exactly what i was hoping we could have for ABS/ShelfPlayer

advplyr/audiobookshelf#1723

@rasmuslos
Copy link
Owner

While it is possible to upload the audio file to a LLM provider like OpenAI and prompt it to generate a short summary it's really not ideal.
I am pretty sure this gets expensive real fast if you upload large audio files, which is required to give the model enough context. Also I am not sure about the legal implications of this, e.g. if you are even allowed to upload copyrighted works.

I have looked into whisper & whisper.cpp, things that can be used to transcribe an item, and they work pretty well. While word synced transcripts are not really possible, extracting timestamped sentences works pretty well. But I could not find the time to implement anything in audiobookshelf yet.
Using something like https://github.com/jzhang38/TinyLlama would probably suffice to then create summaries, but this requires the transcripts to exist in the first place.

And including a open source multi modal model to do the transcripts locally is not really an option. The app is around 15MB right now, including even a small one would inflate that to at least 6GB.

@iamhenry
Copy link
Author

iamhenry commented May 3, 2024

i think someone in the ABS community will be attempting to solve this issue with an initial prototype

i've been tracking the convo here advplyr/audiobookshelf#1723 (comment)

@iamhenry
Copy link
Author

iamhenry commented Jul 12, 2024

snipd just released a huge update related to this. was curious to see what you thought and if you have any aspirations to add this feature? https://x.com/snipd_app/status/1811024587292864948

The feature allows me to upload any audio file and convert it to chapters/transcript while also having the ability to create highlights while autogenerating AI titles

i understand this is a huge task but no other app i have checked is even thinking about the enhancement and could be a game changer for this app

attaching a few screenshots of the highlights and generated chapters

5B6068AD-4FF9-49E4-8B96-33D45ABE3B9F
8EB04577-62C5-4089-B4CB-F00357254F3E
B31A0417-0988-428D-AFC1-BECF08C07D70

@rasmuslos
Copy link
Owner

I think the actual features are easy enough to implement. Generating a transcript using whisper and then feeding it, together with a timestamp and a good prompt into an LLM like Llama is not that hard, the question is where do you run these AIs?

The sniped app is around 120 MB but I don't think the models are included (Whisper Base is around 140 MB, llama even bigger) so including the in the app binary is not possible. The memory consumption is also considerable (500 MB for whisper and multiple GB for llama).
Sniped runs them on their servers, which is why they are charing you for a subscription, a business model not suitable for ShelfPlayer. The AI features would have to be implemented in ABS, where large binaries, huge memory consumption and long program runtimes are possible. I look into doing this but was so unfamiliar with the codebase that I didn't pull through.
I may try again in the winter but until someone adds these features to ABS its not feasible to add them to ShelfPlayer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants