Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retrieve metadata from online sources #11

Open
1 of 4 tasks
gotson opened this issue Oct 8, 2019 · 41 comments
Open
1 of 4 tasks

Retrieve metadata from online sources #11

gotson opened this issue Oct 8, 2019 · 41 comments
Labels
enhancement New feature or request

Comments

@gotson
Copy link
Owner

gotson commented Oct 8, 2019

Mimicking Plex, Komga could manage metadata for series and books, and retrieve metadata from online providers.

See also #48 for manual metadata edition.

Potential providers:

In addition Komga should be able to:

  • Manually override metadata that was retrieved from online sources
@gotson gotson added the enhancement New feature or request label Oct 8, 2019
@bayang
Copy link

bayang commented Oct 25, 2019

I use https://leagueofcomicgeeks.com/ for series information and also for tracking comics (like trakt for comics) but unfortunately, no API provided 😞

@gotson
Copy link
Owner Author

gotson commented Oct 26, 2019

I use https://leagueofcomicgeeks.com/ for series information and also for tracking comics (like trakt for comics) but unfortunately, no API provided 😞

Indeed, the ajax methods return html directly :(

It also doesn't have the completeness information about a series (whether a series is ongoing, finished, abandoned, or in hiatus), which is one of the most important metadata i am looking after!

@The7thSage
Copy link

The7thSage commented Oct 26, 2019

Oh, Just read the last Checkbox, You can disregaurd the rest of this then...
then "maually override" is the same as "editing" metadata, Same result.

This is some Pie-in-The sky stuff, brace yourself....
What about local metadata? As long as it is read from an editable file (not a database?) at some point, you don't have to worry too bad about sources.

As long as you give us a template and location to write the data. That would let the user leverage any data-scraping (or elbow-grease copy-pasting) to write metadata not covered by direct functionality (and hopefully mildly future-proofing the feature).

Another example of a pre-set metadata format would be ComicRack's comicinfo.xml file within .cbz. I am only suggesting this one because "I" Use it alongside a pickier .json (both provided by Hdoujin Down-loader)

Just throwing out some ideas, no pressure.
Not in any hurry, I am using a separate database/reader for tagged manga, and reading in

If I had any idea how making a plugin woks and coding plugins, I'd have made a plugin already (probably leveraging HPX, my other database I'm running alongside this)

@gotson
Copy link
Owner Author

gotson commented Oct 29, 2019

Hi all, I would be interested to know a bit more about the metadata you are after, and how you are using it, so i get a better idea of what to implement and how to implement it.

Could you give some insights about:

  • what kind of metadata fields are you interested in (Author, Description...)
  • is that metadata for individual books, or for a series
  • how you use (or plan to use) this metadata, and how it impacts your workflow or reading experience

I'll throw in some that are of particular interest for me:

  • I would like to have the completeness status of a Series, which is whether it is complete (all books published), ongoing, or in hiatus/abandoned, so that I can filter my library and start reading completed series only.
  • I would like to have the list of all books in a Series, and match this with the list of books I have in my library, so that i can now which books I am missing.

@bayang
Copy link

bayang commented Oct 29, 2019

For me, if we follow the plex analogy, given a filename/folder hierarchy komga should be able to retrieve automatically series informations and book informations in series.
At least :
For series :
Authors, tile, description/summary, list of all books, completeness status (also like pull lists for comics), and a picture (cover/thumbnail) that a client can display
In series, for each book :
authors, title, description/summary, number in series and a picture (cover/thumbnail) that a client can display

Ideally komga should provide everything a client would need to satisfy at least basic needs.
The client would be in charge of the reading part only.

Considering tracking (like trakt for videos or anilist and kitsu for manga/anime), I'm not sure if it is the responsibility of komga or of the client.

Actually I use yacreader to manage comics library and reading -> but no information about books or series, no metadata, no tracking
And on mobile I use tachiyomi for manga (which also serves my tracking purpose) but a desktop client is missing (I'm currently having a look at https://github.com/xgi/houdoku, and I think komga could be implemented as a plugin for it, I'm currently having a look).

So if I could use komga on desktop AND mobile (with the tachiyomi plugin), for comics AND manga, that would be nice.

@gotson
Copy link
Owner Author

gotson commented Oct 30, 2019

Thanks for the detailed answer!

For the thumbnail, at the moment Komga generates one from the first page of each book. For a series, it's the thumbnail of the first book in the series. Do you think there would be a need to have a thumbnail coming from external sources ?

Considering tracking (like trakt for videos or anilist and kitsu for manga/anime), I'm not sure if it is the responsibility of komga or of the client.

I will add tracking (read status) in Komga at some point. It might be manual to start with, because the implementation in the clients is not in my hands. For example in Tachiyomi it is not managed as an extension, but in the main app. There is also some questions on how to track, should the tracking be done on the matched series/books, so with recognized IDs like ComicVine, or using the internal Komga ID (but those can change if you move your files on disk for example).

@bayang
Copy link

bayang commented Oct 30, 2019

No the thumbnails seem fine.
And tracking is indeed hard to get correctly, I don't have much ideas for now.

@WillowMist
Copy link

WillowMist commented Jan 2, 2020

I'm not sure where you're currently pulling information from, but there are two fairly common sources to check for, which I would suggest honoring before trying to pull from ComicVine (which would require each installation to get a CV API key, and should be throttled so you don't try to pull metadata for over 200 books in an hour):

ComicInfo.xml may exist inside an archive, especially if Mylar or ComicRack have been involved in the process of curating the books.

Additionally, ComicBookLover tags may exist in the zipfile comments (for CBZ only)

@frameset
Copy link

frameset commented Jan 3, 2020

Mylar plus a viewer such as Ubooquity or one of the ComicStreamer forks is a common usage scenario for many of us, so our collections come with metadata embedded in the file.

I'd love to replace Ubooquity with Komga as Komga is open source and has a seemingly much friendlier developer. 😉

I've got both running side by side for now, and I'm excited to see Komga get even better.

@WillowMist
Copy link

Yeah, if we get some control over how the OPDS is presented to the client (like with custom filters, or reading lists, etc) then this will be the perfect complement to Mylar. :)

@gotson
Copy link
Owner Author

gotson commented Jan 3, 2020

Yeah, if we get some control over how the OPDS is presented to the client (like with custom filters, or reading lists, etc) then this will be the perfect complement to Mylar. :)

Opds is quite flexible, and I plan to add reading lists to it later 😊

@MI3Guy
Copy link

MI3Guy commented Jan 4, 2020

Personally, I would really like to see ComicInfo.xml support. I prefer having metadata embedded in the files.

@gotson
Copy link
Owner Author

gotson commented Jan 5, 2020

Personally, I would really like to see ComicInfo.xml support. I prefer having metadata embedded in the files.

Planned in #54

@GlassedSilver
Copy link

Personally, I would really like to see ComicInfo.xml support. I prefer having metadata embedded in the files.

Planned in #54

Excellent, most importantly, I would really like to have some staging going on.

What I mean is: ComicInfo.xml should always override the online source and maybe the finding of the scanned source should be written into ComicInfo.xml (if there is none) into the folder/zip/etc... Why? Because it would be really important to be able to move that meta data out of Komga easily. This is one of my biggest gripes with Plex.

That way should a source ever return false information your earlier scrapes are always safe.

I usually like to manually verify the metadata, knowing that the information that is shown will not be altered unless I explicitly force override would be nice. With Plex I'm always a bit skeptical.

Another feat: even if I have to rebuild the entire library with a new database, the previous scrapes' metadata will be transferred.

BONUS: another key metadata source is doujinshi.org for all the doujinshi collectors out there. :) (this would also mean another media "kind". Doujinshi typically aren't published by companies, but "circles" and that's pretty much an important nomenclature. Also very important: usually doujinshis are released at conventions like Comiket for example (most famous example) and not only is there a certain numbering system derived from that (C50 for example would be an identifier usually found in the beginning of a doujinshi's digital file name) but also the importance to have a field for which convention it was released at.

If you look at doujinshi.org at sample entries (NSFW warning, it's very mixed content there and there is definitely no setting to view the site in SFW mode :D) you'll find which metadata is important to include.

@Gin-no-kami
Copy link

Just wanted to put forward a "better" metadata source for manga, MangaUpdates. It has more up to date and detatiled information about manga than Kitsu or MyAnimeList.

@GlassedSilver
Copy link

GlassedSilver commented Mar 23, 2020

Just wanted to put forward a "better" metadata source for manga, MangaUpdates. It has more up to date and detatiled information about manga than Kitsu or MyAnimeList.

No db "has it all". (I realize you didn't try to imply this)
We should probably try to diversify imho. One day a source might die and then you gotta move the tables again anyhow. With a plug-in design one could even go really gung-ho and have a local db to connect to.

@gotson gotson changed the title Metadata management Retrieve metadata from online sources Jun 10, 2020
@omgthegreenranger
Copy link

I use Mylar for my comic post-processing, but it uses a modified ComicTagger script internally. I like CT a lot on it's own because it allows more flexibility in data, I'd wonder if you could incorporate that tool into Komga somehow and give us some customization on metadata field mapping. Have a basic default, but let us muck about if we wanted to.

I'd love to see the ability to grab story arc data, including upcoming and previous storyarcs that are pulled from ComicVine and it collects all issues into one - or a collection based on character appearances, etc. CT seems to fill in all of those details (I don't know if Mylar does).

@chubits
Copy link

chubits commented Jul 1, 2020

It would be great to add a source "doujinshi.org" for " Manga/Doujinshi"

@GlassedSilver
Copy link

It would be great to add a source "doujinshi.org" for " Manga/Doujinshi"

Indeed! Two more really good sources are nhentai and exhentai.

For reference for those two one can look at the HappyPandaX project, specificall in the plugins repo:

https://github.com/happypandax/plugins

There's also a plugin that reads metadata files created by two very popular downloaders. ("File Metadata" plugin)

Overall a pretty nifty project for any doujinshi lover and I've been using it for a few weeks now. Right now I'm having a few issues with importing, but I think I borked something. I'll sit down for that issue later.

So far my strategy is to use both Komga and HPX, but to have feature overlap would be terrific, there's a lot each project can learn from the other. I'm very happy both exist! <3

@vmdude
Copy link

vmdude commented Jan 11, 2021

This feature would be awesome! 👍

@Ludo9743
Copy link

Hi!

Could you add the site manga-news as a source of metadata in French about mangas ? It's really complete and has a lot of information about manga sold in French. Unfortunately, the site does not seem to have an API.

Thank you.

@MKH-42
Copy link
Sponsor Contributor

MKH-42 commented Mar 28, 2021

Read metadata for comics with an ISBN from goolge books.
Comics with a manually or automated filled ISBN should look in google books for the metadata.
It should take over

  • titel,
  • authors
  • publisher
  • publish date
  • description
  • ISBN 13 when only ISBN 10 was the input
  • language
  • (page count)

Google Books API:
https://www.googleapis.com/books/v1/volumes?q=isbn:1234567890123

@AniUrbz
Copy link

AniUrbz commented May 3, 2021

Hi, could you add anilist to the list for manga metadata please, in practice anilist has been more complete in manga and mahwa than myanimelist, from oneshots to independent artists.

Here is the documentation and the api of the site. Thanks for your attention.
https://github.com/AniList/ApiV2-GraphQL-Docs

@Bitwolfies
Copy link

Would this feature in embed the data into the cbz like comic tagger? Or just exist only in komga? (or an option for either)

@MKH-42
Copy link
Sponsor Contributor

MKH-42 commented Jun 3, 2021

My wish is to add it to Komga only.
Maybe we can also create a feature request for export the metadata to comicinfo.xml and include it into comics.
For me is the automatic request only the initial step during the registration or importing of books.
Than you can also edit it manuelly.

@Bitwolfies
Copy link

My wish is to add it to Komga only.
Maybe we can also create a feature request for export the metadata to comicinfo.xml and include it into comics.
For me is the automatic request only the initial step during the registration or importing of books.
Than you can also edit it manuelly.

Personally id like the opposite, and would like to embed if possible, especially when his new comic metadata standard is ready. But both should be options.

@Kussie
Copy link
Contributor

Kussie commented Jun 3, 2021

Easiest approach from a developer stand point would probably be to start with it being added to Komga only first as the first phase, second phase would then probably be to add an export function, that would populate formats like ComicInfo.xml into the book files, third would be to add the ability to automatically export to book files when metadata is changed.

@Bitwolfies
Copy link

Easiest approach from a developer stand point would probably be to start with it being added to Komga only first as the first phase, second phase would then probably be to add an export function, that would populate formats like ComicInfo.xml into the book files, third would be to add the ability to automatically export to book files when metadata is changed.

Sounds about right, normally I would prefer just komga data, but I feel like books are a format that should be embedded, much like how music should be.

@gotson
Copy link
Owner Author

gotson commented Jun 4, 2021

Would this feature in embed the data into the cbz like comic tagger

Already requested here: #82

@Inervo
Copy link

Inervo commented Jan 12, 2022

Hi. As Komga is getting better and better with each update, with a nice metadata feature, I'm curious if this feature is still being considered (I hope 🤞 ).
What are the prerequisite necessary you wish to implement/have before having this feature?
Can we help somehow? :)

Thank you for this marvelous software

@tomandocubatas
Copy link

Hello,
Any progress with this functionality?
I think it would be a giant step in the functionality of the application.
Being able to fill in the metadata based on certain online websites is very, very interesting.

In any case, the work you are doing seems incredible to me. Thanks a lot!!!

@Pfuenzle
Copy link

Pfuenzle commented Feb 21, 2022

As there is no integration in Komga for an Anime Metadata provider, I made my own using the metadata from Anisearch.
https://github.com/Pfuenzle/AnisearchKomga. It supports all languages that are available on Anisearch and pushes the metadata directly to Komga.
If someone wants to help me to port it to Java to implement it in Komga it would be great

@Inervo
Copy link

Inervo commented Feb 21, 2022

As this is no integration in Komga for Metadata provider, I made my own using the metadata from Anisearch. https://github.com/Pfuenzle/AnisearchKomga. It supports all languages that are available on Anisearch and pushes the metadata directly to Komga. If someone wants to help me to port it to Java to implement it in Komga it would be great

Great work!

If there's the same sort of metadata for comics and BD (french/belgium comics), i would love this!

@gotson
Copy link
Owner Author

gotson commented Feb 21, 2022

If someone wants to help me to port it to Java to implement it in Komga it would be great

Komga is in Kotlin 😉

The metadata retrieval is much more than hitting an api and mapping fields though. That bit is probably only 10% of what I envision for metadata retrieval.

@Inervo
Copy link

Inervo commented Feb 22, 2022

Thanks for your reply gotson :)

Can you share with us the main components or behavior you envision for metadata retrieval?
Maybe some of us can help you a bit for some part of the code ;) And by doing so, speed up the release date of this feature

@gotson gotson removed the pinned label May 2, 2022
@Inervo
Copy link

Inervo commented Jan 12, 2023

Thanks for your reply gotson :)

Can you share with us the main components or behavior you envision for metadata retrieval? Maybe some of us can help you a bit for some part of the code ;) And by doing so, speed up the release date of this feature

Hi @gotson. Happy new year !! :)
If that's okay with you, could you share with us your vision regarding the main components for metadata retrieval?
Maybe some of us can help you a bit for some part of the code ;) And by doing so, speed up the release date of this incredible feature and contribute to the great software you created :)

@chu-shen
Copy link

chu-shen commented Feb 8, 2023

Bangumi metadata scraper for Komga👉https://github.com/chu-shen/BangumiKomga

Inspired by https://github.com/Pfuenzle/AnisearchKomga Thanks❤️

@Inervo
Copy link

Inervo commented Feb 15, 2023

In the meantime, for our french friends who wish to refresh their BD metadata from Bedetheque, here is a small metadata scrapper i've written 👉 https://github.com/Inervo/BedethequeKomga

Inspired from chu-shen/BangumiKomga and aubustou/bedetheque_scraper. Thanks a lot ❤️

NB: it's been ages since i've written some code, so it's far from perfect. Don't hesitate to raise any issue or to contribute :)

@knguyen1
Copy link

If someone wants to help me to port it to Java to implement it in Komga it would be great

Komga is in Kotlin 😉

The metadata retrieval is much more than hitting an api and mapping fields though. That bit is probably only 10% of what I envision for metadata retrieval.

Don't let "perfect" be the enemy of "good". ;) I'm sure if you start something others will contribute.

@Lreaper
Copy link

Lreaper commented Apr 5, 2024

I consider this the most crucial feature still missing in Komga. Besides the obvious benefits of metadata scraping this would also greatly assist in tracking the current status of a series.

@BushBoogie

This comment was marked as off-topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests