Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate Mangas/chapters ID handling #112

Open
MikeZeDev opened this issue Mar 18, 2023 · 15 comments
Open

Duplicate Mangas/chapters ID handling #112

MikeZeDev opened this issue Mar 18, 2023 · 15 comments
Labels
bug Something isn't working

Comments

@MikeZeDev
Copy link
Contributor

Case : https://raw.senmanga.com/mou-hatarakitakunaindesu-boukensha-nanka-yamete-yaru-imasara-taiguu-wo-kaerukara-to-onegai-sarete-mo-okotowaridesu-boku-wa-zettai-hatarakimasen

Its not caused by a bad CSS selector in Haru.
There are 2 chapters 3.1, with the same id.

When selecting the manga it cause errors, spinning stays, and subsequent click on others mangas keeps the spinning thingie, chapters are added BELOW IT

image

image

@MikeZeDev
Copy link
Contributor Author

MikeZeDev commented Mar 18, 2023

I see no reasons why a chapter id would be duplicated. Different language but same ID? Website must be able to differentiate what to load therefore Haru plugin should compose an identifier using id and language to make it unique anyway.

  1. Would it be doable to filter dupes by default?
  2. Or since it an edge case UI should at least handle it properly when we switch to another manga.

@Sheepux
Copy link
Contributor

Sheepux commented Mar 18, 2023

Oh interresting unit test that i should add in SAM
Is that an engine issue or an UI issue ? (going to test)

@MikeZeDev
Copy link
Contributor Author

MikeZeDev commented Mar 18, 2023

Error is throwed in node_modules/svelte/internal/index.mjs.

Is that an engine issue or an UI issue ? (going to test)

Both i guess. Error occurs because engine dont filter duplicate chapters, but UI should have be cleaned after we switched manga.

@Sheepux
Copy link
Contributor

Sheepux commented Mar 18, 2023

That's because i'm using a unique key identifier:
https://github.com/manga-download/haruneko/blob/master/web/src/frontend/classic/components/MediaItemSelect.svelte#L244
{#each filteredItems as item (item.Identifier)}
looks like that having the same item.Identifier is a big issue for svelte (seems logicical tho)

@MikeZeDev
Copy link
Contributor Author

MikeZeDev commented Mar 18, 2023

You're not wrong, its the website that is wrong ofc. Nonetheless, maybe something can be done to clean MedialistItem list UI (or whatever you call it) when changing manga (like, force remove spinner??)

Or remove spinner in case of error when fetching chapters? not sure if this one is easily catchable.

@Sheepux
Copy link
Contributor

Sheepux commented Mar 18, 2023

The only way to not get the internal svelte to crash is to dedupe the list.
This is something that MUST be dore by the engine.

@ronny1982
Copy link
Contributor

This is an easy fix, but there are some open questions:

  • If the website shows the same chapter twice, should HaruNeko do the same?
  • If duplicate chapters are to be filtered out, which chapter shall be kept in case the title is different?

image

@MikeZeDev
Copy link
Contributor Author

If the website shows the same chapter twice, should HaruNeko do the same?

I think the answer is obvious. If the chapter id is the same, it means links leads to the same resource. Even if the link set a cookie for language and link to the same chapter (url) but displayed differently according to the cookie then it should be treated as 2 different chapters, Haruneko should generate 2 different ID. If its the same ID, then those are identical, there is no reasons to display 2.

If duplicate chapters are to be filtered out, which chapter shall be kept in case the title is different?

We are talking about identical ID, referencing the same resource. In other words we are creating a workaround for a website error. I see no harm in arbitrarily choosing any of them, being the first or the second, as long as we stick to our choice.
"Keep the longest title" is not a really good criteria after all.

@Sheepux
Copy link
Contributor

Sheepux commented Apr 9, 2023

This is an easy fix, but there are some open questions:

  • If the website shows the same chapter twice, should HaruNeko do the same?
  • If duplicate chapters are to be filtered out, which chapter shall be kept in case the title is different?
    We have 2 issues, Titles and Identifiers (i'm using the identifier as key, but perhaps that's wrong)

T = Title, L = Link

T1/L1 + T1/L1 = Same

Ignore duplicate (show only 1)

T1/L1 + T1/L2 = different target but cannot have same name (would create the same folder).

Append "(x)" (x being the current count of duplicates of the same title)

T1/L1 + T2/L1 = same target but different name.

Leave it as it is (yes it would duplicate the content) but that's up to the website.
But that means that I shouldn't be using the identifier as key but the name ?

@MikeZeDev
Copy link
Contributor Author

MikeZeDev commented May 10, 2023

Case : https://w31.holymanga.net/manga-list/page-331/

image

It ends the Manga list loop 🤷

A whole page of dupe manga

@MikeZeDev MikeZeDev changed the title Duplicate chapter ID handling Duplicate Mangas/chapters ID handling May 10, 2023
@MikeZeDev
Copy link
Contributor Author

T1/L1 + T1/L1 = Same
Ignore duplicate (show only 1) 👍

T1/L1 + T1/L2 = different target but cannot have same name (would create the same folder).
Append "(x)" (x being the current count of duplicates of the same title)

Those will get unique id anyway, so its not a filtering problem but a download sanitizing problem
Therefore its not a plugin issue but something to be settled in download manager.

And ofc there is the issue of recognizing duplicate folders like that in UI, since we display downloaded folders i think, idk?
Also users can different name convention when downloading :D

T1/L1 + T2/L1 = same target but different name.
Leave it as it is (yes it would duplicate the content) but that's up to the website.
But that means that I shouldn't be using the identifier as key but the name ?

Ultimately if target is the same, its the same content. I see no use case where the content behind would be different with the same url. If its different content (like, url is the same but there is a dataset parameter on the link for different language idk, and script put a cookie idk before redirecting) then the plugin have to generate an unique identifier no matter what
aka JSON.stringify(id, title, language, whatever) as unique ID.

@MikeZeDev MikeZeDev added bug Something isn't working engine and removed engine labels Jul 25, 2023
@MikeZeDev
Copy link
Contributor Author

I suppose we can code a default filtering after FetchMangas & FetchChapters in the engine?

Something that could be overwritten for particular websites.

As decorator functions maybe? idk

My point is :

  • same id = same ressource : take either
  • if for some reason the same id point to different ressources (ie, langage set by session/cookie) the Fetch function MUST create 2 separate id

@ronny1982
Copy link
Contributor

That's because i'm using a unique key identifier: https://github.com/manga-download/haruneko/blob/master/web/src/frontend/classic/components/MediaItemSelect.svelte#L244 {#each filteredItems as item (item.Identifier)} looks like that having the same item.Identifier is a big issue for svelte (seems logicical tho)

Is it possible to use the item (object reference) itself as identifier instead of its ID?

{#each filteredItems as item (item)}

@ronny1982
Copy link
Contributor

There are other places as well, e.g. in Bookmarks.

It is a valid use case to have duplicate Identifiers in bookmarks (a user may have the same manga bookmarked from different websites which are using the same path style, such as seen in various WordPressMangaStream websites).

@Sheepux
Copy link
Contributor

Sheepux commented Oct 6, 2023

Is it possible to use the item (object reference) itself as identifier instead of its ID?

Looks like it's valid (tested it quickly on the tutorial)
https://learn.svelte.dev/tutorial/keyed-each-blocks

You can use any object as the key, as Svelte uses a Map internally — in other words you could do (thing) instead of (thing.id). Using a string or number is generally safer, however, since it means identity persists without referential equality, for example when updating with fresh data from an API server.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants