Duplicate Mangas/chapters ID handling #112

MikeZeDev · 2023-03-18T11:48:49Z

Case : https://raw.senmanga.com/mou-hatarakitakunaindesu-boukensha-nanka-yamete-yaru-imasara-taiguu-wo-kaerukara-to-onegai-sarete-mo-okotowaridesu-boku-wa-zettai-hatarakimasen

Its not caused by a bad CSS selector in Haru.
There are 2 chapters 3.1, with the same id.

When selecting the manga it cause errors, spinning stays, and subsequent click on others mangas keeps the spinning thingie, chapters are added BELOW IT

MikeZeDev · 2023-03-18T11:58:37Z

I see no reasons why a chapter id would be duplicated. Different language but same ID? Website must be able to differentiate what to load therefore Haru plugin should compose an identifier using id and language to make it unique anyway.

Would it be doable to filter dupes by default?
Or since it an edge case UI should at least handle it properly when we switch to another manga.

Sheepux · 2023-03-18T12:36:21Z

Oh interresting unit test that i should add in SAM
Is that an engine issue or an UI issue ? (going to test)

MikeZeDev · 2023-03-18T13:30:24Z

Error is throwed in node_modules/svelte/internal/index.mjs.

Is that an engine issue or an UI issue ? (going to test)

Both i guess. Error occurs because engine dont filter duplicate chapters, but UI should have be cleaned after we switched manga.

Sheepux · 2023-03-18T13:33:21Z

That's because i'm using a unique key identifier:
https://github.com/manga-download/haruneko/blob/master/web/src/frontend/classic/components/MediaItemSelect.svelte#L244
{#each filteredItems as item (item.Identifier)}
looks like that having the same item.Identifier is a big issue for svelte (seems logicical tho)

MikeZeDev · 2023-03-18T14:16:34Z

You're not wrong, its the website that is wrong ofc. Nonetheless, maybe something can be done to clean MedialistItem list UI (or whatever you call it) when changing manga (like, force remove spinner??)

Or remove spinner in case of error when fetching chapters? not sure if this one is easily catchable.

Sheepux · 2023-03-18T14:18:04Z

The only way to not get the internal svelte to crash is to dedupe the list.
This is something that MUST be dore by the engine.

ronny1982 · 2023-04-07T10:46:49Z

This is an easy fix, but there are some open questions:

If the website shows the same chapter twice, should HaruNeko do the same?
If duplicate chapters are to be filtered out, which chapter shall be kept in case the title is different?

MikeZeDev · 2023-04-07T15:40:47Z

If the website shows the same chapter twice, should HaruNeko do the same?

I think the answer is obvious. If the chapter id is the same, it means links leads to the same resource. Even if the link set a cookie for language and link to the same chapter (url) but displayed differently according to the cookie then it should be treated as 2 different chapters, Haruneko should generate 2 different ID. If its the same ID, then those are identical, there is no reasons to display 2.

If duplicate chapters are to be filtered out, which chapter shall be kept in case the title is different?

We are talking about identical ID, referencing the same resource. In other words we are creating a workaround for a website error. I see no harm in arbitrarily choosing any of them, being the first or the second, as long as we stick to our choice.
"Keep the longest title" is not a really good criteria after all.

Sheepux · 2023-04-09T18:40:19Z

This is an easy fix, but there are some open questions:

If the website shows the same chapter twice, should HaruNeko do the same?

If duplicate chapters are to be filtered out, which chapter shall be kept in case the title is different?
We have 2 issues, Titles and Identifiers (i'm using the identifier as key, but perhaps that's wrong)

T = Title, L = Link

T1/L1 + T1/L1 = Same

Ignore duplicate (show only 1)

T1/L1 + T1/L2 = different target but cannot have same name (would create the same folder).

Append "(x)" (x being the current count of duplicates of the same title)

T1/L1 + T2/L1 = same target but different name.

Leave it as it is (yes it would duplicate the content) but that's up to the website.
But that means that I shouldn't be using the identifier as key but the name ?

MikeZeDev · 2023-05-10T14:56:27Z

Case : https://w31.holymanga.net/manga-list/page-331/

It ends the Manga list loop 🤷

A whole page of dupe manga

MikeZeDev · 2023-07-25T11:34:31Z

T1/L1 + T1/L1 = Same
Ignore duplicate (show only 1) 👍

T1/L1 + T1/L2 = different target but cannot have same name (would create the same folder).
Append "(x)" (x being the current count of duplicates of the same title)

Those will get unique id anyway, so its not a filtering problem but a download sanitizing problem
Therefore its not a plugin issue but something to be settled in download manager.

And ofc there is the issue of recognizing duplicate folders like that in UI, since we display downloaded folders i think, idk?
Also users can different name convention when downloading :D

T1/L1 + T2/L1 = same target but different name.
Leave it as it is (yes it would duplicate the content) but that's up to the website.
But that means that I shouldn't be using the identifier as key but the name ?

Ultimately if target is the same, its the same content. I see no use case where the content behind would be different with the same url. If its different content (like, url is the same but there is a dataset parameter on the link for different language idk, and script put a cookie idk before redirecting) then the plugin have to generate an unique identifier no matter what
aka JSON.stringify(id, title, language, whatever) as unique ID.

MikeZeDev · 2023-09-09T13:55:31Z

I suppose we can code a default filtering after FetchMangas & FetchChapters in the engine?

Something that could be overwritten for particular websites.

As decorator functions maybe? idk

My point is :

same id = same ressource : take either
if for some reason the same id point to different ressources (ie, langage set by session/cookie) the Fetch function MUST create 2 separate id

ronny1982 · 2023-10-06T18:30:42Z

That's because i'm using a unique key identifier: https://github.com/manga-download/haruneko/blob/master/web/src/frontend/classic/components/MediaItemSelect.svelte#L244 {#each filteredItems as item (item.Identifier)} looks like that having the same item.Identifier is a big issue for svelte (seems logicical tho)

Is it possible to use the item (object reference) itself as identifier instead of its ID?

{#each filteredItems as item (item)}

ronny1982 · 2023-10-06T18:37:06Z

There are other places as well, e.g. in Bookmarks.

It is a valid use case to have duplicate Identifiers in bookmarks (a user may have the same manga bookmarked from different websites which are using the same path style, such as seen in various WordPressMangaStream websites).

Sheepux · 2023-10-06T18:58:23Z

Is it possible to use the item (object reference) itself as identifier instead of its ID?

Looks like it's valid (tested it quickly on the tutorial)
https://learn.svelte.dev/tutorial/keyed-each-blocks

You can use any object as the key, as Svelte uses a Map internally — in other words you could do (thing) instead of (thing.id). Using a string or number is generally safer, however, since it means identity persists without referential equality, for example when updating with fresh data from an API server.

MikeZeDev changed the title ~~Duplicate chapter ID handling~~ Duplicate Mangas/chapters ID handling May 10, 2023

MikeZeDev added bug Something isn't working engine and removed engine labels Jul 25, 2023

MikeZeDev mentioned this issue Oct 2, 2023

Filter Duplicates mangas & chapters #228

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Duplicate Mangas/chapters ID handling #112

Duplicate Mangas/chapters ID handling #112

MikeZeDev commented Mar 18, 2023

MikeZeDev commented Mar 18, 2023 •

edited

Sheepux commented Mar 18, 2023

MikeZeDev commented Mar 18, 2023 •

edited

Sheepux commented Mar 18, 2023

MikeZeDev commented Mar 18, 2023 •

edited

Sheepux commented Mar 18, 2023

ronny1982 commented Apr 7, 2023

MikeZeDev commented Apr 7, 2023

Sheepux commented Apr 9, 2023 •

edited

MikeZeDev commented May 10, 2023 •

edited

MikeZeDev commented Jul 25, 2023

MikeZeDev commented Sep 9, 2023

ronny1982 commented Oct 6, 2023

ronny1982 commented Oct 6, 2023

Sheepux commented Oct 6, 2023 •

edited

Duplicate Mangas/chapters ID handling #112

Duplicate Mangas/chapters ID handling #112

Comments

MikeZeDev commented Mar 18, 2023

MikeZeDev commented Mar 18, 2023 • edited

Sheepux commented Mar 18, 2023

MikeZeDev commented Mar 18, 2023 • edited

Sheepux commented Mar 18, 2023

MikeZeDev commented Mar 18, 2023 • edited

Sheepux commented Mar 18, 2023

ronny1982 commented Apr 7, 2023

MikeZeDev commented Apr 7, 2023

Sheepux commented Apr 9, 2023 • edited

MikeZeDev commented May 10, 2023 • edited

MikeZeDev commented Jul 25, 2023

MikeZeDev commented Sep 9, 2023

ronny1982 commented Oct 6, 2023

ronny1982 commented Oct 6, 2023

Sheepux commented Oct 6, 2023 • edited

MikeZeDev commented Mar 18, 2023 •

edited

MikeZeDev commented Mar 18, 2023 •

edited

MikeZeDev commented Mar 18, 2023 •

edited

Sheepux commented Apr 9, 2023 •

edited

MikeZeDev commented May 10, 2023 •

edited

Sheepux commented Oct 6, 2023 •

edited