Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] How to stop kemono.party from downloading duplicates ? #2886

Open
maxman2103 opened this issue Sep 2, 2022 · 8 comments
Open

Comments

@maxman2103
Copy link

so some of the artist accounts from kemono.party are downloading the same images but with different name. accounts such

https://kemono.party/fanbox/user/11701235
https://kemono.party/fanbox/user/8252709

this 2 and some other downloaded the same images with different name. this happens when i try to say download new images from the site instead of skipping they just the image again. Like this post https://kemono.party/fanbox/user/8252709/post/3919619 , 2 copies were downloaded of every images and the numbering for one of the images were like this

3919619_2022 6月号_10_6bafe37f-34a7-4f18-bde5-fe5d324540c6
3919619_2022 6月号_10_6c251ea2-033c-4c58-8a60-5ed92e545d82

same image with 2 different name.

how do I stop this from happening again.

More info- I use gallery-dl.conf and most of it is default except twitter where just added my account name and pass
(English is not my first language so if have issue not understand I apologies)

@maxman2103 maxman2103 changed the title How to stop kemono.party from downloading duplicates ? [Question] How to stop kemono.party from downloading duplicates ? Sep 2, 2022
@mikf
Copy link
Owner

mikf commented Sep 2, 2022

I think you can use a download archive with archive-format set to "{hash}". This should prevent it from downloading the same file multiple times.

@maxman2103
Copy link
Author

I am not very knowledgeable in this, how do i download and setup download archive and set hash?

@mikf
Copy link
Owner

mikf commented Sep 4, 2022

Get the default config file, put it somewhere gallery-dl will load it automatically (or use -c), and add the following next to the other sites' options (adjust path/etc if necessary):

        "kemonoparty": {
            "archive": "%APPDATA%/gallery-dl/kemono.sqlite3",
            "archive-format": "{hash}"
        },

You might also want to use a different filename than the default.

@biggestsonicfan
Copy link

I can confirm (in my quest to deduplicate my own archives) that kemonoparty does change their hashes if the type changes from an attachment type to a file type or vice-versa. So checking the hash against a database of hashes will only get you so far.

@maxman2103
Copy link
Author

@cglmrfreeman
should I use something like
""archive-format": ": "{id}_{title}_{num:>02}_{hash}.{extension}",
this or change it?
found it form #2740 and modify it a little.
sorry if i made error i am not very knowable at coding.

@biggestsonicfan
Copy link

You can't use ""archive-format": ": "{id}_{title}_{num:>02}_{hash}.{extension}", because you'd have a duplicate ": in there and the config would not be valid.

However, like I previously said, I am finding kemonoparty is changing their hashes on some of their files, hell Patreon itself outside of Kemonoparty also changes hashes on their files. So if you want 100% no duplicates and 100% of all posts, I would remove the hash entirely to get this: "archive-format": "{service}_{user}_{id}_{num}". You get the service, the user, the number of the post, and the number the image was in the post. If the filename or hash changes, the id and num will not.

@Yasand123
Copy link

Yasand123 commented Sep 3, 2023

You can't use ""archive-format": ": "{id}_{title}_{num:>02}_{hash}.{extension}", because you'd have a duplicate ": in there and the config would not be valid.

However, like I previously said, I am finding kemonoparty is changing their hashes on some of their files, hell Patreon itself outside of Kemonoparty also changes hashes on their files. So if you want 100% no duplicates and 100% of all posts, I would remove the hash entirely to get this: "archive-format": "{service}_{user}_{id}_{num}". You get the service, the user, the number of the post, and the number the image was in the post. If the filename or hash changes, the id and num will not.

Is there a way for these changes in file name/format to apply retroactively, will it rename already downloaded files?

@biggestsonicfan
Copy link

Is there a way for these changes in file name/format to apply retroactively, will it rename already downloaded files?

I ran into this issue and the only way I found that works is to run a preprocessor passing the old filename, new filename, and hash to it and having the preprocessor verify the old file and hash then renaming the file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants