Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug?] [Imgur] [Reddit] Downloading an imgur link uses reddit keywords in --list-keywords but not when actually downloading #1687

Closed
Scripter17 opened this issue Jul 11, 2021 · 5 comments

Comments

@Scripter17
Copy link
Contributor

gallery-dl -K https://www.reddit.com/r/tumblr/comments/oi6hf0/all_librarian_lives_matter/ gives reddit keywords despite gallery-dl https://www.reddit.com/r/tumblr/comments/oi6hf0/all_librarian_lives_matter/ using imgur keywords. This probably happens with other sites as well but I haven't tested it

I'm not too sure how to properly resolve the issue of this messing up archives though. The only real solution I can think of is letting reddit access the linked site's keywords, but trying to put that in an extractors.*.filename would be a mess even with conditional filenames. Alternatively you could pass the entire config into the filename and do something obscene like "filename":"reddit-{subreddit}-{config[extractors][{linked_category}][filename]}". Look me in the eye and tell me that is a good idea

For the time being I'm only downloading one user so I can just hard-code them into a special directory, but there's almost certainly people for whom this causes problems

 

Side note: I hope I'm not being annoying with how many issues/feature requests I'm submitting. I'm trying to keep them high quality but I wouldn't be surprised if I get caught up in a spam filter sometimes

@Hrxn
Copy link
Contributor

Hrxn commented Jul 12, 2021

Not sure if I can follow...
Which keywords do you expect that were missing or something?

Do you use the parent-metadata or the category-transfer options?

Because I seem to get the expected keywords for reddit, e.g:

PS E:\> $exampleURL = "https://www.reddit.com/r/tumblr/comments/oi6hf0/all_librarian_lives_matter/"
PS E:\> gallery-dl -K -v $exampleURL | sls -NoEmphasis -Context 0,1 "author|category|subreddit|subcategory"
[gallery-dl][debug] Version 1.18.1
[gallery-dl][debug] Python 3.9.6 - Windows-10-10.0.19042-SP0
[gallery-dl][debug] requests 2.25.1 - urllib3 1.26.6
[gallery-dl][debug] Starting KeywordJob for 'https://www.reddit.com/r/tumblr/comments/oi6hf0/all_librarian_lives_matter/'
[reddit][debug] Using RedditSubmissionExtractor for 'https://www.reddit.com/r/tumblr/comments/oi6hf0/all_librarian_lives_matter/'
[reddit][info] Refreshing private access token
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): www.reddit.com:443
[urllib3.connectionpool][debug] https://www.reddit.com:443 "POST /api/v1/access_token HTTP/1.1" 200 201
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): oauth.reddit.com:443
[urllib3.connectionpool][debug] https://oauth.reddit.com:443 "GET /comments/oi6hf0/.json?limit=20&raw_json=1 HTTP/1.1" 200 4216

> author
    Thryloz
> author_flair_background_color
    None
> author_flair_css_class
    None
> author_flair_richtext[]
> author_flair_template_id
    None
> author_flair_text
    None
> author_flair_text_color
    None
> author_flair_type
    text
> author_fullname
    t2_bdthneg5
> author_patreon_flair
    False
> author_premium
    True
> category
    reddit
> removed_by_category
    None
> subcategory
    submission
> subreddit
    tumblr
> subreddit_id
    t5_2r7hk
> subreddit_name_prefixed
    r/tumblr
> subreddit_subscribers
    1019946
> subreddit_type
    public
> author
    Thryloz
> author_flair_background_color
    None
> author_flair_css_class
    None
> author_flair_richtext[]
> author_flair_template_id
    None
> author_flair_text
    None
> author_flair_text_color
    None
> author_flair_type
    text
> author_fullname
    t2_bdthneg5
> author_patreon_flair
    False
> author_premium
    True
> category
    reddit
> removed_by_category
    None
> subcategory
    submission
> subreddit
    tumblr
> subreddit_id
    t5_2r7hk
> subreddit_name_prefixed
    r/tumblr
> subreddit_subscribers
    1019946
> subreddit_type
    public

PS E:\>

@Scripter17
Copy link
Contributor Author

If you try actually downloading the provided link, it uses the imgur extractor and thus the imgur keywords

@Hrxn
Copy link
Contributor

Hrxn commented Jul 12, 2021

Yes, I mean the image in this case is hosted on Imgur. This is how the reddit extractor usually worked, I believe..
There is not a real "correct" default here. Which metadata should you get? From reddit, from imgur? Both are legitimate.
Have you tried it with "parent-metadata" set to true?

@Scripter17
Copy link
Contributor Author

I wasn't aware of that option. It works pretty much perfectly for my purposes
There are probably edge-cases where you'd only want to overwrite parts of the metadata (and maybe with alias support), but that'd be a separate issue

Though this doesn't fix the issue of -K being different than the actual metadata used

Either way, thanks for your help!

@mikf
Copy link
Owner

mikf commented Jul 15, 2021

https://www.reddit.com/r/tumblr/comments/oi6hf0 uses Imgur keywords because the URL from this Reddit comment points to Imgur and gallery-dl internally uses an entirely new and mostly independent Imgur extractor to handle it, which also uses any config options from imgur.

It does more or less the same as

$ gallery-dl -g https://www.reddit.com/r/tumblr/comments/oi6hf0
https://i.imgur.com/CtXWfp2.jpg
$ gallery-dl https://i.imgur.com/CtXWfp2.jpg
/tmp/imgur/imgur_CtXWfp2_All Librarian lives matter.jpg

and you can get all available keywords by using -K with that Imgur URL. Granted, -K should do this automatically by itself, but for reasons I can't remember it currently only shows keywords for URLs where it knows the site it points to beforehand (URLs that have a _extractor entry in their metadata).

There are several parent-… options to better deal with child extractors like that, like the already mentioned parent-metadata, which quite recently got an update with which it doesn't necessarily overwrite all metadata keys anymore (#1651 (comment)), parent-skip, and parent-directory which puts any child extractor files inside their parent's directory.

Side note: I hope I'm not being annoying with how many issues/feature requests I'm submitting. I'm trying to keep them high quality but I wouldn't be surprised if I get caught up in a spam filter sometimes.

You aren't, no need to worry, but quite a lot of your feature requests are quite complicated or at least not as simple as, for example, changing a regex.

rautamiekka pushed a commit to rautamiekka/gallery-dl that referenced this issue Jul 17, 2021
@mikf mikf closed this as completed Sep 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants