Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[reddit] AttributeError: 'NoneType' object has no attribute 'startswith' (cw: NSFW Links) #2913

Closed
sinclairkosh opened this issue Sep 13, 2022 · 2 comments
Labels

Comments

@sinclairkosh
Copy link

Hi,

The error I'm getting is as follows:

❯ ~ docker run -it --rm --tty trg/gd --verbose https://www.reddit.com/r/WincestTexts/comments/bvjx0v/ss_the_photos_part_1/
[gallery-dl][debug] Version 1.23.0
[gallery-dl][debug] Python 3.10.6 - Linux-5.10.104-linuxkit-x86_64-with-glibc2.31
[gallery-dl][debug] requests 2.28.1 - urllib3 1.26.12
[gallery-dl][debug] Starting DownloadJob for 'https://www.reddit.com/r/WincestTexts/comments/bvjx0v/ss_the_photos_part_1/'
[reddit][debug] Using RedditSubmissionExtractor for 'https://www.reddit.com/r/WincestTexts/comments/bvjx0v/ss_the_photos_part_1/'
[reddit][info] Requesting public access token
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): www.reddit.com:443
[urllib3.connectionpool][debug] https://www.reddit.com:443 "POST /api/v1/access_token HTTP/1.1" 200 151
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): oauth.reddit.com:443
[urllib3.connectionpool][debug] https://oauth.reddit.com:443 "GET /comments/bvjx0v/.json?limit=0&raw_json=1 HTTP/1.1" 200 2984
[reddit][error] An unexpected error occurred: AttributeError - 'NoneType' object has no attribute 'startswith'. Please run gallery-dl again with the --verbose flag, copy its output and report this issue on https://github.com/mikf/gallery-dl/issues .
[reddit][debug]
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/gallery_dl/job.py", line 82, in run
for msg in extractor:
File "/usr/local/lib/python3.10/site-packages/gallery_dl/extractor/reddit.py", line 52, in items
if url.startswith("https://i.redd.it/"):
AttributeError: 'NoneType' object has no attribute 'startswith'

That docker image being used is a standard python base image installing a zipped version of gd (direct from github) using pip without any other changes. I've also tried using both the most recent master and the most recent release of gd with the same error.

This error also appeared a few days ago on a version of master I'd been using from a few months ago (iirc was from around the end of May 2022), that's when I first saw it and prior to that time that particular version had been working flawlessly for months.

I've tried both with and without the custom config I usually use and same error results.

I thought it may have been the sub was a problem (being quarantined) but it works on other quarantined subs and also works when run on this particular sub as a whole.

I thought it may have been something to do with being an individual post, but it works on other individual posts within this sub and individual posts within other subs both quarantined and not. I have even redone oauth with a new token with no change in results.

Whilst the author is deleted, both the post still exists and the imgur link it points to exists.

The problem seems to pop up with random lots of older posts within this sub without any obvious pattern that I can detect.

Sadly, python is not my thing, so alas I have reached the end of my diagnostic tools and I've tried everything I can think of to make sure it's an actual issue not just something at my end.

And thus I turn the mystery over to you :)

Please LMK if there is any other info you need.

Many Thanks,

SK

@mikf
Copy link
Owner

mikf commented Sep 14, 2022

The problem with this reddit post is that the top comment has no url in the API response

    "url": null,

The exception is fixed with 35eddaa, but gallery-dl still cannot grab the imgur link, since it is just not there. The HTML page has it, but for some reason not the JSON representation, regardless of cookies and/or OAuth tokens.

Also, thank you for the detailed error report. It makes figuring out what went wrong a lot easier.

@mikf mikf added the bug label Sep 14, 2022
@sinclairkosh
Copy link
Author

Many thanks for the quick and comprehensive response along with your ongoing outstanding work on this project, I've dealt with A LOT of open source maintainers over my time and you're a shining example of how to do it right..

As a follow up in case you come across it again. I think I know how.. ok not how but possibly WHY, something like this might have happened.

A few months ago the mod of the sub deleted everything basically. the "community" as it were kicked up a stink told him he was reading the rules wrong etc.. Eventually he relented and turned the sub over to someone else but said that he'd undelete everything, which give him his due he did, but since then there's been some interesting quirks, like the all time top lists etc were all out of whack. But it seemed like, if not all then, a vast majority of the posts were back and this is the first time I've encountered any "issues" so it might be the case that not everything come back "right" so there's a few "undead" posts as it were floating around.

Happy to help where I can and I might finally get around to learning python ;)

SK

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants