Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash with KeyError when trying to fetch post author's username #824

Closed
stijn-uva opened this issue Sep 28, 2020 · 4 comments
Closed

Crash with KeyError when trying to fetch post author's username #824

stijn-uva opened this issue Sep 28, 2020 · 4 comments
Labels
bug Bug

Comments

@stijn-uva
Copy link
Contributor

stijn-uva commented Sep 28, 2020

Describe the bug
When scraping data, the scraper occasionally crashes with a JSON error while retrieving the username of a post's poster. A field seems to be missing from the Instagram response - KeyError: 'entry_data'.

To Reproduce
The code was part of a larger project but the relevant part can be boiled down to:

instagram = instaloader.Instaloader(
            quiet=True,
            download_pictures=False,
            download_videos=False,
            download_comments=True,
            download_geotags=False,
            download_video_thumbnails=False,
            compress_json=False,
            save_metadata=True
        )

chunk = instagram.get_hashtag_posts("#blessed")
for post in chunk:
    print(post.owner_username)

Unfortunately the hashtag I tried this with is very high volume so I was not able to find the specific post this happened with. I have observed it on multiple occasions though. I will try to find a specific post this happens for and will update the issue if I do.

Expected behavior
No crash!

Error messages and tracebacks

Exception in thread Thread-1:
Traceback (most recent call last):
  File "C:\[path]\instagram_scraper.py", line 132, in scrape
    "author": post.owner_username,
  File "C:\[path]\site-packages\instaloader\structures.py", line 205, in owner_username
    return self.owner_profile.username
  File "C:\[path]\site-packages\instaloader\structures.py", line 198, in owner_profile
    owner_struct = self._full_metadata['owner']
  File "C:\[path]\site-packages\instaloader\structures.py", line 161, in _full_metadata
    self._obtain_metadata()
  File "C:\[path]\site-packages\instaloader\structures.py", line 153, in _obtain_metadata
    + json.dumps(pic_json['entry_data'], indent=2))
KeyError: 'entry_data'

Instaloader version
4.5.3 on Python 3.7/Windows

Additional context
I believe this bug was reported on an earlier occasion in #752.

@stijn-uva stijn-uva added the bug Bug label Sep 28, 2020
@stijn-uva
Copy link
Contributor Author

stijn-uva commented Sep 28, 2020

Here's a post URL this happened for: https://www.instagram.com/p/CFsmkKfJLpB/. I checked it seconds after the error occurred, but the post was (already?) deleted.

Perhaps the post was deleted between the index scrape and fetching the username?

@aandergr
Copy link
Member

aandergr commented Oct 5, 2020

Thanks for reporting this! It might be that the post was deleted between the index scrape and fetching the username, however, that should not lead Instaloader to fail. The KeyError is thrown in a code block executed when the metadata fetch fails:

if self._full_metadata_dict is None:
# issue #449
self._context.error("Fetching Post metadata failed (issue #449). "
"The following data has been returned:\n"
+ json.dumps(pic_json['entry_data'], indent=2))
raise BadResponseException("Fetching Post metadata failed.")

The KeyError is raised when accessing the nonexistant entry_data here, which is not really necessary.

@lorenzobn
Copy link
Contributor

Hi @aandergr ,
by not really necessary, you mean that the dump (line 153) can be removed?
If so, the error message at line 151 could be maintained, without any other information on data returned.
What do you think?

@aandergr
Copy link
Member

aandergr commented Nov 2, 2020

Hi @aandergr ,
by not really necessary, you mean that the dump (line 153) can be removed?
If so, the error message at line 151 could be maintained, without any other information on data returned.

Yes, the error message (lines 150-153) can be removed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug
Projects
None yet
Development

No branches or pull requests

3 participants