Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting 404 when trying to scrape hashtag #2144

Open
mapto opened this issue Dec 21, 2023 · 16 comments
Open

Getting 404 when trying to scrape hashtag #2144

mapto opened this issue Dec 21, 2023 · 16 comments
Labels
bug Bug

Comments

@mapto
Copy link

mapto commented Dec 21, 2023

Describe the bug
I get JSON Query to explore/tags/foodie/: 404 Not Found [retrying; skip with ^C] when I try to scrape a hashtag.
Clearly, the https://www.instagram.com/explore/tags/foodie/ is a normal lively hashtag.
I've also tried other hashtags and get the same error.

To Reproduce

from instaloader import Instaloader, Hashtag
from secret import USER

L = Instaloader()
L.load_session_from_file(USER)  # session file duly created
print(f"Logged in as: {L.test_login()}")

h = Hashtag.from_name(L.context, "foodie")

Expected behavior
I'd like to iterate over h.get_posts_resumable()

Error messages and tracebacks

JSON Query to explore/tags/foodie/: 404 Not Found [retrying; skip with ^C]
JSON Query to explore/tags/foodie/: 404 Not Found [retrying; skip with ^C]

---------------------------------------------------------------------------
QueryReturnedNotFoundException            Traceback (most recent call last)
File /usr/local/lib/python3.10/dist-packages/instaloader/instaloadercontext.py:405, in InstaloaderContext.get_json(self, path, params, host, session, _attempt, response_headers)
    404 if resp.status_code == 404:
--> 405     raise QueryReturnedNotFoundException("404 Not Found")
    406 if resp.status_code == 429:

QueryReturnedNotFoundException: 404 Not Found

During handling of the above exception, another exception occurred:

QueryReturnedNotFoundException            Traceback (most recent call last)
File /usr/local/lib/python3.10/dist-packages/instaloader/instaloadercontext.py:405, in InstaloaderContext.get_json(self, path, params, host, session, _attempt, response_headers)
    404 if resp.status_code == 404:
--> 405     raise QueryReturnedNotFoundException("404 Not Found")
    406 if resp.status_code == 429:

QueryReturnedNotFoundException: 404 Not Found

During handling of the above exception, another exception occurred:

QueryReturnedNotFoundException            Traceback (most recent call last)
File /usr/local/lib/python3.10/dist-packages/instaloader/instaloadercontext.py:405, in InstaloaderContext.get_json(self, path, params, host, session, _attempt, response_headers)
    404 if resp.status_code == 404:
--> 405     raise QueryReturnedNotFoundException("404 Not Found")
    406 if resp.status_code == 429:

QueryReturnedNotFoundException: 404 Not Found

The above exception was the direct cause of the following exception:

QueryReturnedNotFoundException            Traceback (most recent call last)
Cell In[5], line 6
      2 hashtag = "foodie"
      4 print(L.context)
----> 6 h = Hashtag.from_name(L.context, hashtag)
      7 print(h)
      8 h

File /usr/local/lib/python3.10/dist-packages/instaloader/structures.py:1662, in Hashtag.from_name(cls, context, name)
   1660 # pylint:disable=protected-access
   1661 hashtag = cls(context, {'name': name.lower()})
-> 1662 hashtag._obtain_metadata()
   1663 return hashtag

File /usr/local/lib/python3.10/dist-packages/instaloader/structures.py:1676, in Hashtag._obtain_metadata(self)
   1674 def _obtain_metadata(self):
   1675     if not self._has_full_metadata:
-> 1676         self._node = self._query({"__a": 1, "__d": "dis"})
   1677         self._has_full_metadata = True

File /usr/local/lib/python3.10/dist-packages/instaloader/structures.py:1671, in Hashtag._query(self, params)
   1670 def _query(self, params):
-> 1671     json_response = self._context.get_json("explore/tags/{0}/".format(self.name), params)
   1672     return json_response["graphql"]["hashtag"] if "graphql" in json_response else json_response["data"]

File /usr/local/lib/python3.10/dist-packages/instaloader/instaloadercontext.py:435, in InstaloaderContext.get_json(self, path, params, host, session, _attempt, response_headers)
    433         if is_other_query:
    434             self._rate_controller.handle_429('other')
--> 435     return self.get_json(path=path, params=params, host=host, session=sess, _attempt=_attempt + 1,
    436                          response_headers=response_headers)
    437 except KeyboardInterrupt:
    438     self.error("[skipped by user]", repeat_at_end=False)

File /usr/local/lib/python3.10/dist-packages/instaloader/instaloadercontext.py:435, in InstaloaderContext.get_json(self, path, params, host, session, _attempt, response_headers)
    433         if is_other_query:
    434             self._rate_controller.handle_429('other')
--> 435     return self.get_json(path=path, params=params, host=host, session=sess, _attempt=_attempt + 1,
    436                          response_headers=response_headers)
    437 except KeyboardInterrupt:
    438     self.error("[skipped by user]", repeat_at_end=False)

File /usr/local/lib/python3.10/dist-packages/instaloader/instaloadercontext.py:423, in InstaloaderContext.get_json(self, path, params, host, session, _attempt, response_headers)
    421 if _attempt == self.max_connection_attempts:
    422     if isinstance(err, QueryReturnedNotFoundException):
--> 423         raise QueryReturnedNotFoundException(error_string) from err
    424     else:
    425         raise ConnectionException(error_string) from err

QueryReturnedNotFoundException: JSON Query to explore/tags/foodie/: 404 Not Found

Instaloader version
4.10.2

@mapto mapto added the bug Bug label Dec 21, 2023
@leofidus
Copy link

leofidus commented Jan 1, 2024

Also happens with the cli. instaloader #foodie just returns

JSON Query to explore/tags/foodie/: 404 Not Found [retrying; skip with ^C]
JSON Query to explore/tags/foodie/: 404 Not Found [retrying; skip with ^C]
Get hashtag #foodie: JSON Query to explore/tags/foodie/: 404 Not Found

Errors or warnings occurred:
Get hashtag #foodie: JSON Query to explore/tags/foodie/: 404 Not Found

Also happens with older versions of instaloader that used to work. Maybe it's related to the instagram app and web interface now only showing top posts for each tag? For example https://www.instagram.com/explore/tags/foodie/ only shows 28 posts, despite clearly stating that 249,155,265 exist.

@leofidus
Copy link

leofidus commented Jan 2, 2024

Actually in the app you still get an infinite scrolling version, it's only the web version that limits the number of posts visible. So there has to be an API somewhere that still returns the information we seek.

@mapto
Copy link
Author

mapto commented Jan 22, 2024

Looking at the code, it seems instaloader puts {"__a": 1, "__d": "dis"} in the payload. However, on 4 October 2023 on StackOverflow it was suggested that a solution should also contain a max_id parameter. I see that in the instaloader code this is also used, but how should I specify it for the first request when I don't have it yet?

@Takhir-18
Copy link

Looking at the code, it seems instaloader puts {"__a": 1, "__d": "dis"} in the payload. However, on 4 October 2023 on StackOverflow it was suggested that a solution should also contain a max_id parameter. I see that in the instaloader code this is also used, but how should I specify it for the first request when I don't have it yet?

I'm not sure whether this is exactly what you need, but when you search for the tags on instagram in your browser it returns a json with 'next_max_id' in it. So, what you can do is go to https://www.instagram.com/explore/tags/{your tag}, open the inspector, check for network and find get request to https://www.instagram.com/api/v1/tags/web_info/?tag_name={your tag} which returns json file. Within this file look for 'data' -> 'top' -> 'next_max_id'. As an alternative, you can try the method offered by Daniel Choi (comment made on 8th April, 2018) in the StackOverflow link that you provided, where you define 'id' as 17875800862117404 and 'after' = next_max_id. I hope this can solve the problem

@Ghodawalaaman
Copy link

I have tried making a request to https://www.instagram.com/api/v1/tags/web_info/?tag_name=anime but it gives me this error:
{ "message": "useragent mismatch", "status": "fail" }

I don't know what I am doing wrong.

@jackmat
Copy link

jackmat commented Feb 5, 2024

Is this solved ?, I am getting the same problem

@pankajyadav05
Copy link

+1 Getting same error.

@carl5431
Copy link

+1 getting also the error

@coloteong
Copy link

+1 also getting the error, can't find any workaround

@goodbyenary
Copy link

same here. :(

@thamaradias
Copy link

Me too

@giovsta
Copy link

giovsta commented Apr 9, 2024

I keep getting this same error as well with different hashtags, login done with cookies file, a domestic connection and a user agent

@johnnylegend
Copy link

@Ghodawalaaman you should run this code using the source code from instaloader and replace the url in from_username, but in order to do this, you need to login which may be blocked by Ins if you try many times

@debora-alayo
Copy link

Today i have this same problem :c

@dffesalbon
Copy link

having the same problem too

@stupiding
Copy link

same problem too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug
Projects
None yet
Development

No branches or pull requests