TwitGet Bad Request 400 #100

Copy-link · 2020-08-11T23:44:22Z

I'm getting this problem using the old method, the one that doesn't involve a headless browser. It started up just a few hours ago. At first I thought it was IP-based, like I hit some sort of request limit, but not only did it not go away when I threw up a VPN, it seems to be errorring out on the Twitter profile page, not even getting to the step for the search API json.

This strikes me as very odd and makes me wonder if the error that is being thrown by xA-Scraper is even accurate.

Main.TwitGet.StatusMgr - INFO - GetArtist - veyopixel (ID: 437)
Main.WebRequest - INFO - Fetching content at URL: https://twitter.com/veyopixel
Main.WebRequest - INFO - Have additional GET parameters!
Main.WebRequest - INFO -        Item: 'Accept' -> 'application/json, text/javascript, */*; q=0.01'
Main.WebRequest - INFO -        Item: 'Referer' -> 'https://twitter.com/veyopixel'
Main.WebRequest - INFO -        Item: 'User-Agent' -> 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/603.3.8 (KHTML, like Gecko) Version/10.1.2 Safari/603.3.8'
Main.WebRequest - INFO -        Item: 'X-Twitter-Active-User' -> 'yes'
Main.WebRequest - INFO -        Item: 'X-Requested-With' -> 'XMLHttpRequest'
Main.WebRequest - INFO -        Item: 'Accept-Language' -> 'en-US'
Main.WebRequest - WARNING - Error opening page: https://twitter.com/veyopixel at Tue Aug 11 18:30:39 2020 On Attempt 1.
Main.WebRequest - WARNING - Error Code: HTTP Error 400: Bad Request
Main.WebRequest - WARNING - Original URL: https://twitter.com/veyopixel
Main.WebRequest - INFO - Have additional GET parameters!
Main.WebRequest - INFO -        Item: 'Accept' -> 'application/json, text/javascript, */*; q=0.01'
Main.WebRequest - INFO -        Item: 'Referer' -> 'https://twitter.com/veyopixel'
Main.WebRequest - INFO -        Item: 'User-Agent' -> 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/603.3.8 (KHTML, like Gecko) Version/10.1.2 Safari/603.3.8'
Main.WebRequest - INFO -        Item: 'X-Twitter-Active-User' -> 'yes'
Main.WebRequest - INFO -        Item: 'X-Requested-With' -> 'XMLHttpRequest'
Main.WebRequest - INFO -        Item: 'Accept-Language' -> 'en-US'
Main.WebRequest - ERROR - Failed to retrieve Website : https://twitter.com/veyopixel at Tue Aug 11 18:30:53 2020 All Attempts Exhausted
Main.WebRequest - CRITICAL - Critical Failure to retrieve page! https://twitter.com/veyopixel at Tue Aug 11 18:30:53 2020, attempt 2
Main.WebRequest - CRITICAL - Error:
Main.WebRequest - CRITICAL - Exiting
Main.TwitGet.StatusMgr - ERROR - Traceback (most recent call last):
Main.TwitGet.StatusMgr - ERROR -   File "D:\xA-Scraper\xascraper\modules\twit\twitScrape.py", line 256, in go
Main.TwitGet.StatusMgr - ERROR -     errored |= self.getArtist(aid=aid, artist=name, ctrlNamespace=ctrlNamespace)
Main.TwitGet.StatusMgr - ERROR -   File "D:\xA-Scraper\xascraper\modules\twit\twitScrape.py", line 206, in getArtist
Main.TwitGet.StatusMgr - ERROR -     for tweet in intf.get_all_tweets(artist, min_date):
Main.TwitGet.StatusMgr - ERROR -   File "D:\xA-Scraper\xascraper\modules\twit\vendored_twitter_scrape.py", line 281, in get_all_tweets
Main.TwitGet.StatusMgr - ERROR -     interval_start = self.get_joined_date(username, twit_headers)
Main.TwitGet.StatusMgr - ERROR -   File "D:\xA-Scraper\xascraper\modules\twit\vendored_twitter_scrape.py", line 149, in get_joined_date
Main.TwitGet.StatusMgr - ERROR -     ctnt = self.stateful_get("https://twitter.com/{user}".format(user=user), headers=twit_headers)
Main.TwitGet.StatusMgr - ERROR -   File "D:\xA-Scraper\xascraper\modules\twit\vendored_twitter_scrape.py", line 22, in stateful_get
Main.TwitGet.StatusMgr - ERROR -     return self.__stateful_get("getpage", url, headers, params)
Main.TwitGet.StatusMgr - ERROR -   File "D:\xA-Scraper\xascraper\modules\twit\vendored_twitter_scrape.py", line 54, in __stateful_get
Main.TwitGet.StatusMgr - ERROR -     page = func(url, addlHeaders=headers)
Main.TwitGet.StatusMgr - ERROR -   File "C:\Python38\lib\site-packages\WebRequest\WebRequestClass.py", line 195, in getpage
Main.TwitGet.StatusMgr - ERROR -     return self._unwaf_func("_getpage", requestedUrl, *args, **kwargs)
Main.TwitGet.StatusMgr - ERROR -   File "C:\Python38\lib\site-packages\WebRequest\WebRequestClass.py", line 160, in _unwaf_func
Main.TwitGet.StatusMgr - ERROR -     return target_func(requestedUrl, *args, **kwargs)
Main.TwitGet.StatusMgr - ERROR -   File "C:\Python38\lib\site-packages\WebRequest\WebRequestClass.py", line 658, in _getpage
Main.TwitGet.StatusMgr - ERROR -     raise Exceptions.FetchFailureError("Failed to retreive page", requestedUrl,
Main.TwitGet.StatusMgr - ERROR - WebRequest.Exceptions.FetchFailureError: <FetchFailureError 400 -> 'Bad Request' for url: https://twitter.com/veyopixel ({b''})>
Main.TwitGet.StatusMgr - ERROR -

The text was updated successfully, but these errors were encountered:

Copy-link · 2020-08-11T23:49:19Z

Shit. I just realized what the problem is... or at least I think this is what's wrong. They fucked with the joined date, again.

The relevant code, which is now broken...

	def get_joined_date(self, user, twit_headers):

		ctnt = self.stateful_get("https://twitter.com/{user}".format(user=user), headers=twit_headers)
		html = HTML(html=ctnt)
		joined_items = html.find(".ProfileHeaderCard-joinDateText")
		if not joined_items:
			raise exceptions.AccountDisabledException("Could not retreive artist joined date. "
				"This usually means the account has been disabled!")

		assert len(joined_items) == 1, "Too many joined items?"
		joined = joined_items[0]

		posttime = dateparser.parse(joined.attrs['title'])

		self.log.info("User %s joined twitter at %s", user, posttime)

		return posttime

.ProfileHeaderCard-joinDateText no longer exists, and now one would have to lookup the text within div[data-testid="UserProfileHeader_Items"] > span, but I'm not entirely sure how to lookup attributes other than class and id with this Python library.

I don't understand why this is throwing '400 Bad Request' instead of 'Could not retreive artist joined date.', however. Either more than one thing is wrong, or it's just not tripping if not joined_items for some reason.

fake-name · 2020-08-12T06:31:46Z

Dammit, I hate minified/obsfucated CSS.

fake-name · 2020-08-12T06:33:24Z

The reason you're seeing the 400 error is probably because they added more UA/header sniffing, which is catching that WebRequest isn't acting exactly like a browser.

More and more I'm considering trying to create a library around either the firefox or chromium HTTP(s) client code.

Copy-link · 2020-08-12T14:09:54Z

Not sniffing as it turns out, but now it's actually directly checking to see if JavaScript was loaded and blocking you if not.

<script nonce="ZmY4Y2NjZGUtNjZkMi00ZTY4LWIyZWEtMWE0ZDM1YmE2MDg4">
  if (!window.__SCRIPTS_LOADED__['main']) {
    document.getElementById('ScriptLoadFailure').style.display = 'block';
  }
</script>

Copy-link · 2020-08-12T14:16:37Z

This does look like something I'd have to jump to your headless browser method to, though I have a feeling even if I did, updates would be necessary since the way to obtain the join date changed so drastically.

Copy-link · 2020-08-12T15:08:35Z

Temporary workaround.

	def get_joined_date(self, user, twit_headers):

		ctnt = self.stateful_get("https://nitter.net/{user}".format(user=user), headers=twit_headers)
		html = HTML(html=ctnt)
		joined_items = html.find(".profile-joindate > span > div")
		if not joined_items:
			raise exceptions.AccountDisabledException("Could not retreive artist joined date. "
				"This usually means the account has been disabled!")

		assert len(joined_items) == 1, "Too many joined items?"
		joined = joined_items[0]

		posttime = dateparser.parse(joined.text.replace("Joined ",""))

		self.log.info("User %s joined twitter at %s", user, posttime)

		return posttime

God bless nitter.net

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TwitGet Bad Request 400 #100

TwitGet Bad Request 400 #100

Copy-link commented Aug 11, 2020

Copy-link commented Aug 11, 2020 •

edited

Loading

fake-name commented Aug 12, 2020

fake-name commented Aug 12, 2020 •

edited

Loading

Copy-link commented Aug 12, 2020 •

edited

Loading

Copy-link commented Aug 12, 2020

Copy-link commented Aug 12, 2020

TwitGet Bad Request 400 #100

TwitGet Bad Request 400 #100

Comments

Copy-link commented Aug 11, 2020

Copy-link commented Aug 11, 2020 • edited Loading

fake-name commented Aug 12, 2020

fake-name commented Aug 12, 2020 • edited Loading

Copy-link commented Aug 12, 2020 • edited Loading

Copy-link commented Aug 12, 2020

Copy-link commented Aug 12, 2020

Copy-link commented Aug 11, 2020 •

edited

Loading

fake-name commented Aug 12, 2020 •

edited

Loading

Copy-link commented Aug 12, 2020 •

edited

Loading