-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TwitGet Bad Request 400 #100
Comments
Shit. I just realized what the problem is... or at least I think this is what's wrong. They fucked with the joined date, again. The relevant code, which is now broken... def get_joined_date(self, user, twit_headers):
ctnt = self.stateful_get("https://twitter.com/{user}".format(user=user), headers=twit_headers)
html = HTML(html=ctnt)
joined_items = html.find(".ProfileHeaderCard-joinDateText")
if not joined_items:
raise exceptions.AccountDisabledException("Could not retreive artist joined date. "
"This usually means the account has been disabled!")
assert len(joined_items) == 1, "Too many joined items?"
joined = joined_items[0]
posttime = dateparser.parse(joined.attrs['title'])
self.log.info("User %s joined twitter at %s", user, posttime)
return posttime
I don't understand why this is throwing '400 Bad Request' instead of 'Could not retreive artist joined date.', however. Either more than one thing is wrong, or it's just not tripping |
Dammit, I hate minified/obsfucated CSS. |
The reason you're seeing the 400 error is probably because they added more UA/header sniffing, which is catching that More and more I'm considering trying to create a library around either the firefox or chromium HTTP(s) client code. |
Not sniffing as it turns out, but now it's actually directly checking to see if JavaScript was loaded and blocking you if not. <script nonce="ZmY4Y2NjZGUtNjZkMi00ZTY4LWIyZWEtMWE0ZDM1YmE2MDg4">
if (!window.__SCRIPTS_LOADED__['main']) {
document.getElementById('ScriptLoadFailure').style.display = 'block';
}
</script> |
This does look like something I'd have to jump to your headless browser method to, though I have a feeling even if I did, updates would be necessary since the way to obtain the join date changed so drastically. |
Temporary workaround. def get_joined_date(self, user, twit_headers):
ctnt = self.stateful_get("https://nitter.net/{user}".format(user=user), headers=twit_headers)
html = HTML(html=ctnt)
joined_items = html.find(".profile-joindate > span > div")
if not joined_items:
raise exceptions.AccountDisabledException("Could not retreive artist joined date. "
"This usually means the account has been disabled!")
assert len(joined_items) == 1, "Too many joined items?"
joined = joined_items[0]
posttime = dateparser.parse(joined.text.replace("Joined ",""))
self.log.info("User %s joined twitter at %s", user, posttime)
return posttime God bless nitter.net |
I'm getting this problem using the old method, the one that doesn't involve a headless browser. It started up just a few hours ago. At first I thought it was IP-based, like I hit some sort of request limit, but not only did it not go away when I threw up a VPN, it seems to be errorring out on the Twitter profile page, not even getting to the step for the search API json.
This strikes me as very odd and makes me wonder if the error that is being thrown by xA-Scraper is even accurate.
The text was updated successfully, but these errors were encountered: