Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In light of changes to Twitter's API coming Feb 9 #120

Open
robertoszek opened this issue Feb 3, 2023 · 25 comments
Open

In light of changes to Twitter's API coming Feb 9 #120

robertoszek opened this issue Feb 3, 2023 · 25 comments
Labels
breaking changes As it says on the tin

Comments

@robertoszek
Copy link
Owner

robertoszek commented Feb 3, 2023

I guess adding scraping capabilities to the bot has become a priority.

Using RSS feeds as a source will hopefully continue to work after February 9th (if you can find a working Nitter instance, RSSHub or some other third-party site that's still able to generate an RSS feed).

@robertoszek robertoszek pinned this issue Feb 3, 2023
@robertoszek robertoszek added the breaking changes As it says on the tin label Feb 3, 2023
@edel79
Copy link

edel79 commented Feb 3, 2023

Hello, I'm using your script for some days and I do agree your statement.
I was wonderring about the support of the Twint python library (https://github.com/twintproject/twint), wich is capable to scrap Twitter content. Could be a good start to add this support.

@tomakun
Copy link

tomakun commented Feb 3, 2023

Saw that earlier, it sucks...

Just to confirm, if you get a paid access to the Twitter API, you theoretically still can use it as is right @robertoszek ? Providing you use a valid Twitter token of course.

@robertoszek
Copy link
Owner Author

Just to confirm, if you get a paid access to the Twitter API, you theoretically still can use it as is right @robertoszek ? Providing you use a valid Twitter token of course.

Potentially, yes.
Assuming they don't change the baseline API endpoints behavior or add additional steps to authenticate with a paid token, the bot would theoretically continue to work.

The thing is nobody really knows how's it going to change or be implemented.
We'll have to wait until the 9th and see once the dust settles what are our options going forward.

@edel79
Copy link

edel79 commented Feb 4, 2023

As a potential replacement, this scrapper seems good, to, and quite light : https://github.com/JustAnotherArchivist/snscrape
It's working great, today.

@robertoszek
Copy link
Owner Author

As a potential replacement, this scrapper seems good, to, and quite light : https://github.com/JustAnotherArchivist/snscrape It's working great, today.

It seems to use the unofficial GraphQL endpoint for scraping data:
https://github.com/JustAnotherArchivist/snscrape/blob/23ebdd2a3ce6c3e93012e2b5bc7c2b02c749aaf2/snscrape/modules/twitter.py#L1704

In addition to https://api.twitter.com/2/search/adaptive.json:
https://github.com/JustAnotherArchivist/snscrape/blob/23ebdd2a3ce6c3e93012e2b5bc7c2b02c749aaf2/snscrape/modules/twitter.py#L1549

We already use https://api.twitter.com/2/search/adaptive.json with guest tokens on the bot currently:

"https://twitter.com/i/api/2/search/adaptive.json"

However the adaptive.json endpoint was severely limited recently (to only top results for non logged in users, removing any option to scrape by latest).

I'll look into how feasible would be to use the GraphQL endpoint for our own scraping too.

@edel79
Copy link

edel79 commented Feb 4, 2023

Using snscrape, I just did a request to get last 100 tweets for a specific Twitter user (@transportsidf), it worked well. So I don't know what are the limits, but if we can get at least 100 tweets at time, it seems enough for a bot, I think.
But, using Plroma in guest mode, gives me this error (same Twitter account) :

Gathering tweets... 0
✖ 2023-02-04 21:17:59,995 - pleroma_bot - ERROR - Unable to retrieve tweets. Is the account protected? If so, you need to provide the following OAuth 1.0a fields in the user config:

  • consumer_key
  • consumer_secret
  • access_token_key
  • access_token_secret (cli.py:645)

Should I use my API token and it's working fine. I don't know if I do something wrong or if it is a limitation/change in how guest mode works.

@nemobis
Copy link
Contributor

nemobis commented Feb 9, 2023

I guess adding scraping capabilities to the bot has become a priority.

As a bridge solution, maybe pleroma-bot could scrape a Nitter instance? I'd be happy to set up a Nitter instance for my own pleroma-bot to scrape.

Then there's zedeus/nitter#389

@dawnerd
Copy link

dawnerd commented Mar 30, 2023

Looks like it's finally here https://tapbots.social/@paul/110109551743991074

@dawnerd
Copy link

dawnerd commented Jun 10, 2023

We just saw our access revoked overnight :/

@gigantuar
Copy link

gigantuar commented Jun 10, 2023

Same here, it finally stopped working yesterday. I’ll need to start experimenting with using RSS via Nitter.

Edit: https://github.com/mahrtayyab/tweety looks like a great alternative to use instead of polling RSS.

@edel79
Copy link

edel79 commented Jun 11, 2023

My API key switched back to free plan so I can't extract tweets anymore, too.
As I previoulsy mentionned, snscrape is also still working to retrieve tweets.

@dawnerd
Copy link

dawnerd commented Jun 11, 2023

I switched to using rsshub, tried nitter but that was very buggy. I think adopting the full graph endpoints would be the best path forward.

@edel79
Copy link

edel79 commented Jun 11, 2023

This one, very simple, is working, too : https://gitlab.com/jeancf/twoot
It is using random nitter instances to extract tweets.

@edel79
Copy link

edel79 commented Jun 12, 2023

@robertoszek any chance of future developpments to handle the end of the free API using one of the above solutions ?

@dawnerd
Copy link

dawnerd commented Jun 12, 2023

rsshub isn't perfect either, html ends up being embedded:
image

@Vardor
Copy link

Vardor commented Jun 15, 2023

I'm also having problems with twitter api. My bots are no longer working and I can't make it work with RSS source.
I've found a python scrapper for nitter called pnyter and I'm starting to explore it to see what I can do.
I've created a matrix channel in case anyone wants to join and exchange ideas #pletomabot:matrix.org https://matrix.to/#/!DmKYBjBcZXoeKlRmMU:matrix.org?via=matrix.org

@edel79
Copy link

edel79 commented Jun 16, 2023

Hello @AltGrCarlos the main problem here is that the creator of this bot is not active in the current time to make the necessary fixes. I would say 75% of the code is still working, and this bot is doing more than a simple scraper : it also updates the user profile, wich is great, and post tweets to mastodon.
So the part needing a fix is the scrape from Twitter part. Everything else can be kept as-is.
If you could create a fork with an upadated and fonctionnal scrapper, that would be great.

PS : I don't know about Matrix, in term of live chatting Discord must be more used.

@Vardor
Copy link

Vardor commented Jun 16, 2023

Hello @AltGrCarlos the main problem here is that the creator of this bot is not active in the current time to make the necessary fixes. I would say 75% of the code is still working, and this bot is doing more than a simple scraper : it also updates the user profile, wich is great, and post tweets to mastodon. So the part needing a fix is the scrape from Twitter part. Everything else can be kept as-is. If you could create a fork with an upadated and fonctionnal scrapper, that would be great.

PS : I don't know about Matrix, in term of live chatting Discord must be more used.

Hi. I'm not a really good programmer, but I'm trying to understand the code before to make any modification. I'm also trying to develop my own nitter scrapper in order to get the specific information i need from twitter.

@edel79
Copy link

edel79 commented Jun 19, 2023

Waiting for a fix to make Pleroma work again, I have set Twoot (previously mentionned) as replacement. It's working fine without API key.

@us3r1d
Copy link

us3r1d commented Jul 7, 2023

After last week's API changes breaking nitter, I'm now using https://github.com/12joan/twitter-client to generate RSS for stork.

Just so you know stork is still working and still useful. :-)

It'd be nice if I could find some way to get profile updates happening while still getting the tweets from RSS; I'll post here again if I figure out a way to do that.

@robertoszek
Copy link
Owner Author

Hey, sorry for being a lot less active.

I've been moving across countries during the last 6 months and between all the logistics and bureaucracy involved (getting a visa, a work permit, finding an apartment, packing, etc.) in addition to keeping a day job, it basically left little to no time to do anything else.

I'm glad this project was still somewhat useful for some of you during that time with the scraping functionality implementation still pending.
My intention is to get back to it and try to make it work in the current state of affairs.
Thank you all for sharing the different projects you've found success with, I'll take a look at their approach and see what works and doesn't at the moment.

@robertoszek
Copy link
Owner Author

Got profile info and pinned tweet gathering working. c96943e
The user timeline scraping seems a lot more involved, requiring "guest accounts".

These guest accounts seem to be restricted by IP, so only a limited amount can be created from the same host/IP.

I'm thinking about adding a flag so they can be created easily on demand:

$ pleroma-bot --create-guest-account

being dumped to guest_accounts.json, for example.

And if you have access to a list of proxies that could be used to generate more accounts at the same time, perhaps passing them as a text file (by a flag or on the config file):

$ pleroma-bot --create-guest-account --proxies-file my_proxies.txt

And of course the bot would also need to try generating additional guest accounts in the middle of a run if it gets rate limited.
I need to think about it a bit more but there's definitely some progress being made.

@dawnerd
Copy link

dawnerd commented Oct 18, 2023

I have ~50 accounts in my config and run an pleroma-bot every 15 minutes against a nitter rss feed right now as a workaround. The guest accounts last for 30 days and I've ended up needing ~6k guest accounts to keep it running the whole time without erroring out. I use geonode for proxies FYI.

@edel79
Copy link

edel79 commented Oct 18, 2023

As long as you use a working Nitter instance as source, you don't have to deal with guest accounts : they are used to scrape Twitter.
Your bot is reading an already scrapped content, the one provided by Nitter.
Well, using Nitter RSS feed, at last.

@dawnerd
Copy link

dawnerd commented Oct 18, 2023

With these changes a lot of nitter instances have either turned off rss or asked people not to scrape them. I run my own so I'm not eating up guest tokens from someone else. Just keep that in mind. Generating guest tokens is extremely cheap on geonode too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking changes As it says on the tin
Projects
None yet
Development

No branches or pull requests

8 participants