In light of changes to Twitter's API coming Feb 9 #120

robertoszek · 2023-02-03T01:39:54Z

I guess adding scraping capabilities to the bot has become a priority.

Using RSS feeds as a source will hopefully continue to work after February 9th (if you can find a working Nitter instance, RSSHub or some other third-party site that's still able to generate an RSS feed).

edel79 · 2023-02-03T07:56:07Z

Hello, I'm using your script for some days and I do agree your statement.
I was wonderring about the support of the Twint python library (https://github.com/twintproject/twint), wich is capable to scrap Twitter content. Could be a good start to add this support.

tomakun · 2023-02-03T12:15:53Z

Saw that earlier, it sucks...

Just to confirm, if you get a paid access to the Twitter API, you theoretically still can use it as is right @robertoszek ? Providing you use a valid Twitter token of course.

robertoszek · 2023-02-03T12:33:14Z

Just to confirm, if you get a paid access to the Twitter API, you theoretically still can use it as is right @robertoszek ? Providing you use a valid Twitter token of course.

Potentially, yes.
Assuming they don't change the baseline API endpoints behavior or add additional steps to authenticate with a paid token, the bot would theoretically continue to work.

The thing is nobody really knows how's it going to change or be implemented.
We'll have to wait until the 9th and see once the dust settles what are our options going forward.

edel79 · 2023-02-04T19:22:01Z

As a potential replacement, this scrapper seems good, to, and quite light : https://github.com/JustAnotherArchivist/snscrape
It's working great, today.

robertoszek · 2023-02-04T19:53:44Z

As a potential replacement, this scrapper seems good, to, and quite light : https://github.com/JustAnotherArchivist/snscrape It's working great, today.

It seems to use the unofficial GraphQL endpoint for scraping data:
https://github.com/JustAnotherArchivist/snscrape/blob/23ebdd2a3ce6c3e93012e2b5bc7c2b02c749aaf2/snscrape/modules/twitter.py#L1704

In addition to https://api.twitter.com/2/search/adaptive.json:
https://github.com/JustAnotherArchivist/snscrape/blob/23ebdd2a3ce6c3e93012e2b5bc7c2b02c749aaf2/snscrape/modules/twitter.py#L1549

We already use https://api.twitter.com/2/search/adaptive.json with guest tokens on the bot currently:

pleroma-bot/pleroma_bot/_twitter.py

Line 565 in 9a64891

"https://twitter.com/i/api/2/search/adaptive.json"

However the adaptive.json endpoint was severely limited recently (to only top results for non logged in users, removing any option to scrape by latest).

I'll look into how feasible would be to use the GraphQL endpoint for our own scraping too.

edel79 · 2023-02-04T20:23:39Z

Using snscrape, I just did a request to get last 100 tweets for a specific Twitter user (@transportsidf), it worked well. So I don't know what are the limits, but if we can get at least 100 tweets at time, it seems enough for a bot, I think.
But, using Plroma in guest mode, gives me this error (same Twitter account) :

Gathering tweets... 0
✖ 2023-02-04 21:17:59,995 - pleroma_bot - ERROR - Unable to retrieve tweets. Is the account protected? If so, you need to provide the following OAuth 1.0a fields in the user config:

consumer_key
consumer_secret
access_token_key
access_token_secret (cli.py:645)

Should I use my API token and it's working fine. I don't know if I do something wrong or if it is a limitation/change in how guest mode works.

nemobis · 2023-02-09T20:22:29Z

I guess adding scraping capabilities to the bot has become a priority.

As a bridge solution, maybe pleroma-bot could scrape a Nitter instance? I'd be happy to set up a Nitter instance for my own pleroma-bot to scrape.

Then there's zedeus/nitter#389

dawnerd · 2023-03-30T01:09:20Z

Looks like it's finally here https://tapbots.social/@paul/110109551743991074

dawnerd · 2023-06-10T16:18:12Z

We just saw our access revoked overnight :/

gigantuar · 2023-06-10T16:27:02Z

Same here, it finally stopped working yesterday. I’ll need to start experimenting with using RSS via Nitter.

Edit: https://github.com/mahrtayyab/tweety looks like a great alternative to use instead of polling RSS.

edel79 · 2023-06-11T05:56:24Z

My API key switched back to free plan so I can't extract tweets anymore, too.
As I previoulsy mentionned, snscrape is also still working to retrieve tweets.

dawnerd · 2023-06-11T05:58:22Z

I switched to using rsshub, tried nitter but that was very buggy. I think adopting the full graph endpoints would be the best path forward.

edel79 · 2023-06-11T06:47:39Z

This one, very simple, is working, too : https://gitlab.com/jeancf/twoot
It is using random nitter instances to extract tweets.

edel79 · 2023-06-12T19:42:45Z

@robertoszek any chance of future developpments to handle the end of the free API using one of the above solutions ?

dawnerd · 2023-06-12T20:51:48Z

rsshub isn't perfect either, html ends up being embedded:

Vardor · 2023-06-15T20:31:45Z

I'm also having problems with twitter api. My bots are no longer working and I can't make it work with RSS source.
I've found a python scrapper for nitter called pnyter and I'm starting to explore it to see what I can do.
I've created a matrix channel in case anyone wants to join and exchange ideas #pletomabot:matrix.org https://matrix.to/#/!DmKYBjBcZXoeKlRmMU:matrix.org?via=matrix.org

edel79 · 2023-06-16T07:37:50Z

Hello @AltGrCarlos the main problem here is that the creator of this bot is not active in the current time to make the necessary fixes. I would say 75% of the code is still working, and this bot is doing more than a simple scraper : it also updates the user profile, wich is great, and post tweets to mastodon.
So the part needing a fix is the scrape from Twitter part. Everything else can be kept as-is.
If you could create a fork with an upadated and fonctionnal scrapper, that would be great.

PS : I don't know about Matrix, in term of live chatting Discord must be more used.

Vardor · 2023-06-16T21:55:45Z

Hello @AltGrCarlos the main problem here is that the creator of this bot is not active in the current time to make the necessary fixes. I would say 75% of the code is still working, and this bot is doing more than a simple scraper : it also updates the user profile, wich is great, and post tweets to mastodon. So the part needing a fix is the scrape from Twitter part. Everything else can be kept as-is. If you could create a fork with an upadated and fonctionnal scrapper, that would be great.

PS : I don't know about Matrix, in term of live chatting Discord must be more used.

Hi. I'm not a really good programmer, but I'm trying to understand the code before to make any modification. I'm also trying to develop my own nitter scrapper in order to get the specific information i need from twitter.

edel79 · 2023-06-19T19:59:06Z

Waiting for a fix to make Pleroma work again, I have set Twoot (previously mentionned) as replacement. It's working fine without API key.

us3r1d · 2023-07-07T20:01:02Z

After last week's API changes breaking nitter, I'm now using https://github.com/12joan/twitter-client to generate RSS for stork.

Just so you know stork is still working and still useful. :-)

It'd be nice if I could find some way to get profile updates happening while still getting the tweets from RSS; I'll post here again if I figure out a way to do that.

robertoszek · 2023-10-16T22:40:22Z

Hey, sorry for being a lot less active.

I've been moving across countries during the last 6 months and between all the logistics and bureaucracy involved (getting a visa, a work permit, finding an apartment, packing, etc.) in addition to keeping a day job, it basically left little to no time to do anything else.

I'm glad this project was still somewhat useful for some of you during that time with the scraping functionality implementation still pending.
My intention is to get back to it and try to make it work in the current state of affairs.
Thank you all for sharing the different projects you've found success with, I'll take a look at their approach and see what works and doesn't at the moment.

robertoszek · 2023-10-18T17:27:57Z

Got profile info and pinned tweet gathering working. c96943e
The user timeline scraping seems a lot more involved, requiring "guest accounts".

These guest accounts seem to be restricted by IP, so only a limited amount can be created from the same host/IP.

I'm thinking about adding a flag so they can be created easily on demand:

$ pleroma-bot --create-guest-account

being dumped to guest_accounts.json, for example.

And if you have access to a list of proxies that could be used to generate more accounts at the same time, perhaps passing them as a text file (by a flag or on the config file):

$ pleroma-bot --create-guest-account --proxies-file my_proxies.txt

And of course the bot would also need to try generating additional guest accounts in the middle of a run if it gets rate limited.
I need to think about it a bit more but there's definitely some progress being made.

dawnerd · 2023-10-18T17:32:58Z

I have ~50 accounts in my config and run an pleroma-bot every 15 minutes against a nitter rss feed right now as a workaround. The guest accounts last for 30 days and I've ended up needing ~6k guest accounts to keep it running the whole time without erroring out. I use geonode for proxies FYI.

edel79 · 2023-10-18T18:31:27Z

As long as you use a working Nitter instance as source, you don't have to deal with guest accounts : they are used to scrape Twitter.
Your bot is reading an already scrapped content, the one provided by Nitter.
Well, using Nitter RSS feed, at last.

dawnerd · 2023-10-18T18:33:20Z

With these changes a lot of nitter instances have either turned off rss or asked people not to scrape them. I run my own so I'm not eating up guest tokens from someone else. Just keep that in mind. Generating guest tokens is extremely cheap on geonode too.

robertoszek pinned this issue Feb 3, 2023

robertoszek added the breaking changes As it says on the tin label Feb 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

In light of changes to Twitter's API coming Feb 9 #120

In light of changes to Twitter's API coming Feb 9 #120

robertoszek commented Feb 3, 2023 •

edited

Loading

edel79 commented Feb 3, 2023

tomakun commented Feb 3, 2023 •

edited

Loading

robertoszek commented Feb 3, 2023

edel79 commented Feb 4, 2023

robertoszek commented Feb 4, 2023

edel79 commented Feb 4, 2023 •

edited

Loading

nemobis commented Feb 9, 2023

dawnerd commented Mar 30, 2023

dawnerd commented Jun 10, 2023

gigantuar commented Jun 10, 2023 •

edited

Loading

edel79 commented Jun 11, 2023

dawnerd commented Jun 11, 2023

edel79 commented Jun 11, 2023 •

edited

Loading

edel79 commented Jun 12, 2023

dawnerd commented Jun 12, 2023

Vardor commented Jun 15, 2023

edel79 commented Jun 16, 2023

Vardor commented Jun 16, 2023

edel79 commented Jun 19, 2023

us3r1d commented Jul 7, 2023

robertoszek commented Oct 16, 2023

robertoszek commented Oct 18, 2023

dawnerd commented Oct 18, 2023

edel79 commented Oct 18, 2023 •

edited

Loading

dawnerd commented Oct 18, 2023

In light of changes to Twitter's API coming Feb 9 #120

In light of changes to Twitter's API coming Feb 9 #120

Comments

robertoszek commented Feb 3, 2023 • edited Loading

edel79 commented Feb 3, 2023

tomakun commented Feb 3, 2023 • edited Loading

robertoszek commented Feb 3, 2023

edel79 commented Feb 4, 2023

robertoszek commented Feb 4, 2023

edel79 commented Feb 4, 2023 • edited Loading

nemobis commented Feb 9, 2023

dawnerd commented Mar 30, 2023

dawnerd commented Jun 10, 2023

gigantuar commented Jun 10, 2023 • edited Loading

edel79 commented Jun 11, 2023

dawnerd commented Jun 11, 2023

edel79 commented Jun 11, 2023 • edited Loading

edel79 commented Jun 12, 2023

dawnerd commented Jun 12, 2023

Vardor commented Jun 15, 2023

edel79 commented Jun 16, 2023

Vardor commented Jun 16, 2023

edel79 commented Jun 19, 2023

us3r1d commented Jul 7, 2023

robertoszek commented Oct 16, 2023

robertoszek commented Oct 18, 2023

dawnerd commented Oct 18, 2023

edel79 commented Oct 18, 2023 • edited Loading

dawnerd commented Oct 18, 2023

robertoszek commented Feb 3, 2023 •

edited

Loading

tomakun commented Feb 3, 2023 •

edited

Loading

edel79 commented Feb 4, 2023 •

edited

Loading

gigantuar commented Jun 10, 2023 •

edited

Loading

edel79 commented Jun 11, 2023 •

edited

Loading

edel79 commented Oct 18, 2023 •

edited

Loading