Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some tweets have their links or media skipped (unified cards) #79

Open
nemobis opened this issue Apr 10, 2022 · 15 comments
Open

Some tweets have their links or media skipped (unified cards) #79

nemobis opened this issue Apr 10, 2022 · 15 comments

Comments

@nemobis
Copy link
Contributor

nemobis commented Apr 10, 2022

Some fancy accounts seem to be using some Twitter feature which pleroma-bot doesn't support yet.

This is typically spotted in tweets which follow the trend of containing a mere "↓" as warning that the main content of the update is actually somewhere else, like this: https://respublicae.eu/@EU_Commission/108092396818818757 https://nitter.eu/EU_Commission/status/1512123194785898503 which is just a link to https://ec.europa.eu/commission/presscorner/detail/en/statement_22_2331 . These tweets look like just any other tweet whose main URL has been "eaten" by Twitter and shown only as attached "card", but they seem to be different.

Others are more complicated like https://respublicae.eu/@EU_Commission/108103776666586079 https://nitter.eu/EU_Commission/status/1512777762909655043 which contains a "broadcast": https://nitter.eu/i/broadcasts/1BRJjnyZoZdJw . I guess there isn't much to do about these, other than documenting it somewhere so that people make informed decisions about the nitter and signature configs.

@robertoszek
Copy link
Owner

Yeah, Twitter v2 API's response for the example tweet you provided (1512123194785898503) doesn't seem to include the link anywhere (even with all the expansions set):

{
   "data":[
      {
         "conversation_id":"1512123194785898503",
         "text":"President @vonderleyen has visited Stockholm to give the green light to Sweden's €3.3 billion recovery and resilience plan.\n\nSweden is a renewable energy pioneer. \n\nRenewables are bound to make up half of the country's energy mix by the end of the decade. ↓\n\n#NextGenerationEU",
         "lang":"en",
         "entities":{
            "mentions":[
               {
                  "start":10,
                  "end":22,
                  "username":"vonderleyen",
                  "id":"1146329871418843136"
               }
            ],
            "hashtags":[
               {
                  "start":259,
                  "end":276,
                  "tag":"NextGenerationEU"
               }
            ],
            "annotations":[
               {
                  "start":35,
                  "end":43,
                  "probability":0.9802,
                  "type":"Place",
                  "normalized_text":"Stockholm"
               },
               {
                  "start":72,
                  "end":77,
                  "probability":0.9972,
                  "type":"Place",
                  "normalized_text":"Sweden"
               },
               {
                  "start":125,
                  "end":130,
                  "probability":0.9456,
                  "type":"Place",
                  "normalized_text":"Sweden"
               }
            ]
         },
         "public_metrics":{
            "retweet_count":59,
            "reply_count":25,
            "like_count":197,
            "quote_count":2
         },
         "created_at":"2022-04-07T17:40:38.000Z",
         "possibly_sensitive":false,
         "id":"1512123194785898503",
         "source":"Twitter for Advertisers.",
         "author_id":"157981564",
         "context_annotations":[
            {
               "domain":{
                  "id":"10",
                  "name":"Person",
                  "description":"Named people in the world like Nelson Mandela"
               },
               "entity":{
                  "id":"1151432219002454016",
                  "name":"Ursula von der Leyen",
                  "description":"President of European Commission"
               }
            },
            {
               "domain":{
                  "id":"35",
                  "name":"Politician",
                  "description":"Politicians in the world, like Joe Biden"
               },
               "entity":{
                  "id":"1151432219002454016",
                  "name":"Ursula von der Leyen",
                  "description":"President of European Commission"
               }
            },
            {
               "domain":{
                  "id":"30",
                  "name":"Entities [Entity Service]",
                  "description":"Entity Service top level domain, every item that is in Entity Service should be in this domain"
               },
               "entity":{
                  "id":"848920371311001600",
                  "name":"Technology",
                  "description":"Technology and computing"
               }
            },
            {
               "domain":{
                  "id":"30",
                  "name":"Entities [Entity Service]",
                  "description":"Entity Service top level domain, every item that is in Entity Service should be in this domain"
               },
               "entity":{
                  "id":"848920371311001600",
                  "name":"Technology",
                  "description":"Technology and computing"
               }
            },
            {
               "domain":{
                  "id":"30",
                  "name":"Entities [Entity Service]",
                  "description":"Entity Service top level domain, every item that is in Entity Service should be in this domain"
               },
               "entity":{
                  "id":"898654185146560512",
                  "name":"Energy Technology",
                  "description":"Energy Technology"
               }
            }
         ]
      }
   ],
   "includes":{
      "users":[
         {
            "id":"157981564",
            "name":"European Commission 🇪🇺",
            "username":"EU_Commission"
         },
         {
            "id":"1146329871418843136",
            "name":"Ursula von der Leyen",
            "username":"vonderleyen"
         }
      ],
      "tweets":[
         
      ],
      "media":[
         
      ],
      "polls":[
         
      ]
   },
   "meta":{
      "result_count":1
   }
}

It looks like the only way to obtain info about the cards is using the Twitter Ads API:
https://developer.twitter.com/en/docs/twitter-ads-api/creatives/guides/identifying-cards

And that would require to apply and create an additional Twitter Ads API application (with a separate token, etc.) 😖

@nemobis
Copy link
Contributor Author

nemobis commented Apr 10, 2022

Wow, that's nasty! No wonder nitter is forced to use the "unofficial API" aka web scraping. zedeus/nitter@111927a

@robertoszek
Copy link
Owner

Funnily enough, I'm able to get some card metadata with the endpoints used by guest tokens.
So I've made some changes to extract the URL and media from a card:
e815211

You can try it out on 1.1.1rc47:
pip install -i https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple pleroma-bot==1.1.1rc47
Keep in mind it will only work when using guest tokens (either by omitting the twitter_token mapping or adding guest: true in your config).

@robertoszek robertoszek changed the title Some tweets have their links or media skipped Some tweets have their links or media skipped (unified cards) Dec 4, 2022
@nemobis
Copy link
Contributor Author

nemobis commented Dec 4, 2022 via email

@robertoszek
Copy link
Owner

robertoszek commented Dec 4, 2022

No, if an user in your config is marked as "guest", it will use the guest token on all the calls associated to that user.

I've been working a bit more on it to get this feature ready for the next stable release:
get pinned tweet if using guest token (5b29832)
get poll from card if using guest token (2ade63b)
Those commits are included in 1.1.1rc48.

So the current limitations are listed here:
https://github.com/robertoszek/pleroma-bot/blob/develop/docs/gettingstarted/beforerunning.md#guest-tokens

The inability of obtaining protected tweets makes sense, as it will never work with a guest token.

So the only main difference between using regular Twitter tokens and the guests ones is the 20 tweet limit per user, which I'm going to try to find if there's a way around it.

@robertoszek
Copy link
Owner

I figured out how to force it to paginate using guest tokens:
57aece6

I've managed to gather more than 4000 tweets for an user using this method, not sure if it has a hard limit (apart from hitting rate limits).

That commit is included in version 1.1.1rc49.

@nemobis
Copy link
Contributor Author

nemobis commented Dec 5, 2022 via email

@robertoszek
Copy link
Owner

Hmm...

Does running version 1.1.1rc52 make any difference?

@nemobis
Copy link
Contributor Author

nemobis commented Dec 5, 2022 via email

@nemobis
Copy link
Contributor Author

nemobis commented Dec 5, 2022 via email

@nemobis
Copy link
Contributor Author

nemobis commented Dec 5, 2022 via email

@robertoszek
Copy link
Owner

Weird, 1597718716837335040 seems to only show up on the search API endpoint, doing the same query here:

https://twitter.com/search?q=(from%3ANigel_Farage)%20since_time%3A1669593600%20until_time%3A1670307727%20include%3Anativeretweets&src=typed_query&f=live

doesn't seem to include it on the results. You would think when using from:account wouldn't return quotes from other random accounts 😅 (and it only does it on the API endpoint it looks like).

I've added another pass to filter any tweets that don't originate from the mirrored user, just in case.
b70327d

Regarding the 404's, I tried replicating on my end to no avail (reply to a deleted tweet, reply to a tweet that quotes a deleted tweet and a retweet to a deleted tweet didn't trigger it for me).
I've done some changes trying to handle it nonetheless:
812e94b

Both commits are included on 1.1.1rc53. Let me know if it stills results on unhandled errors on your end.

@robertoszek
Copy link
Owner

Oh, and the weird 403's you were getting when providing the token should be fixed on 1.1.1rc54.
There was a parameter that resulted on 403 Client is not authorized to perform this action even when using a Elevated Twitter token (but it was fine with guest tokens):
a0e01b8

@robertoszek
Copy link
Owner

Oh, I forgot to mention I added some retries for cases when an HTTP 503 is returned by Twitter's API:
3ffe5f3

It was included in the latest stable release, v1.2.0.

Not much else we can do than to retry a few times, usually Twitter's API starts returning 503 if their servers are overloaded or over capacity at the time of the request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants