Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Twitter extractor not getting full text? (retweets) #4690

Closed
brachna opened this issue Oct 21, 2023 · 5 comments
Closed

Twitter extractor not getting full text? (retweets) #4690

brachna opened this issue Oct 21, 2023 · 5 comments

Comments

@brachna
Copy link

brachna commented Oct 21, 2023

While checking metadata jsons I noticed truncated text instead of full text.
I replaced line
content = text.unescape(note["text"] if note else tget("full_text") or tget("text") or "")
with
content = tget("full_text")
and it seems to work fine now.
Is that intended behavior?

@mikf
Copy link
Owner

mikf commented Oct 21, 2023

This is obviously not intended behavior.
I assumed that a "note", when present, contained the entire Tweet text. Apparently that's not the case ...

content = tget("full_text")
and it seems to work fine now.

For Tweets with really long text, this does not work, or at least it didn't.
https://twitter.com/i/web/status/1629193457112686592

@brachna
Copy link
Author

brachna commented Oct 21, 2023

Thanks for example, currently rate-limited, will check tomorrow.

@brachna
Copy link
Author

brachna commented Oct 22, 2023

So I did checks and you're right. ['text'] holds full text, ['full_text'] doesn't (nice api twitter team).
But extractor still doesn't get full text for retweets: https://twitter.com/elonmusk/status/1714527871681613846
For retweets required ["note_tweet"] is inside ['legacy']['retweeted_status_result']['result'].
For now I added this and it grabs full text for both examples:

if tweet.get('legacy', {}).get('retweeted_status_result', {}).get('result', {}).get('note_tweet', None) != None:
    note = tweet['legacy']['retweeted_status_result']['result']["note_tweet"]["note_tweet_results"]["result"]
elif "note_tweet" in tweet:
    note = tweet["note_tweet"]["note_tweet_results"]["result"]
else:
    note = None

It's ugly, but works.

@brachna brachna changed the title Twitter extractor not getting full text? Twitter extractor not getting full text? (retweets) Oct 22, 2023
@brachna
Copy link
Author

brachna commented Oct 22, 2023

Oops, that only covered retweets with ['note_tweet'], had to add this to make other retweets work:

content = ''
if 'retweeted_status_result' in tweet:
    rget = tweet['retweeted_status_result']['result']['legacy'].get
    content = text.unescape(note["text"] if note else rget("full_text") or rget("text") or "")
else:
    content = text.unescape(note["text"] if note else tget("full_text") or tget("text") or "")

@Hrxn
Copy link
Contributor

Hrxn commented Oct 23, 2023

Hey, if it works, it works 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants