Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Archive import #23

Open
wants to merge 5 commits into
base: master
from

Conversation

@russss
Copy link
Contributor

russss commented Jun 27, 2019

This PR is a bit messy for reasons mentioned below. It's working fine for me so I'm going to raise this PR mostly so I don't forget about it.

This PR adds a feature to use a Twitter archive export to initially populate the semiphemeral DB. This has the following advantages:

  • It's a lot faster as it hits fewer rate limits
  • It allows you to delete tweets which don't appear in the API due to...timeline anomalies. Such as if you accidentally run semiphemeral with an incomplete dataset, which is what I did.

You run it by requesting your archive, unzipping it, and then running semiphemeral import path/to/archive_dir.

Issues

  • The archive I just downloaded from Twitter does not appear to flag retweets as such - they just appear (and get inserted into the DB) as normal tweets. I can't work out if this is a Twitter bug or not. I'm not too concerned about this as I want to delete retweets as well.
  • The Twitter archive has likes, but with extremely little metadata so they'd require a metadata fetch for each like. I don't have the patience for that so I didn't implement importing of likes.
  • There's probably a bit of scope for more refactoring.

This depends on #22.

russss added 5 commits Jun 27, 2019
Tweepy by default only fetches 140 character tweets. This changes it to
fetch the full 280 characters so the sqlite database is more useful.
Tweepy by default only fetches 140 character tweets. This changes it to
fetch the full 280 characters so the sqlite database is more useful.
@KonradIT

This comment has been minimized.

Copy link

KonradIT commented Jun 28, 2019

This works beautifully, thanks @russss .

To get your twitter archive zip go here: https://help.twitter.com/en/managing-your-account/how-to-download-your-twitter-archive


def import_dump(self, filepath):
if self.common.settings.get('delete_tweets'):
with open(os.path.join(filepath, 'tweet.js'), 'r') as f:

This comment has been minimized.

Copy link
@chk1

chk1 Jul 7, 2019

With my archive I received this encoding error: 'charmap' codec can't decode byte 0x81

I fixed it for me by adding the parameter encoding='UTF-8' to the open() command

@wohali

This comment has been minimized.

Copy link

wohali commented Oct 21, 2019

Thanks for this PR. With this branch I was able to successfully import all of my tweets in, even those that radiergummi wasn't able to delete. semiphemeral delete (master branch) is now working its way through removing all those older tweets.

This adds more evidence for the theory that if you have a large 'hole' in your tweets, Twitter's API is not able to see past that hole to older tweets that may still be undeleted, meaneing semiphemeral can't access them without this kind of import.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.