Script to import json files from twitter's new archive feature. Only tested with my data so far. Feedback appreciated.
Added folder for json archive files exportet from twitter
Made tweetid unique => You need to run upgrade.php on your existing t…
…weet nest instance
Script to load tweets from archive files into database based on loadt…
Basic instructions added
Select most recent tweet based on time rather than id
Use markdown for readme to get nice indentions.
I did the mysql table change manually and then copied over the loadarchive.php and run it. Worked just fine. 👍
Worked great for me, too!
Worked fine for me, too!
works fine here...
Looks like a bunch of people are getting there twitter archive finally. Thanks for the feedback.
42K tweets are taking quite a while to import, but it works very nicely, including full Unicode (which the csv export from Twitter lacks), so kudos!
Thanks for this @tralafiti!
To anyone importing a Twitter Tweet Archive into an existing Tweet Nest install for the first time, keep in mind that you'll need to clear your database of existing tweets (by running TRUNCATE on tn_tweets, tn_tweetwords, and tn_words), otherwise if you don't manually remove the .js files for the tweets already imported, running loadarchive.php will result in duplicate tweets.
This only happens the first time, as loadarchive.php keeps track of which .js files have already been imported. If you've never imported a Twitter Tweet Archive, you may have existing tweets that loadarchive.php doesn't know about.
Since Tweet Nest imports new Tweets automatically (which loadarchive.php won't know about), you'll need to be careful anytime you import a Twitter Tweet Archive into an existing Tweet Nest install, or you risk having duplicate tweets.
You're welcome @raamdev.
Did you run the upgrade.php? It marks the tweetid-column as unique to prevent the duplication of tweets on existing instances. If you did this indeed is a bug that should be fixed.
@tralafiti I did run upgrade.php but I got an error that said something like "Duplicate entry ‘44794062607360000’ for key ‘tweetid’". I proceeded to run loadarchive.php (from the command line) which seemed to work, but then I noticed I had duplicate entries.
@raamdev This means there already were some duplicated tweets in your database which led to upgrade.php being unable to make the unique alteration. Maybe the script should clean up these entries upon encountering this edge case or at least stop the process with an meaningful error message. Thanks for the hint.
you should have a look at https://github.com/amwhalen/archive-my-tweets which has a similar feature.
Thanks for the patch; great work, and just what I was looking for.
I found that while the import (into an existing Tweetnest install) worked beautifully, I subsequently didn't get any new tweets grabbed into my database by the normal tweetnest loadtweets.php. Looks like loadtweets.php finds the latest tweet by finding the latest tweetid using ORDER BY id DESC -- so if you import a bunch of older tweets, it gets confused as something that's not your latest tweet ends up with the highest id.
I worked around it by finding my latest tweet and re-inserting it as the latest thing in the database, then deleting the original entry for it, but I'd guess a better way would maybe be using the tweet's time or Twitter's tweetid (which I think is always an incrementing "integer", even though it's actually a string)? to find the latest tweet in loadtweets.php?
@gothick You sure you applied this commit, which is part of this branch, too? It should take care of the problem you ran into tralafiti@e9ed808
@tralafiti You know what? Turns out I'm an idiot. I'd applied that commit, but managed not to upload it to my server along with the other changes. Sorry to trouble you!
Hi! I first started by importing straight from Twitter, but it only grabbed my last 3200 tweets, so I tried to import the missing months from the downloaded tweet archives. When I clicked through to different months, Tweetnest only showed me the same tweets, my latest ones, no matter which month I clicked (although the counts differ per month and appear correct). I TRUNCATEd the approproate tables, and reimported EVERY month's "archive".js file from the very beginning of when I opened my Twitter account. All the counts are correct again, but still, clicking through to every month shows me only the same tweets, my most recent. Any idea what to do? Here's my tweetnest: http://tweets.richardarchambault.ca
Experienced this issue today. Was not upgrading to the latest tweet nest version for about a year.
Had to manually clean up the duplicates in the DB (PhpMyAdmin). Maybe it will be helpful for someone:
GROUP BY tweetid
HAVING count(tweetid) > 1;
select * from tn_tweets where tweetid in(