New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import twitter archive #45

Merged
merged 7 commits into from Jul 2, 2013

Conversation

Projects
None yet
@tralafiti
Contributor

tralafiti commented Dec 21, 2012

Script to import json files from twitter's new archive feature. Only tested with my data so far. Feedback appreciated.

@badboy

This comment has been minimized.

Show comment
Hide comment
@badboy

badboy Jan 24, 2013

I did the mysql table change manually and then copied over the loadarchive.php and run it. Worked just fine. 👍

badboy commented Jan 24, 2013

I did the mysql table change manually and then copied over the loadarchive.php and run it. Worked just fine. 👍

@alanmoo

This comment has been minimized.

Show comment
Hide comment
@alanmoo

alanmoo Feb 13, 2013

Worked great for me, too!

alanmoo commented Feb 13, 2013

Worked great for me, too!

@gr4y

This comment has been minimized.

Show comment
Hide comment
@gr4y

gr4y Feb 14, 2013

Worked fine for me, too!

gr4y commented Feb 14, 2013

Worked fine for me, too!

@xsteadfastx

This comment has been minimized.

Show comment
Hide comment
@xsteadfastx

xsteadfastx Feb 15, 2013

works fine here...

xsteadfastx commented Feb 15, 2013

works fine here...

@tralafiti

This comment has been minimized.

Show comment
Hide comment
@tralafiti

tralafiti Feb 15, 2013

Contributor

Looks like a bunch of people are getting there twitter archive finally. Thanks for the feedback.

Contributor

tralafiti commented Feb 15, 2013

Looks like a bunch of people are getting there twitter archive finally. Thanks for the feedback.

@seefood

This comment has been minimized.

Show comment
Hide comment
@seefood

seefood Feb 17, 2013

42K tweets are taking quite a while to import, but it works very nicely, including full Unicode (which the csv export from Twitter lacks), so kudos!

seefood commented Feb 17, 2013

42K tweets are taking quite a while to import, but it works very nicely, including full Unicode (which the csv export from Twitter lacks), so kudos!

@raamdev

This comment has been minimized.

Show comment
Hide comment
@raamdev

raamdev Mar 25, 2013

Thanks for this @tralafiti!

To anyone importing a Twitter Tweet Archive into an existing Tweet Nest install for the first time, keep in mind that you'll need to clear your database of existing tweets (by running TRUNCATE on tn_tweets, tn_tweetwords, and tn_words), otherwise if you don't manually remove the .js files for the tweets already imported, running loadarchive.php will result in duplicate tweets.

This only happens the first time, as loadarchive.php keeps track of which .js files have already been imported. If you've never imported a Twitter Tweet Archive, you may have existing tweets that loadarchive.php doesn't know about.

Since Tweet Nest imports new Tweets automatically (which loadarchive.php won't know about), you'll need to be careful anytime you import a Twitter Tweet Archive into an existing Tweet Nest install, or you risk having duplicate tweets.

raamdev commented Mar 25, 2013

Thanks for this @tralafiti!

To anyone importing a Twitter Tweet Archive into an existing Tweet Nest install for the first time, keep in mind that you'll need to clear your database of existing tweets (by running TRUNCATE on tn_tweets, tn_tweetwords, and tn_words), otherwise if you don't manually remove the .js files for the tweets already imported, running loadarchive.php will result in duplicate tweets.

This only happens the first time, as loadarchive.php keeps track of which .js files have already been imported. If you've never imported a Twitter Tweet Archive, you may have existing tweets that loadarchive.php doesn't know about.

Since Tweet Nest imports new Tweets automatically (which loadarchive.php won't know about), you'll need to be careful anytime you import a Twitter Tweet Archive into an existing Tweet Nest install, or you risk having duplicate tweets.

@tralafiti

This comment has been minimized.

Show comment
Hide comment
@tralafiti

tralafiti Mar 25, 2013

Contributor

You're welcome @raamdev.

Did you run the upgrade.php? It marks the tweetid-column as unique to prevent the duplication of tweets on existing instances. If you did this indeed is a bug that should be fixed.

Contributor

tralafiti commented Mar 25, 2013

You're welcome @raamdev.

Did you run the upgrade.php? It marks the tweetid-column as unique to prevent the duplication of tweets on existing instances. If you did this indeed is a bug that should be fixed.

@raamdev

This comment has been minimized.

Show comment
Hide comment
@raamdev

raamdev Mar 25, 2013

@tralafiti I did run upgrade.php but I got an error that said something like "Duplicate entry ‘44794062607360000’ for key ‘tweetid’". I proceeded to run loadarchive.php (from the command line) which seemed to work, but then I noticed I had duplicate entries.

raamdev commented Mar 25, 2013

@tralafiti I did run upgrade.php but I got an error that said something like "Duplicate entry ‘44794062607360000’ for key ‘tweetid’". I proceeded to run loadarchive.php (from the command line) which seemed to work, but then I noticed I had duplicate entries.

@tralafiti

This comment has been minimized.

Show comment
Hide comment
@tralafiti

tralafiti Mar 28, 2013

Contributor

@raamdev This means there already were some duplicated tweets in your database which led to upgrade.php being unable to make the unique alteration. Maybe the script should clean up these entries upon encountering this edge case or at least stop the process with an meaningful error message. Thanks for the hint.

Contributor

tralafiti commented Mar 28, 2013

@raamdev This means there already were some duplicated tweets in your database which led to upgrade.php being unable to make the unique alteration. Maybe the script should clean up these entries upon encountering this edge case or at least stop the process with an meaningful error message. Thanks for the hint.

@ali0une

This comment has been minimized.

Show comment
Hide comment
@ali0une

ali0une May 5, 2013

you should have a look at https://github.com/amwhalen/archive-my-tweets which has a similar feature.

ali0une commented May 5, 2013

you should have a look at https://github.com/amwhalen/archive-my-tweets which has a similar feature.

@gothick

This comment has been minimized.

Show comment
Hide comment
@gothick

gothick May 10, 2013

Thanks for the patch; great work, and just what I was looking for.

I found that while the import (into an existing Tweetnest install) worked beautifully, I subsequently didn't get any new tweets grabbed into my database by the normal tweetnest loadtweets.php. Looks like loadtweets.php finds the latest tweet by finding the latest tweetid using ORDER BY id DESC -- so if you import a bunch of older tweets, it gets confused as something that's not your latest tweet ends up with the highest id.

I worked around it by finding my latest tweet and re-inserting it as the latest thing in the database, then deleting the original entry for it, but I'd guess a better way would maybe be using the tweet's time or Twitter's tweetid (which I think is always an incrementing "integer", even though it's actually a string)? to find the latest tweet in loadtweets.php?

gothick commented May 10, 2013

Thanks for the patch; great work, and just what I was looking for.

I found that while the import (into an existing Tweetnest install) worked beautifully, I subsequently didn't get any new tweets grabbed into my database by the normal tweetnest loadtweets.php. Looks like loadtweets.php finds the latest tweet by finding the latest tweetid using ORDER BY id DESC -- so if you import a bunch of older tweets, it gets confused as something that's not your latest tweet ends up with the highest id.

I worked around it by finding my latest tweet and re-inserting it as the latest thing in the database, then deleting the original entry for it, but I'd guess a better way would maybe be using the tweet's time or Twitter's tweetid (which I think is always an incrementing "integer", even though it's actually a string)? to find the latest tweet in loadtweets.php?

@tralafiti

This comment has been minimized.

Show comment
Hide comment
@tralafiti

tralafiti May 11, 2013

Contributor

@gothick You sure you applied this commit, which is part of this branch, too? It should take care of the problem you ran into tralafiti@e9ed808

Contributor

tralafiti commented May 11, 2013

@gothick You sure you applied this commit, which is part of this branch, too? It should take care of the problem you ran into tralafiti@e9ed808

@gothick

This comment has been minimized.

Show comment
Hide comment
@gothick

gothick May 12, 2013

@tralafiti You know what? Turns out I'm an idiot. I'd applied that commit, but managed not to upload it to my server along with the other changes. Sorry to trouble you!

gothick commented May 12, 2013

@tralafiti You know what? Turns out I'm an idiot. I'd applied that commit, but managed not to upload it to my server along with the other changes. Sorry to trouble you!

@richardmtl

This comment has been minimized.

Show comment
Hide comment
@richardmtl

richardmtl Aug 6, 2013

Hi! I first started by importing straight from Twitter, but it only grabbed my last 3200 tweets, so I tried to import the missing months from the downloaded tweet archives. When I clicked through to different months, Tweetnest only showed me the same tweets, my latest ones, no matter which month I clicked (although the counts differ per month and appear correct). I TRUNCATEd the approproate tables, and reimported EVERY month's "archive".js file from the very beginning of when I opened my Twitter account. All the counts are correct again, but still, clicking through to every month shows me only the same tweets, my most recent. Any idea what to do? Here's my tweetnest: http://tweets.richardarchambault.ca

Thanks!

richardmtl commented Aug 6, 2013

Hi! I first started by importing straight from Twitter, but it only grabbed my last 3200 tweets, so I tried to import the missing months from the downloaded tweet archives. When I clicked through to different months, Tweetnest only showed me the same tweets, my latest ones, no matter which month I clicked (although the counts differ per month and appear correct). I TRUNCATEd the approproate tables, and reimported EVERY month's "archive".js file from the very beginning of when I opened my Twitter account. All the counts are correct again, but still, clicking through to every month shows me only the same tweets, my most recent. Any idea what to do? Here's my tweetnest: http://tweets.richardarchambault.ca

Thanks!

@liberborn

This comment has been minimized.

Show comment
Hide comment
@liberborn

liberborn Oct 19, 2013

Experienced this issue today. Was not upgrading to the latest tweet nest version for about a year.

Had to manually clean up the duplicates in the DB (PhpMyAdmin). Maybe it will be helpful for someone:

  • Find duplicate ids by running query:

SELECT tweetid
FROM tn_tweets
GROUP BY tweetid
HAVING count(tweetid) > 1;

  • Find duplicate rows by query:

select * from tn_tweets where tweetid in(
'121647570861830144',
'132989796304949248',
...);

  • Remove duplicate items.

liberborn commented Oct 19, 2013

Experienced this issue today. Was not upgrading to the latest tweet nest version for about a year.

Had to manually clean up the duplicates in the DB (PhpMyAdmin). Maybe it will be helpful for someone:

  • Find duplicate ids by running query:

SELECT tweetid
FROM tn_tweets
GROUP BY tweetid
HAVING count(tweetid) > 1;

  • Find duplicate rows by query:

select * from tn_tweets where tweetid in(
'121647570861830144',
'132989796304949248',
...);

  • Remove duplicate items.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment