# Crystal1Johnson Tweets

@Crystal1Johnson's Twitter account was swept up in Twitter's suspension of Russian disinformation campaigns. This notebook explores what tweets of hers are in the Internet Archive's Wayback Machine using the [wayback](https://wayback.readthedocs.io/en/stable/) Python library that the kind folks at [EDGI](https://envirodatagov.org/) have created.

In [1]:
import wayback

First lets see what days her profile page has been archived on: https://twitter.com/Crystal1Johnson.

In [11]:
wb = wayback.WaybackClient()

for result in wb.search('https://twitter.com/Crystal1Johnson'):
    print(result.timestamp, result.view_url)

2016-06-09 01:12:22 http://web.archive.org/web/20160609011222/https://twitter.com/Crystal1Johnson
2016-07-10 02:20:05 http://web.archive.org/web/20160710022005/https://twitter.com/Crystal1Johnson
2016-09-23 21:56:05 http://web.archive.org/web/20160923215605/https://twitter.com/crystal1johnson
2016-09-28 20:57:22 http://web.archive.org/web/20160928205722/https://twitter.com/Crystal1Johnson
2016-10-05 20:57:22 http://web.archive.org/web/20161005205722/https://twitter.com/Crystal1Johnson
2016-11-17 04:55:01 http://web.archive.org/web/20161117045501/https://twitter.com/Crystal1Johnson
2016-11-21 14:48:11 http://web.archive.org/web/20161121144811/https://twitter.com/crystal1johnson
2016-12-01 01:53:20 http://web.archive.org/web/20161201015320/https://twitter.com/Crystal1Johnson
2016-12-01 01:53:30 http://web.archive.org/web/20161201015330/https://twitter.com/Crystal1Johnson
2016-12-01 01:53:39 http://web.archive.org/web/20161201015339/https://twitter.com/Crystal1Johnson
2016-12-01 01:53:50 

You can see that there was some sporadic coverage and then a ton of archive requests on 2016-12-01. Something must have happened in Twitter then for so many requests.

Lets write the days that her profile page was archived, and the Wayback URL to view it for Kevin. We'll ignore the multiple pages on the same day assuming that 

In [None]:
import csv 

out = csv.writer('data/crystal1johnson-profile.csv')
out.writerow(['date', 'wayback url'])

for url in wb.search('https://twitter.com/Crystal1Johnson'):
    out.writerow([result.timestamp, result.view_url])

Now lets look to see what individual tweets have been archived. Since you can't edit tweets we are really only interested in unique URLs that match the pattern https://twitter.com/Crystal1Johnson/status/{id} Also it's worth pointing out that Twitter screen names are cast insenstive, so https://twitter.com/Crystal1Johnson/status/123 is the same as https://twitter.com/crystal1johnson/status/123

In [39]:
import re
from urllib.parse import urlparse

tweets = set()
archive_urls = {}

for result in wb.search('https://twitter.com/Crystal1Johnson', matchType='prefix'):
    url = urlparse(result.url)
    if re.match(r'/crystal1johnson/status/\d+$', url.path, re.IGNORECASE):
        tweet_url = '{0.scheme}://{0.netloc}{0.path}'.format(url).lower()
        tweets.add(tweet_url)
        archive_urls[tweet_url] = result.view_url
    
for url in seen_tweets:
    print('{0} archived at {1}'.format(url, archive_urls[url]))


https://twitter.com/crystal1johnson/status/798959788033851392 archived at http://web.archive.org/web/20161201022642/https://twitter.com/crystal1johnson/status/798959788033851392
https://twitter.com/crystal1johnson/status/759466110957654016 archived at http://web.archive.org/web/20160730195618/https://twitter.com/crystal1johnson/status/759466110957654016
https://twitter.com/crystal1johnson/status/786740005049806851 archived at http://web.archive.org/web/20161201035026/https://twitter.com/crystal1johnson/status/786740005049806851
https://twitter.com/crystal1johnson/status/795746641194217472 archived at http://web.archive.org/web/20161201024916/https://twitter.com/crystal1johnson/status/795746641194217472
https://twitter.com/crystal1johnson/status/786733410035326977 archived at http://web.archive.org/web/20161201035035/https://twitter.com/crystal1johnson/status/786733410035326977
https://twitter.com/crystal1johnson/status/789555607015534592 archived at http://web.archive.org/web/201612010

It might be interesting to see the number of times the tweet was archived, since this can be a proxy for user attention on the tweet. So lets do it again using a counter.

In [45]:
from collections import Counter

tweet_counter = Counter()
archived_urls = {}

for result in wb.search('https://twitter.com/Crystal1Johnson', matchType='prefix'):
    url = urlparse(result.url)
    if re.match(r'/crystal1johnson/status/\d+$', url.path, re.IGNORECASE):
        tweet_url = '{0.scheme}://{0.netloc}{0.path}'.format(url).lower()
        tweet_counter.update([tweet_url])
        archived_urls[tweet_url] = result.view_url
    
for url, count in tweet_counter.most_common():
    print(url, count)

https://twitter.com/crystal1johnson/status/780922359075119104 87
https://twitter.com/crystal1johnson/status/868243673330442240 56
https://twitter.com/crystal1johnson/status/799408315901976576 51
https://twitter.com/crystal1johnson/status/802708209672667136 50
https://twitter.com/crystal1johnson/status/849715099753299968 49
https://twitter.com/crystal1johnson/status/871189735246573569 49
https://twitter.com/crystal1johnson/status/834511794093842432 48
https://twitter.com/crystal1johnson/status/839539757474435072 48
https://twitter.com/crystal1johnson/status/884836416458342400 48
https://twitter.com/crystal1johnson/status/793893878587883522 38
https://twitter.com/crystal1johnson/status/808803189873147904 31
https://twitter.com/crystal1johnson/status/805951974696976389 21
https://twitter.com/crystal1johnson/status/793903016554487810 18
https://twitter.com/crystal1johnson/status/807320901276631040 18
https://twitter.com/crystal1johnson/status/853736578866446338 18
https://twitter.com/cryst

https://twitter.com/crystal1johnson/status/881605241497460736 1
https://twitter.com/crystal1johnson/status/882688357054259201 1
https://twitter.com/crystal1johnson/status/883387774560120832 1
https://twitter.com/crystal1johnson/status/883458220538318848 1
https://twitter.com/crystal1johnson/status/883506466463768576 1
https://twitter.com/crystal1johnson/status/884478641714757633 1
https://twitter.com/crystal1johnson/status/884840532475650048 1
https://twitter.com/crystal1johnson/status/885226052645265408 1
https://twitter.com/crystal1johnson/status/885312250009071616 1
https://twitter.com/crystal1johnson/status/886664883835678720 1
https://twitter.com/crystal1johnson/status/887029879434293248 1
https://twitter.com/crystal1johnson/status/888128024880726018 1
https://twitter.com/crystal1johnson/status/888822003133579264 1
https://twitter.com/crystal1johnson/status/888862920498597888 1
https://twitter.com/crystal1johnson/status/889226363147882497 1
https://twitter.com/crystal1johnson/stat

Now I'm going to write these out to a CSV to give to Kevin :-)

In [46]:
import csv

out = csv.writer(open('data/crystal1johnson.csv', 'w'))
out.writerow(['tweet url', 'count', 'archive url'])

for url, count in tweet_counter.most_common():
    out.writerow([url, archived_urls[url], count])