# Zola

I saw [this post](https://hyperallergic.com/541775/zola-sundance-janicza-bravo-jeremy-o-harris/?utm_medium=social&utm_source=twitter&utm_campaign=sf) about a Twitter thread that has been turned into a movie. I thought it would be interesting to find the original thread on Twitter, but [the account](twitter.com/_zolarmoon) has been suspended! I sure hope that the creator of the thread is being paid in some way.

Anyway, I wanted to take a look in the Internet Archive's Wayback Machine to see what tweets they might have archived from the account. If you know the URL it's easy to look it up, but I was interested in specific tweets as well. Their [CDX API](https://github.com/internetarchive/wayback/tree/master/wayback-cdx-server).

## Get URLs

I created a function that would return all the dates and URLs under the prefix of https://twitter.com/_zolarmoon. The CDX API lets you use a URL prefix and filter by mimetype which is handy.

In [56]:
import requests

def get_snapshots():
    page_num = 0
    while True:
        params = {
            "url": "twitter.com/_zolarmoon",
            "matchType": "prefix",
            "filter": "mimetype:text/html",
            "output": "json",
            "limit": 1000,
            "page": page_num
        }
        resp = requests.get('http://web.archive.org/cdx/search/cdx', params=params)
        
        if resp.text:
            snapshots = resp.json()
        else:
            return

        # remove the header
        snapshots.pop(0)
    
        for snap in snapshots:
            yield (snap[1], snap[2])
            
        page_num += 1


We can test it out:

In [57]:
print(next(get_snapshots()))

('20160112132933', 'https://twitter.com/_zolarmoon')


## Collect

Now we can use our function to look at all the URLs that have snapshots in the Internet Archive. 

In [54]:
from collections import Counter
from urllib.parse import urlparse

url_counter = Counter()
for date, url in get_snapshots():
    uri = urlparse(url)
    norm_url = 'https://' + uri.netloc + uri.path
    url_counter[url] += 1

In [55]:
for url, count in url_counter.most_common():
    print(url, count)

https://twitter.com/_zolarmoon 22
https://twitter.com/_zolarmoon/status/915460088151859200 6
https://twitter.com/_zolarmoon?lang=en 5
https://twitter.com/_zolarmoon/ 4
https://twitter.com/_zolarmoon/status/659227068266381313 4
https://twitter.com/_zolarmoon/status/659227281810944000 4
https://twitter.com/_zolarmoon/status/659227506965352448 4
https://twitter.com/_zolarmoon/status/659227813753593857 4
https://twitter.com/_zolarmoon/status/659228012228059136 4
https://twitter.com/_zolarmoon/status/659228190485925888 4
https://twitter.com/_zolarmoon/status/659228459345051648 4
https://twitter.com/_zolarmoon/status/659228769190879232 4
https://twitter.com/_zolarmoon/status/659229109638340608 4
https://twitter.com/_zolarmoon/status/659229333022752768 4
https://twitter.com/_zolarmoon/status/659229535322394624 4
https://twitter.com/_zolarmoon/status/659229730256891904 4
https://twitter.com/_zolarmoon/status/659230271884144640 4
https://twitter.com/_zolarmoon/status/659230684146475012 4
https:

Unfortunately eyeballing some of these tweet URLs, they don't seem to be rendering in the Wayback machine. For example:

https://web.archive.org/web/20160429223327/https://twitter.com/_zolarmoon/status/659227068266381313

Sad.



