# Maryland State Parks Twitter Purge

According to [this article](https://www.baltimoresun.com/news/maryland/investigations/bs-md-state-park-social-media-accounts-merging-20190109-story.html) in the Baltimore Sun the Maryland Park Service has decided to consolidate all the individual state park social media accounts, including Twitter. This would effectively remove the historical record of feeds that people have followed. Let's use [twarc](https://github.com/docnow/twarc) to determine where these accounts are, and how many followers and tweets they have.

In [1]:
import twarc

The text of the tweet that each park had to tweet out looked like this:

> Happy New Year! As part of our resolution to streamline communications from Maryland State Parks, we are merging this account with @MDStateParks. Please be sure to follow that account today to keep up-to-date with events and news! This account will be closed on January 31.

We can use some of that text to identify the Park accounts:

In [2]:
t = twarc.Twarc()
tweets = t.search('Happy New Year! As part of our resolution to streamline communications from Maryland State Parks, we are merging this account with @MDStateParks')

Now lets go through each one and print out the user account, and the number of tweets and followers they have:

In [3]:
for tweet in tweets:
    print(tweet['user']['screen_name'], tweet['user']['followers_count'], tweet['user']['statuses_count'])

JanesIslandSP 1286 459
DeepCreekLakeSP 2635 591
PointLookoutSP 1573 468
TuckahoeSP 1639 314
SenecaCreekSP 1421 632
robinsnewswire 25742 1409604
HerringtonMnrSP 1694 543
PocomokeRiverSP 1671 911
RocksStatePark 1322 179
SusquehannaSP 1540 232
TubmanSP 1674 2617
ReneeHawk1956 881 11771
GreenbrierSP 1974 998
CunninghamFalls 2028 467
NewGermanySP 2752 1739
SmallwoodSP 1051 319
GunpowderSP 2298 1549
FortFrederickSP 1231 665
RockyGapSP 2746 3835
fairhillsp 984 293
PatapscoSP 2955 3095
AssateagueSP 4028 1510


It looks like some users have retweeted that message, like [@robinnewswire](https://robbinewswire) so let's ignore the retweets.

In [4]:
for tweet in t.search('Happy New Year! As part of our resolution to streamline communications from Maryland State Parks, we are merging this account with @MDStateParks'):
    if 'retweeted_status' in tweet:
        continue
    print(tweet['user']['screen_name'], tweet['user']['followers_count'], tweet['user']['statuses_count'])

JanesIslandSP 1286 459
DeepCreekLakeSP 2635 591
PointLookoutSP 1573 468
TuckahoeSP 1639 314
SenecaCreekSP 1421 632
HerringtonMnrSP 1694 543
PocomokeRiverSP 1671 911
RocksStatePark 1322 179
SusquehannaSP 1540 232
TubmanSP 1674 2617
GreenbrierSP 1974 998
CunninghamFalls 2028 467
NewGermanySP 2752 1739
SmallwoodSP 1051 319
GunpowderSP 2298 1549
FortFrederickSP 1231 665
RockyGapSP 2746 3835
fairhillsp 984 293
PatapscoSP 2955 3095
AssateagueSP 4028 1510


Let's do the search again but put them into a list that we can then use without going back to the API.

In [5]:
users = []
for tweet in t.search('Happy New Year! As part of our resolution to streamline communications from Maryland State Parks, we are merging this account with @MDStateParks'):
    if 'retweeted_status' in tweet:
        continue
    users.append(tweet['user'])

from pprint import pprint
pprint([u['screen_name'] for u in users])

['JanesIslandSP',
 'DeepCreekLakeSP',
 'PointLookoutSP',
 'TuckahoeSP',
 'SenecaCreekSP',
 'HerringtonMnrSP',
 'PocomokeRiverSP',
 'RocksStatePark',
 'SusquehannaSP',
 'TubmanSP',
 'GreenbrierSP',
 'CunninghamFalls',
 'NewGermanySP',
 'SmallwoodSP',
 'GunpowderSP',
 'FortFrederickSP',
 'RockyGapSP',
 'fairhillsp',
 'PatapscoSP',
 'AssateagueSP']


Now we can print out the total number of tweets generated by these accounts:

In [6]:
print(sum([u['statuses_count'] for u in users]))

21416


Or the total number of users who followed each of the accounts:

In [7]:
print(sum([u['followers_count'] for u in users]))

38502


The [Twitter API](https://developer.twitter.com/en/docs/tweets/timelines/api-reference/get-statuses-user_timeline.html) will only allow you to get the last 3,200 tweets for a given user. Most of the account, except for [@AssateagueSP](https://twitter.com/AssateagueSP), are below this limit.

Let's use twarc to get what we can from the Twitter API. We will also use [tqdm](https://github.com/tqdm/tqdm) to create a little progress bar.

In [8]:
from tqdm import tqdm

tweets = []
for user in users:
    progress = tqdm(
        desc='{:20}'.format(user['screen_name']),
        total=user['statuses_count'],
        unit='tweet'
    )
    for tweet in t.timeline(screen_name=user['screen_name']):
        tweets.append(tweet)
        progress.update(1)
    progress.close()

JanesIslandSP       : 100%|██████████| 459/459 [00:02<00:00, 206.15tweet/s]
DeepCreekLakeSP     : 100%|██████████| 591/591 [00:03<00:00, 196.93tweet/s]
PointLookoutSP      : 100%|██████████| 468/468 [00:01<00:00, 238.12tweet/s]
TuckahoeSP          : 100%|██████████| 314/314 [00:00<00:00, 384.37tweet/s]
SenecaCreekSP       : 100%|█████████▉| 631/632 [00:02<00:00, 295.66tweet/s]
HerringtonMnrSP     : 100%|█████████▉| 541/543 [00:01<00:00, 279.15tweet/s]
PocomokeRiverSP     : 100%|█████████▉| 910/911 [00:03<00:00, 261.30tweet/s]
RocksStatePark      : 100%|██████████| 179/179 [00:00<00:00, 240.69tweet/s]
SusquehannaSP       : 100%|██████████| 232/232 [00:00<00:00, 234.49tweet/s]
TubmanSP            :  99%|█████████▉| 2600/2617 [00:10<00:00, 239.20tweet/s]
GreenbrierSP        : 100%|█████████▉| 997/998 [00:04<00:00, 212.36tweet/s]
CunninghamFalls     : 100%|██████████| 467/467 [00:02<00:00, 174.02tweet/s]
NewGermanySP        : 100%|██████████| 1739/1739 [00:05<00:00, 308.28tweet/s]
Smallwoo

Notice how some of the progress bars didn't quite complete (e.g. RockyGapSP)? It appears that there is a discrepency between the number of tweets they have sent (as reported by Twitter) and the number of tweets that can be retrieved. Perhaps `statuses_count` includes deleted tweets that are not retrievable from Twitter?

At any rate, let's write the tweets we were able to get as CSV using twarc:

In [9]:
import csv
from twarc.json2csv import get_headings, get_row

with open('data/md-state-parks.csv', 'w') as fh:
    writer = csv.writer(fh)
    writer.writerow(get_headings())
    for tweet in tweets:
        writer.writerow(get_row(tweet))