# Maryland State Parks Twitter Purge

According to [this article](https://www.baltimoresun.com/news/maryland/investigations/bs-md-state-park-social-media-accounts-merging-20190109-story.html) in the Baltimore Sun the Maryland Park Service has decided to consolidate all the individual state park social media accounts, including Twitter. This would effectively remove the historical record of feeds that people have followed. Let's use [twarc](https://github.com/docnow/twarc) to determine where these accounts are, and how many followers and tweets they have.

In [94]:
import twarc

The text of the tweet that each park had to tweet out looked like this:

> Happy New Year! As part of our resolution to streamline communications from Maryland State Parks, we are merging this account with @MDStateParks. Please be sure to follow that account today to keep up-to-date with events and news! This account will be closed on January 31.

We can use some of that text to identify the Park accounts:

In [95]:
t = twarc.Twarc()
tweets = t.search('Happy New Year! As part of our resolution to streamline communications from Maryland State Parks, we are merging this account with @MDStateParks')

Now lets go through each one and print out the user account, and the number of tweets and followers they have:

In [96]:
for tweet in tweets:
    print(tweet['user']['screen_name'], tweet['user']['followers_count'], tweet['user']['statuses_count'])

JanesIslandSP 1288 459
DeepCreekLakeSP 2639 593
PointLookoutSP 1574 468
TuckahoeSP 1639 314
SenecaCreekSP 1421 632
robinsnewswire 25743 1410129
HerringtonMnrSP 1695 543
PocomokeRiverSP 1671 912
RocksStatePark 1321 179
SusquehannaSP 1540 232
TubmanSP 1674 2624
ReneeHawk1956 879 11820
GreenbrierSP 1974 998
CunninghamFalls 2028 467
NewGermanySP 2756 1741
SmallwoodSP 1051 319
GunpowderSP 2299 1550
FortFrederickSP 1232 665
RockyGapSP 2748 3838
fairhillsp 984 293
PatapscoSP 2962 3096
AssateagueSP 4030 1510


It looks like some users have retweeted that message, like [@robinnewswire](https://robbinewswire) so let's ignore the retweets.

In [97]:
for tweet in t.search('Happy New Year! As part of our resolution to streamline communications from Maryland State Parks, we are merging this account with @MDStateParks'):
    if 'retweeted_status' in tweet:
        continue
    print(tweet['user']['screen_name'], tweet['user']['followers_count'], tweet['user']['statuses_count'])

JanesIslandSP 1288 459
DeepCreekLakeSP 2639 593
PointLookoutSP 1574 468
TuckahoeSP 1639 314
SenecaCreekSP 1421 632
HerringtonMnrSP 1695 543
PocomokeRiverSP 1671 912
RocksStatePark 1321 179
SusquehannaSP 1540 232
TubmanSP 1674 2624
GreenbrierSP 1974 998
CunninghamFalls 2028 467
NewGermanySP 2756 1741
SmallwoodSP 1051 319
GunpowderSP 2299 1550
FortFrederickSP 1232 665
RockyGapSP 2748 3838
fairhillsp 984 293
PatapscoSP 2962 3096
AssateagueSP 4030 1510


Let's do the search again but put them into a list that we can then use without going back to the API.

In [98]:
users = []
for tweet in t.search('Happy New Year! As part of our resolution to streamline communications from Maryland State Parks, we are merging this account with @MDStateParks'):
    if 'retweeted_status' in tweet:
        continue
    users.append(tweet['user'])

from pprint import pprint
pprint([u['screen_name'] for u in users])

['JanesIslandSP',
 'DeepCreekLakeSP',
 'PointLookoutSP',
 'TuckahoeSP',
 'SenecaCreekSP',
 'HerringtonMnrSP',
 'PocomokeRiverSP',
 'RocksStatePark',
 'SusquehannaSP',
 'TubmanSP',
 'GreenbrierSP',
 'CunninghamFalls',
 'NewGermanySP',
 'SmallwoodSP',
 'GunpowderSP',
 'FortFrederickSP',
 'RockyGapSP',
 'fairhillsp',
 'PatapscoSP',
 'AssateagueSP']


Now we can print out the total number of tweets generated by these accounts:

In [99]:
print(sum([u['statuses_count'] for u in users]))

21433


Or the total number of users who followed each of the accounts:

In [100]:
print(sum([u['followers_count'] for u in users]))

38526


The [Twitter API](https://developer.twitter.com/en/docs/tweets/timelines/api-reference/get-statuses-user_timeline.html) will only allow you to get the last 3,200 tweets for a given user. Most of the account, except for [@AssateagueSP](https://twitter.com/AssateagueSP), are below this limit.

Let's use twarc to get what we can from the Twitter API. We will also use [tqdm](https://github.com/tqdm/tqdm) to create a little progress bar.

In [101]:
from tqdm import tqdm

tweets = []
for user in users:
    progress = tqdm(
        desc='{:20}'.format(user['screen_name']),
        total=user['statuses_count'],
        unit='tweet'
    )
    for tweet in t.timeline(screen_name=user['screen_name']):
        tweets.append(tweet)
        progress.update(1)
    progress.close()

JanesIslandSP       : 100%|██████████| 459/459 [00:00<00:00, 487.35tweet/s]
DeepCreekLakeSP     : 100%|██████████| 593/593 [00:01<00:00, 470.11tweet/s]
PointLookoutSP      : 100%|██████████| 468/468 [00:01<00:00, 396.07tweet/s]
TuckahoeSP          : 100%|██████████| 314/314 [00:00<00:00, 546.23tweet/s]
SenecaCreekSP       : 100%|█████████▉| 631/632 [00:01<00:00, 459.43tweet/s]
HerringtonMnrSP     : 100%|█████████▉| 541/543 [00:00<00:00, 561.74tweet/s]
PocomokeRiverSP     : 100%|█████████▉| 911/912 [00:01<00:00, 473.73tweet/s]
RocksStatePark      : 100%|██████████| 179/179 [00:00<00:00, 534.08tweet/s]
SusquehannaSP       : 100%|██████████| 232/232 [00:00<00:00, 376.54tweet/s]
TubmanSP            :  99%|█████████▉| 2608/2624 [00:06<00:00, 399.71tweet/s]
GreenbrierSP        : 100%|█████████▉| 997/998 [00:01<00:00, 558.02tweet/s]
CunninghamFalls     : 100%|██████████| 467/467 [00:01<00:00, 397.08tweet/s]
NewGermanySP        : 100%|██████████| 1741/1741 [00:02<00:00, 594.58tweet/s]
Smallwoo

Notice how some of the progress bars didn't quite complete (e.g. RockyGapSP)? It appears that there is a discrepency between the number of tweets they have sent (as reported by Twitter) and the number of tweets that can be retrieved. Perhaps `statuses_count` includes deleted tweets that are not retrievable from Twitter?

At any rate, let's write the tweets we were able to get as CSV using twarc:

In [108]:
import csv
from twarc.json2csv import get_headings, get_row

with open('data/md-state-parks.csv', 'w') as fh:
    writer = csv.writer(fh)
    writer.writerow(get_headings())
    for tweet in tweets:
        writer.writerow(get_row(tweet))

It might be interesting to see what the most retweeted tweet was. Let's take a look using [pandas](https://pandas.pydata.org/).

In [109]:
import pandas

# this will allow our dataframe columns to not be truncated
pandas.set_option('display.max_colwidth', 80)

# read in the csv
df = pandas.read_csv('data/md-state-parks.csv', parse_dates=['created_at'])

# sort the dataframe by retweet count
df = df.sort_values('retweet_count', ascending=False)

# show the top 5
df[0:5][['retweet_count', 'tweet_url', 'tweet_type']]

Unnamed: 0,retweet_count,tweet_url,tweet_type
3717,170046,https://twitter.com/PocomokeRiverSP/status/690953235771731969,retweet
265,31788,https://twitter.com/JanesIslandSP/status/621087503219621888,retweet
4615,16150,https://twitter.com/TubmanSP/status/1059096922986868737,retweet
15139,8571,https://twitter.com/RockyGapSP/status/569526645821968385,retweet
14102,6808,https://twitter.com/RockyGapSP/status/665322332685123584,retweet


You can see that the top retweet was not originally sent by one of the Maryland parks, bbut is actually a retweet of the National Zoo. We can use thee *tweet_type* column to filter out any tweets that were not originals of the park account:

In [110]:
original = df.query('tweet_type == "original"')
original[0:5][['retweet_count', 'tweet_url']]

Unnamed: 0,retweet_count,tweet_url
20031,62,https://twitter.com/AssateagueSP/status/511511328012632065
19482,57,https://twitter.com/AssateagueSP/status/938179687330861056
12898,54,https://twitter.com/RockyGapSP/status/968157714865180672
6464,50,https://twitter.com/TubmanSP/status/865333841401180162
19296,49,https://twitter.com/AssateagueSP/status/1069978136589283329


One of the nice things about the individual accounts is that they can provide local context and respond more personally to people. Let's see if this is reflected in the data by printing out the replies:

In [114]:
reply_count = 0
for tweet in tweets:
    if tweet['in_reply_to_screen_name']:
        reply_count += 1
        print('@{} ➜ @{}'.format(tweet['user']['screen_name'], tweet['in_reply_to_screen_name']))
        print(tweet['full_text'])
        print('https://twitter.com/{}/status/{}'.format(tweet['user']['screen_name'], tweet['id_str']))
        print('')
    

@JanesIslandSP ➜ @MDStateParks
@MDStateParks Way to go Ranger Sarah!!!
https://twitter.com/JanesIslandSP/status/889271970201784321

@JanesIslandSP ➜ @murphy380
@murphy380 @JanesIslandSP   Thanks Larry!! Enjoy the RV Life!!
https://twitter.com/JanesIslandSP/status/767404744813846528

@JanesIslandSP ➜ @PatapscoSP
@PatapscoSP - Congratulations Ranger Jamie!!
https://twitter.com/JanesIslandSP/status/681134103337545728

@JanesIslandSP ➜ @JanesIslandSP
The 2016 annual Maryland State Parks passport is now available 4 purchase @JanesIslandSP or  https://t.co/Xi1QVrg8w3 https://t.co/Olz2R2kUiJ
https://twitter.com/JanesIslandSP/status/649667630652108800

@JanesIslandSP ➜ @wboc
@wboc
https://twitter.com/JanesIslandSP/status/388518830689243137

@JanesIslandSP ➜ @MWeimer42
@MWeimer42 can reserve them at Park Store: 410-968-1565. $7 per platform/2-4 people. 3 Sites - 2 to 15+ Max.
https://twitter.com/JanesIslandSP/status/236886494780989441

@JanesIslandSP ➜ @MDStateParks
@MDStateParks @JanesIslandSP

https://twitter.com/TubmanSP/status/841051429980766209

@TubmanSP ➜ @AssateagueSP
Thank you @AssateagueSP so glad we're among friends!
https://twitter.com/TubmanSP/status/841041470475890688

@TubmanSP ➜ @cwroadtrip
Wonderful @cwroadtrip so glad you could make it! @TubmanUGRRNPS
https://twitter.com/TubmanSP/status/840739833563774977

@TubmanSP ➜ @robbiethompson
@robbiethompson @TubmanUGRRNPS We're so glad to see so many people here this weekend.
https://twitter.com/TubmanSP/status/840673750353862657

@TubmanSP ➜ @KandyLanae
We're proud to honor Harriet, our hero. @KandyLanae
https://twitter.com/TubmanSP/status/840667871655129089

@TubmanSP ➜ @Omeakiasmiles
@Omeakiasmiles Happy to welcome visitors today!
https://twitter.com/TubmanSP/status/840630247263461376

@TubmanSP ➜ @JGSbmore
Good morning @MrGrantSkinner, we're looking forward to it!
https://twitter.com/TubmanSP/status/840538485664686080

@TubmanSP ➜ @MdPublicSchools
We are excited to open the doors to the Harriet #TubmanVC tomorrow

https://twitter.com/RockyGapSP/status/728773807691993088

@RockyGapSP ➜ @WestMDstuff
@WestMDstuff @MDDNRWildlife :( You could do a GM pulling party for your neighborhood! They are easy to pull, just tedious! #party 🎉🎉🎉
https://twitter.com/RockyGapSP/status/728743713686601730

@RockyGapSP ➜ @WestMDstuff
@WestMDstuff Love this view!
https://twitter.com/RockyGapSP/status/728333809385639937

@RockyGapSP ➜ @WestMDstuff
@WestMDstuff @MDDNRWildlife Good observation but power lines often do spray to control veg. Let's see what Wildlife &amp;Heritage thinks...
https://twitter.com/RockyGapSP/status/728044417760829444

@RockyGapSP ➜ @WestMDstuff
@WestMDstuff @MDDNRWildlife any chance someone has sprayed these plants
https://twitter.com/RockyGapSP/status/728043170769080320

@RockyGapSP ➜ @WestMDstuff
@WestMDstuff @MDDNRWildlife Certain weevils impact garlic mustard...thoughts, WHS?
https://twitter.com/RockyGapSP/status/728042996898406400

@RockyGapSP ➜ @WestMDstuff
@WestMDstuff @MDDNRWildlife Keep

https://twitter.com/PatapscoSP/status/607722716469280770

@PatapscoSP ➜ @GartrellRealtor
@GartrellRealtor yes, we close when there are no longer parking spots.
https://twitter.com/PatapscoSP/status/597471343270068224

@PatapscoSP ➜ @PatapscoSP
@PatapscoSP and I wanna buy All of it!
https://twitter.com/PatapscoSP/status/596507453409333248

@PatapscoSP ➜ @moonmaners
@moonmaners unfortunately not
https://twitter.com/PatapscoSP/status/590164710655406080

@PatapscoSP ➜ @jac63057
@jac63057 just the park is year round
https://twitter.com/PatapscoSP/status/588371020077228032

@PatapscoSP ➜ @jac63057
@jac63057 what do you want to know? Check this site and let us know if you have any questions http://t.co/8RrESkJocc
https://twitter.com/PatapscoSP/status/587709568714694658

@PatapscoSP ➜ @Rharvley
@Rharvley on a trail! :) I can narrow it down to about 170 miles
https://twitter.com/PatapscoSP/status/587325071850614785

@PatapscoSP ➜ @DonHuber1
@DonHuber1 oh Don lol, there hasn't been camping there

In [115]:
print('There were {} replies!'.format(reply_count))

There were 1685 replies!
