# Twitter Data Sharing

In this lesson, we're going to learn how to share Twitter data and access Twitter data that has been shared by others with the Python/command line tool [twarc](https://github.com/DocNow/twarc). This tool was developed by a project called [Documenting the Now](https://www.docnow.io/). The DocNow team develops tools and ethical frameworks for social media research.

This lesson presumes that you've already installed and configured twarc, which was covered in [a previous lesson](https://melaniewalsh.github.io/Intro-Cultural-Analytics/Collecting-Cultural-Data/Twitter-Data.html#Install-and-Configure-Twarc)).

## Tweet IDs

Twitter discourages developers and researchers from sharing full Twitter data openly on the web. They instead encourage developers and researchers to share *tweet IDs*:

> [If you provide Twitter Content to third parties, including downloadable datasets or via an API, you may only distribute **Tweet IDs**, Direct Message IDs, and/or User IDs.](https://developer.twitter.com/en/developer-terms/policy#4-e)

Tweet IDs are unique identifiers assigned to every tweet. They look like a random string of numbers: 1189206626135355397. Each tweet ID can be used to download the full data associated with that tweet (if the tweet still exists). This is a process called "hydration."

<img src="https://cdn.pixabay.com/photo/2013/07/12/19/24/sapling-154734_960_720.png" width=100% >

**Hydration: a young tweet ID sprouts into a full tweet (to be read in David Attenborough's voice)**

There are actually two reasons that you might want to dehydrate tweets and/or hydrate tweet IDs: first, to responsibly share Twitter data with others and/or access Twitter data shared by others; second, to get more information about the Twitter data that you yourself collected.

If you collected tweets in real time, for example, you collected those tweets immediately after they were published, which means that they will not contain any retweet or favorite count information. Nobody's had time to retweet them yet! So if you'd like to retroactively get retweet and favorite count information about your tweets, then you would want to dehydrate and rehydrate them.

## Dehydrate Tweets

`twarc2 dehydrate tweets.jsonl > tweet_ids.txt`

To transform your Twitter data into a list of tweet IDs (so that you can share your data openly on the web), you can run the twarc command `twarc dehydrate` with the name of your JSONL file followed by the output operator `>` and the desired name of your tweet ID text file.

> tweet ID —> tweet = hydration <br>
> tweet ID <— tweet = dehydration

Let's dehydrate the Twitter data that we collected about "Infinite Jest" from only verified Twitter accounts.

In [1]:
!twarc2 dehydrate twitter-data/infinite_jest_verified_tweets.jsonl > twitter-data/infinite_jest_verified_tweets.txt

If we `open()` and `.read()` the tweet IDs file that we just created, it looks something like this:

In [2]:
tweet_ids = open("twitter-data/infinite_jest_verified_tweets.txt", encoding="utf-8").read()

In [3]:
print(tweet_ids)

1298279565932900354
1298267213544058885
1298263845320798212
1298258323364405251
1298256212849393664
1298255758027440129
1298253692995399680
1298252102918115328
1298248048506744835
1298247922035970048
1298245254156636161
1298240308308840448
1298239661987573763
1298230417137586177
1298226636115017728
1298221655765004291
1298214719178838016
1298212546646769672
1298206253487984640
1298204812375396353
1298191515408310277
1298191332482007040
1298176670323671042
1298176239836987393
1298140325110788096
1298138751345074176
1298132602377965568
1298121065714126848
1298114474562527238
1298108385850806272
1298107956974813186
1298107618615951361
1298107444548308993
1298098822716035072
1298074920291885056
1298074601226776577
1298073476914438145
1298073309528170496
1298072952236376065
1298067694621609984
1298064623808081920
1298061207216230402
1298056539991871488
1298056358256873474
1298051410668511233
1298041637545758720
1298040868356665345
1298040136450613253
1298038419386400769
1298036077970763776


## Hydrate Tweets

`twarc2 hydrate tweet_ids.txt > tweets.jsonl`

To transform a list of tweet IDs into full Twitter data, you can run the twarc command `twarc hydrate` with the name of your tweet IDs text file followed by the output operator `>` and the desired name of your JSONL file.

> tweet ID —> tweet = hydration <br>
> tweet ID <— tweet = dehydration

Now let's re-hydrate the Twitter data that we collected a few weeks ago based on the tweet IDs that we just dehydrated.

In [4]:
!twarc2 hydrate twitter-data/infinite_jest_verified_tweets.txt > twitter-data/infinite_jest_verified_tweets_REHYDRATED.jsonl

In [5]:
tweet_json = open("twitter-data/infinite_jest_verified_tweets_REHYDRATED.jsonl", encoding="utf-8").read()

In [7]:
print(tweet_json)




## Deleted Tweets & The Right To Be Forgotten

What happens if someone decides to delete their tweet between the time when the tweet is first collected and the time when the tweet is "hydrated"? The deleted tweet will **not** be hydrated. The deleted tweet is no longer be accessible.

## Where to Find Tweet IDs

- DocNow Catalog: https://catalog.docnow.io/

- George Washington University Tweet IDs: https://dataverse.harvard.edu/dataverse/gwu-libraries