# Collect Tweets for a Twitter Trend

If you haven't done so already, you need to open a Terminal and type `twarc2 configure`, then enter your bearer token

To collect tweets from the past week, we can use [`twarc2 counts`](https://twarc-project.readthedocs.io/en/latest/twarc2/#counts) followed by a search query.
      

## Get Tweets (Standard Track)

To actually collect tweets and their associated metadata, we can use the command `twarc2 search` and insert a query. By default, `twarc2 search` will use the standard track of the Twitter API, which only collects tweets from the past week.

In [None]:
!twarc2 search --limit 1000 "University of Washington" > tweets.jsonl

## Convert JSONL to CSV

To make our Twitter data easier to work with, we can convert our JSONL file to a CSV file with the [`twarc-csv`](https://pypi.org/project/twarc-csv/) plugin, which needs to be installed separately.

In [None]:
!pip install twarc-csv

Once installed, we can use the plug-in from twarc2 with the input filename for the JSONL and a desired output filename for the CSV file.

In [None]:
!twarc2 csv tweets.jsonl > tweets.csv

## Read in CSV

Now we're ready to explore the data!

To work with our tweet data, we can read in our CSV file with pandas and again parse the date column.

In [11]:
import pandas as pd
pd.options.display.max_colwidth = 400

In [None]:
tweets_df = pd.read_csv('tweets.csv',
                        parse_dates = ['created_at'])

In [None]:
tweets_df

If we ask for a list of all the columns in the DataFrame, we can see that there are more than 90 columns here!

In [None]:
tweets_df.columns

## Rename and Select Columns

To make the data more readable, we're going to rename a number of columns.

In [None]:
tweets_df = tweets_df.rename(columns={'created_at': 'date',
                          'public_metrics.retweet_count': 'retweets', 
                          'author.username': 'username', 
                          'author.name': 'name',
                          'author.verified': 'verified', 
                          'public_metrics.like_count': 'likes', 
                          'public_metrics.quote_count': 'quotes', 
                          'public_metrics.reply_count': 'replies',
                           'author.description': 'user_bio'})

Then we're only going to select the columns that we're interested. Depending on your project and research question, you should change and customize these categories.

In [None]:
tweets_df = tweets_df[['date', 'username', 'name', 'verified', 'text', 'retweets',
            'likes', 'replies',  'quotes', 'user_bio']]

Now we can view our more focused DataFrame!

In [None]:
tweets_df

# Collect Tweets for Your Twitter Trend!

With your group, try to answer the following questions:

4. After examining the actual tweets and metadata below, were you collecting the tweets that you thought you were collecting in the previous exercise (getting tweet counts)?

5. What does a closer analysis of the tweets show you that the tweets counts alone didn't show you?

6. What other steps or analysis would you be interested in pursuing from here?

Run your query, collect tweets, and transform into a DataFrame:

In [None]:
!twarc2 search --limit 1000 "#Your query here" > tweets.jsonl

!twarc2 csv tweets.jsonl > tweets.csv
tweets_df = pd.read_csv('tweets.csv',
                        parse_dates = ['created_at'])
tweets_df = tweets_df.rename(columns={'created_at': 'date',
                          'public_metrics.retweet_count': 'retweets', 
                          'author.username': 'username', 
                          'author.name': 'name',
                          'author.verified': 'verified', 
                          'public_metrics.like_count': 'likes', 
                          'public_metrics.quote_count': 'quotes', 
                          'public_metrics.reply_count': 'replies',
                           'author.description': 'user_bio'})
tweets_df = tweets_df[['date', 'username', 'name', 'verified', 'text', 'retweets',
           'likes', 'replies',  'quotes', 'user_bio']]
tweets_df

## Sort By Top Retweets

We can sort by number of retweets to see the most circulated tweets. Let's examine the top 5. The column is titled "retweets."

In [None]:
# Sort top 5 tweets

## Sort By Date

We can sort from the earliest tweets to the latest tweets. Let's examine the earliest 5 tweets. The column is titled "date."

In [53]:
# Sort by top earliest tweets

With your group, try to answer the following questions:

4. After examining the actual tweets and metadata below, were you collecting the tweets that you thought you were collecting in the previous exercise (getting tweet counts)?

5. What does a closer analysis of the tweets show you that the tweets counts alone didn't show you?

6. What other steps or analysis would you be interested in pursuing from here?