# Get Tweet Counts

In this notebook, we will learn how to get tweets counts.

## But First, New Concepts!

### Make a DataFrame From a List of Dictionaries

We can make a DataFrame from a list of dictionaries!

In [3]:
{'month': 'October'}

{'month': 'October'}

In [4]:
type({'year': 2020})

dict

In [2]:
import pandas as pd

In [9]:
pd.DataFrame([{'month': 'October'}, {'month': 'November'}])

Unnamed: 0,month
0,October
1,November


In [10]:
pd.DataFrame([{'month': 'October', 'tweets': 100}, {'month': 'November', 'tweets': 1000}])

Unnamed: 0,month,tweets
0,October,100
1,November,1000


### Make Plots with Plotly

In [11]:
tweet_df = pd.DataFrame([{'month': 'October', 'tweets': 100}, {'month': 'November', 'tweets': 1000}])

In [3]:
import plotly.express as px
import plotly.io as pio
pio.renderers.default='iframe'

In [9]:
px.line(tweet_df, x= 'month', y = 'tweets', title = 'Tweets Per Month')

## Get Tweet Counts

The [tweet counts API endpoint](https://twittercommunity.com/t/introducing-new-tweet-counts-endpoints-to-the-twitter-api-v2/155997) is a convenient feature of the v2 API (first introduced in 2021) that allows us to get a sense of how many tweets will be returned for a given query before we actually collect all the tweets that match the query.

The tweet counts API endpoint is perhaps even more useful for research projects that are primarily interested in tracking the volume of a Twitter conversation over time. In this case, tweet counts enable a researcher to retrieve this information in a way that's faster and easier than retrieving all tweets and relevant metadata.

To get tweet counts, we can use [`twarc2 counts`](https://twarc-project.readthedocs.io/en/latest/twarc2/#counts) followed by a search query.



| Search Operator             | Explanation                                                                                                                                                                                                                                                                                                                                                                                                                                         |
|:--------------------:|:----------------------------------------------------------------------------------------------:|
| keyword              | Matches a keyword within the body of a Tweet. `so sweet and so cold`                                                                                          
| "exact phrase match" | Matches the exact phrase within the body of a Tweet. `"so sweet and so cold" OR "plums in the icebox"`                                                                                              |
| - | Do NOT match a keyword or operator `baldwin -alec`, `walt whitman -bridge`                                                                                              |
| #                    | Matches any Tweet containing a recognized hashtag `#arthistory`        |                                                                             |
| from:, to:                | Matches any Tweet from or to a specific user. `from:KingJames` `to:KingJames`                                                                    |                                                                                                            |
| place:               | Matches Tweets tagged with the specified location or Twitter place ID. `place:"new york city" OR place:seattle`                                                                                            |
| is:reply, is:quote             | Returns only replies or quote tweets. `DFW bro is:reply` `David Foster Wallace bro is:quote`                                                                                                                               |
| is:verified          | Returns only Tweets whose authors are verified by Twitter.`DFW bro is:verified`                                                                                                                                |
| has:media           | Matches Tweets that contain a media object, such as a photo, GIF, or video, as determined by Twitter. `I Think You Should Leave has:media`                                                                                                                                |
| has:images, has:videos           | Matches Tweets that contain a recognized URL to an image. `i'm gonna tell my kinds that this was has:images`                                                                                    |
| has:geo              | Matches Tweets that have Tweet-specific geolocation data provided by the Twitter user.  `pyramids has:geo`           

We will also use the flag `--csv` because we want to output the data as a CSV and the flag `--granularity day` to get tweet counts per day (other options include `hour` and `minute` — you can see more in [twarc's documentation](https://twarc-project.readthedocs.io/en/latest/twarc2/#counts)).  Finally, we write the data to a CSV file.

In [6]:
!twarc2 counts "virginia" --csv --granularity day > tweet-counts.csv

We can read in this CSV file with pandas, parse the date columns, and sort from earliest to latest. Then we can make a quick plot of tweets per day with [plotly](https://plotly.com/python/line-charts/) The code below is largely [borrowed from Ed Summers](https://github.com/edsu/notebooks/blob/master/Black%20Lives%20Matter%20Counts.ipynb). Thanks, Ed!

In [43]:
# Read in CSV as DataFrame and parse dates
tweet_counts_df = pd.read_csv('tweet-counts.csv', parse_dates=['start', 'end'])

# Sort values by earliest date
tweet_counts_df = tweet_counts_df.sort_values(by='start')

tweet_counts_df

Unnamed: 0,start,end,day_count
0,2021-11-02 08:23:52+00:00,2021-11-03 00:00:00+00:00,407015
1,2021-11-03 00:00:00+00:00,2021-11-04 00:00:00+00:00,1231983
2,2021-11-04 00:00:00+00:00,2021-11-05 00:00:00+00:00,321488
3,2021-11-05 00:00:00+00:00,2021-11-06 00:00:00+00:00,159428
4,2021-11-06 00:00:00+00:00,2021-11-07 00:00:00+00:00,99062
5,2021-11-07 00:00:00+00:00,2021-11-08 00:00:00+00:00,58689
6,2021-11-08 00:00:00+00:00,2021-11-09 00:00:00+00:00,72148
7,2021-11-09 00:00:00+00:00,2021-11-09 08:23:52+00:00,19639


In [14]:
# Make a line plot and create x and y axis labels and a title

px.line(tweet_counts_df, x='start', y='day_count', title= 'Tweets Mentioning "Virginia"',
        
    labels={'start': 'Time', 'day_count': 'Tweets per Day'})

With a plotly line chart, we can hover over points to see more information, and we can use the tool bar in the upper right corner to zoom or pan on different parts of the graph. We can also press the camera button to download an image of the graph at any pan or zoom level.

To return to the original view, double-click on the plot.

## Plot Multiple Twitter Trends

In [44]:
!twarc2 counts "texas" --csv --granularity day > texas-tweet-counts.csv

In [9]:
import plotly.graph_objects as go

In [66]:
# Read in CSV as DataFrame and parse dates
texas_tweet_counts_df = pd.read_csv('texas-tweet-counts.csv', parse_dates=['start', 'end'])

# Sort values by earliest date
texas_tweet_counts_df = texas_tweet_counts_df.sort_values(by='start')

In [48]:
tweet_counts_df['Subject'] = 'Virignia'

In [49]:
texas_tweet_counts_df['Subject'] = 'Texas'

In [55]:
df = tweet_counts_df.append(texas_tweet_counts_df)

In [65]:
df

Unnamed: 0,start,end,day_count,Subject
0,2021-11-02 08:23:52+00:00,2021-11-03 00:00:00+00:00,407015,Virignia
1,2021-11-03 00:00:00+00:00,2021-11-04 00:00:00+00:00,1231983,Virignia
2,2021-11-04 00:00:00+00:00,2021-11-05 00:00:00+00:00,321488,Virignia
3,2021-11-05 00:00:00+00:00,2021-11-06 00:00:00+00:00,159428,Virignia
4,2021-11-06 00:00:00+00:00,2021-11-07 00:00:00+00:00,99062,Virignia
5,2021-11-07 00:00:00+00:00,2021-11-08 00:00:00+00:00,58689,Virignia
6,2021-11-08 00:00:00+00:00,2021-11-09 00:00:00+00:00,72148,Virignia
7,2021-11-09 00:00:00+00:00,2021-11-09 08:23:52+00:00,19639,Virignia
0,2021-11-02 08:45:32+00:00,2021-11-03 00:00:00+00:00,114002,Texas
1,2021-11-03 00:00:00+00:00,2021-11-04 00:00:00+00:00,152682,Texas


In [64]:
px.line(df, x='start', y='day_count', color= 'Subject',
        
    labels={'start': 'Time', 'day_count': 'Tweets per Day'}, title = 'Virgina v Texas Tweets')