# How often is 'x' mentioned on Twitter?
##### *Counts pulled with the Twarc library. [Check it out](https://twarc-project.readthedocs.io/en/latest/twarc2_en_us/)*. 

#### Load Python tools

In [1]:
%load_ext lab_black

In [2]:
import pandas as pd
import altair as alt

In [3]:
pd.options.display.max_columns = 1000
pd.options.display.max_rows = 1000

---

#### Read data

In [4]:
mentioned = "Elon Musk"
src = pd.read_csv(
    "../../data/raw/twitter_mentions/elonmusk_mentions_daily_full_name.csv",
    parse_dates=["start", "end"],
).sort_values("start", ascending=False)

#### First five rows

In [5]:
mentioned = "Elon Musk"

src = pd.read_csv(
    "../../data/raw/twitter_mentions/elonmusk_mentions_daily_full_name.csv",
    parse_dates=["start", "end"],
).sort_values("start", ascending=False)

#### Process dates

In [6]:
src["year"] = pd.to_datetime(src["start"]).dt.strftime("%Y")
src["month_year"] = pd.to_datetime(src["start"]).dt.strftime("%Y-%m")
src["date"] = pd.to_datetime(src["start"]).dt.strftime("%Y-%m-%d")

#### Slim down and re-order the dataframe

In [7]:
src.head()

Unnamed: 0,start,end,day_count,year,month_year,date
30,2022-04-14 00:00:00+00:00,2022-04-14 20:09:09+00:00,694825,2022,2022-04,2022-04-14
29,2022-04-13 00:00:00+00:00,2022-04-14 00:00:00+00:00,81318,2022,2022-04,2022-04-13
28,2022-04-12 00:00:00+00:00,2022-04-13 00:00:00+00:00,83130,2022,2022-04,2022-04-12
27,2022-04-11 00:00:00+00:00,2022-04-12 00:00:00+00:00,182429,2022,2022-04,2022-04-11
26,2022-04-10 00:00:00+00:00,2022-04-11 00:00:00+00:00,95818,2022,2022-04,2022-04-10


In [8]:
df = src[["year", "month_year", "date", "day_count"]].copy()

#### When was the first mention? 

In [9]:
df[df["day_count"] > 0].tail(1)

Unnamed: 0,year,month_year,date,day_count
5501,2007,2007-03,2007-03-21,4


#### Define that as a variable

In [10]:
first = df[df["day_count"] > 0]["month_year"].tail(1).iloc[0]

In [11]:
# .iloc says you only want the first column of data

In [12]:
first

'2007-03'

#### How many total mentions? 

In [13]:
df.day_count.sum()

33466375

#### Average mentions? 

In [14]:
df.day_count.mean()

5702.227807122167

#### Make a new dataframe starting from first mention

In [15]:
df_complete = df[df["date"] > first]

#### Which day was mentioned the most? 

In [16]:
df_complete[df_complete["day_count"] == df_complete["day_count"].max()]

Unnamed: 0,year,month_year,date,day_count
30,2022,2022-04,2022-04-14,694825


#### Chart it

In [19]:
alt.Chart(df_complete.head(365 * 2)).mark_area(color="red").encode(
    x="date:T", y="day_count:Q",
).properties(width=900)

---

## Aggregate 

#### Groupby month/year

In [25]:
df_complete.groupby(["month_year"]).agg({"day_count": sum}).sort_values(
    "month_year"
).reset_index()

Unnamed: 0,month_year,day_count
0,2007-03,4
1,2007-04,0
2,2007-05,4
3,2007-06,0
4,2007-07,0
5,2007-08,2
6,2007-09,2
7,2007-10,3
8,2007-11,0
9,2007-12,4


#### Which month-year was max? 

---

#### Exports

In [28]:
df_complete.to_csv(f"data/twitter_mentions_{mentioned}.csv", index=False)