# How often is 'x' mentioned on Twitter?
##### *Counts pulled with the Twarc library. [Check it out](https://twarc-project.readthedocs.io/en/latest/twarc2_en_us/)*. 

#### Load Python tools

In [1]:
%load_ext lab_black

In [2]:
import pandas as pd
import altair as alt

In [3]:
pd.options.display.max_columns = 1000
pd.options.display.max_rows = 1000

---

#### Read data

In [4]:
# mentioned = "USC"

# src = pd.read_csv(
#     "../data/raw/usc_mentions_daily.csv", parse_dates=["start", "end"]
# ).sort_values("start", ascending=False)

In [42]:
mentioned = "elonmusk"

src = pd.read_csv(
    "https://raw.githubusercontent.com/stiles/usc/main/data/raw/elonmusk_mentions_daily_full_name.csv",
    parse_dates=["start", "end"],
).sort_values("start", ascending=False)

#### Process dates

In [6]:
src["year"] = pd.to_datetime(src["start"]).dt.strftime("%Y")
src["month_year"] = pd.to_datetime(src["start"]).dt.strftime("%Y-%m")
src["date"] = pd.to_datetime(src["start"]).dt.strftime("%Y-%m-%d")

In [8]:
src.head()

Unnamed: 0,start,end,day_count,year,month_year,date
30,2022-04-14 00:00:00+00:00,2022-04-14 17:43:29+00:00,529321,2022,2022-04,2022-04-14
29,2022-04-13 00:00:00+00:00,2022-04-14 00:00:00+00:00,81340,2022,2022-04,2022-04-13
28,2022-04-12 00:00:00+00:00,2022-04-13 00:00:00+00:00,83145,2022,2022-04,2022-04-12
27,2022-04-11 00:00:00+00:00,2022-04-12 00:00:00+00:00,182471,2022,2022-04,2022-04-11
26,2022-04-10 00:00:00+00:00,2022-04-11 00:00:00+00:00,95829,2022,2022-04,2022-04-10


#### Slim down and re-order the dataframe

In [9]:
df = src[["year", "month_year", "date", "day_count"]].copy()

#### When was the first mention? 

In [10]:
df[df["day_count"] > 0].tail(1)

Unnamed: 0,year,month_year,date,day_count
5501,2007,2007-03,2007-03-21,4


#### Define that as a variable

In [19]:
first = df[df["day_count"] > 0]["date"].tail(1).iloc[0]

In [20]:
first

'2007-03-21'

#### How many total mentions? 

In [21]:
df.day_count.sum()

33300103

#### Average mentions? 

In [22]:
df.day_count.mean()

5673.897256772874

#### Make a new dataframe starting from first mention

In [26]:
df_complete = df[df["date"] >= first]

#### Which day was mentioned the most? 

In [27]:
df_complete[df_complete["day_count"] == df_complete["day_count"].max()]

Unnamed: 0,year,month_year,date,day_count
30,2022,2022-04,2022-04-14,529321


#### Chart it

In [36]:
alt.Chart(df_complete.head(365 * 2)).mark_area(color="red").encode(
    x="date:T", y="day_count:Q",
).properties(width=900)

---

## Aggregate 

#### Groupby month/year

In [40]:
elon_months = (
    df_complete.groupby(["month_year"])
    .agg({"day_count": sum})
    .sort_values("month_year", ascending=False)
    .reset_index()
)

In [41]:
alt.Chart(elon_months).mark_area(color="red").encode(
    x="month_year:T", y="day_count:Q",
).properties(width=900)

#### Which month-year was max? 

---

#### Exports

In [44]:
df_complete.to_csv(f"../data/processed/twitter_mentions_{mentioned}.csv", index=False)