## Plotting word counts through time with Pandas, Resample, and Plotly

***

### 1. Open and prepare the dataset 

***

This task works with an example dataset of old-fashioned Dutch boy names. Just because the are short and, in my humble opinion, also because they sound quite cool. If you are working with your own data, then you need a dataframe with at least two columns: 
* one column with a date string (e.g. "1950") which can be turned into a Pandas datetime object (here we use years, but the datetime function also recognizes formates such as "10/11/12," which are parsed as "2012-11-10").
* and one column with a text string (here we use a few boy names, but full newspaper articles, for example, would work, too). 

In [1]:
# Import some packages and create an example dataset
import warnings
warnings.filterwarnings('ignore') # only use this when you know the script and want to supress unnecessary warnings

# Create a dataframe
import pandas as pd
dict={'year':['1950', '1951', '1952', '1953', '1954'],'text':['Cees Aart Arie Jan Otto Gijs Sef Toon', 
                                                              'Cees Aart Arie Jan Otto Gijs Sef Toon Cees Aart Arie Jan Otto Gijs Sef Toon', 
                                                              'Aart Arie Toon', 
                                                              'Jan Otto', 
                                                              'Gijs']} 
df=pd.DataFrame(dict,index=['0', '1', '3', '4', '5'])
# in Jupyter Notebooks, you just call the name of a dataframe (e.g. "df") in the bottom of a cell to print it
df

Unnamed: 0,year,text
0,1950,Cees Aart Arie Jan Otto Gijs Sef Toon
1,1951,Cees Aart Arie Jan Otto Gijs Sef Toon Cees Aar...
3,1952,Aart Arie Toon
4,1953,Jan Otto
5,1954,Gijs


In [2]:
# Convert the date string in the column "year" (e.g. 1950) to a Pandas datetime object 
df['datetime']  = pd.to_datetime(df['year'], errors = 'coerce')
df

Unnamed: 0,year,text,datetime
0,1950,Cees Aart Arie Jan Otto Gijs Sef Toon,1950-01-01
1,1951,Cees Aart Arie Jan Otto Gijs Sef Toon Cees Aar...,1951-01-01
3,1952,Aart Arie Toon,1952-01-01
4,1953,Jan Otto,1953-01-01
5,1954,Gijs,1954-01-01


***

### 2. Getting the words counts

***

In [3]:
# Word counts for 'term of interest' per year 
# You can resample by year, month, or day,i.e. 'A-DEC', 'M', or 'D'
# For example, if you have daily time series data, you can aggregate oberservation by day, month or year.
# In this example, we work with yearly time series data, so we can aggregate observations by year.
df['term_of_interest'] = df['text'].str.count('Aart*')
df_word = df.set_index('datetime').resample('A-DEC')['term_of_interest'].sum()
df_word = df_word.reset_index()
print(df_word.sum())
df_word

term_of_interest    4
dtype: int64


Unnamed: 0,datetime,term_of_interest
0,1950-12-31,1
1,1951-12-31,2
2,1952-12-31,1
3,1953-12-31,0
4,1954-12-31,0


In [6]:
# Optional: use a Plotly bar chart to plot the data
# To run this code, you need to install the Plotly package first (see the How to get started with Python page in my Blog section)
import plotly.express as px
fig = px.bar(df_word, x='datetime', y='term_of_interest')
fig.update_layout(showlegend=False,
    xaxis_rangeslider_visible=False,
    width=450,
    height=450)  
fig.update_layout(paper_bgcolor='rgba(0,0,0,0)', plot_bgcolor='rgba(0,0,0,0)')
fig.update_xaxes(title_text="Year", showgrid=True, gridwidth=0.3, gridcolor='LightGrey')
fig.update_yaxes(title_text="# Reference to term of interest", showgrid=True, gridwidth=0.3, gridcolor='LightGrey')
fig.show('img')

ValueError: 
Invalid named renderer(s) received: ['img']