# Trends

We are going to do some re-importing of texts here by year. The first time around we are going to do the combined dataset and look for overall trends, and then we will follow that up by loading both the `only` and `plus` datasets separately to see if there are any differences worth noting. Our goal here is to see what words trend not only to learn about TED talks as a developing collection of events but it might also be possible to compare the trends glimpsed here against either trends from the BYU corpus or Google Trends itself.

<h1><span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Imports-and-Data-Load" data-toc-modified-id="Imports-and-Data-Load-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Imports and Data Load</a></span></li><li><span><a href="#Working-with-the-Years" data-toc-modified-id="Working-with-the-Years-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Working with the Years</a></span></li></ul></div>

## Imports and Data Load

In [1]:
import pandas as pd
import re
import matplotlib.pyplot as plt
plt.rcParams["figure.figsize"] = (20,10)

In [2]:
%matplotlib inline

In [6]:
df = pd.read_csv('../output/TEDall_years.csv', index_col=0)
df.shape
df.head()

Unnamed: 0,public_url,event,published,text,year
0,https://www.ted.com/talks/al_gore_on_averting_...,TED2006,6/27/06,"Thank you so much, Chris. And it's truly a g...",2006
1,https://www.ted.com/talks/david_pogue_says_sim...,TED2006,6/27/06,"(Music: ""The Sound of Silence,"" Simon & Garf...",2006
2,https://www.ted.com/talks/majora_carter_s_tale...,TED2006,6/27/06,If you're here today — and I'm very happy th...,2006
3,https://www.ted.com/talks/ken_robinson_says_sc...,TED2006,6/27/06,Good morning. How are you? (Laughter) ...,2006
4,https://www.ted.com/talks/hans_rosling_shows_t...,TED2006,6/27/06,"About 10 years ago, I took on the task to te...",2006


## Working with the Years

Okay, now the data analysis begins with us sorting out the talks into year bins where we can count terms and then determine the best way to find out which words, if any, show notable dynamism. And we are going to have to decide how to define dynamism. 

Our hypothesis here is that TED events will likely have some topicality, and so we will see one event dynamism, but we also probably want to try to find words that rise and fall over two years or more.

In [None]:
dfyears = sorted(df.year.unique().tolist())
print(dfyears)

We can make a quick check to see how many talks we have for each year. As `df.groupby('year').size()` reveals, the first year for which we have a substantial number of talks is 2002. We can probably safely start our analysis there.  

In [None]:
df.groupby('year').size()

In [None]:
df.groupby('year').size().plot()

Our next step is to filter by year, so maybe choosing a year like 1998 with 6 talks might be a good place to begin building our code:

In [None]:
year_1998 = df.loc[df['year'] == '1998']
year_1998.head()

Okay, so now we need code that:

```python 
for year in dfyears:
    year_year = df.loc[df['year'] == year]
```

Or maybe I'm thinking about this wrong. Maybe there's a **pandas** way of doing this.