### Following sentdex's video 1 and video 2 of [Data Analysis w/ Python](https://www.youtube.com/playlist?list=PLQVvvaa0QuDfSfqQuee6K8opKtZsh7sA9)

In [None]:
import pandas as pd
%matplotlib inline
df = pd.read_csv('../input/avocado.csv')

In [None]:
# Let's glance at what we have here
df.head()

### Probably what we are interested in as a feature space is Date, AveragePrice, region and maybe TotalBags columns

#### Date is a string, let's convert that to a datetime object as we will use it for index

In [None]:
df['Date'] = pd.to_datetime(df['Date'])

In [None]:
# Picking up only albany data by using selector on dataframe
albany_df = df.copy()[df['region'] == 'Albany']

In [None]:
albany_df = albany_df.set_index("Date")

In [None]:
albany_df['AveragePrice'].plot()

Uh oh! The about plot looks a lot noisy! Lot of sharp ups and downs!

Let us do a rolling window average to smooth things out a bit

In [None]:
albany_df['AveragePrice'].rolling(25).mean().plot()

There is some problem with the ordering of data then - with the graph being some messed up, we will look at how dates are ordered

In [None]:
albany_df.index

Lets make sure that the dates are in proper order, so that our graph comes out to be accurate representation

In [None]:
albany_df.sort_index(inplace=True)
albany_df['AveragePrice'].rolling(25).mean().plot()

#### Neat! Seems like average price has a seasonal affair! Now lets put this data as a column in our DF

In [None]:
albany_df['price25ma'] = albany_df['AveragePrice'].rolling(25).mean()

### Next Up - we will plot the trends of all the regions, we will create a new dataframe with columns being moving average of each region, rows being the dates

#### We will take type = "organic" as the source data has two rows for each date, one for type = "organic", one for type = "conventional"

In [None]:
organic_df = df.copy()[df['type']=='organic']
organic_df['Date'] = pd.to_datetime(organic_df['Date'])
df.sort_values(by='Date', ascending=True, inplace=True)

In [None]:
# Now transform the organic_df to the structure mentioned above
graph_df = pd.DataFrame()

for region in organic_df['region'].unique():
    region_df = organic_df.copy()[organic_df['region']==region]
    region_df.set_index('Date', inplace=True)
    region_df.sort_index(inplace=True)
    region_df[f'{region}_price25ma'] = region_df['AveragePrice'].rolling(25).mean()
    
    if graph_df.empty:
        graph_df = region_df[[f'{region}_price25ma']]
    else:
        graph_df = graph_df.join(region_df[f'{region}_price25ma'])
    

### Looking at the structure, we are good, rolling mean of average price, indexed by the dates, a column per region

In [None]:
graph_df.tail()

In [None]:
# Plotting! Making plot a bit bigger to see more clearly the graph, turning off the legend, dropping na, rolling
# average for first 25 rows will be NaN - to make graph look lit!
graph_df.dropna().plot(figsize=(14, 10), legend=False)