# Avocado Market - Time Series Regression

### by ReDay Zarra

This project utilizes **Facebook Prophet** to analyze time series data for avocados from this dataset. The data includes observation dates, average unit price, unit type, region, volume and more. This time series data will then be used to **predict the prices for avocados** at any given date. This project showcases a 
step-by-step implementation of the model as well as in-depth notes to customize the 
model further for higher accuracy.

## Importing the necessary libraries

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import random
import seaborn as sns
from prophet import Prophet

> Pandas is a library used for data frame manipulations. NumPy is a package used for numerical analysis. Matplotlib and Seaborn are used for data visualization. Random will be used to generate random values. Facebook 

In [2]:
import matplotlib.pyplot as plt
import seaborn as sns

# Set seaborn style
sns.set_style('darkgrid')

## Importing the dataset

Importing the dataset with the .read_csv method from Pandas to load the dataset and storing it in the avocado_df variable. We can take a glimpse at the dataset with the built-in Pandas methods.

In [3]:
avocado_df = pd.read_csv('avocado.csv')

In [4]:
avocado_df.head()

Unnamed: 0.1,Unnamed: 0,Date,AveragePrice,Total Volume,4046,4225,4770,Total Bags,Small Bags,Large Bags,XLarge Bags,type,year,region
0,0,2015-12-27,1.33,64236.62,1036.74,54454.85,48.16,8696.87,8603.62,93.25,0.0,conventional,2015,Albany
1,1,2015-12-20,1.35,54876.98,674.28,44638.81,58.33,9505.56,9408.07,97.49,0.0,conventional,2015,Albany
2,2,2015-12-13,0.93,118220.22,794.7,109149.67,130.5,8145.35,8042.21,103.14,0.0,conventional,2015,Albany
3,3,2015-12-06,1.08,78992.15,1132.0,71976.41,72.58,5811.16,5677.4,133.76,0.0,conventional,2015,Albany
4,4,2015-11-29,1.28,51039.6,941.48,43838.39,75.78,6183.95,5986.26,197.69,0.0,conventional,2015,Albany


## Visualizing the dataset

Visualizing the dataset and conducting exploratory data analysis to find patterns and trends from the dataset. I will plot the data on different kinds of plots to compare components of the data that I find interesting.

### Year vs. Average Price

In [5]:
avocado_df = avocado_df.sort_values("Date")

> Using the .sort_values to **sort the dataframe** from the values of the Date column

In [None]:
plt.plot(avocado_df['Date'], avocado_df['AveragePrice'])
plt.figure(figsize = (15, 15))

<Figure size 1500x1500 with 0 Axes>

In [None]:
plt.figure(figsize = (30, 12))
plt.xticks(rotation = 45)
sns.countplot(x = 'region', data = avocado_df)

In [None]:
sns.countplot(x = 'year', data = avocado_df)

In [None]:
df_prophet = avocado_df[['Date', 'AveragePrice']]

In [None]:
df_prophet

## Time Series Regression

In [None]:
df_prophet = df_prophet.rename(columns = {'Date':'ds', 'AveragePrice':'y'})

In [None]:
p = Prophet()

In [None]:
p.fit(df_prophet)

In [None]:
wanted = p.make_future_dataframe(periods = 365)
forecast = p.predict(wanted)

In [None]:
forecast

In [None]:
figure = p.plot(forecast, xlabel = 'Date', ylabel = 'Price')

In [None]:
figure = p.plot_components(forecast)

## Region Based - Time Series Regression

In [None]:
df_prophet = pd.read_csv('avocado.csv')

In [None]:
df_prophet.head()

In [None]:
df_prophet_region = df_prophet[df_prophet['region'] == 'West']

In [None]:
df_prophet_region = df_prophet_region.sort_values("Date")

In [None]:
df_prophet_region.head()

In [None]:
plt.figure(figsize = (10, 10))
plt.plot(df_prophet_region['Date'], df_prophet_region['AveragePrice'])

In [None]:
df_prophet_region = df_prophet_region[['Date', 'AveragePrice']]

In [None]:
df_prophet_region = df_prophet_region.rename(columns = {'Date':'ds', 'AveragePrice':'y'})

In [None]:
df_prophet_region.head()

In [None]:
r = Prophet()

In [None]:
r.fit(df_prophet_region)

In [None]:
new = r.make_future_dataframe(periods = 365)
newfore = r.predict(new)

In [None]:
fig = r.plot(newfore, xlabel = 'Date', ylabel = 'Price')

In [None]:
fig = r.plot_components(newfore)