# Avocado Prices Analysis and Prediction

- Hello everyone, i am new at time series and this is my first notebook about time series, in this notebook i aimed to predict avocado prices, and analyze the change in average avocado prices

**Yunus Emre Gündoğmuş - September 2018**

# İntroduction
1. [Import Libraries](#ch0)
1. [Data Preprocessing](#ch1)
1. [Visualization](#ch2)
1. [Time Series Analysis And Prediction](#ch3)
1. [Source](#ch4)

## 1 - İmport Libraries
<a id="ch0"></a>

In [None]:
#data analysis libraries 
import numpy as np
import pandas as pd
import datetime

#visualization libraries
import matplotlib.pyplot as plt
from matplotlib import pyplot
import seaborn as sns
%matplotlib inline

#ignore warnings
import warnings
warnings.filterwarnings('ignore')

## 2 - Data PreProcessing 
<a id="ch1"></a>

In [None]:
data = pd.read_csv('../input/avocado.csv') #read to data
data = data.drop(['Unnamed: 0'], axis = 1) #drop the useless column
names = ["date", "avprice", "totalvol", "small","large","xlarge","totalbags","smallbags","largebags","xlargebags","type","year","region"] #get new column names
data = data.rename(columns=dict(zip(data.columns, names))) #rename columns
data.head()

- I want to look at data types and take positions accordingly

In [None]:
data.info()

- When I look at data types, I see that date is not datetime, so I will solve this problem and sort the data by date. Then I will divide the dates into day-month-year for a better analysis.

In [None]:
dates = [datetime.datetime.strptime(ts, "%Y-%m-%d") for ts in data['date']]
dates.sort()
sorteddates = [datetime.datetime.strftime(ts, "%Y-%m-%d") for ts in dates]
data['date'] = pd.DataFrame({'date':sorteddates})
data['Year'], data['Month'],  data['Day'] = data['date'].str.split('-').str
data.head(10)

- Now I got Sorted data let's do some visualizations

## 3- Visualizations
<a id="ch2"></a>

In [None]:
plt.figure(figsize=(12,5))
plt.title("Price Distirbution Graph")
ax = sns.distplot(data["avprice"], color = 'y')

- Here we can see that the weight is in the price range of $ 1.15

### Weight distribution of prices

In [None]:
import seaborn as sns
fig, ax = plt.subplots()
fig.set_size_inches(10,5)
sns.violinplot(data.dropna(subset = ['avprice']).avprice)

### Price distribution graph in general

In [None]:
dategroup=data.groupby('date').mean()
plt.figure(figsize=(12,5))
dategroup['avprice'].plot(x=data.date)
plt.title('Average Price')

### Change of average price per calendar year

- This is important because seasonal changes can affect prices.

In [None]:
dategroup=data.groupby('Month').mean()
fig, ax = plt.subplots(figsize=(12,5))
ax.xaxis.set(ticks=range(0,13)) # Manually set x-ticks
dategroup['avprice'].plot(x=data.Month)
plt.title('Average Price by Month')

### Changes in prices by day of month
- Which is again a significant chart reason for us on certain days of the week discounts

In [None]:
dategroup=data.groupby('Day').mean()
fig, ax = plt.subplots(figsize=(12,5))
ax.xaxis.set(ticks=range(0,31)) # Manually set x-ticks
dategroup['avprice'].plot(x=data.Day)
plt.title('Average Price by Day')

### Yearly Average Price in Each Region
- Again an extremely important graphic reason for us is the price change between the states

In [None]:
plt.figure(figsize=(20,20))
sns.set_style('whitegrid')
sns.pointplot(x='avprice',y='region',data=data, hue='year',join=False)
plt.xticks(np.linspace(1,2,5))
plt.xlabel('Region',{'fontsize' : 'large'})
plt.ylabel('Average Price',{'fontsize':'large'})
plt.title("Yearly Average Price in Each Region",{'fontsize':20})

### Type Average Price in Each Region
- In this chart, we can see price changes according to type in cities, organic avocado is more expensive as a standard

In [None]:
plt.figure(figsize=(12,20))
sns.set_style('whitegrid')
sns.pointplot(x='avprice',y='region',data=data, hue='type',join=False)
plt.xticks(np.linspace(1,2,5))
plt.xlabel('Region',{'fontsize' : 'large'})
plt.ylabel('Average Price',{'fontsize':'large'})
plt.title("Type Average Price in Each Region",{'fontsize':20})

- Here we look at the type distribution in the dataset.

In [None]:
print(data['type'].value_counts())
plt.figure(figsize=(12,5))
sns.countplot(data['type'])
plt.show()

## 4- Time Series Analysis
<a id="ch3"></a>

- We first import the necessary libraries and start the time series analysis process

In [None]:
%matplotlib inline
import pandas as pd
from fbprophet import Prophet

import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')

In [None]:
df = data.loc[:, ["date","avprice"]]
df['date'] = pd.DatetimeIndex(df['date'])
df.dtypes

- If you want to do time series analysis in the prophet library, you have to change the column names. I changed the name of the column to make it possible.

In [None]:
df = df.rename(columns={'date': 'ds',
                        'avprice': 'y'})

In [None]:
ax = df.set_index('ds').plot(figsize=(20, 12))
ax.set_ylabel('Monthly Average Price of Avocado')
ax.set_xlabel('Date')

plt.show()

### Call Prophet Model
- now we introduce the prophet function and train the model, here I set it to predict the next 900 days, you can increase or decrease this number

In [None]:
my_model = Prophet()
my_model.fit(df)

future_dates = my_model.make_future_dataframe(periods=900)
forecast =my_model.predict(future_dates)

- Now let's look at how the model made a price estimate, and the part of the blue area contains the values that the model predicts

In [None]:
fig2 = my_model.plot_components(forecast)

- In this section we will divide the predicted results by train results.
- We first create a DataFrame containing the main train data named ForecastNew. Then we filter it out and divide it into two. 
- forecastnew = main data 
- forecastedvalues = model predicted data

In [None]:
forecastnew = forecast['ds']
forecastnew2 = forecast['yhat']

forecastnew = pd.concat([forecastnew,forecastnew2], axis=1)

mask = (forecastnew['ds'] > "2018-03-24") & (forecastnew['ds'] <= "2020-09-10")
forecastedvalues = forecastnew.loc[mask]

mask = (forecastnew['ds'] > "2015-01-04") & (forecastnew['ds'] <= "2018-03-25")
forecastnew = forecastnew.loc[mask]

 ## Final Results
 
 - Our model predicted that rising prices would go down in the coming years.

In [None]:
fig, ax1 = plt.subplots(figsize=(16, 8))
ax1.plot(forecastnew.set_index('ds'), color='b')
ax1.plot(forecastedvalues.set_index('ds'), color='r')
ax1.set_ylabel('Average Prices')
ax1.set_xlabel('Date')
print("Red = Predicted Values, Blue = Base Values")

# 5- Source
<a id="ch4"></a>
- [Explore avocados from all sides!](https://www.kaggle.com/hely333/explore-avocados-from-all-sides])
- [A Guide to Time Series Forecasting with Prophet in Python 3](https://www.digitalocean.com/community/tutorials/a-guide-to-time-series-forecasting-with-prophet-in-python-3)
- [Time Series Analysis in Python: An Introduction](https://towardsdatascience.com/time-series-analysis-in-python-an-introduction-70d5a5b1d52a)

**Thank You For Reading, All Feedbacks are Welcome!**