## Avocado prices


Nature and rationale of the data:

The data represents weekly 2018 retail scan data for National retail volume (units) and price. Retail scan data comes directly from retailers’ cash registers based on actual retail sales of Hass avocados. Starting in 2013, the table below reflects an expanded, multi-outlet retail data set. Multi-outlet reporting includes an aggregation of the following channels: grocery, mass, club, drug, dollar and military. The Average Price (of avocados) in the table reflects a per unit (per avocado) cost, even when multiple units (avocados) are sold in bags. The Product Lookup codes (PLU’s) in the table are only for Hass avocados. Other varieties of avocados (e.g. greenskins) are not included in this table.


Some relevant columns in the dataset:
* Date - The date of the observation
* AveragePrice - the average price of a single avocado
* type - conventional or organic
* year - the year
* Region - the city or region of the observation
* Total Volume - Total number of avocados sold
* 4046 - Total number of avocados with PLU 4046 sold
* 4225 - Total number of avocados with PLU 4225 sold
* 4770 - Total number of avocados with PLU 4770 sold

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [None]:
df = pd.read_csv('../input/avocado-prices/avocado.csv')

In [None]:
df

## Exploratory Analysis

In [None]:
df = df.drop(columns = ['Unnamed: 0'])

In [None]:
df['Date'] = pd.to_datetime(df['Date'], errors='coerce')

#analyzing the column 'Date', we can see that the data is recollected once a week and 108 observations. 
#Looking at the regions we find outthat there are 54 unique regions and 2 types of avocados.

In [None]:
df.info()

In [None]:
df

### Types of Avocados:
In this section we will analyze the different types of avocados that we have in this dataset. 
Basically, we have two types of avocados:  
        * Conventional  
        * Organic  

In [None]:
df.groupby(['type']).sum().plot(kind='pie', y='Total Volume')

The volume of organic avocados represents a small precent in the total sales of avocados.

In [None]:
# List of types
types = ['conventional', 'organic']

# Iterate through the five airlines
for i in types:
    # Subset to the airline
    subset = df[df['type'] == i]
    
    # Draw the density plot
    sns.distplot(subset['AveragePrice'], hist = False, kde = True,
                 kde_kws = {'linewidth': 3},
                 label = i)
    
# Plot formatting
plt.legend(prop={'size': 16}, title = 'Types')
plt.title('Avocado Avg price by Type')
plt.xlabel('Avg. Price')
plt.ylabel('Density')

We can see that organic avocados tend to have higher prices than conventional ones.

### Time Series analysis and pattern recognition

In [None]:
conventional = df[df['type'] == 'conventional']
organic = df[df['type'] == 'organic']

import matplotlib.dates as mdates
years = mdates.YearLocator() 
months = mdates.MonthLocator()  # every month
years_fmt = mdates.DateFormatter('%Y')


fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(20, 8))
plt.title('CONVENTIONAL')
conventional.set_index('Date').plot(y='AveragePrice', ax=ax1)

# format the ticks
ax1.xaxis.set_major_locator(years)
ax1.xaxis.set_major_formatter(years_fmt)
ax1.xaxis.set_minor_locator(months)
ax1.set_title('CONVENTIONAL')

organic.set_index('Date').plot(y='AveragePrice', ax=ax2, color ='green')

# format the ticks
ax2.xaxis.set_major_locator(years)
ax2.xaxis.set_major_formatter(years_fmt)
ax2.xaxis.set_minor_locator(months)
ax2.set_title('ORGANIC')

In [None]:
#Theres is approx. 108 prices per day, lets calculate the mean.
fig,ax = plt.subplots(figsize=(15,6))
df.groupby(['Date','type']).mean()['AveragePrice'].unstack().plot(ax=ax)
plt.title('Avg Price of avocado  over time')

Checking the graph, we can find some seasonal patterns in the data. Max. price of the year is around august-october and the minimum price on february. Probably this is related with the seasonal production of the avocado.
For other side we can see that the price has been increasing with the time.

In [None]:
fig,ax = plt.subplots(figsize=(15,6))
df.groupby(['Date','type']).sum()['Total Volume'].unstack().plot(ax=ax)
plt.title('Total amount of avocado  over time')

We can see that the mean price is related to the Total amount of avocados.

So far we have done some analysis working mainly with the average price, total amount and type of avocado. We could continue analyzing the different regions one by one. Checking the influence in the price by the sell of avocados in bags of different sizes or by the variety of avocado.