Hello ðŸ™Œ, welcome to my notebook. In this notebook we will try to exploring avocado price data and also make time-series visualization. Feel free if you have any question or suggestion! Thank you!

# Task & Description

- Context: It is a well known fact that Millenials LOVE Avocado Toast. It's also a well known fact that all Millenials live in their parents basements. Clearly, they aren't buying home because they are buying too much Avocado Toast! But maybe there's hopeâ€¦ if a Millenial could find a city with cheap avocados, they could live out the Millenial American Dream.

- Content: This data was downloaded from the Hass Avocado Board website in May of 2018 & compiled into a single CSV. Here's how the Hass Avocado Board describes the data on their website:

- The table below represents weekly 2018 retail scan data for National retail volume (units) and price. Retail scan data comes directly from retailersâ€™ cash registers based on actual retail sales of Hass avocados. Starting in 2013, the table below reflects an expanded, multi-outlet retail data set. Multi-outlet reporting includes an aggregation of the following channels: grocery, mass, club, drug, dollar and military. The Average Price (of avocados) in the table reflects a per unit (per avocado) cost, even when multiple units (avocados) are sold in bags. The Product Lookup codes (PLUâ€™s) in the table are only for Hass avocados. Other varieties of avocados (e.g. greenskins) are not included in this table. Some relevant columns in the dataset:

    1. Date - The date of the observation
    2. AveragePrice - the average price of a single avocado
    3. type - conventional or organic
    4. year - the year
    5. Region - the city or region of the observation
    6. Total Volume - Total number of avocados sold
    7. 4046 - Total number of avocados with PLU 4046 sold
    8. 4225 - Total number of avocados with PLU 4225 sold
    9. 4770 - Total number of avocados with PLU 4770 sold


- Inspiration

    1. In which cities can millenials have their avocado toast AND buy a home?

    2. Was the Avocadopocalypse of 2017 real?
    
    3. Do the price, volume, and type of avocado change over time?

    4. Does the number of avocados sold differ across regions and cities?

    5. Is there a preference for certain avocado sizes? Did 2017 change those preferences?

# Data Importing & Preview

In [None]:
import pandas as pd
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', None)

data1 = pd.read_csv('../input/avocado-prices/avocado.csv')

In [None]:
data1.head()

In [None]:
data1.info()

In [None]:
data1.shape

In [None]:
data1.describe()

In [None]:
data1.describe(include=['object'])

In [None]:
'''Missing Value Chart'''
import matplotlib.pyplot as plt

plt.figure(figsize=(10, 5))
data1.isnull().mean(axis=0).plot.barh()
plt.title("Ratio of missing values per columns")

In [None]:
data1.isnull().values.sum() #total missing values

In [None]:
'''Checking Duplicate'''

print('Dupplicate entries: {}'.format(data1.duplicated().sum()))
# data.drop_duplicates(inplace = True)

In [None]:
obj = data1.dtypes[data1.dtypes == "object"].index.tolist()
print(obj)

In [None]:
'''Nunique Columns'''

def nunique_counts(data):
   for i in data.columns:
       count = data[i].nunique()
       print(i, ": ", count)
    
nunique_counts(data1)

In [None]:
data1['Date'] = pd.to_datetime(data1['Date'])

In [None]:
data1.rename(columns={'year':'Year'}, inplace=True) #renaming kolom

In [None]:
data1.rename(columns={'4046':'PLU 4046'}, inplace=True) #renaming kolom
data1.rename(columns={'4225':'PLU 4225'}, inplace=True) #renaming kolom
data1.rename(columns={'4770':'PLU 4770'}, inplace=True) #renaming kolom

In [None]:
data1['Year'] = data1['Date'].dt.year
data1['Month'] = data1['Date'].dt.month
data1['Day'] = data1['Date'].dt.day

In [None]:
data1.head()

In [None]:
data1.info()

In [None]:
data1['Year'].min(), data1['Year'].max()

In [None]:
data1.drop('Unnamed: 0', axis=1, inplace=True)

In [None]:
data1['Total Volume'].sum() == data1['PLU 4046'].sum() + data1['PLU 4225'].sum() + data1['PLU 4770'].sum()

In [None]:
data1['Total Volume'].sum(), data1['PLU 4046'].sum() + data1['PLU 4225'].sum() + data1['PLU 4770'].sum()

In [None]:
data1['Total Revenue'] = data1['AveragePrice'] * data1['Total Volume']
data1['Total Revenue PLU 4046'] = data1['AveragePrice'] * data1['PLU 4046']
data1['Total Revenue PLU 4225'] = data1['AveragePrice'] * data1['PLU 4225']
data1['Total Revenue PLU 4770'] = data1['AveragePrice'] * data1['PLU 4770']
data1['Total Revenue Small Bags'] = data1['AveragePrice'] * data1['Small Bags']
data1['Total Revenue Large Bags'] = data1['AveragePrice'] * data1['Large Bags']
data1['Total Revenue XLarge Bags'] = data1['AveragePrice'] * data1['XLarge Bags']

In [None]:
data1.head()

- For the data shape we got 18249 rows and 14 features
- After checking for the missing values and duplicates, we know our data is ready
- For date columns, we will convert it to datetime format so later we can use for time series analysis
- We also generete it's Year, Month and Date seperately in different columns
- I want to find out if total volumes is summination of PLU 4046, 4225, and 4770 volumes and its not same
- And then making little feature engineering to generate total revenue from total volumes, volumes by its avocado types and bags

# Volume by Avocado Type

In [None]:
import plotly.express as px
import plotly.offline as py 
py.init_notebook_mode(connected=True)
import plotly.graph_objs as go 
import plotly.tools as tools
import warnings
from collections import Counter
from plotly.subplots import make_subplots

custom_aggregation = {}
custom_aggregation["Total Volume"] = "sum"
custom_aggregation["PLU 4046"] = "sum"
custom_aggregation["PLU 4225"] = "sum"
custom_aggregation["PLU 4770"] = "sum"

data2 = data1.groupby("type").agg(custom_aggregation)
data2['Avocado Type'] = data2.index

fig = make_subplots(rows=1, 
                    cols=2, 
                    specs=[[{'type':'domain'}, {'type':'domain'}]],
                    subplot_titles=('Total Volume',
                                    'PLU 4046'))


labels = data2['Avocado Type'].tolist()
values = data2['Total Volume'].tolist()
values1 = data2['PLU 4046'].tolist()
values2 = data2['PLU 4225'].tolist()
values3 = data2['PLU 4770'].tolist()

fig.add_trace(go.Pie(
                    labels=labels,
                    values=values, 
                    name=''),
                    1,1)

fig.add_trace(go.Pie(
                    labels=labels,
                    values=values1, 
                    name=''),
                    1,2)

fig.update_traces(textposition='inside')
fig.update_layout(uniformtext_minsize=12, uniformtext_mode='hide')
fig['layout'].update(height=400, 
                     width=800, 
                     title='Volume by Avocado Type',
                    legend_title="Type")
fig.show()

In [None]:
fig = make_subplots(rows=1, 
                    cols=2, 
                    specs=[[{'type':'domain'}, {'type':'domain'}]],
                    subplot_titles=('PLU 4225',
                                    'PLU 4770'))

values2 = data2['PLU 4225'].tolist()
values3 = data2['PLU 4770'].tolist()

fig.add_trace(go.Pie(
                    labels=labels,
                    values=values2, 
                    name=''),
                    1,1)

fig.add_trace(go.Pie(
                    labels=labels,
                    values=values3, 
                    name=''),
                    1,2)

fig.update_traces(textposition='inside')
fig.update_layout(uniformtext_minsize=12, uniformtext_mode='hide')
fig['layout'].update(height=400, 
                     width=800, 
                     title='Volume by Avocado Type',
                    legend_title="Type")
fig.show()

- By total volume (all and volume by avocado type) we know that majority of avocado sold is Conventional Avocados. Meanwhile, for Organic Avocados the sizes is very small, nearly 1%-2% of total avocados sold

# Total Revenue by Avocado Type

In [None]:
custom_aggregation = {}
custom_aggregation["Total Revenue"] = "sum"
data2 = data1.groupby("type").agg(custom_aggregation)
data2['Avocado Type'] = data2.index

labels = data2['Avocado Type'].tolist()
values = data2['Total Revenue'].tolist()

fig = px.pie(data2, values=values, names=labels)
fig.update_traces(textposition='inside')
fig.update_layout(uniformtext_minsize=12, uniformtext_mode='hide')
fig['layout'].update(height=400, width=800, title='Total Revenue by Avocado Type', legend_title="Type:")
fig.show()

- As well as the volume, Conventional avocados give almost all of the revenue which generated compared to Organic avocados

# Type & Total Volume on Each Bags

In [None]:
custom_aggregation = {}
custom_aggregation["Small Bags"] = "sum"
custom_aggregation["Large Bags"] = "sum"
custom_aggregation["XLarge Bags"] = "sum"

data2 = data1.groupby("type").agg(custom_aggregation)
data2['Avocado Type'] = data2.index

sb = go.Bar(
    x = data2['Avocado Type'].value_counts().index.sort_values(),
    y = data2["Small Bags"],
    name='Small Bags')

lb = go.Bar(
    x = data2['Avocado Type'].value_counts().index.sort_values(),
    y = data2["Large Bags"],
    name='Large Bags')

xlb = go.Bar(
    x = data2['Avocado Type'].value_counts().index.sort_values(),
    y = data2["XLarge Bags"],
    name='XLarge Bags')


data = [sb, lb, xlb]

fig = tools.make_subplots(rows=1, 
                          cols=1)

fig.append_trace(sb, 1, 1)
fig.append_trace(lb, 1, 1)
fig.append_trace(xlb, 1, 1)

fig['layout'].update(height=400, 
                     width=800, 
                     title='Type & Total Volume on Each Bags',
                    xaxis_title="Avocado Type",
                    yaxis_title="Total Volume",
                    legend_title="Bags:")

py.iplot(fig, filename='combined-savings')

- From this chart we know that most of avocados (either it conventional nor organic) is sold majority on Small Bags. 

# Total Volume by Region

In [None]:
custom_aggregation = {}
custom_aggregation["Total Volume"] = "sum"
data2 = data1.groupby("region").agg(custom_aggregation)
data2['Region'] = data2.index

fig = px.bar(data2, x='Region', y="Total Volume", color="Region")
fig['layout'].update(height=400, width=800, title='Total Volume by Region')
fig.show()

- Total avocados volume sold by its region majority is from West, South Central and California

# Total Revenue by Region

In [None]:
custom_aggregation = {}
custom_aggregation["Total Revenue"] = "sum"
data2 = data1.groupby("region").agg(custom_aggregation)
data2['Region'] = data2.index

fig = px.bar(data2, x='Region', y="Total Revenue", color="Region")
fig['layout'].update(height=400, width=800, title='Total Revenue by Region')
fig.show()

# Average Price by Region

In [None]:
custom_aggregation = {}
custom_aggregation["AveragePrice"] = "mean"
data2 = data1.groupby("region").agg(custom_aggregation)
data2['Region'] = data2.index

fig = px.bar(data2, x='Region', y="AveragePrice", color="Region")
fig['layout'].update(height=400, width=800, title='Average Price by Region')
fig.show()

- The region which have high avg. avocados price are: Hatfor Spring Field, San Francisco and California. Meanwhile South Central, Houston, and Dallas are region with lowest avocados price

# Avocado Revenue by Label & Region

In [None]:
custom_aggregation = {}
custom_aggregation["Total Revenue PLU 4046"] = "sum"
custom_aggregation["Total Revenue PLU 4225"] = "sum"
custom_aggregation["Total Revenue PLU 4770"] = "sum"

data2 = data1.groupby("region").agg(custom_aggregation)
data2['Region'] = data2.index

data3 = data2[:11]
data4 = data2[12:23]
data5 = data2[24:35]
data6 = data2[36:47]
data7 = data2[48:55]

plu_1 = go.Bar(
    x = data3['Region'].value_counts().index.sort_values(),
    y = data3["Total Revenue PLU 4046"],
    name='PLU 4046')

plu_2 = go.Bar(
    x = data3['Region'].value_counts().index.sort_values(),
    y = data3["Total Revenue PLU 4225"],
    name='PLU 4225')

plu_3 = go.Bar(
    x = data3['Region'].value_counts().index.sort_values(),
    y = data3["Total Revenue PLU 4770"],
    name='PLU 4770')

#------------------------------------------------------------------

plu_4 = go.Bar(
    x = data4['Region'].value_counts().index.sort_values(),
    y = data4["Total Revenue PLU 4046"],
    name='PLU 4046')

plu_5 = go.Bar(
    x = data4['Region'].value_counts().index.sort_values(),
    y = data4["Total Revenue PLU 4225"],
    name='PLU 4225')

plu_6 = go.Bar(
    x = data4['Region'].value_counts().index.sort_values(),
    y = data4["Total Revenue PLU 4770"],
    name='PLU 4770')


#------------------------------------------------------------------

plu_7 = go.Bar(
    x = data5['Region'].value_counts().index.sort_values(),
    y = data5["Total Revenue PLU 4046"],
    name='PLU 4046')

plu_8 = go.Bar(
    x = data5['Region'].value_counts().index.sort_values(),
    y = data5["Total Revenue PLU 4225"],
    name='PLU 4225')

plu_9 = go.Bar(
    x = data5['Region'].value_counts().index.sort_values(),
    y = data5["Total Revenue PLU 4770"],
    name='PLU 4770')

#------------------------------------------------------------------

plu_10 = go.Bar(
    x = data6['Region'].value_counts().index.sort_values(),
    y = data6["Total Revenue PLU 4046"],
    name='PLU 4046')

plu_11 = go.Bar(
    x = data6['Region'].value_counts().index.sort_values(),
    y = data6["Total Revenue PLU 4225"],
    name='PLU 4225')

plu_12 = go.Bar(
    x = data6['Region'].value_counts().index.sort_values(),
    y = data6["Total Revenue PLU 4770"],
    name='PLU 4770')

#------------------------------------------------------------------

plu_13 = go.Bar(
    x = data7['Region'].value_counts().index.sort_values(),
    y = data7["Total Revenue PLU 4046"],
    name='PLU 4046')

plu_14 = go.Bar(
    x = data7['Region'].value_counts().index.sort_values(),
    y = data7["Total Revenue PLU 4225"],
    name='PLU 4225')

plu_15 = go.Bar(
    x = data7['Region'].value_counts().index.sort_values(),
    y = data7["Total Revenue PLU 4770"],
    name='PLU 4770')



data = [plu_1, plu_2, plu_3,plu_4,plu_5,plu_6,
       plu_7,plu_8,plu_9,plu_10,plu_11,plu_12,
       plu_13,plu_14,plu_15]

fig = tools.make_subplots(rows=5, 
                          cols=1)

fig.append_trace(plu_1, 1, 1)
fig.append_trace(plu_2, 1, 1)
fig.append_trace(plu_3, 1, 1)

fig.append_trace(plu_4, 2, 1)
fig.append_trace(plu_5, 2, 1)
fig.append_trace(plu_6, 2, 1)

fig.append_trace(plu_7, 3, 1)
fig.append_trace(plu_8, 3, 1)
fig.append_trace(plu_9, 3, 1)

fig.append_trace(plu_10, 4, 1)
fig.append_trace(plu_11, 4, 1)
fig.append_trace(plu_12, 4, 1)

fig.append_trace(plu_13, 5, 1)
fig.append_trace(plu_14, 5, 1)
fig.append_trace(plu_15, 5, 1)


fig['layout'].update(height=1700, 
                     width=900, 
                     title='Avocado Revenue by Label & Region',
                     showlegend=False)

py.iplot(fig, filename='combined-savings')

- The idea here is, i want to find out the tencendy of a region to chose avocado type
- For overall in US, mostly market chose PLU 4046 than PLU 4225. For PLU 4770 the size is very small
- But in certain region market prefer to chose PLU 4225 than PLU 4046. The region such as: Baltimore, Chicago, Great Lakes, Hartfor Spring Field and etc
- This is very usefull information as consideration to optimalize the avocado sales in each region

# Avocado Revenue by Bags & Region

In [None]:
custom_aggregation = {}
custom_aggregation["Small Bags"] = "sum"
custom_aggregation["Large Bags"] = "sum"
custom_aggregation["XLarge Bags"] = "sum"

data2 = data1.groupby("region").agg(custom_aggregation)
data2['Region'] = data2.index

data3 = data2[:11]
data4 = data2[12:23]
data5 = data2[24:35]
data6 = data2[36:47]
data7 = data2[48:55]

plu_1 = go.Bar(
    x = data3['Region'].value_counts().index.sort_values(),
    y = data3["Small Bags"],
    name='Small Bags')

plu_2 = go.Bar(
    x = data3['Region'].value_counts().index.sort_values(),
    y = data3["Large Bags"],
    name='Large Bags')

plu_3 = go.Bar(
    x = data3['Region'].value_counts().index.sort_values(),
    y = data3["XLarge Bags"],
    name='XLarge Bags')

#------------------------------------------------------------------

plu_4 = go.Bar(
    x = data4['Region'].value_counts().index.sort_values(),
    y = data4["Small Bags"],
    name='Small Bags')

plu_5 = go.Bar(
    x = data4['Region'].value_counts().index.sort_values(),
    y = data4["Large Bags"],
    name='Large Bags')

plu_6 = go.Bar(
    x = data4['Region'].value_counts().index.sort_values(),
    y = data4["XLarge Bags"],
    name='XLarge Bags')


#------------------------------------------------------------------

plu_7 = go.Bar(
    x = data5['Region'].value_counts().index.sort_values(),
    y = data5["Small Bags"],
    name='Small Bags')

plu_8 = go.Bar(
    x = data5['Region'].value_counts().index.sort_values(),
    y = data5["Large Bags"],
    name='Large Bags')

plu_9 = go.Bar(
    x = data5['Region'].value_counts().index.sort_values(),
    y = data5["XLarge Bags"],
    name='XLarge Bags')

#------------------------------------------------------------------

plu_10 = go.Bar(
    x = data6['Region'].value_counts().index.sort_values(),
    y = data6["Small Bags"],
    name='Small Bags')

plu_11 = go.Bar(
    x = data6['Region'].value_counts().index.sort_values(),
    y = data6["Large Bags"],
    name='Large Bags')

plu_12 = go.Bar(
    x = data6['Region'].value_counts().index.sort_values(),
    y = data6["XLarge Bags"],
    name='XLarge Bags')

#------------------------------------------------------------------

plu_13 = go.Bar(
    x = data7['Region'].value_counts().index.sort_values(),
    y = data7["Small Bags"],
    name='Small Bags')

plu_14 = go.Bar(
    x = data7['Region'].value_counts().index.sort_values(),
    y = data7["Large Bags"],
    name='Large Bags')

plu_15 = go.Bar(
    x = data7['Region'].value_counts().index.sort_values(),
    y = data7["XLarge Bags"],
    name='XLarge Bags')


data = [plu_1, plu_2, plu_3,plu_4,plu_5,plu_6,
       plu_7,plu_8,plu_9,plu_10,plu_11,plu_12,
       plu_13,plu_14,plu_15]

fig = tools.make_subplots(rows=5, 
                          cols=1)

fig.append_trace(plu_1, 1, 1)
fig.append_trace(plu_2, 1, 1)
fig.append_trace(plu_3, 1, 1)

fig.append_trace(plu_4, 2, 1)
fig.append_trace(plu_5, 2, 1)
fig.append_trace(plu_6, 2, 1)

fig.append_trace(plu_7, 3, 1)
fig.append_trace(plu_8, 3, 1)
fig.append_trace(plu_9, 3, 1)

fig.append_trace(plu_10, 4, 1)
fig.append_trace(plu_11, 4, 1)
fig.append_trace(plu_12, 4, 1)

fig.append_trace(plu_13, 5, 1)
fig.append_trace(plu_14, 5, 1)
fig.append_trace(plu_15, 5, 1)


fig['layout'].update(height=1700, 
                     width=900, 
                     title='Avocado Revenue by Bags & Region',
                     showlegend=False)

py.iplot(fig, filename='combined-savings')

- Most of avocados sold are in Small Bags, maybe its for retail market.
- But in several region such as Dayton and Denver, market prefer to buy on Large Bags compared to Small Bags

# Monthly Avocado Price & Revenue

In [None]:
fig = make_subplots(rows=2, 
                    cols=1,
                    subplot_titles=(['Monthly Avocado Price',
                                    'Monthly Total Revenue']))

custom_aggregation = {}
custom_aggregation["AveragePrice"] = "mean"
custom_aggregation["Total Revenue"] = "sum"

data2 = data1.set_index(pd.DatetimeIndex(data1['Date']))
data0 = data2.resample('M').agg(custom_aggregation)
data0.columns = ["Average Price", 'Total Revenue']
data0['Date'] = data0.index

x = data0['Date'].tolist()
y = data0['Average Price'].tolist()
y2 = data0['Total Revenue'].tolist()

fig.add_trace(go.Scatter(x=x, 
                         y=y,
                         line=dict(color='darkgreen', width=2)), 
                         1, 1)

fig.add_trace(go.Scatter(x=x, 
                         y=y2,
                         line=dict(color='green', width=2)), 
                         2, 1)


fig['layout'].update(height=600, 
                     width=900,
                    showlegend=False)

fig['layout']['xaxis']['title']='Time'
fig['layout']['xaxis2']['title']='Time'
fig['layout']['yaxis']['title']='Price of 1 Avocado ($)'
fig['layout']['yaxis2']['title']='Total Revenue ($)'

fig.show()

- For the avocados price, get its lowets price in May 2016 and Feb 2017. And for highest price in October 2016 and Sep 2017
- At a glance, we can see that avocados price is cyclical. Increased from beginning of the year until  September get its higher price on one year, and then decreased until Feb - May.

In [None]:
fig = make_subplots(rows=2, 
                    cols=1,
                    subplot_titles=(['Monthly Avocado Price',
                                    'Monthly Total Revenue']))

custom_aggregation = {}
custom_aggregation["AveragePrice"] = "mean"
custom_aggregation["Total Revenue"] = "sum"
data_ = data1.loc[data1['type'] == 'conventional']
data__ = data1.loc[data1['type'] == 'organic']

data2 = data_.set_index(pd.DatetimeIndex(data_['Date']))
data0 = data2.resample('M').agg(custom_aggregation)
data0['Date'] = data0.index

x = data0['Date'].tolist()
y1 = data0['AveragePrice'].tolist()
y2 = data0['Total Revenue'].tolist()

fig.add_trace(go.Scatter(x=x, 
                         y=y1,
                         line=dict(color='darkgreen', width=2),
                         name='Conventional Price'), 
                         1, 1)

fig.add_trace(go.Scatter(x=x, 
                         y=y2,
                         line=dict(color='darkgreen', width=2),
                         name='Conventional Revenue'), 
                         2, 1)


#--------------------------------------------------------------------------

data2 = data__.set_index(pd.DatetimeIndex(data__['Date']))
data0 = data2.resample('M').agg(custom_aggregation)
data0['Date'] = data0.index

x = data0['Date'].tolist()
y1 = data0['AveragePrice'].tolist()
y2 = data0['Total Revenue'].tolist()

fig.add_trace(go.Scatter(x=x, 
                         y=y1,
                         line=dict(color='blue', width=2),
                         name='Organic Price'), 
                         1, 1)

fig.add_trace(go.Scatter(x=x, 
                         y=y2,
                         line=dict(color='blue', width=2),
                         name='Organic Revenue'), 
                         2, 1)

#----------------------------------------------------------------------------

fig['layout'].update(height=1000, 
                     width=900,
                    showlegend=True)

fig['layout']['xaxis']['title']='Time'
fig['layout']['xaxis2']['title']='Time'

fig['layout']['yaxis']['title']='Avocado Price ($)'
fig['layout']['yaxis2']['title']='Total Revenue ($)'

fig.show()



- For Conventional and Organic avocados price it seems like no significance different
- Majority of revenue is come from Conventional avocados compared to Organic one

In [None]:
fig = make_subplots(rows=3, 
                    cols=1,
                    subplot_titles=(['Monthly PLU 4046 Total Revenue',
                                    'Monthly PLU 4225 Total Revenue',
                                    'Monthly PLU 4770 Total Revenue']))

custom_aggregation = {}
custom_aggregation["Total Revenue PLU 4046"] = "sum"
custom_aggregation["Total Revenue PLU 4225"] = "sum"
custom_aggregation["Total Revenue PLU 4770"] = "sum"

data2 = data1.set_index(pd.DatetimeIndex(data1['Date']))
data0 = data2.resample('M').agg(custom_aggregation)
data0.columns = ["Total Revenue PLU 4046",'Total Revenue PLU 4225','Total Revenue PLU 4770']
data0['Date'] = data0.index

x = data0['Date'].tolist()
y = data0['Total Revenue PLU 4046'].tolist()
y2 = data0['Total Revenue PLU 4225'].tolist()
y3 = data0['Total Revenue PLU 4770'].tolist()


fig.add_trace(go.Scatter(x=x, 
                         y=y,
                         line=dict(color='darkgreen', width=2)), 
                         1, 1)

fig.add_trace(go.Scatter(x=x, 
                         y=y2,
                         line=dict(color='blue', width=2)), 
                         2, 1)

fig.add_trace(go.Scatter(x=x, 
                         y=y3,
                         line=dict(color='black', width=2)), 
                         3, 1)

fig['layout'].update(height=1000, 
                     width=900,
                    showlegend=False)

fig['layout']['xaxis']['title']='Time'
fig['layout']['xaxis2']['title']='Time'
fig['layout']['xaxis3']['title']='Time'

fig['layout']['yaxis']['title']='Total Revenue PLU 4046 ($)'
fig['layout']['yaxis2']['title']='Total Revenue PLU 4225 ($)'
fig['layout']['yaxis3']['title']='Total Revenue PLU 4770 ($)'

fig.show()

In [None]:
fig = make_subplots(rows=3, 
                    cols=1,
                    subplot_titles=(['Monthly Small Bags Total Revenue',
                                    'Monthly Large Bags Total Revenue',
                                    'Monthly XLarge Bags Total Revenue']))

custom_aggregation = {}
custom_aggregation["Total Revenue Small Bags"] = "sum"
custom_aggregation["Total Revenue Large Bags"] = "sum"
custom_aggregation["Total Revenue XLarge Bags"] = "sum"

data2 = data1.set_index(pd.DatetimeIndex(data1['Date']))
data0 = data2.resample('M').agg(custom_aggregation)
data0.columns = ["Total Revenue Small Bags",'Total Revenue Large Bags','Total Revenue XLarge Bags']
data0['Date'] = data0.index

x = data0['Date'].tolist()
y = data0['Total Revenue Small Bags'].tolist()
y2 = data0['Total Revenue Large Bags'].tolist()
y3 = data0['Total Revenue XLarge Bags'].tolist()


fig.add_trace(go.Scatter(x=x, 
                         y=y,
                         line=dict(color='darkgreen', width=2)), 
                         1, 1)

fig.add_trace(go.Scatter(x=x, 
                         y=y2,
                         line=dict(color='blue', width=2)), 
                         2, 1)

fig.add_trace(go.Scatter(x=x, 
                         y=y3,
                         line=dict(color='black', width=2)), 
                         3, 1)

fig['layout'].update(height=1000, 
                     width=900,
                    showlegend=False)

fig['layout']['xaxis']['title']='Time'
fig['layout']['xaxis2']['title']='Time'
fig['layout']['xaxis3']['title']='Time'

fig['layout']['yaxis']['title']='Total Revenue Small Bags ($)'
fig['layout']['yaxis2']['title']='Total Revenue Large Bags ($)'
fig['layout']['yaxis3']['title']='Total Revenue XLarge Bags ($)'

fig.show()

- The interesting discovery is, for revenue on XLarge Bags avocados always get it's peak on July every year and then decrease until December
- Can we assume that for market segment which need avocados in large volume such as Company, Restaurant always make a purchase on July? Need further inspection

# Avocado Monthly Revenue by Avocado Type

In [None]:
custom_aggregation = {}
custom_aggregation["Total Revenue PLU 4046"] = "sum"
custom_aggregation["Total Revenue PLU 4225"] = "sum"
custom_aggregation["Total Revenue PLU 4770"] = "sum"

data_ = data1.loc[data1['Year'] == 2015]
data__ = data1.loc[data1['Year'] == 2016]
data___ = data1.loc[data1['Year'] == 2017]

#--------------------------------------------------------------------------------------------------------
fig = tools.make_subplots(rows=3, 
                          cols=1,
                        specs=[[{'type':'bar'}],
                              [{'type':'bar'}],
                              [{'type':'bar'}]],
                       subplot_titles=(['2015',
                                        '2016',
                                       '2017']))

data2 = data_.groupby("Month").agg(custom_aggregation)
data2['Month'] = data2.index

plu_1 = go.Bar(
    x = data2['Month'].value_counts().index.sort_values(),
    y = data2["Total Revenue PLU 4046"],
    name='PLU 4046')

plu_2 = go.Bar(
    x = data2['Month'].value_counts().index.sort_values(),
    y = data2["Total Revenue PLU 4225"],
    name='PLU 4225')

plu_3 = go.Bar(
    x = data2['Month'].value_counts().index.sort_values(),
    y = data2["Total Revenue PLU 4770"],
    name='PLU 4770')

fig.append_trace(plu_1, 1, 1)
fig.append_trace(plu_2, 1, 1)
fig.append_trace(plu_3, 1, 1)


x = data2['Month'].tolist()
y = data2['Total Revenue PLU 4046'].tolist()
y2 = data2['Total Revenue PLU 4225'].tolist()
y3 = data2['Total Revenue PLU 4770'].tolist()


fig.add_trace(go.Scatter(x=x, 
                         y=y,
                         line=dict(color='blue', width=1)), 
                         1, 1)

fig.add_trace(go.Scatter(x=x, 
                         y=y2,
                         line=dict(color='red', width=1)), 
                         1, 1)

fig.add_trace(go.Scatter(x=x, 
                         y=y3,
                         line=dict(color='green', width=1)), 
                         1, 1)

#--------------------------------------------------------------------------------------------------------


data2 = data__.groupby("Month").agg(custom_aggregation)
data2['Month'] = data2.index

plu_4 = go.Bar(
    x = data2['Month'].value_counts().index.sort_values(),
    y = data2["Total Revenue PLU 4046"],
    name='PLU 4046')

plu_5 = go.Bar(
    x = data2['Month'].value_counts().index.sort_values(),
    y = data2["Total Revenue PLU 4225"],
    name='PLU 4225')

plu_6 = go.Bar(
    x = data2['Month'].value_counts().index.sort_values(),
    y = data2["Total Revenue PLU 4770"],
    name='PLU 4770')

fig.append_trace(plu_4, 2, 1)
fig.append_trace(plu_5, 2, 1)
fig.append_trace(plu_6, 2, 1)


x = data2['Month'].tolist()
y = data2['Total Revenue PLU 4046'].tolist()
y2 = data2['Total Revenue PLU 4225'].tolist()
y3 = data2['Total Revenue PLU 4770'].tolist()


fig.add_trace(go.Scatter(x=x, 
                         y=y,
                         line=dict(color='blue', width=1)), 
                         2, 1)

fig.add_trace(go.Scatter(x=x, 
                         y=y2,
                         line=dict(color='red', width=1)), 
                         2, 1)

fig.add_trace(go.Scatter(x=x, 
                         y=y3,
                         line=dict(color='green', width=1)), 
                         2, 1)

#--------------------------------------------------------------------------------------------------------

data2 = data___.groupby("Month").agg(custom_aggregation)
data2['Month'] = data2.index

plu_7 = go.Bar(
    x = data2['Month'].value_counts().index.sort_values(),
    y = data2["Total Revenue PLU 4046"],
    name='PLU 4046')

plu_8 = go.Bar(
    x = data2['Month'].value_counts().index.sort_values(),
    y = data2["Total Revenue PLU 4225"],
    name='PLU 4225')

plu_9 = go.Bar(
    x = data2['Month'].value_counts().index.sort_values(),
    y = data2["Total Revenue PLU 4770"],
    name='PLU 4770')

fig.append_trace(plu_7, 3, 1)
fig.append_trace(plu_8, 3, 1)
fig.append_trace(plu_9, 3, 1)


x = data2['Month'].tolist()
y = data2['Total Revenue PLU 4046'].tolist()
y2 = data2['Total Revenue PLU 4225'].tolist()
y3 = data2['Total Revenue PLU 4770'].tolist()


fig.add_trace(go.Scatter(x=x, 
                         y=y,
                         line=dict(color='blue', width=1)), 
                         3, 1)

fig.add_trace(go.Scatter(x=x, 
                         y=y2,
                         line=dict(color='red', width=1)), 
                         3, 1)

fig.add_trace(go.Scatter(x=x, 
                         y=y3,
                         line=dict(color='green', width=1)), 
                         3, 1)

#--------------------------------------------------------------------------------------------------------


data = [plu_1, plu_2, plu_3,
       plu_4,plu_5,plu_6,
       plu_7,plu_8,plu_9]

fig['layout'].update(height=900, 
                     width=900, 
                     title='Avocado Monthly Revenue by Avocado Type in:',
                     showlegend=False)

fig['layout']['xaxis']['title']='Month'
fig['layout']['xaxis2']['title']='Month'
fig['layout']['xaxis3']['title']='Month'

fig['layout']['yaxis']['title']='Total Revenue ($)'
fig['layout']['yaxis2']['title']='Total Revenue ($)'
fig['layout']['yaxis3']['title']='Total Revenue ($)'

py.iplot(fig, filename='combined-savings')

- We can see that the market preference on Aug 2015 until Aug 2016 is increased on chosing PLU 4225 than PLU 4046 (the red and blue line getting closer)
- And from Aug 2016 until Dec 2017 the market preference of chosing PLU 4046 and 4225 are balanced

# Avocado Monthly Revenue by Avocado Bags

In [None]:
custom_aggregation = {}
custom_aggregation["Total Revenue Small Bags"] = "sum"
custom_aggregation["Total Revenue Large Bags"] = "sum"
custom_aggregation["Total Revenue XLarge Bags"] = "sum"

data_ = data1.loc[data1['Year'] == 2015]
data__ = data1.loc[data1['Year'] == 2016]
data___ = data1.loc[data1['Year'] == 2017]

#--------------------------------------------------------------------------------------------------------
fig = tools.make_subplots(rows=3, 
                          cols=1,
                        specs=[[{'type':'bar'}],
                              [{'type':'bar'}],
                              [{'type':'bar'}]],
                       subplot_titles=(['2015',
                                        '2016',
                                       '2017']))

data2 = data_.groupby("Month").agg(custom_aggregation)
data2['Month'] = data2.index

plu_1 = go.Bar(
    x = data2['Month'].value_counts().index.sort_values(),
    y = data2["Total Revenue Small Bags"],
    name='Small Bags')

plu_2 = go.Bar(
    x = data2['Month'].value_counts().index.sort_values(),
    y = data2["Total Revenue Large Bags"],
    name='Large Bags')

plu_3 = go.Bar(
    x = data2['Month'].value_counts().index.sort_values(),
    y = data2["Total Revenue XLarge Bags"],
    name='XLarge Bags')

fig.append_trace(plu_1, 1, 1)
fig.append_trace(plu_2, 1, 1)
fig.append_trace(plu_3, 1, 1)


x = data2['Month'].tolist()
y = data2['Total Revenue Small Bags'].tolist()
y2 = data2['Total Revenue Large Bags'].tolist()
y3 = data2['Total Revenue XLarge Bags'].tolist()


fig.add_trace(go.Scatter(x=x, 
                         y=y,
                         line=dict(color='blue', width=1)), 
                         1, 1)

fig.add_trace(go.Scatter(x=x, 
                         y=y2,
                         line=dict(color='red', width=1)), 
                         1, 1)

fig.add_trace(go.Scatter(x=x, 
                         y=y3,
                         line=dict(color='green', width=1)), 
                         1, 1)

#--------------------------------------------------------------------------------------------------------


data2 = data__.groupby("Month").agg(custom_aggregation)
data2['Month'] = data2.index

plu_4 = go.Bar(
    x = data2['Month'].value_counts().index.sort_values(),
    y = data2["Total Revenue Small Bags"],
    name='Small Bags')

plu_5 = go.Bar(
    x = data2['Month'].value_counts().index.sort_values(),
    y = data2["Total Revenue Large Bags"],
    name='Large Bags')

plu_6 = go.Bar(
    x = data2['Month'].value_counts().index.sort_values(),
    y = data2["Total Revenue XLarge Bags"],
    name='Xlarge Bags')

fig.append_trace(plu_4, 2, 1)
fig.append_trace(plu_5, 2, 1)
fig.append_trace(plu_6, 2, 1)


x = data2['Month'].tolist()
y = data2['Total Revenue Small Bags'].tolist()
y2 = data2['Total Revenue Large Bags'].tolist()
y3 = data2['Total Revenue XLarge Bags'].tolist()


fig.add_trace(go.Scatter(x=x, 
                         y=y,
                         line=dict(color='blue', width=1)), 
                         2, 1)

fig.add_trace(go.Scatter(x=x, 
                         y=y2,
                         line=dict(color='red', width=1)), 
                         2, 1)

fig.add_trace(go.Scatter(x=x, 
                         y=y3,
                         line=dict(color='green', width=1)), 
                         2, 1)

#--------------------------------------------------------------------------------------------------------

data2 = data___.groupby("Month").agg(custom_aggregation)
data2['Month'] = data2.index

plu_7 = go.Bar(
    x = data2['Month'].value_counts().index.sort_values(),
    y = data2["Total Revenue Small Bags"],
    name='Small Bags')

plu_8 = go.Bar(
    x = data2['Month'].value_counts().index.sort_values(),
    y = data2["Total Revenue Large Bags"],
    name='Large Bags')

plu_9 = go.Bar(
    x = data2['Month'].value_counts().index.sort_values(),
    y = data2["Total Revenue XLarge Bags"],
    name='XLarge Bags')

fig.append_trace(plu_7, 3, 1)
fig.append_trace(plu_8, 3, 1)
fig.append_trace(plu_9, 3, 1)


x = data2['Month'].tolist()
y = data2['Total Revenue Small Bags'].tolist()
y2 = data2['Total Revenue Large Bags'].tolist()
y3 = data2['Total Revenue XLarge Bags'].tolist()


fig.add_trace(go.Scatter(x=x, 
                         y=y,
                         line=dict(color='blue', width=1)), 
                         3, 1)

fig.add_trace(go.Scatter(x=x, 
                         y=y2,
                         line=dict(color='red', width=1)), 
                         3, 1)

fig.add_trace(go.Scatter(x=x, 
                         y=y3,
                         line=dict(color='green', width=1)), 
                         3, 1)

#--------------------------------------------------------------------------------------------------------


data = [plu_1, plu_2, plu_3,
       plu_4,plu_5,plu_6,
       plu_7,plu_8,plu_9]

fig['layout'].update(height=900, 
                     width=900, 
                     title='Avocado Monthly Revenue by Avocado Bags in:',
                     showlegend=False)

fig['layout']['xaxis']['title']='Month'
fig['layout']['xaxis2']['title']='Month'
fig['layout']['xaxis3']['title']='Month'

fig['layout']['yaxis']['title']='Total Revenue ($)'
fig['layout']['yaxis2']['title']='Total Revenue ($)'
fig['layout']['yaxis3']['title']='Total Revenue ($)'

py.iplot(fig, filename='combined-savings')

- And for the Bags, it seems there is no significance different on market preference from chosing Bags. Every month the avocados on Small bags are sold more compared to Large and Xlarge Bags

# Conclusion

1. In which cities can millenials have their avocado toast AND buy a home?

    - Region which have low avocado price are: South Central, Houston, and Dallas
    
    
2. Do the price, volume, and type of avocado change over time?

    - Yes, avocados price, get its lowets price in May 2016 and Feb 2017. And for highest price in October 2016 and Sep 2017. At a glance, we can see that avocados price is cyclical. Increased from beginning of the year until  September get its higher price on one year, and then decreased until Feb - May.


3. Does the number of avocados sold differ across regions and cities?

    - Yes it does, region which have high avocado sales are West, South Central and California
    

4. Is there a preference for certain avocado sizes? Did 2017 change those preferences?

    - Yes it is, we can see that the market preference on Aug 2015 until Aug 2016 is increased on chosing PLU 4225 than PLU 4046 and from Aug 2016 until Dec 2017 the market preference of chosing PLU 4046 and 4225 are balanced


5. Was the Avocadopocalypse of 2017 real?

    - I think its not, it's normal. Because tha avocado price are cyclical. And in 2017 it get it's highest price. As we know there is so many factors that affect avocados price especially in US. We must check the supplier of the avocado (maybe its imported from other country? or from US farm). Climate also giving a significance role in avocados price.


Finish, don't forget to upvote! Thank you!:)