In this notebook I'm presenting some types of useful plots using matplotlib and a heatmap that demonstrates correlation between some of the variables in the 'Avocado prices' database.

I chose this database because it has some very useful information about volumes sold, formats of bags in which avocados were sold and prices.

Now, the database concerns 4 years, 2015-2018, but the last year covers only 3 months so I chose to remove them to make the other years comparable in terms of period.

In [None]:
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))


*Importing dataset and having a look at it*.

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

avocado_db = pd.read_csv('/kaggle/input/avocado-prices/avocado.csv')
avocado_db.head()

*It's interesting to see the evolution of sales by year, so I'm grouping by year and removing the incomplete year, 2018 ( notice that the date variable changed but I'm not using it right now so I'm leaving it like this for now).*

In [None]:
avocado_db = avocado_db.groupby('year').sum()
avocado_db.round(0)
avocado_db = avocado_db.reset_index()
avocado_db = avocado_db[avocado_db['year'] < 2018]
avocado_db.head()
avocado_db.round(0)

1. PLOT - plotting volume of sales per year

In [None]:
year = avocado_db['year'].astype(str)
volume = avocado_db['Total Volume']/10000

plt.title('Volume by year')
plt.xlabel('year')
plt.ylabel('volume (/ 10k)')
plt.plot(year,volume)

2.SCATTER PLOT 

In [None]:
plt.title('Volume by year')
plt.xlabel('year')
plt.ylabel('volume (/ 10k)')
plt.scatter(year,volume, alpha = 0.5)

3. BARCHARTS

In [None]:
plt.bar(year, volume, align='center', alpha=0.5)

4. MULTIPLE PLOTS - plotting volume per year per type of bag

In [None]:

from numpy import *
from matplotlib.pyplot import *
from numpy.random import *

plt.title('Sales per type of bag by year')
plt.xlabel('year')
plt.ylabel('volume (/ 10k)')
small_bags = avocado_db['Small Bags']/10000
large_bags = avocado_db['Large Bags']/10000
xlarge_bags = avocado_db['XLarge Bags']/10000
plt.plot(year,small_bags, c = 'k', label = 'small bags')
plt.plot(year,large_bags, c = 'b', label = 'large bags')
plt.plot(year,xlarge_bags, c = 'r', label = 'very large bags')
legend()

5. BAR CHARTS WITH MULTIPLE X VARIABLES

In [None]:
N = 3

ind = np.arange(N) 
width = 0.35       
plt.bar(ind, small_bags, width, label='small')
plt.bar(ind + width, large_bags, width,
    label='large')
plt.bar(ind + width, xlarge_bags, width,
    label='xlarge')


plt.ylabel('sales')
plt.title('sales')

plt.xticks(ind + width, ('2015', '2016', '2017'))
plt.legend(loc='best')
plt.show()

6. STACKED BARCHARTS 



In [None]:
years = ['2015', '2016', '2017']

ind = [x for x, _ in enumerate(years)]

plt.bar(ind, small_bags, width=0.8, label='small', color='gold', bottom=large_bags+xlarge_bags)
plt.bar(ind, large_bags, width=0.8, label='large', color='silver', bottom=xlarge_bags)
plt.bar(ind, xlarge_bags, width=0.8, label='xlarge', color='#CD853F')

plt.xticks(ind, years)
plt.ylabel("sales")
plt.xlabel("years")
plt.legend(loc="upper left")
plt.title("Sales per year")

plt.show()

7. HEATMAPS - showcasing correlations
For this part I'll be using the variables 4046, 4225 and 4770 which indicate different kinds of avocados. Now, it's interesting to see if some kinds are more correlated to specific types of bags. Or not.

In [None]:
avocado_db2 = pd.read_csv('/kaggle/input/avocado-prices/avocado.csv')
avocado_db2.dropna(thresh=1)
avocados_4225 = avocado_db2[['4225','Large Bags','Small Bags','XLarge Bags']]
avocados_4046 = avocado_db2[['4046','Large Bags','Small Bags','XLarge Bags']]
avocados_4770 = avocado_db2[['4770','Large Bags','Small Bags','XLarge Bags']]

4225

In [None]:
import seaborn as sns
correlation = avocados_4225.corr(method = 'pearson')
ax = sns.heatmap(correlation)

*The correlation seems pretty strong, and positive, this type of avocado seems the more correlated with the small bags.*

4046

In [None]:
correlation = avocados_4046.corr(method = 'pearson')
ax = sns.heatmap(correlation)

In [None]:
correlation = avocados_4770.corr(method = 'pearson')
ax = sns.heatmap(correlation)

*4046 and 4770 both seem more correlated with the small bags but 4770 has a less strong relationship then 4046. *