<a href="https://www.kaggle.com/code/mikedelong/eda-with-pie-charts?scriptVersionId=162655235" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

In [1]:
import pandas as pd
from warnings import filterwarnings

filterwarnings(action='ignore', category=FutureWarning)

SALES = '/kaggle/input/europe-sales-records/Europe Sales Records.csv'

df = pd.read_csv(filepath_or_buffer=SALES, parse_dates=['Order Date', 'Ship Date']).drop(columns=['Region'])
df['year'] = df['Order Date'].dt.year
df.head()

Unnamed: 0,Country,Item Type,Sales Channel,Order Priority,Order Date,Order ID,Ship Date,Units Sold,Unit Price,Unit Cost,Total Revenue,Total Cost,Total Profit,year
0,Czech Republic,Beverages,Offline,C,2011-09-12,478051030,2011-09-29,4778,47.45,31.79,226716.1,151892.62,74823.48,2011
1,Bosnia and Herzegovina,Clothes,Online,M,2013-10-14,919133651,2013-11-04,927,109.28,35.84,101302.56,33223.68,68078.88,2013
2,Austria,Cereal,Offline,C,2014-08-13,987410676,2014-09-06,5616,205.7,117.11,1155211.2,657689.76,497521.44,2014
3,Bulgaria,Office Supplies,Online,L,2010-10-31,672330081,2010-11-29,6266,651.21,524.96,4080481.86,3289399.36,791082.5,2010
4,Estonia,Fruits,Online,L,2016-09-28,579463422,2016-11-01,4958,9.33,6.92,46258.14,34309.36,11948.78,2016


Let's rank our countries by mean total profit.

In [2]:
from plotly.express import bar
bar(data_frame=df[['Country', 'Total Profit']].groupby(by='Country').mean().reset_index().sort_values(ascending=False, by='Total Profit'),
    x='Country', y='Total Profit',)

In [3]:
bar(data_frame=df[['Country', 'Total Profit', 'year']].groupby(by=['Country', 'year']).sum().reset_index().sort_values(ascending=True, by=['year', 'Total Profit']),
    x='Country', y='Total Profit', color='year', title='Total profit by country by year')

Weirdly Andorra is the source of our greatest profit; let's take another look at total profit, because we want to look for things we can control; we can in principle increase profits by focusing on the component of the profit in the breakdowns below.

In [4]:
for breakdown in ['year', 'Sales Channel', 'Order Priority', 'Item Type']:
    bar(data_frame=df[['Country', 'Total Profit', breakdown]].groupby(by=['Country', breakdown]).mean().reset_index().sort_values(ascending=True, by=['Total Profit', breakdown]),
    x='Country', y='Total Profit', color=breakdown, title='Mean total profit by {}'.format(breakdown), height=900).show()

In [5]:
from plotly.express import pie
for breakdown in ['year', 'Sales Channel', 'Order Priority', 'Item Type']:
    pie(data_frame=df[['Total Profit', breakdown]].groupby(by=breakdown).mean().reset_index(), names=breakdown, values='Total Profit').show()

This analysis probably tells us that we should focus on making more sales in our profit driving categories, namely Cosmetics, Household, and Office Supplies. The other pie chart breakdowns don't tell us much:
* The year breakdown tells us our mean profit fluctuates from year to year but not much
* Our shipping priority and sales channel breakdowns tell us more about variables we have little control over; we can choose to invest in marketing for different channels, but our customers will tend to have their own priorities.

In [6]:
from plotly.express import imshow
imshow(img=df[['Units Sold', 'Unit Price', 'Unit Cost', 'Total Revenue', 'Total Cost', 'Total Profit']].corr())

While maximizing total profit is the obvious business objective, we probably want to focus on predicting units sold; profit per unit is a hidden variable here; unit cost and unit profit are independent variables, and together with units sold produce our totals.

We would like to build a model here, but our available data doesn't seem to contain enough information to drive further insights.