# Contents
1. ### ABC XYZ analysis description
2. ### Data preparation
3. ### Analysis
4. ### Result

## ***What is ABC XYZ analysis?***
#### This is ABC XYZ analysis for e-commerce. It is a combination of two inventory analysis - ABC and XYZ. The goal is to shape sales strategy by product
#### ABC classificates products by amount of profit or gross, i.e. shows what goods gives maximum cash. 
* #### group A - important products that should always be present in the assortment. this group includes the most profitable products.
* #### Group B is of medium importance.
* #### Group C - names of important products, these are applicants for exclusion from the range and new products.XYZ classificates products by 
#### XYZ-analysis is a tool that allows you to divide products according to the degree of stability in sales and the level of fluctuations in consumption.The method of this analysis consists in calculating each heading of the coefficient of variation or flow rate fluctuation. This coefficient shows the deviation of consumption from the average value and is expressed as a percentage. The result of the XYZ analysis is a grouping of goods into three categories, based on the stability of their behavior:
* #### Category X These are goods characterized by a stable consumption value and a high degree of forecasting.
* #### Category Y These are products with seasonal fluctuations and average forecasting capabilities.
* #### Category Z These are goods with irregular consumption and unpredictable fluctuations, therefore, it is impossible to predict their demand.
#### Result is a matrix which shows products and their categories, for each of what company shapes strategy

## Data preparation

### Importing librarires

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import plotly.express as px
import copy
import plotly.graph_objects as go
import cufflinks
cufflinks.go_offline()
cufflinks.set_config_file(world_readable=True, theme='pearl', offline=True)


import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

### Reading files

In [None]:
orderdetails = pd.read_csv('../input/ecommerce-data/Order Details.csv')
orderlist = pd.read_csv('../input/ecommerce-data/List of Orders.csv')
target = pd.read_csv('../input/ecommerce-data/Sales target.csv')

### Looking at dataset with order details

In [None]:
orderdetails.head()

### Checking null and missning values

In [None]:
orderdetails.isnull().sum()

### Reading some dataset descriptions

In [None]:
print('Orderdetails info')
print(orderdetails.info())
print('Orderdetails describe')
print(orderdetails.describe())

### Looking at the dataset with orders

In [None]:
orderlist.head()

### Reading some second dataset descriptions

In [None]:
print('Orderlist info')
print(orderlist.info())
print('Orderlist describe')
print(orderlist.describe())

### Checking missing values in second dataset

In [None]:
orderlist.isnull().sum()

### Creating heatmap of the dataset to check if there are 60 rows with empty sells or there are empty rows. As we can see it is empty rows.

In [None]:
cols = orderlist.columns
df=orderlist[cols].isnull().replace({True:1, False:0})
fig = px.imshow(df,x=df.columns, y=orderlist.index, labels=dict(x="Column", y="Row", color="Missing indicator"))
fig.update_layout(title = 'Missing values map (yellow for missing value cells and blue for filled cells)')
fig.show()

### Dropping empty rows from second dataset

In [None]:
orderlist = orderlist.dropna()

## Analysis

### Joining 2 datasets for analysis

In [None]:
orders = orderdetails.merge(orderlist[['Order ID','Order Date']], left_on='Order ID',right_on='Order ID', how='inner')
orders.head()

### Creating a copy of new dataset

In [None]:
orders_copy = copy.deepcopy(orders)

### Creating row with month and year of order to get know month good flow

In [None]:
orders_copy['Date'] = orders_copy['Order Date'].apply(lambda x:x[6::]+'-'+x[3:5])

### Creating dataset with information about monthly profit by goods' sub-categories

In [None]:
orderssub = orders_copy.groupby(['Sub-Category', 'Date']).Profit.sum().unstack().reset_index()
orderssub.head()

### Creating column with full all profits for a known period by sub-categories

In [None]:
orderssub['Profit Summary'] = orderssub.iloc[:,1:14].sum(axis=1)

In [None]:
orderssub

### Get know what sub-categories cause losses. These is Electronic games and Tables

In [None]:
orderssub.loc[orderssub['Profit Summary'] < 0]['Sub-Category'].unique()

In [None]:
fig = go.Figure(data=[go.Bar(x=orderssub[orderssub['Profit Summary'] < 0]['Sub-Category'], y=orderssub[orderssub['Profit Summary'] < 0]['Profit Summary'])])
# Customize aspect
fig.update_traces(marker_color='rgb(255,0,17)', marker_line_color='rgb(173,0,0)',
                  marker_line_width=1.5, opacity=0.6)
fig.update_layout(title_text='Non profitable sub-categories: Loss')
fig.show()

In [None]:
orderssub[orderssub['Profit Summary'] < 0]

### Drop sub-categories which cause losses

In [None]:
orderssub = orderssub.loc[orderssub['Profit Summary'] > 0]
orderssub = orderssub.reset_index(drop=True)

### Creating new columns. Profit share is shows what contribution does the category make to the formation of profits, Cumulative share is cumulative total

In [None]:
orderssub['Profit share'] = orderssub['Profit Summary']/orderssub['Profit Summary'].values.sum()*100
orderssub = orderssub.sort_values('Profit share', ascending = False)
orderssub['Cumulative share'] = orderssub['Profit share'].cumsum()
orderssub

In [None]:
fig = px.bar(orderssub, y='Profit Summary', x='Sub-Category', title='Profit by sub-category', color='Cumulative share')
fig.show()

### Defining ABC categories. A is goods which give 80% of profit, B is 15%, C is 5%.

In [None]:
orderssub['ABC'] = 0
orderssub.loc[(orderssub['Cumulative share'] <= 83), 'ABC'] = 'A'
orderssub.loc[(orderssub['Cumulative share'] <= 95)&(orderssub['Cumulative share'] >= 83), 'ABC'] = 'B'
orderssub.loc[(orderssub['Cumulative share'] >= 95), 'ABC'] = 'C'
orderssub

In [None]:
fig = px.pie(orderssub.groupby('ABC')['Profit Summary'].sum(), values='Profit Summary', names = orderssub.groupby('ABC')['Profit Summary'].sum().index, title = 'Profit by ABC')
fig.show()

### Creating dataset with information about sales by sub-category.

In [None]:
ordersubb = orders_copy.groupby(['Sub-Category', 'Date']).Amount.sum().unstack().reset_index()
ordersubb.head()

### Dropping non-profit sub-categories (information is above)

In [None]:
ordersubb = ordersubb[ordersubb['Sub-Category'].isin(orderssub.loc[orderssub['Profit Summary'] > 0]['Sub-Category'].unique())]

### Creating new column with information about gross given by sub-catgory for all known period

In [None]:
ordersubb['Summary'] = ordersubb.iloc[:,1:14].sum(axis=1)

In [None]:
ordersubb = ordersubb.sort_values('Summary', ascending = False)
ordersubb

### Creating a column with coefficient of variation of sales. It gives information about sale stability (or volatility)

In [None]:
ordersubb['variation'] = 0
ordersubb['variation'] = ordersubb.iloc[:,1:13].apply(lambda x: np.std(x)/np.mean(x)*100, axis=1)

In [None]:
ordersubb.sort_values('variation')

### Sales sub-categories summary with defining volatility by color

In [None]:
fig = px.bar(ordersubb, y='Summary', x='Sub-Category', title='Sales by sub-category', color='variation')
fig.show()

### Monthly sub-catogory sales visualization

In [None]:
import plotly.graph_objects as go
fig = go.Figure()
for i in range(ordersubb['Sub-Category'].nunique()):
    x = ordersubb.iloc[:, 1:13].columns
    fig.add_trace(go.Scatter(x = x, y = ordersubb.iloc[i, 1:13], mode = 'lines+markers', name = ordersubb['Sub-Category'].unique()[i]))
    i+=1  
fig.update_layout(title = 'Monthly Sub-categories sales')
fig.show()

### Defining XYZ categories. In classic X is for sales with volatility 5-15%, Y for volatility 15-50% and Z is for >50%. But here we have the minimum 44%, so assign categories based on this minimum: <50% is for X, 50-70% is for Y and >71% is for Z.

In [None]:
ordersubb['XYZ'] = 0
ordersubb.loc[(ordersubb['variation'] <= 50), 'XYZ'] = 'X'
ordersubb.loc[(ordersubb['variation'] <= 71)&(ordersubb['variation'] >= 50), 'XYZ'] = 'Y'
ordersubb.loc[(ordersubb['variation'] >= 71), 'XYZ'] = 'Z'
ordersubb

### Joining results of ABC and XYZ

In [None]:
subb = orderssub[['Sub-Category','ABC']].merge(ordersubb[['Sub-Category','XYZ']], right_on = 'Sub-Category', left_on = 'Sub-Category', how = 'inner')
subb

## Result

In [None]:
table = subb.groupby(['ABC','XYZ'])['Sub-Category'].unique().unstack(level=-1)
table

### It is a final matrix, which gives understanding how to build a stock system. 
* ### There is Phones in **AX** cell: it means that phones are profitable and quite stable (when compared with other units of the commodity matrix), so It is necessary to ensure the constant availability of the goods, but for this you do not need to create an excess safety stock.
* ### There are Printers, Stole and Hackerchief in **AY** cell. It means that they are profitable and more volatile. In order to ensure constant availability, company needs to increase the safety stock.
* ### There are Bookcases, Accessories and Trousers in AZ cell. It means that they are profitable but too volatile, it is not possible to build any sale forecast for them. An attempt to ensure the guaranteed availability for all goods of this group only at the expense of excess insurance stock will lead to the fact that the average stock of the company will increase significantly. Therefore, for the goods of this group, the ordering system should be revised:
    * #### - transfer part of the goods to the ordering system with a constant amount (volume) of the order;
    * #### - to ensure more frequent deliveries for goods;
    * #### - choose suppliers located close to the warehouse, thereby reducing the amount of the insurance stock;
* ### There are Shirts and Furnishings in BY cell. It means that they are less profitable than A category and quite volatile.With a high turnover, they have insufficient stability of consumption, and, as a result, in order to ensure constant availability, it is necessary to increase the safety stock.
* ### There is T-shirt in BZ cell. It means that it is mediumly profitable and too volatile. It requires well thought out system of work with stocks like in AZ group.
* ### There is Saree in **CX** cell: it means that it is not very profitable but stable. For goods of this group, company can use the ordering system with a constant frequency and reduce the safety stock.
* ### There is Skirt in CY cell. It means that it is not profitable and volatile.For goods of this group, company can switch to a system with a constant amount (volume) of the order, but at the same time form a safety stock based on the financial capabilities of the company.
* ### There are Chairs, Leggings and Kurti in CZ cell. It means these goods do not bring much profit and do not have stable demand.The CZ group of goods includes all new goods, goods of spontaneous demand, delivered on order, etc. Some of these goods can be painlessly removed from the assortment, and the other part needs to be regularly monitored, since it is from the goods of this group that illiquid or hard-to-sell stocks arise from which the company suffers losses. It is necessary to remove from the assortment the leftovers of goods taken on order or no longer produced, that is, goods that usually belong to the category of stocks.

### Hey guy! Thank you so much for watching! If you found this notebook useful or interesting, please, upvote 