# AtliQ Data Analysis project
Author: Do Nam Phong (Mason) Phung   
Last update: 2024 Aug 03

AtliQ is a B2B hardware & peripheral manufacturer headquartered in Mumbai, they have many regional branches across India. The company provides computer and network equipments for other businesses. 
   
In the previous quarter, the company was reported to have declining sales and their Sales director is having trouble tracking where business is falling in the local Indian market.

We will help the company to take a look at their sales data, analyze and determine what's the problems and provide suggestions by looking at these questions:
- Which products have high/low performances (profit margin)?
- Who are the best customers? (bring the most profit)
- What is the sales performance between months/years?
- How do different markets perform?

Tools & software used in the project:
- Local database: MySQL 
- Database management: MySQL Benchmark or DBeaver (for its compatibility with MacOS ARM).   
- IDE: Visual Studio Code
- Python libraries: sqlalchemy, pandas, numpy (for data mining, data manipulation)

Quick overview:
- I. Initialization: Create mysqlalchemy engine to run MySQL through Python and notebook. Review ERD and data description tables.
- II. Data cleaning: Import tables from local MySQL database, clean the data and export them back to MySQL server.
- III. Data analysis: Do exploratory analysis with SQL to explore the datasets, gather insights for the problems.
- IV. Conclusion: Analyze the problems observed, provide suggestions/make comments.

## I. Initialization

### ER diagram of the database

![ER diagram](img/erd_sales.png "ER diagram of the imported data")

*Note that the `transactions` table has no primary key. There are 4 foreign keys in this database: `product_code`, `market_code`, `customer_code`, `date`*

### Data descriptions
For each table

**Transactions**

The table includes the details of the transactions such as product code, customer code, market code, order date, sales amount and profit margin.

| **Variable**             | **Description**                                                                                        |
|--------------------------|--------------------------------------------------------------------------------------------------------|
| product_code             | Identification code of the product                                                                     |
| customer_code            | Identification code of the customer                                                                    |
| market_code              | Identification code of the market                                                                      |
| order_date               | Date of the order                                                                                      |
| sales_qty                | Number of units sold in the order                                                                      |
| sales_amount             | Revenue of the order                                                                                   |
| currency                 | The money currency which was used in the order                                                         |
| profit_margin_percentage | The profit margin as a percentage of sales amount, calculated as (profit_margin / sales_amount) * 100. |
| profit_margin            | The profit from a transaction or group of transactions, calculated as sales amount minus cost price.   |
| cost_price               | The cost of the order                                                                                  |

**Products**

The table contains the type of the products by their product code

| **Variable**             | **Description**                                                                                        |
|--------------------------|--------------------------------------------------------------------------------------------------------|
| product_code             | Identification code of the product                                                                     |
| product_type             | The type of the product (own brand - company's own products, or distribution - third-party products)   |                                                           |


**Markets**

The data of the markets with their names and the zone they belong to based on the market code

| **Variable**             | **Description**                                                                                        |
|--------------------------|--------------------------------------------------------------------------------------------------------|
| market_code              | Identification code of the market                                                                      |
| markets_name             | Geographic name of the market                                                                          |
| zone                     | The geographic zone where the market belongs to (North/Central/South)                                  |

**Customers**

The table contains customers' name and their type by each customer code

| **Variable**             | **Description**                                                                                        |
|--------------------------|--------------------------------------------------------------------------------------------------------|
| customer_code            | Identification code of the customer                                                                    |
| customer_name            | Name of the customer                                                                                   |
| customer_type            | Customer store type (`Brick & Mortar` or `E-commerce`)                                                                                     |

**Date**

The list of date from 2017-1-1 to 2020-06-30 and different time data related to the date

| **Variable**             | **Description**                                                                                        |
|--------------------------|--------------------------------------------------------------------------------------------------------|
| date                     | Date in YYYY-MM-DD format                                                                              |
| cy_date                  | The current year's date in the format YYYY-MM-DD                                                       |
| year                     | The year of the order in YYYY format                                                                   |
| month_name               | Month of the Date in text - MMM format                                                                 |
| date_yy_mmm              | Date in YY-MMM format                                                                                  |

### Load packages and import datasets

Import the required packages for the work

In [2]:
# SQLhttps://file+.vscode-resource.vscode-cdn.net/Users/masonphung/Desktop/data%20science/data_projects/project-atliq/img/erd_sales.png
import sqlalchemy
from sqlalchemy import create_engine

# Data manipulation
import pandas as pd
import numpy as np

Create a local MySQL database with homebrew

In [107]:
## Install MySQL with homebrew
# brew install mysql
# brew services start mysql

## First login to mysql
# mysql -u root -p

## Create a username and password
# CREATE USER ‘root:tttn0711’@localhost;

Import data to the database by reading the SQL dump file using DBeaver. Name the database as `sales` and it after reading the dump file, it should includes 5 tables
- `customers`: Information of customers such as name and business type.
- `date`: All the dates with different formats of them, start from the first transaction date to the last.
- `markets`: Name and zone information of each business market by its code.
- `products`: All product codes and their types (Own brand or distribution).
- `transactions`: Sales data of each transaction in the period.

![Imported DBeaver database](img/dbeaver_imported_db.png "Imported DBeaver database")

**Since we are using jupyter notebook, I will use SQL through python's sqlalchemy.**

In [5]:
# Load SQL extension and create a connection to mysql database
%load_ext sql
%sql mysql+mysqlconnector://root:tttn0711@localhost:3306/sales 
# %sql mysql://username:password@host:port/database_name

# Create an engine as a connector between database and the our editor
engine = create_engine("mysql+mysqlconnector://root:tttn0711@localhost:3306/sales")

**Take a quick look at the `markets` table**

In [109]:
%%sql
SELECT *
FROM markets
LIMIT 5

 * mysql+mysqlconnector://root:***@localhost:3306/sales
5 rows affected.


markets_code,markets_name,zone
Mark001,Chennai,South
Mark002,Mumbai,Central
Mark003,Ahmedabad,North
Mark004,Delhi NCR,North
Mark005,Kanpur,North


**Total number of transactions in `transactions` table**

In [110]:
%%sql
SELECT count(*) as total_transaction
FROM transactions

 * mysql+mysqlconnector://root:***@localhost:3306/sales
1 rows affected.


total_transaction
148395


**USD transactions in `transactions` table**

In [111]:
%%sql
SELECT *
FROM transactions
WHERE currency = 'USD'

 * mysql+mysqlconnector://root:***@localhost:3306/sales
2 rows affected.


product_code,customer_code,market_code,order_date,sales_qty,sales_amount,currency,profit_margin_percentage,profit_margin,cost_price
Prod003,Cus005,Mark004,2017-11-20,59,500.0,USD,0.31,11625.0,25875.0
Prod003,Cus005,Mark004,2017-11-22,36,250.0,USD,0.17,3187.5,15562.5


- *The imported data works fine !*
- *We will need to take a look at the data to make sure it's cleaned*

## II. Data cleaning

To clean the data, we need to import them from MySQL server. We will use pandas `read_sql_table` with the defined engine.

In [48]:
# Tables to be imported
tables = ['transactions', 'products', 'markets', 'customers', 'date']

# Import the tables using pandas `read_sql_table`
for table in tables:
    try:
        globals()[table] = pd.read_sql_table(table, con=engine)
        print(f'table imported')
    except Exception as e:
        print(f'Failed to import: {e}')

table imported
table imported
table imported
table imported
table imported


### **0. Take a brief look at all of the tables**

Let's take a look at every table and find possible data issues that need cleaning

In [113]:
transactions.head()

Unnamed: 0,product_code,customer_code,market_code,order_date,sales_qty,sales_amount,currency,profit_margin_percentage,profit_margin,cost_price
0,Prod279,Cus020,Mark011,2017-10-11,1,102.0,INR,0.39,39.78,62.22
1,Prod279,Cus020,Mark011,2017-10-18,1,102.0,INR,-0.12,-12.24,114.24
2,Prod279,Cus020,Mark011,2017-10-19,1,102.0,INR,0.29,29.58,72.42
3,Prod279,Cus020,Mark011,2017-11-08,1,102.0,INR,0.36,36.72,65.28
4,Prod279,Cus020,Mark011,2018-03-09,1,102.0,INR,-0.35,-35.7,137.7


Possible cleaning checks:
- As we are dealing currency, it is better if we have all of the currency and its related variables synced as a whole.
- `sales_qty`, `sales_amount` and `cost_price` values should be larger than 0 (or at least with `sales_qty` larger than 1)

In [114]:
transactions.describe()

Unnamed: 0,order_date,sales_qty,sales_amount,profit_margin_percentage,profit_margin,cost_price
count,148395,148395.0,148395.0,148395.0,148395.0,148395.0
mean,2019-01-09 14:59:53.086020864,16.370376,6636.433,0.024448,166.15835,6470.649
min,2017-10-04 00:00:00,1.0,5.0,-0.35,-369348.5,3.05
25%,2018-05-15 00:00:00,1.0,176.0,-0.16,-67.32,166.5
50%,2018-12-20 00:00:00,1.0,519.0,0.02,5.55,508.26
75%,2019-08-29 00:00:00,7.0,3065.0,0.21,105.6,2907.13
max,2020-06-26 00:00:00,14049.0,1510944.0,0.4,481775.04,1846742.0
std,,115.394269,30086.49,0.218956,6850.373158,29779.92


Looks like the dataset matches one of our requirements (`sales_qty` min value > 1, `sales_amount` and `cost_price` min values > 0)

In [115]:
products.head()

Unnamed: 0,product_code,product_type
0,Prod001,Own Brand\r
1,Prod002,Own Brand\r
2,Prod003,Own Brand\r
3,Prod004,Own Brand\r
4,Prod005,Own Brand\r


- Carriage return `\r` can be found in each observation of `product_type`, this can be due to an issue when we use `pd.read_sql_table` to import the dataset. We will need to remove them.

In [116]:
markets

Unnamed: 0,markets_code,markets_name,zone
0,Mark001,Chennai,South
1,Mark002,Mumbai,Central
2,Mark003,Ahmedabad,North
3,Mark004,Delhi NCR,North
4,Mark005,Kanpur,North
5,Mark006,Bengaluru,South
6,Mark007,Bhopal,Central
7,Mark008,Lucknow,North
8,Mark009,Patna,North
9,Mark010,Kochi,South


New York and Paris markets are not needed as we are focusing on the domestic. We will need to remove any observation from all tables that is related to these two markets.

In [117]:
customers.head()

Unnamed: 0,customer_code,custmer_name,customer_type
0,Cus001,Surge Stores,Brick & Mortar
1,Cus002,Nomad Stores,Brick & Mortar
2,Cus003,Excel Stores,Brick & Mortar
3,Cus004,Surface Stores,Brick & Mortar
4,Cus005,Premium Stores,Brick & Mortar


In [50]:
date.head()

Unnamed: 0,date,cy_date,year,month_name,date_yy_mmm
0,2017-06-01,2017-06-01,2017,June,17-Jun
1,2017-06-02,2017-06-01,2017,June,17-Jun
2,2017-06-03,2017-06-01,2017,June,17-Jun
3,2017-06-04,2017-06-01,2017,June,17-Jun
4,2017-06-05,2017-06-01,2017,June,17-Jun


- Remove found `\r`.
- Check if the time span is correct.

### **1. Check for missing data**

In [119]:
# Define a function to report missing data
def report_missing(df):
    """
    Create a dataframe, then calculate the number of null and blank value.
    
    Parameter:
    df (DataFrame)
        The dataframe used to check for missing values
    
    Return:
    completed_report (DataFrame)
        The report table including the number of null, blank values and their percentage in total
    """
    # Total observation count
    total_obs = df.shape[0]
    # Create a dataframe
    missing = pd.DataFrame()
    # Total nulls
    missing['null_count'] = df.isnull().sum()
    # Total blank value
    missing['blank_count'] = [df[df[c].astype(str) == ""][c].count() for c in df.columns]
    # Total missing value
    missing['total_missing'] = missing.sum(axis = 1)
    # Report missing percentage
    missing['null_percent'] = round(100* (missing['null_count']/ total_obs), 2)
    missing['blank_percent'] = round(100* (missing['blank_count']/ total_obs), 2)
    missing['total_missing_percent'] = round(100* (missing['total_missing']/ total_obs), 2)
    
    completed_report = missing.sort_values(
        by = 'total_missing_percent',
        ascending = False
    )
    return completed_report

report_missing(transactions)

Unnamed: 0,null_count,blank_count,total_missing,null_percent,blank_percent,total_missing_percent
product_code,0,0,0,0.0,0.0,0.0
customer_code,0,0,0,0.0,0.0,0.0
market_code,0,0,0,0.0,0.0,0.0
order_date,0,0,0,0.0,0.0,0.0
sales_qty,0,0,0,0.0,0.0,0.0
sales_amount,0,0,0,0.0,0.0,0.0
currency,0,0,0,0.0,0.0,0.0
profit_margin_percentage,0,0,0,0.0,0.0,0.0
profit_margin,0,0,0,0.0,0.0,0.0
cost_price,0,0,0,0.0,0.0,0.0


The table does not seem to have any missing data

### **2. Remove carriage returns '\r' found in output datasets**

In `products` and `date` datasets, '\r' are found in the the observations of their last columns. We need to remove these excessive tags.

In [120]:
# Replace '\r' characters with a space
products['product_type'] = products['product_type'].str.replace('\r', ' ')
date['date_yy_mmm'] = date['date_yy_mmm'].str.replace('\r', ' ')

products.head()

Unnamed: 0,product_code,product_type
0,Prod001,Own Brand
1,Prod002,Own Brand
2,Prod003,Own Brand
3,Prod004,Own Brand
4,Prod005,Own Brand


In [121]:
date.head()

Unnamed: 0,date,cy_date,year,month_name,date_yy_mmm
0,2017-06-01,2017-06-01,2017,June,17-Jun
1,2017-06-02,2017-06-01,2017,June,17-Jun
2,2017-06-03,2017-06-01,2017,June,17-Jun
3,2017-06-04,2017-06-01,2017,June,17-Jun
4,2017-06-05,2017-06-01,2017,June,17-Jun


Completed !

### **3. Multiple currencies**

In the dataset, there are two currencies observed: Indian Rupee (INR) and United States Dollar (USD). We'll convert all of the USD sales into INR (As the INR dominates the dataset).

In [122]:
# Check for the currencies in the dataset
transactions['currency'].unique()

array(['INR', 'USD'], dtype=object)

In [123]:
# Print the rows with `currency = USD`
transactions[transactions['currency'] == 'USD']

Unnamed: 0,product_code,customer_code,market_code,order_date,sales_qty,sales_amount,currency,profit_margin_percentage,profit_margin,cost_price
135937,Prod003,Cus005,Mark004,2017-11-20,59,500.0,USD,0.31,11625.0,25875.0
135938,Prod003,Cus005,Mark004,2017-11-22,36,250.0,USD,0.17,3187.5,15562.5


- There are 02 observations that have 'USD' as the currency. Note that `profit_margin` and `cost_price` are still in INR, only the `sales_amount` is in USD.
- If we take `cost_price` + `profit_margin`, we will get the `sales_amount` in INR for these two observation. Let's replace the `sales_amount` of them with the newly calculated in INR.

In [124]:
# Replace any `sales_amount` value with `currency = USD` with the sum of `profit_margin` and `cost_price`
transactions.loc[transactions['currency'] == 'USD', 'sales_amount'] = transactions['profit_margin'] + transactions['cost_price']
# Take a look at the result
transactions[transactions['currency'] == 'USD']

Unnamed: 0,product_code,customer_code,market_code,order_date,sales_qty,sales_amount,currency,profit_margin_percentage,profit_margin,cost_price
135937,Prod003,Cus005,Mark004,2017-11-20,59,37500.0,USD,0.31,11625.0,25875.0
135938,Prod003,Cus005,Mark004,2017-11-22,36,18750.0,USD,0.17,3187.5,15562.5


**Change the 'USD' currency to 'INR'**

In [125]:
# Replace any currency = `USD` with `INR`
transactions['currency'] = transactions['currency'].replace(['USD'], 'INR')
# Check if there is any `currency = USD` left
transactions[transactions['currency'] == 'USD']

Unnamed: 0,product_code,customer_code,market_code,order_date,sales_qty,sales_amount,currency,profit_margin_percentage,profit_margin,cost_price


### **4. Only focus on the domestic markets**

By observing the dataset `markets`, we see that aside from local Indian markets, there are also details about two overseas markets Paris and New York. As we are focusing on the domestic, there is no need to care about these two markets. Therefore, we'll remove the observations related to these two markets in the `markets` and the `transactions` table (if there is any). 

In [126]:
# Apply ~ as the logical negation, to keep the rows that does not match the criteria
markets = markets[~markets['markets_code'].isin(['Mark097', 'Mark999'])]
markets

Unnamed: 0,markets_code,markets_name,zone
0,Mark001,Chennai,South
1,Mark002,Mumbai,Central
2,Mark003,Ahmedabad,North
3,Mark004,Delhi NCR,North
4,Mark005,Kanpur,North
5,Mark006,Bengaluru,South
6,Mark007,Bhopal,Central
7,Mark008,Lucknow,North
8,Mark009,Patna,North
9,Mark010,Kochi,South


In [127]:
transactions[transactions['market_code'].isin(['Mark097', 'Mark999'])]

Unnamed: 0,product_code,customer_code,market_code,order_date,sales_qty,sales_amount,currency,profit_margin_percentage,profit_margin,cost_price


There is no transaction data related to the two markets so we can skip here.

### **5. Export the dataset back to SQL server as a table**

We have finished cleaning the data, let's export them back to MySQL database

In [131]:
# Write the records in each data frame to the SQL server, replace if exist]
dfs = {
    'transactions': transactions,
    'markets': markets,
    'products': products,
    'customers': customers,
    'date': date 
}


for name, df in dfs.items():
    try:
        df.to_sql(name = name, con = engine, if_exists = 'replace', index = False)
        print(f'{name} table exported')
    except Exception as e:
        print(f'{name}Failed to export: {e}')

transactions table exported
markets table exported
products table exported
customers table exported
date table exported


**Recheck the new tables in MySQL database**

Let's find if there is any transaction in USD currency

In [132]:
%%sql
SELECT *
FROM transactions
WHERE currency = 'USD'

 * mysql+mysqlconnector://root:***@localhost:3306/sales
0 rows affected.


product_code,customer_code,market_code,order_date,sales_qty,sales_amount,currency,profit_margin_percentage,profit_margin,cost_price


There is no transactions with `USD` currency. Looks like we have successfully replaced the original data with the cleaned ones.

## III. Data Analysis

We mainly analyze `transactions` as this table gives the most important and number of data. In this project, we define a successful sale as a transaction that has the best profit margin, which is able to make the highest profit after considering the cost.

Let's determine the total number of orders

In [112]:
%%sql

SELECT
    COUNT(*) as total_transactions,
    SUM(sales_qty) as total_quantity, 
    SUM(sales_amount) as sales_amount,
    ROUND(SUM(cost_price), 3) as total_cost,
    ROUND(SUM(profit_margin), 3) as profit_margin,
    ROUND(SUM(profit_margin) / SUM(sales_amount), 3) as profit_margin_perc
FROM 
    transactions

 * mysql+mysqlconnector://root:***@localhost:3306/sales
1 rows affected.


total_transactions,total_quantity,sales_amount,total_cost,profit_margin,profit_margin_perc
148395,2429282,984868963.0,960211894.59,24657068.41,0.025


<span style="color:#0077B6">

- There are almost 150k transactions in the table.
- The company has a profit margin of 2.5% over 4 years.
- Notice that the total cost seems to be high compared to the total_sales

</span>

**1. Find the top 10 single orders with highest/lowest profit margin**

Highest 

In [11]:
%%sql

SELECT *
FROM
    transactions
ORDER BY profit_margin DESC
LIMIT 10

 * mysql+mysqlconnector://root:***@localhost:3306/sales
10 rows affected.


product_code,customer_code,market_code,order_date,sales_qty,sales_amount,currency,profit_margin_percentage,profit_margin,cost_price
Prod318,Cus038,Mark013,2018-02-23 00:00:00,1798,1338264.0,INR,0.36,481775.04,856488.96
Prod329,Cus006,Mark004,2018-12-14 00:00:00,280,1160782.0,INR,0.38,441097.16,719684.84
Prod329,Cus006,Mark004,2019-01-18 00:00:00,360,1477458.0,INR,0.28,413688.24,1063769.76
Prod329,Cus006,Mark004,2018-11-23 00:00:00,240,994954.0,INR,0.4,397981.6,596972.4
Prod049,Cus022,Mark002,2018-03-07 00:00:00,747,996102.0,INR,0.32,318752.64,677349.36
Prod329,Cus006,Mark004,2019-01-08 00:00:00,240,984977.0,INR,0.31,305342.87,679634.13
Prod316,Cus020,Mark004,2018-02-28 00:00:00,480,878935.0,INR,0.34,298837.9,580097.1
Prod040,Cus020,Mark004,2018-03-09 00:00:00,400,807301.0,INR,0.37,298701.37,508599.63
Prod304,Cus006,Mark004,2018-08-03 00:00:00,600,809574.0,INR,0.36,291446.64,518127.36
Prod308,Cus006,Mark004,2018-07-05 00:00:00,560,762949.0,INR,0.35,267032.15,495916.85


<span style="color:#0077B6">

- The transaction with the highest profit come from `Mark013` of customer `Cus038` purchased `Prod318`.
- 4/10 transactions are of `Prod329` bought of customer `Cus006`. This is a sign of a frequent customer, who purchased many times and helped generated high profit.
- Most of the transactions are from `Mark004`, which means that this market could be performing well.
- Most of the transactions have a profit margin percentage of around 30%.
- Note that with similar profit margin percentages, the data indicates that their is no product outperforms others (as high profit products have high cost and low profit products also have low cost).

</span>

Lowest

In [18]:
%%sql

SELECT *
FROM
    transactions
ORDER BY profit_margin
LIMIT 10

 * mysql+mysqlconnector://root:***@localhost:3306/sales
10 rows affected.


product_code,customer_code,market_code,order_date,sales_qty,sales_amount,currency,profit_margin_percentage,profit_margin,cost_price
Prod073,Cus006,Mark004,2020-04-16 00:00:00,947,1477394.0,INR,-0.25,-369348.5,1846742.5
Prod159,Cus006,Mark004,2018-01-30 00:00:00,1480,1228148.0,INR,-0.29,-356162.92,1584310.92
Prod329,Cus006,Mark004,2018-10-26 00:00:00,200,829130.0,INR,-0.35,-290195.5,1119325.5
Prod316,Cus006,Mark004,2018-08-07 00:00:00,640,1316921.0,INR,-0.21,-276553.41,1593474.41
Prod328,Cus006,Mark004,2018-07-18 00:00:00,303,850509.0,INR,-0.3,-255152.7,1105661.7
Prod332,Cus020,Mark004,2018-01-03 00:00:00,393,778588.0,INR,-0.26,-202432.88,981020.88
Prod084,Cus006,Mark004,2020-02-28 00:00:00,800,666111.0,INR,-0.3,-199833.3,865944.3
Prod329,Cus006,Mark004,2020-03-19 00:00:00,160,629750.0,INR,-0.31,-195222.5,824972.5
Prod320,Cus006,Mark004,2020-05-08 00:00:00,276,873528.0,INR,-0.22,-192176.16,1065704.16
Prod324,Cus006,Mark004,2018-04-13 00:00:00,333,589958.0,INR,-0.32,-188786.56,778744.56


<span style="color:#0077B6">

- There is no specific product appear many times in this list.
- 9/10 orders are for customer `Cus006`. This customers both appeared frequently in the top 10 profit/loss list.
- All of these orders are from `Mark004`
- The losses are ranged from 20-30% for each transaction.

</span>

**2. Take a look at the summary of loss transactions (transactions that have negative profit margin**

In [109]:
%%sql

SELECT
    COUNT(*) as total_transactions,
    SUM(sales_qty) as total_quantity, 
    SUM(sales_amount) as sales_amount,
    ROUND(SUM(cost_price), 3) as total_cost,
    ROUND(SUM(profit_margin), 3) as profit_margin
FROM
    transactions
WHERE
    profit_margin < 0

 * mysql+mysqlconnector://root:***@localhost:3306/sales
1 rows affected.


total_transactions,total_quantity,sales_amount,total_cost,profit_margin
68501,1130275,459414589.0,540746253.92,-81331664.92


<span style="color:#0077B6">

- We have about 45% of the orders that have a negative profit margin. Which means the company loses money almost once every two orders. This is a huge number.
- As the total profit margin (determined in the first query) is positive (24,657,068.41), the company is actually able to cover this loss.
- If we do an addition of the total profit margin (24,657,068.41) with the total loss (81,331,664.92), we get the total profit margin (if there is no loss) of 105,988,733.33.
This means that over 4 years, the company has a total loss of 76.7%(81.3/105.9) of its profit.
- We can say that 45% of the orders that have negative profit margin has loss the company a HUGE amount of money (76.7% of the total sales profit).

</span>

**3. Determine 10 products that have the highest/lowest profit margin**

- By sum up sales and profit data and then group them by `product_code`
- Note that in order to determine the `profit margin percentage`, we will need to recalculate them by doing a division between the sum of `profit_margin` and the sum of `sales_amount`. This variable won't be correct if we just calculate it with `SUM` as other features.

In [108]:
%%sql

SELECT
    product_code,
    SUM(sales_qty) as total_quantity, 
    SUM(sales_amount) as sales_amount,
    ROUND(SUM(cost_price), 3) as total_cost,
    ROUND(SUM(profit_margin), 3) as profit_margin,
    ROUND(SUM(profit_margin) / SUM(sales_amount), 3) as profit_margin_perc
FROM
    transactions
GROUP BY
    product_code
ORDER BY
    profit_margin DESC
LIMIT
    10

 * mysql+mysqlconnector://root:***@localhost:3306/sales
10 rows affected.


product_code,total_quantity,sales_amount,total_cost,profit_margin,profit_margin_perc
Prod329,8485,34381481.0,32433893.95,1947587.05,0.057
Prod318,74195,68967202.0,67100858.63,1866343.37,0.027
Prod316,44477,60883452.0,59712074.21,1171377.79,0.019
Prod040,16116,23581969.0,22556232.29,1025736.71,0.043
Prod324,20878,41455364.0,40445417.9,1009946.1,0.024
Prod334,29221,31468996.0,30604021.73,864974.27,0.027
Prod304,21727,17873777.0,17086752.63,787024.37,0.044
Prod308,11269,8350170.0,7563974.93,786195.07,0.094
Prod090,277959,13418817.0,12714268.74,704548.26,0.053
Prod049,9661,11048968.0,10354255.74,694712.26,0.063


<span style="color:#0077B6">

- `Prod329` has the highest profit margin of 1.95 millions rupee.
- Due to high cost, there is no product yields a high profit margin percentage.
- 2 products `Prod329` and `Prod318` seem to have a significantly higher profit margin compared to other products in the list.
- `Prod308` and `Prod049` have a high profit margin percentage of 6.3% and 9.4%, which means that they can make the most profit (in percentage) out of the cost compared to other products.

</span>

In [107]:
%%sql

SELECT
    product_code,
    SUM(sales_qty) as total_quantity, 
    SUM(sales_amount) as sales_amount,
    ROUND(SUM(cost_price), 3) as total_cost,
    ROUND(SUM(profit_margin), 3) as profit_margin,
    ROUND(SUM(profit_margin) / SUM(sales_amount), 3) as profit_margin_perc
FROM
    transactions
GROUP BY
    product_code
ORDER BY
    profit_margin
LIMIT
    10

 * mysql+mysqlconnector://root:***@localhost:3306/sales
10 rows affected.


product_code,total_quantity,sales_amount,total_cost,profit_margin,profit_margin_perc
Prod073,947,1477394.0,1846742.5,-369348.5,-0.25
Prod336,1816,3400849.0,3732668.96,-331819.96,-0.098
Prod044,2879,5126501.0,5394350.49,-267849.49,-0.052
Prod084,1180,1029689.0,1219589.71,-189900.71,-0.184
Prod169,1055,1515289.0,1667575.66,-152286.66,-0.101
Prod319,18918,22188881.0,22335771.95,-146890.95,-0.007
Prod016,1931,1997834.0,2143052.99,-145218.99,-0.073
Prod024,4421,9337235.0,9445182.54,-107947.54,-0.012
Prod206,4691,5391375.0,5480118.0,-88743.0,-0.016
Prod030,1196,997609.0,1079390.64,-81781.64,-0.082


<span style="color:#0077B6">

- `Prod073` has the highest loss of 369k rupee and its loss is also the highest in percentage (-25%). This is an exceptional loss compared to other products.
- Most of other products in list has a loss percentage ranged from 1% to 10%.


</span>

**4. Determine the products that have the highest/lowest profit margin percentage**

- We'll only take the more than products that have more than 20 units sold to make sure that we have enough data to prevent bias. 
- In order to filter `total_quantity`, we use `HAVING` as this clause support aggregated data (we used `SUM`).

Highest

In [6]:
%%sql

SELECT
    product_code,
    SUM(sales_qty) as total_quantity, 
    SUM(sales_amount) as sales_amount,
    ROUND(SUM(cost_price), 3) as total_cost,
    ROUND(SUM(profit_margin), 3) as profit_margin,
    ROUND(SUM(profit_margin) / SUM(sales_amount), 3) as profit_margin_perc
FROM
    transactions
GROUP BY
    product_code
HAVING
    total_quantity > 20
ORDER BY
    profit_margin_perc DESC
LIMIT
    10

 * mysql+mysqlconnector://root:***@localhost:3306/sales
10 rows affected.


product_code,total_quantity,sales_amount,total_cost,profit_margin,profit_margin_perc
Prod001,100,41241.0,25157.01,16083.99,0.39
Prod153,247,75574.0,47611.62,27962.38,0.37
Prod155,126,38500.0,25410.0,13090.0,0.34
Prod201,387,106921.0,70685.97,36235.03,0.339
Prod035,124,47727.0,33246.26,14480.74,0.303
Prod112,51,8444.0,6079.68,2364.32,0.28
Prod192,63,16324.0,11763.8,4560.2,0.279
Prod219,396,127657.0,92925.8,34731.2,0.272
Prod083,1313,789444.0,586901.5,202542.5,0.257
Prod012,33,29648.0,22236.0,7412.0,0.25


<span style="color:#0077B6">

- The rest of the list shows a good profit margin rate, ranged from 25%-34%.
- The profit margin percentages observed are high. However, compared to the products that have the highest profit margin, these products have a quite small quantity sold (most are at hundreds) and the sales amount are considerably smaller.
- However, this shows that these products have a good potential and it is worthy to take a deeper look at them to see if we can focus to gain more profit from.
- `Prod001` and `Prod153` have the highest profit margin percentage of 0.39 and 0.37, respectively.

</span>

Lowest

In [7]:
%%sql

SELECT
    product_code,
    SUM(sales_qty) as total_quantity, 
    SUM(sales_amount) as sales_amount,
    ROUND(SUM(cost_price), 3) as total_cost,
    ROUND(SUM(profit_margin), 3) as profit_margin,
    ROUND(SUM(profit_margin) / SUM(sales_amount), 3) as profit_margin_perc
FROM
    transactions
GROUP BY
    product_code
HAVING
    total_quantity > 20
ORDER BY
    profit_margin_perc
LIMIT
    10

 * mysql+mysqlconnector://root:***@localhost:3306/sales
10 rows affected.


product_code,total_quantity,sales_amount,total_cost,profit_margin,profit_margin_perc
Prod080,52,23681.0,31969.35,-8288.35,-0.35
Prod022,33,56166.0,73048.49,-16882.49,-0.301
Prod073,947,1477394.0,1846742.5,-369348.5,-0.25
Prod203,233,35648.0,44203.52,-8555.52,-0.24
Prod190,67,16625.0,20448.75,-3823.75,-0.23
Prod191,90,17663.0,21547.07,-3884.07,-0.22
Prod038,164,91520.0,111240.69,-19720.69,-0.215
Prod109,45,3046.0,3666.84,-620.84,-0.204
Prod107,400,316611.0,376767.09,-60156.09,-0.19
Prod084,1180,1029689.0,1219589.71,-189900.71,-0.184


<span style="color:#0077B6">

- The list shows the loss mostly ranged from -20% to -30%.
- Some product (`Prod073` and `Prod084`) have a (extremely) high amount of loss (more than 1M rupees). Notice that these products also have quite a few amount of units sold. This means that the losses could be happened from a long time without being notice, therefore, the company keeped selling the products and generated further losses.
- `Prod080` and `Prod022` have the highest loss percentage of 0.35 and 0.30, respectively.
</span>

**4. Determine the performance of each market. Additionally, let's join the markets' zones in `markets` table with `transactions` to see which zone is performing well**

We need to know `zone`, which is from the `markets` table. Join two table `transactions ` and `markets` using `INNER JOIN`, which will join the records that match the value of `market_code` in each table.

In [106]:
%%sql

SELECT 
    ma.zone, 
    tr.market_code, 
    SUM(sales_qty) as total_quantity, 
    SUM(sales_amount) as sales_amount,
    ROUND(SUM(cost_price), 3) as total_cost,
    ROUND(SUM(profit_margin), 3) as profit_margin,
    ROUND(SUM(profit_margin) / SUM(sales_amount), 3) as profit_margin_perc
FROM 
    transactions tr
INNER JOIN 
    markets ma ON tr.market_code = ma.markets_code
GROUP BY 
    ma.zone, tr.market_code
ORDER BY 
    profit_margin DESC

 * mysql+mysqlconnector://root:***@localhost:3306/sales
15 rows affected.


zone,market_code,total_quantity,sales_amount,total_cost,profit_margin,profit_margin_perc
North,Mark004,988294,519569771.0,507615972.48,11953798.52,0.023
Central,Mark002,383643,150084801.0,145212161.23,4872639.77,0.032
North,Mark003,206925,132307441.0,129459172.49,2848268.51,0.022
Central,Mark011,262094,55026321.0,53614211.28,1412109.72,0.026
Central,Mark013,25856,16525290.0,15278890.84,1246399.16,0.075
Central,Mark007,86884,42084571.0,41043820.47,1040750.53,0.025
South,Mark010,255482,18813466.0,18110176.01,703289.99,0.037
South,Mark001,50485,18042702.0,17742129.0,300573.0,0.017
North,Mark009,5505,4428393.0,4246132.22,182260.78,0.041
North,Mark012,17099,2605796.0,2479118.95,126677.05,0.049


<span style="color:#0077B6">

- Notice that the top 5 performing markets are in North and Central zone.
- Most of the profit margin percentage of the markets ranged from 2-4%.
- `Mark013` has the biggest profit margin percentage of 7.5%.
- There are some markets that have a very small profit margin such as `Mark014` and `Mark008` with 0.6% and 1%. Note that these markets have small profit with just around 35-46 thousands rupee so these percentages are actually very small compared to other markets'.

- There are two markets `Mark005` and `Mark006` that have total loss. Note that `Mark006` has a 20% loss.

</span>

Let's take a look at the total sales of each zone

In [105]:
%%sql

SELECT 
    ma.zone,  
    SUM(sales_qty) as total_quantity, 
    SUM(sales_amount) as sales_amount,
    ROUND(SUM(cost_price), 3) as total_cost,
    ROUND(SUM(profit_margin), 3) as profit_margin,
    ROUND(SUM(profit_margin) / SUM(sales_amount), 3) as profit_margin_perc
FROM 
    transactions tr
INNER JOIN 
    markets ma ON tr.market_code = ma.markets_code
GROUP BY 
    ma.zone
ORDER BY 
    profit_margin DESC

 * mysql+mysqlconnector://root:***@localhost:3306/sales
3 rows affected.


zone,total_quantity,sales_amount,total_cost,profit_margin,profit_margin_perc
North,1271557,675588017.0,660511729.38,15076287.62,0.022
Central,758477,263720983.0,255149083.82,8571899.18,0.033
South,399248,45559963.0,44551081.39,1008881.61,0.022


<span style="color:#0077B6">

- `South` zone, as observed from the previous query and this result, has the smallest profit margin.
- `North` zone has the highest profit margin of 15M rupee, almost doubled from the Central zone and 15x more than the South zone.
- We can observe that even though `North` and `Central` zones bring more profit, they did not outperform `South` zone as their profit margin percentage are similar. The upper two zones have high sales amount but also have high cost, which significantly reduce their performance.

</span>

Let's look at each zone's performance in each year

In [15]:
%%sql

SELECT 
    ma.zone,
    date.year,
    SUM(sales_qty) as total_quantity, 
    SUM(sales_amount) as sales_amount,
    ROUND(SUM(cost_price), 3) as total_cost,
    ROUND(SUM(profit_margin), 3) as profit_margin,
    ROUND(SUM(profit_margin) / SUM(sales_amount), 3) as profit_margin_perc
FROM 
    transactions tr
INNER JOIN 
    markets ma ON tr.market_code = ma.markets_code
LEFT JOIN
    date ON order_date = date.date
GROUP BY 
    ma.zone, date.year
ORDER BY 
    ma.zone DESC

 * mysql+mysqlconnector://root:***@localhost:3306/sales
12 rows affected.


zone,year,total_quantity,sales_amount,total_cost,profit_margin,profit_margin_perc
South,2017,36821,4734621.0,4676791.97,57829.03,0.012
South,2018,151831,18810117.0,18596115.86,214001.14,0.011
South,2019,152388,15454705.0,15098973.66,355731.34,0.023
South,2020,58208,6560520.0,6179199.9,381320.1,0.058
North,2017,127822,63782899.0,62032205.53,1750693.47,0.027
North,2018,524305,287037445.0,281686400.29,5351044.71,0.019
North,2019,426678,225201876.0,218114632.48,7087243.52,0.031
North,2020,192752,99565797.0,98678491.08,887305.92,0.009
Central,2017,69819,24420633.0,23456087.47,964545.53,0.039
Central,2018,321361,107839601.0,104067350.7,3772250.3,0.035


<span style="color:#0077B6">

- It is visible that `Central` and `North` zones are contributing more to the total sales, with `North` is signficantly larger in each year.
- Both of these markets have a major drop in sales in 2020, with `North` went down by 87.5% and `Central` decreased by 73.9%.
- On the other hand, `South` market, eventhough have a smaller sales contribution, increased gradually and managed to have an increased sales of 30k in 2020.
- Moreover, in 2020, `South` market has a profit margin percentage of 5.8%, which is the best number observed in the list. This proves the efficiency of the markets in this zone even during a bad year.

</span>

**5. Take a look at the sales by year**
- Join the year from `date` table by using `date.date` and `transactions.date`
- Calculate sale performance with the data grouped by year

In [104]:
%%sql

SELECT
    date.year,
    SUM(sales_qty) as total_quantity, 
    SUM(sales_amount) as sales_amount,
    ROUND(SUM(cost_price), 3) as total_cost,
    ROUND(SUM(profit_margin), 3) as profit_margin,
    ROUND(SUM(profit_margin) / SUM(sales_amount), 3) as profit_margin_perc
FROM
    transactions
INNER JOIN
    date ON order_date = date.date
GROUP BY
    date.year
ORDER BY
    year DESC

 * mysql+mysqlconnector://root:***@localhost:3306/sales
4 rows affected.


year,total_quantity,sales_amount,total_cost,profit_margin,profit_margin_perc
2020,350240,142224545.0,140164384.66,2060160.34,0.014
2019,847083,336019102.0,325532558.11,10486543.89,0.031
2018,997497,413687163.0,404349866.85,9337296.15,0.023
2017,234462,92938153.0,90165084.97,2773068.03,0.03


<span style="color:#0077B6">

We can see a good increase in profit margin from 2017 to 2019, with an exceptional performance in 2018 with the peak at 2019. However, the sales quantity and profit margin decreased drastically in 2020. This can be probably due to the outbreak of the Covid-19 epidemic.

</span>

**6. Take a look at the sales by each month of a year**

We will look at the average sales performance in each month in this case.

In [82]:
%%sql

SELECT
    date.month_name,
    ROUND(AVG(sales_qty), 1) as avg_quantity, 
    ROUND(AVG(sales_amount), 3) as avg_sales,
    ROUND(AVG(cost_price), 3) as avg_cost,
    ROUND(AVG(profit_margin), 3) as avg_profit_margin,
    ROUND(AVG(profit_margin) / AVG(sales_amount), 3) as avg_profit_margin_perc
FROM
    transactions
INNER JOIN
    date ON order_date = date.date
GROUP BY
    month_name
ORDER BY
    avg_profit_margin DESC

 * mysql+mysqlconnector://root:***@localhost:3306/sales
12 rows affected.


month_name,avg_quantity,avg_sales,avg_cost,avg_profit_margin,avg_profit_margin_perc
December,15.6,6773.156,6560.797,212.359,0.031
March,17.4,6967.44,6759.132,208.308,0.03
January,16.1,7209.517,7001.797,207.72,0.029
November,17.0,6567.139,6361.565,205.574,0.031
July,16.7,6933.292,6733.569,199.724,0.029
February,16.3,6630.794,6436.313,194.481,0.029
September,15.9,6201.304,6034.457,166.847,0.027
October,15.6,5924.45,5768.717,155.732,0.026
August,19.4,7648.362,7505.699,142.664,0.019
May,15.7,6205.995,6079.028,126.967,0.02


<span style="color:#0077B6">

- In general, the company seems to have a better profit margin from the last quarter of the year to the first quarter of the next year (Q4 and Q1), while it has a slower sales performance in Q2 and Q3.
- In the better half of the year (Q4-Q1), the company can generate doubled the sales profit (compared to the other half).

</span>

**7. Find how each customer type performs**

In [103]:
%%sql

SELECT
    customers.customer_type,
    COUNT(*) as total_transactions,
    SUM(sales_qty) as total_quantity, 
    SUM(sales_amount) as sales_amount,
    ROUND(SUM(cost_price), 3) as total_cost,
    ROUND(SUM(profit_margin), 3) as profit_margin,
    ROUND(SUM(profit_margin) / SUM(sales_amount), 3) as profit_margin_perc
FROM
    transactions tr
RIGHT JOIN
    customers ON customers.customer_code = tr.customer_code
GROUP BY
    customers.customer_type
ORDER BY
    profit_margin DESC
LIMIT
    5

 * mysql+mysqlconnector://root:***@localhost:3306/sales
2 rows affected.


customer_type,total_transactions,total_quantity,sales_amount,total_cost,profit_margin,profit_margin_perc
Brick & Mortar,96190,1854201,744525338.0,728343186.57,16182151.43,0.022
E-Commerce,52205,575081,240343625.0,231868708.02,8474916.98,0.035


<span style="color:#0077B6">

- There are only two types of customers: `Brick & Mortar` (Offline stores) and `E-Commerce` (Online).
- `Brick & Mortar` outperforms `E-commerce`, with almost doubled number of transactions and profit margin.
- However, `E-commerce` has a higher number of profit margin percentage by 60%. Means that this customer type generates a higher profit per cost compared to `Brick & Mortar`.


</span>

Let's see the sales trend of these two customer types in 4 years

In [22]:
%%sql

SELECT
    customers.customer_type,
    date.year,
    COUNT(*) as total_transactions,
    SUM(sales_qty) as total_quantity, 
    SUM(sales_amount) as sales_amount,
    ROUND(SUM(cost_price), 3) as total_cost,
    ROUND(SUM(profit_margin), 3) as profit_margin,
    ROUND(SUM(profit_margin) / SUM(sales_amount), 3) as profit_margin_perc
FROM
    transactions tr
RIGHT JOIN
    customers ON customers.customer_code = tr.customer_code
INNER JOIN
    date ON order_date = date.date
GROUP BY
    customers.customer_type, date.year
ORDER BY
    customer_type

 * mysql+mysqlconnector://root:***@localhost:3306/sales
8 rows affected.


customer_type,year,total_transactions,total_quantity,sales_amount,total_cost,profit_margin,profit_margin_perc
Brick & Mortar,2017,10216,181689,66076415.0,64636017.46,1440397.54,0.022
Brick & Mortar,2018,40515,759078,307748862.0,301745914.93,6002947.07,0.02
Brick & Mortar,2019,32766,651750,257563281.0,250227721.54,7335559.46,0.028
Brick & Mortar,2020,12693,261684,113136780.0,111733532.64,1403247.36,0.012
E-Commerce,2017,4341,52773,26861738.0,25529067.51,1332670.49,0.05
E-Commerce,2018,20240,238419,105938301.0,102603951.92,3334349.08,0.031
E-Commerce,2019,18946,195333,78455821.0,75304836.57,3150984.43,0.04
E-Commerce,2020,8678,88556,29087765.0,28430852.02,656912.98,0.023


<span style="color:#0077B6">

- `Brick & Mortar` generates a higher profit margin in general.
- However, `E-Commerce` has a better `profit_margin_perc` in 4 years, which means that it could make more profit out of the cost.

</span>

**8A. Find top 5 customers that helped making the highest profit margin**

In [102]:
%%sql

SELECT
    tr.customer_code,
    cu.customer_type,
    COUNT(*) as total_transactions,
    SUM(sales_qty) as total_quantity, 
    SUM(sales_amount) as sales_amount,
    ROUND(SUM(cost_price), 3) as total_cost,
    ROUND(SUM(profit_margin), 3) as profit_margin,
    ROUND(SUM(profit_margin) / SUM(sales_amount), 3) as profit_margin_perc
FROM
    transactions tr
RIGHT JOIN
    customers cu ON cu.customer_code = tr.customer_code
GROUP BY
    cu.customer_type, tr.customer_code
ORDER BY
    profit_margin DESC
LIMIT
    5

 * mysql+mysqlconnector://root:***@localhost:3306/sales
5 rows affected.


customer_code,customer_type,total_transactions,total_quantity,sales_amount,total_cost,profit_margin,profit_margin_perc
Cus006,Brick & Mortar,13819,653823,413333588.0,404025688.65,9307899.35,0.023
Cus020,E-Commerce,17327,123356,43893083.0,42107271.61,1785811.39,0.041
Cus022,E-Commerce,4686,79456,49644189.0,47955929.0,1688260.0,0.034
Cus038,E-Commerce,130,25891,16529970.0,15283326.74,1246643.26,0.075
Cus005,Brick & Mortar,19938,279093,44962166.0,43908381.91,1053784.09,0.023


<span style="color:#0077B6">

- Customer `Cus006` generated the highest amount of profit margin. 
- `Cus038` have the best profit margin percentage in the list of 7.5%. However, we still need more data for this customer as it has a significantly less number of transactions made compared to other customers in the list.
</span>

**8B. Let's see which products that `Cus006` - with the highest profit margin, contributed the most to its profit margin**

In [113]:
%%sql

SELECT
    product_code,
    COUNT(*) as total_transactions,
    SUM(sales_qty) as total_quantity, 
    SUM(sales_amount) as sales_amount,
    ROUND(SUM(cost_price), 3) as total_cost,
    ROUND(SUM(profit_margin), 3) as profit_margin,
    ROUND(SUM(profit_margin) / SUM(sales_amount), 3) as profit_margin_perc
FROM
    transactions
WHERE
    customer_code = 'Cus006'
GROUP BY
    product_code
ORDER BY
    profit_margin DESC
LIMIT
    10

 * mysql+mysqlconnector://root:***@localhost:3306/sales
10 rows affected.


product_code,total_transactions,total_quantity,sales_amount,total_cost,profit_margin,profit_margin_perc
Prod329,107,8262,33783964.0,31850923.51,1933040.49,0.057
Prod324,270,17182,34414139.0,33477610.99,936528.01,0.027
Prod304,298,17165,14601102.0,13690238.32,910863.68,0.062
Prod308,79,9295,7507749.0,6705782.35,801966.65,0.107
Prod040,180,7851,14365108.0,13646145.91,718962.09,0.05
Prod339,166,7404,13972858.0,13364340.01,608517.99,0.044
Prod102,438,7716,9022343.0,8542511.71,479831.29,0.053
Prod209,149,17627,7314775.0,6874051.0,440724.0,0.06
Prod322,21,938,1584289.0,1231658.3,352630.7,0.223
Prod313,56,5173,6182497.0,5854536.63,327960.37,0.053


<span style="color:#0077B6">

- `Prod329` has a signficantly higher profit margin compared to other products (1.9M) and has a good profit margin percentage of 5.7%.
- `Prod322`, `Prod308` have a great profit margin percentage of 22.3% and 10.7%, respectively. These products managed to have better returns out of the cost compared to other products.

</span>

**9A. Find top 5 customers that have the lowest profit margin**

In [114]:
%%sql

SELECT
    tr.customer_code,
    customers.customer_type,
    COUNT(*) as total_transactions,
    SUM(sales_qty) as total_quantity, 
    SUM(sales_amount) as sales_amount,
    ROUND(SUM(cost_price), 3) as total_cost,
    ROUND(SUM(profit_margin), 3) as profit_margin,
    ROUND(SUM(profit_margin) / SUM(sales_amount), 3) as profit_margin_perc
FROM
    transactions tr
RIGHT JOIN
    customers ON customers.customer_code = tr.customer_code
GROUP BY
    customers.customer_type, tr.customer_code
ORDER BY
    profit_margin
LIMIT
    5

 * mysql+mysqlconnector://root:***@localhost:3306/sales
5 rows affected.


customer_code,customer_type,total_transactions,total_quantity,sales_amount,total_cost,profit_margin,profit_margin_perc
Cus018,Brick & Mortar,3104,10470,1868461.0,1905947.76,-37486.76,-0.02
Cus015,Brick & Mortar,164,500,336367.0,333619.31,2747.69,0.008
Cus034,E-Commerce,275,3244,430368.0,415281.72,15086.28,0.035
Cus028,E-Commerce,203,23433,2252506.0,2218026.24,34479.76,0.015
Cus026,E-Commerce,1223,38340,3342051.0,3307440.61,34610.39,0.01


<span style="color:#0077B6">

- `Cus018` is the only customer that has a negative profit margin.
- We will take a deeper look at this customer to see if there is any purchase of them having problem.

</span>

**9B. Let's see which products that `Cus018` - the only customer that has a negative profit margin, contributed the most the loss**

In [115]:
%%sql

SELECT
    product_code,
    COUNT(*) as total_transactions,
    SUM(sales_qty) as total_quantity, 
    SUM(sales_amount) as sales_amount,
    ROUND(SUM(cost_price), 3) as total_cost,
    ROUND(SUM(profit_margin), 3) as profit_margin,
    ROUND(SUM(profit_margin) / SUM(sales_amount), 3) as profit_margin_perc
FROM
    transactions
WHERE
    customer_code = 'Cus018'
GROUP BY
    product_code
ORDER BY
    profit_margin
LIMIT
    10

 * mysql+mysqlconnector://root:***@localhost:3306/sales
10 rows affected.


product_code,total_transactions,total_quantity,sales_amount,total_cost,profit_margin,profit_margin_perc
Prod163,1,4000,644444.0,702443.96,-57999.96,-0.09
Prod255,254,1114,135116.0,137737.16,-2621.16,-0.019
Prod054,13,34,6439.0,7119.31,-680.31,-0.106
Prod058,3,3,2736.0,3182.88,-446.88,-0.163
Prod121,75,119,20211.0,20485.24,-274.24,-0.014
Prod286,32,32,3494.0,3679.11,-185.11,-0.053
Prod057,4,4,3462.0,3629.11,-167.11,-0.048
Prod065,12,22,6053.0,6181.23,-128.23,-0.021
Prod292,40,40,6534.0,6634.25,-100.25,-0.015
Prod260,40,56,4979.0,5076.35,-97.35,-0.02


<span style="color:#0077B6">

- The one order with `Prod163` has a significantly larger loss compared to other products that have loss.
- This only transaction is the main reason for the whole lost calculated of the customer as it has a much greater loss compared to other products.

</span>

We may need to find more details about the order so we can track from the sales team to understand what happened (It's much easier as we only have a single transaction in this case).

In [63]:
%%sql

SELECT
    cu.customer_name,
    pr.product_name
    *
FROM
    transactions
WHERE 
    customer_code = 'Cus018' AND product_code = 'Prod163'

 * mysql+mysqlconnector://root:***@localhost:3306/sales
1 rows affected.


product_code,customer_code,market_code,order_date,sales_qty,sales_amount,currency,profit_margin_percentage,profit_margin,cost_price
Prod163,Cus018,Mark004,2019-08-16 00:00:00,4000,644444.0,INR,-0.09,-57999.96,702443.96


<span style="color:#0077B6">

With the full details of the order, we could now use the date to track back and make inquiries about this order. Additionally, we can look deeper at `Prod163` to see if similar losses happen with this product with other customers.

</span>

## IV. Conclusion

**Answer the problem questions using the insights found** *(Note that the numbers which are used to compare/considered as a good number are chosen based on the general observations from the analyses or compared to other objects in the same category)*:
- Which products have high/low performances (profit margin)?
    - Top products: `Prod329`, `Prod308`, `Prod040`, `Prod049`: have high profit margin, and average profit margin percentage larger than 5%.
    - Products with high potential: `Prod001`, `Prod153`, `Prod155`, `Prod201` (All have more than 100 units sold with a profit margin percentage >= 30%).
    - Products that have a considerable amount of net loss: `Prod073`, `Prod084`, `Prod169`, `Prod336`, `Prod044`, `Prod016`, `Prod030` (Most have a total loss of more than 100k rupee and a loss percentage > 5%).
    - Products that have a high loss percentage and can potentially create more loss: `Prod080`, `Prod022` (Both have a loss percentage >= 30%).
    - As there are many products have a good profit margin and a good future potential, the company can focus on these units in future marketing, advertisement to boost up their sales. AtliQ can also focus their R&D to further develop these products and support their customers during their usage.
    - For loss products, it is suggested to take a look on what is increasing the products' costs (could be operation coss, accidents, logistic costs, ...). If problems could not be found, the company can remove their production lines and focus on better products.
    
- Who are the best customers? (bring the most profit)
    - `Cus006` generated a significant larger amount of profit margin compared to other customers (about 9.3M rupees).
    - On the other hand, `Cus018` is the only customer that have a net loss (-37k rupees).
    - In general, there are many customer re-purchased many times and it is suggested that the company can open a rewards/referral/frequent customers program.
    - Even though there is only one customer that has a net loss, it is worth to communicate more with them in order to determine bottlenecks/logistic issues as our costs are extremely high.
    - `Brick & Mortar` customers is a reliable and efficient source to generate profits. However, `E-commerce` customers have a great potential as they helped generated more profit per cost.

- What is the sales performance between months/years?
    - The company had a good development in sales from 2017 to 2019, at the peak in 2019, it had increased by 12.9% from the previous year 2018.
    - However, the sales decreased drastically in 2020, from 10.5M in 2019 to 2.1M rupees (80%). The profit margin percentage was also reduced by half, from 3.1% to 1.4%.
    - Annually, the company seems to have a better sales from Q4 to Q1 compared to Q2 and Q3. During the peak months, the sales profit could be double off-peak months.

- How do different markets perform?
    - By zone:
        - `North` and `Central` zones have bigger markets and generated the most profit in the 3 zones. In 4 years, these two zones always contribute a major portion to the profit margin.
        - However in 2020, the year that we observed a decline in general, `South` market did the best. This market:
            - Generated profit, which the other 02 markets could not.
            - Have the highest profit margin - made the most out of the cost.
    - By markets:
        - There are 6 markets generate a net profit more than 1M rupees: `Mark007`, `Mark013`, `Mark011`, `Mark003`, `Mark002`, `Mark004`. The last two market generates 4.87M and 11.95M rupees, respectively.
        - There are 4 markets require detailed observations:
            - `Mark014` and `Mark008` have an extremely small profit margin (~35-46k rupees).
            - `Mark005` and `Mark006` have a net loss with `Mark006` has a 20% loss.
    
PROBLEMS:

1. High loss: 
    - The profits are generated well (up to 20-30% in a single order) but the losses are also as high as the profits. This explains why the net profit margin percentage are considerably smaller( less than 10%, mostly observed at around 2-5%) when we consider the transactions as groups (based on customers/products/markets/years).
    - Note that the frequency of loss orders are high (almost once every two orders). The company needs to find a problem related to this issue

2. An issue in 2020 (is likely to be the Covid-19) had a signficant impact on the sales performance, with the total revenue decreased by 57.8% and profit margin reduced by 80.8% (compared between year 2020 and 2019).

We'll need to observe the BI dashboard in order to fully understand the data and determine other issues. Suggestion and implemetation will be added in the final report presentation.