# Northwind Traders Analysis
Northwind database contains data on their customers, orders, products, suppliers, and other aspects of the business. The goals of this project are:  

- Evaluating employee performance to boost productivity
- Understanding product sales and category performance to optimize inventory and marketing strategies
- Analyzing sales growth to identify trends, monitor company progress, and make more accurate forecasts
- And evaluating customer purchase behavior to target high-value customers with promotional incentives

## Database Schema

![](schema_diagram.svg)

#### Before we can begin any analysis, we need to connect to the database and confirm the connection by making a SELECT query.

In [1]:
# Get username and password from the .env file
from dotenv import load_dotenv
import os

load_dotenv()  # Load environment variables from .env file

db_user = os.environ.get('PSQL_USER')
db_password = os.environ.get('PSQL_PASS')

In [2]:
# Load the sql extension and make a connection to the database
%load_ext sql
%sql postgresql://{db_user}:{db_password}@localhost:5432/northwind

In [3]:
# Run a select query to confirm connection to the database
%sql SELECT * FROM customers LIMIT 5;

 * postgresql://postgres:***@localhost:5432/northwind
5 rows affected.


customer_id,company_name,contact_name,contact_title,address,city,region,postal_code,country,phone,fax
ALFKI,Alfreds Futterkiste,Maria Anders,Sales Representative,Obere Str. 57,Berlin,,12209,Germany,030-0074321,030-0076545
ANATR,Ana Trujillo Emparedados y helados,Ana Trujillo,Owner,Avda. de la Constitución 2222,México D.F.,,05021,Mexico,(5) 555-4729,(5) 555-3745
ANTON,Antonio Moreno Taquería,Antonio Moreno,Owner,Mataderos 2312,México D.F.,,05023,Mexico,(5) 555-3932,
AROUT,Around the Horn,Thomas Hardy,Sales Representative,120 Hanover Sq.,London,,WA1 1DP,UK,(171) 555-7788,(171) 555-6750
BERGS,Berglunds snabbköp,Christina Berglund,Order Administrator,Berguvsvägen 8,Luleå,,S-958 22,Sweden,0921-12 34 65,0921-12 34 67


#### To obtain a list of all tables and views in the PostgreSQL database, we can query the information_schema.tables system table.

In [4]:
%%sql
SELECT table_name, table_type
  FROM information_schema.tables
 WHERE table_schema = 'public' AND table_type IN ('BASE TABLE', 'VIEW');

 * postgresql://postgres:***@localhost:5432/northwind
17 rows affected.


table_name,table_type
territories,BASE TABLE
order_details,BASE TABLE
employee_territories,BASE TABLE
us_states,BASE TABLE
customers,BASE TABLE
orders,BASE TABLE
employees,BASE TABLE
shippers,BASE TABLE
products,BASE TABLE
categories,BASE TABLE


#### Next, let's check the data types we have in the tables of interest for this analysis.

In [5]:
%%sql
SELECT DISTINCT data_type
  FROM information_schema.columns
 WHERE table_name IN ('categories', 'orders', 'employees', 'customers', 'order_details', 'products')
 ORDER BY data_type

 * postgresql://postgres:***@localhost:5432/northwind
6 rows affected.


data_type
character varying
date
integer
real
smallint
text


#### Some columns have 'bytea' datatype which is used for storing raw binary data such as images, multimedia files or other non-textual data. Let's check which tables and columns have this datatype.

In [6]:
%%sql
SELECT table_name, column_name, data_type
  FROM information_schema.columns
 WHERE table_name IN ('categories', 'orders', 'employees', 'customers', 'order_details', 'products') AND data_type = 'bytea'
 ORDER BY data_type

 * postgresql://postgres:***@localhost:5432/northwind
0 rows affected.


table_name,column_name,data_type


#### We can see above that two columns are images and have bytea data type. This data type is challengin to render in Jupyter Notebooks and is also not needed for the analysis goals we have. So we can drop these columns.

In [7]:
%%sql
-- Drop the photo column from employees table
ALTER TABLE employees
DROP COLUMN photo;
-- Drop the picture column from the categories table
ALTER TABLE categories
DROP COLUMN picture;

 * postgresql://postgres:***@localhost:5432/northwind
(psycopg2.errors.UndefinedColumn) column "photo" of relation "employees" does not exist

[SQL: -- Drop the photo column from employees table
ALTER TABLE employees
DROP COLUMN photo;]
(Background on this error at: https://sqlalche.me/e/20/f405)


#### Confirm the 'bytea' data type is no longer present

In [8]:
%%sql
SELECT DISTINCT data_type
  FROM information_schema.columns
 WHERE table_name IN ('categories', 'orders', 'employees', 'customers', 'order_details', 'products')
 ORDER BY data_type

 * postgresql://postgres:***@localhost:5432/northwind
6 rows affected.


data_type
character varying
date
integer
real
smallint
text


#### Let's create some VIEWs to streamline the rest of the project. We will need the following:  
- Combine orders and customers tables to get more detailed information about each order.
- Combine order_details, products, and orders tables to get detailed order information, including the product name and quantity.
- Combine employees and orders tables to see who is responsible for each order.

In [9]:
%%sql

DROP VIEW IF EXISTS orders_customers;
DROP VIEW IF EXISTS orders_products;
DROP VIEW IF EXISTS employees_orders;

--Combine orders and customers tables to get more detailed information about each order.
    
CREATE VIEW orders_customers AS
SELECT o.order_id, o.order_date,
       c.customer_id, c.company_name, c.contact_name
  FROM orders o
  JOIN customers c
    ON o.customer_id = c.customer_id;

--Combine order_details, products, and orders tables to get detailed order information, including the product name and quantity.
    
CREATE VIEW orders_products AS
SELECT od.order_id, od.unit_price, od.quantity, od.discount,
       o.order_date,o.customer_id,
       p.product_name, p.category_id, p.product_id
  FROM order_details od
  JOIN orders o
    ON od.order_id = o.order_id
  JOIN products p
    ON od.product_id = p.product_id;

--Combine employees and orders tables to see who is responsible for each order.
    
CREATE VIEW employees_orders AS
SELECT o.order_id, o.order_date,
       e.first_name || ' ' || e.last_name AS employee_name, e.employee_id
  FROM orders o
  JOIN employees e
    ON o.employee_id = e.employee_id;


 * postgresql://postgres:***@localhost:5432/northwind
Done.
Done.
Done.
Done.
Done.
Done.


[]

## Task 1: Ranking Employee Sales Performance
 The objective is twofold:  
- First, the management team wants to recognize and reward top-performing employees, fostering a culture of excellence within the organization.
- Second, they want to identify employees who might be struggling so that they can offer the necessary training or resources to help them improve.

We can achieve this by ranking employees based on their total sales amount.

In [10]:
%%sql
--Calculate total sales for each employee
WITH
employee_sales AS (
SELECT eo.employee_id, eo.employee_name, 
       ROUND(SUM((od.unit_price*od.quantity*(1-od.discount))::numeric),2) AS total_sales
  FROM employees_orders eo
  JOIN order_details od
    ON eo.order_id = od.order_id
 GROUP BY eo.employee_name, eo.employee_id
)
SELECT *,
       RANK() OVER(ORDER BY total_sales DESC)
  FROM employee_sales;

 * postgresql://postgres:***@localhost:5432/northwind
9 rows affected.


employee_id,employee_name,total_sales,rank
4,Margaret Peacock,232890.85,1
3,Janet Leverling,202812.84,2
1,Nancy Davolio,192107.6,3
2,Andrew Fuller,166537.76,4
8,Laura Callahan,126862.28,5
7,Robert King,124568.23,6
9,Anne Dodsworth,77308.07,7
6,Michael Suyama,73913.13,8
5,Steven Buchanan,68792.28,9


#### The above table gives us the employee ranking by sales performance.

## Task 2: Running Total of Monthly Sales
The objective is to visualize the progress of the sales and identify trends that might shape the company's future strategies.

For this we need to visualize the company's sales progress over time on a monthly basis. This will involve aggregating the sales data at a monthly level and calculating a running total of sales by month. 

In [11]:
%%sql
WITH
monthly AS(
SELECT DATE(DATE_TRUNC('month', order_date)) as order_month,
       ROUND(SUM((unit_price*quantity*(1-discount))::numeric),2) AS monthly_sales
  FROM orders_products
 GROUP BY DATE_TRUNC('month', order_date)
 ORDER BY order_month
)
SELECT *,
       SUM(monthly_sales) OVER(ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total
  FROM monthly;

 * postgresql://postgres:***@localhost:5432/northwind
23 rows affected.


order_month,monthly_sales,running_total
1996-07-01,27861.9,27861.9
1996-08-01,25485.28,53347.18
1996-09-01,26381.4,79728.58
1996-10-01,37515.72,117244.3
1996-11-01,45600.05,162844.35
1996-12-01,45239.63,208083.98
1997-01-01,61258.07,269342.05
1997-02-01,38483.63,307825.68
1997-03-01,38547.22,346372.9
1997-04-01,53032.95,399405.85


## Task 3: Month-Over-Month Sales Growth
Understanding the rate at which sales are increasing or decreasing from month to month will help identify significant trends.

For this task we will need to calculate the percentage change in sales from one month to the next.

In [12]:
%%sql
WITH
monthly AS(
SELECT DATE(DATE_TRUNC('month', order_date)) as order_month,
       ROUND(SUM((unit_price*quantity*(1-discount))::numeric),2) AS monthly_sales
  FROM orders_products
 GROUP BY DATE_TRUNC('month', order_date)
),
month_over_month AS(
SELECT *,
       LAG(monthly_sales) OVER(ORDER BY order_month) AS previous_month_sales
  FROM monthly
)
SELECT *,
       ROUND((monthly_sales - previous_month_sales)/previous_month_sales * 100,2) AS growth_percentage
  FROM month_over_month;

 * postgresql://postgres:***@localhost:5432/northwind
23 rows affected.


order_month,monthly_sales,previous_month_sales,growth_percentage
1996-07-01,27861.9,,
1996-08-01,25485.28,27861.9,-8.53
1996-09-01,26381.4,25485.28,3.52
1996-10-01,37515.72,26381.4,42.21
1996-11-01,45600.05,37515.72,21.55
1996-12-01,45239.63,45600.05,-0.79
1997-01-01,61258.07,45239.63,35.41
1997-02-01,38483.63,61258.07,-37.18
1997-03-01,38547.22,38483.63,0.17
1997-04-01,53032.95,38547.22,37.58


## Task 4: Identifying High-Value Customers
Objective is to identify high-value customers to whom company can offer targeted promotions and special offers, which could drive increased sales, improve customer retention, and attract new customers.

We can do this by identifying customers with above-average order values. These customers might be businesses buying in bulk or individuals purchasing high-end products. We will catergorize each order as 'Above Average' or 'Average/Below Average' and then count how many orders are 'Above Average' for each customer.

In [13]:
%%sql
WITH
order_value AS(
SELECT oc.order_id, oc.company_name, oc.contact_name,
       ROUND(SUM((od.unit_price*od.quantity*(1-od.discount))::numeric),2) AS order_value
  FROM orders_customers oc
  JOIN order_details od
    ON oc.order_id = od.order_id
 GROUP BY oc.order_id, oc.company_name, oc.contact_name
)
SELECT *,
       --AVG(order_value) OVER() AS avg_order_value,
       CASE
       WHEN order_value > (AVG(order_value) OVER() ) THEN 'Above Average'
       ELSE 'Average/Below Average'
       END AS order_category
  FROM order_value
 LIMIT 10;

 * postgresql://postgres:***@localhost:5432/northwind
10 rows affected.


order_id,company_name,contact_name,order_value,order_category
10248,Vins et alcools Chevalier,Paul Henriot,440.0,Average/Below Average
10249,Toms Spezialitäten,Karin Josephs,1863.4,Above Average
10250,Hanari Carnes,Mario Pontes,1552.6,Above Average
10251,Victuailles en stock,Mary Saveley,654.06,Average/Below Average
10252,Suprêmes délices,Pascale Cartrain,3597.9,Above Average
10253,Hanari Carnes,Mario Pontes,1444.8,Average/Below Average
10254,Chop-suey Chinese,Yang Wang,556.62,Average/Below Average
10255,Richter Supermarkt,Michael Holz,2490.5,Above Average
10256,Wellington Importadora,Paula Parente,517.8,Average/Below Average
10257,HILARION-Abastos,Carlos Hernández,1119.9,Average/Below Average


#### We have categorized each order as ‘Above Average’ or ‘Average/Below Average’ in the table above. Now let's find out how many orders are ‘Above Average’ for each customer.

In [14]:
%%sql
WITH
order_value AS(
SELECT oc.order_id, oc.customer_id, oc.company_name, oc.contact_name,
       ROUND(SUM((od.unit_price*od.quantity*(1-od.discount))::numeric),2) AS order_value
  FROM orders_customers oc
  JOIN order_details od
    ON oc.order_id = od.order_id
 GROUP BY oc.order_id, oc.customer_id, oc.company_name, oc.contact_name
),
order_cat AS(
SELECT *,
       --AVG(order_value) OVER() AS avg_order_value,
       CASE
       WHEN order_value > (AVG(order_value) OVER() ) THEN 'Above Average'
       ELSE 'Average/Below Average'
       END AS order_category
  FROM order_value
)
SELECT customer_id, company_name, contact_name, COUNT(*) AS num_above_avg_orders
  FROM order_cat
 WHERE order_category = 'Above Average'
 GROUP BY customer_id, company_name, contact_name
 ORDER BY num_above_avg_orders DESC

 * postgresql://postgres:***@localhost:5432/northwind
64 rows affected.


customer_id,company_name,contact_name,num_above_avg_orders
ERNSH,Ernst Handel,Roland Mendel,26
SAVEA,Save-a-lot Markets,Jose Pavarotti,26
QUICK,QUICK-Stop,Horst Kloss,22
HUNGO,Hungry Owl All-Night Grocers,Patricia McKenna,11
RATTC,Rattlesnake Canyon Grocery,Paula Wilson,10
BONAP,Bon app',Laurence Lebihan,8
FOLKO,Folk och fä HB,Maria Larsson,8
FRANK,Frankenversand,Peter Franken,7
RICSU,Richter Supermarkt,Michael Holz,7
HILAA,HILARION-Abastos,Carlos Hernández,7


#### Table above gives us the on high-value customers.

## Task 5: Percentage of Sales for Each Category
By knowing the percentage of total sales for each product category, company can gain insights into which categories drive most of the company's sales. This understanding will help guide decisions about inventory (e.g., which categories should be stocked more heavily) and marketing strategies (e.g., which categories should be promoted more aggressively).


In [15]:
%%sql
WITH
cat_sale AS(
SELECT category_id, 
       ROUND(SUM((unit_price*quantity*(1-discount))::numeric),2) AS sales
  FROM orders_products
 GROUP BY category_id
),
cat_sale_name AS(
SELECT cs.category_id,
       c.category_name,
       cs.sales
  FROM cat_sale cs
  JOIN categories c
    ON cs.category_id = c.category_id
)
SELECT *,
       ROUND(sales/(SUM(sales) OVER()) * 100::numeric,2) AS percent_of_total_sales
  FROM cat_sale_name
 ORDER BY percent_of_total_sales DESC

 * postgresql://postgres:***@localhost:5432/northwind
8 rows affected.


category_id,category_name,sales,percent_of_total_sales
1,Beverages,267868.18,21.16
4,Dairy Products,234507.28,18.53
3,Confections,167357.23,13.22
6,Meat/Poultry,163022.36,12.88
8,Seafood,131261.74,10.37
2,Condiments,106047.08,8.38
7,Produce,99984.58,7.9
5,Grains/Cereals,95744.59,7.56


#### Above table shows us that Beverages is the top category followed by Dairy Products. Produce and Grains/Cereals are the categories with the smallest sales percentage

## Task 6: Top Products Per Category
Company needs to know the top three items sold in each product category. This will allow them to identify star performers and ensure that these products are kept in stock and marketed prominently.

In [16]:
%%sql
WITH
cat_sale AS(
SELECT category_id, product_id, product_name,
       ROUND(SUM((unit_price*quantity*(1-discount))::numeric),2) AS sales
  FROM orders_products
 GROUP BY category_id, product_id, product_name
),
cat_sale_name AS(
SELECT cs.category_id,
       c.category_name,
       cs.product_id,
       cs.product_name,
       cs.sales
  FROM cat_sale cs
  JOIN categories c
    ON cs.category_id = c.category_id
 ORDER BY cs.category_id
),
sales_rank AS(
SELECT *,
       RANK() OVER(PARTITION BY category_id
                        ORDER BY SALES DESC) AS sales_ranking
  FROM cat_sale_name
)
SELECT *
  FROM sales_rank
 WHERE sales_ranking <= 3

 * postgresql://postgres:***@localhost:5432/northwind
24 rows affected.


category_id,category_name,product_id,product_name,sales,sales_ranking
1,Beverages,38,Côte de Blaye,141396.74,1
1,Beverages,43,Ipoh Coffee,23526.7,2
1,Beverages,2,Chang,16355.96,3
2,Condiments,63,Vegie-spread,16701.1,1
2,Condiments,61,Sirop d'érable,14352.6,2
2,Condiments,65,Louisiana Fiery Hot Pepper Sauce,13869.89,3
3,Confections,62,Tarte au sucre,47234.97,1
3,Confections,20,Sir Rodney's Marmalade,22563.36,2
3,Confections,26,Gumbär Gummibärchen,19849.14,3
4,Dairy Products,59,Raclette Courdavault,71155.7,1


#### Table above gives us the top 3 products sold from each category.

# Conclusion:
We have completed all 6 tasks:
1. Ranked Employee Sales Performance based on their total sales amount.
2. Calculated Running Total of Monthly Sales.
3. Calculated Month-Over-Month Sales Growth.
4. Identified High-Value Customers using above-average order values.
5. Calculated Percentage of Sales for Each Category.
6. Identified Top 3 Products Per Category.

# Author
### Puneet Pawar