# Analysing Customer Sales and Product Data

## Description of the Project

The database contains sales data for scale model cars. It is arranged in eight tables:
- `customers`
- `employees`
- `offices`
- `orderdetails`
- `orders`
- `payments`
- `productlines`
- `products`

By analysing the products and sales data, we can find insight to improve marketing, inventory and customer retention.

## Project Goal

The goal of this project is to analyze data from a sales records database for scale model cars and extract information for decision-making.

We want to answer the following questions: 

1) Which products should we order more of or less of?
2) How should we tailor marketing and communication strategies to customer behaviors?
3) How much can we spend on acquiring new customers?

By answering these questions we will be able to:
- Identify the most popular brands, products lines and models
- Produce cost-efficient marketing strategies
- Use reduced storage space by ordering stock at appropriate levels

## Overview of the data

### Connecting to the database

Activating ipython-sql and connecting to the database:

In [2]:
%load_ext sql
%sql sqlite:///stores.db

'Connected: @stores.db'

### Specific Attributes

Determining the attributes and keys of each table:

In [27]:
%%sql

PRAGMA table_info(customers)

 * sqlite:///stores.db
Done.


cid,name,type,notnull,dflt_value,pk
0,customerNumber,INTEGER,1,,1
1,customerName,nvarchar(50),1,,0
2,contactLastName,nvarchar(50),1,,0
3,contactFirstName,nvarchar(50),1,,0
4,phone,nvarchar(50),1,,0
5,addressLine1,nvarchar(50),1,,0
6,addressLine2,nvarchar(50),0,,0
7,city,nvarchar(50),1,,0
8,state,nvarchar(50),0,,0
9,postalCode,nvarchar(15),0,,0


Descriptions of the tables in the database:

In [3]:
%%sql

SELECT 
    'customers' AS name,
    (SELECT COUNT(*) FROM pragma_table_info('customers')) AS num_columns,
    COUNT(*) AS num_rows
FROM
    customers
    
UNION ALL

SELECT 
    'employees' AS name,
    (SELECT COUNT(*) FROM pragma_table_info('employees')) AS num_columns,
    COUNT(*) AS num_rows
FROM
    employees

UNION ALL

SELECT 
    'offices' AS name,
    (SELECT COUNT(*) FROM pragma_table_info('offices')) AS num_columns,
    COUNT(*) AS num_rows
FROM
    offices
    
UNION ALL

SELECT 
    'orderdetails' AS name,
    (SELECT COUNT(*) FROM pragma_table_info('orderdetails')) AS num_columns,
     COUNT(*) AS num_rows
FROM
    orderdetails

UNION ALL

SELECT 
    'orders' AS name,
    (SELECT COUNT(*) FROM pragma_table_info('orders')) AS num_columns,
     COUNT(*) AS num_rows
FROM
    orders

UNION ALL

SELECT 
    'payments' AS name,
    (SELECT COUNT(*) FROM pragma_table_info('payments')) AS num_columns,
     COUNT(*) AS num_rows
FROM
    payments

UNION ALL

SELECT 
    'productlines' AS name,
    (SELECT COUNT(*) FROM pragma_table_info('productlines')) AS num_columns,
    COUNT(*) AS num_rows
FROM
    productlines

UNION ALL

SELECT 
    'products' AS name,
    (SELECT COUNT(*) FROM pragma_table_info('products')) AS num_columns,
    COUNT(*) AS num_rows
FROM
    products

ORDER BY
    name;

 * sqlite:///stores.db
Done.


name,num_columns,num_rows
customers,13,122
employees,8,23
offices,9,7
orderdetails,5,2996
orders,7,326
payments,4,273
productlines,4,7
products,9,110


## Question 1: Which Products Should We Order More of or Less of?

To answer this question, we need to analyse stock levels and product performance. We need to ensure that the best-selling products are prevented from going out-of-stock.

- The low stock represents the quantity of each product sold divided by the quantity of product in stock. We can consider the ten lowest rates. These will be the top ten products that are (almost) out-of-stock.

- The product performance represents the sum of sales per product.

- Priority products for restocking are those with high product performance that are on the brink of being out of stock.



Determining which products have the lowest stock:

In [29]:
%%sql

SELECT
    p.productCode, 
    p.productName, 
    p.productLine, 
    -- Total number of orders of each product
    SUM(od.quantityOrdered) AS quantity_ordered, 
    quantityInStock AS quantity_in_stock, 
    -- Calculating which product has the lowest stock level
    ROUND(SUM(od.quantityOrdered) * 1.0 / p.quantityInStock,2) as low_stock
    
FROM 
    products p
    
JOIN 
    -- Comining the orderdetails table to have all columns available, and not require subqueries.
    orderdetails od
    
ON 
    p.productCode = od.productCode
    
GROUP BY
    p.productCode
    
ORDER BY
    low_stock
    
LIMIT
    10;

 * sqlite:///stores.db
Done.


productCode,productName,productLine,quantity_ordered,quantity_in_stock,low_stock
S18_1984,1995 Honda Civic,Classic Cars,917,9772,0.09
S24_3432,2002 Chevy Corvette,Classic Cars,894,9446,0.09
S12_2823,2002 Suzuki XREO,Motorcycles,1028,9997,0.1
S12_3380,1968 Dodge Charger,Classic Cars,925,9123,0.1
S18_1589,1965 Aston Martin DB5,Classic Cars,914,9042,0.1
S18_2325,1932 Model A Ford J-Coupe,Vintage Cars,957,9354,0.1
S18_2870,1999 Indy 500 Monte Carlo SS,Classic Cars,855,8164,0.1
S18_3482,1976 Ford Gran Torino,Classic Cars,915,9127,0.1
S32_2206,1982 Ducati 996 R,Motorcycles,906,9241,0.1
S700_2466,America West Airlines B757-200,Planes,984,9653,0.1


Determining product performance:

In [35]:
%%sql

SELECT
    -- The product code
    od.productCode,
    -- The product name (from the products table with matching productCode)
    (SELECT 
        productName 
    FROM 
        products p 
    WHERE 
        od.productCode = p.productCode) as product_name,
    -- The total amount ordered
    SUM(quantityOrdered) as total_ordered,
    -- The price per item
    priceEach,
    -- The total spent on each product
    ROUND(SUM(quantityOrdered * priceEach),2)  AS product_performance

FROM 
    orderdetails od
    
GROUP BY 
    productCode
    
ORDER BY 
    product_performance DESC
    
LIMIT 
    20;

 * sqlite:///stores.db
Done.


productCode,product_name,total_ordered,priceEach,product_performance
S18_3232,1992 Ferrari 360 Spider red,1808,165.95,276839.98
S12_1108,2001 Ferrari Enzo,1019,205.72,190755.86
S10_1949,1952 Alpine Renault 1300,961,214.3,190017.96
S10_4698,2003 Harley-Davidson Eagle Drag Bike,985,172.36,170686.0
S12_1099,1968 Ford Mustang,933,165.38,161531.48
S12_3891,1969 Ford Falcon,965,141.88,152543.02
S18_1662,1980s Black Hawk Helicopter,1040,134.04,144959.91
S18_2238,1998 Chrysler Plymouth Prowler,986,135.9,142530.63
S18_1749,1917 Grand Touring Sedan,918,136.0,140535.6
S12_2823,2002 Suzuki XREO,1028,122.0,135767.03


Determining which high performing products are also on the low stock list:

In [39]:
%%sql

WITH low_stock_table AS (
SELECT
    p.productCode, 
    ROUND(SUM(od.quantityOrdered) * 1.0 / p.quantityInStock,2) as low_stock  
FROM 
    products p
JOIN 
    orderdetails od 
ON 
    p.productCode = od.productCode 
GROUP BY
    p.productCode
ORDER BY
    low_stock
LIMIT 
    10
)

-- The product performance query for the ten products with the lowest stock.
SELECT
    od.productCode,
    (SELECT 
        productName 
    FROM 
        products p 
    WHERE 
        od.productCode = p.productCode) as product_name,
    (SELECT 
        productLine 
    FROM 
        products p 
    WHERE 
        od.productCode = p.productCode) as product_line,
    (SELECT low_stock
     FROM low_stock_table
     WHERE
     low_stock_table.productCode = od.productCode) AS low_stock,
    ROUND(SUM(quantityOrdered * priceEach),2)  AS product_performance

FROM 
    orderdetails od
WHERE 
    od.productCode IN (SELECT productCode FROM low_stock_table)
    
GROUP BY 
    productCode
    
ORDER BY 
    product_performance DESC
    
LIMIT 
    10;

 * sqlite:///stores.db
Done.


productCode,product_name,product_line,low_stock,product_performance
S12_2823,2002 Suzuki XREO,Motorcycles,0.1,135767.03
S18_3482,1976 Ford Gran Torino,Classic Cars,0.1,121890.6
S18_1984,1995 Honda Civic,Classic Cars,0.09,119050.95
S18_2325,1932 Model A Ford J-Coupe,Vintage Cars,0.1,109992.01
S18_1589,1965 Aston Martin DB5,Classic Cars,0.1,101778.13
S18_2870,1999 Indy 500 Monte Carlo SS,Classic Cars,0.1,100770.12
S12_3380,1968 Dodge Charger,Classic Cars,0.1,98718.76
S700_2466,America West Airlines B757-200,Planes,0.1,89347.8
S24_3432,2002 Chevy Corvette,Classic Cars,0.09,87404.81
S32_2206,1982 Ducati 996 R,Motorcycles,0.1,33268.76


The table above lists the highest performing products that are also on the low_stock list. The Classic Cars product line appears most frequently and need to be prioritised. 

## Question 2: How Should We Match Marketing and Communication Strategies to Customer Behavior?

This involves categorizing customers: finding the VIP (very important person) customers and those who are less engaged.

- VIP customers bring in the most profit for the store.

- Less-engaged customers bring in less profit.


Determining which customers generate the most profit:

In [47]:
%%sql

-- FIRST OPTION FOR DETERMINING - SUBQUERY SELECTIONS

SELECT
    -- customer numbers that match the order numbers in 'orderdetails'
    (SELECT 
         customerNumber 
    FROM 
         orders o 
    WHERE od.orderNumber = o.orderNumber) AS customer_number,
    (SELECT 
         contactFirstName
     FROM
         customers c 
    WHERE 
        (SELECT 
            customerNumber
        FROM 
            orders o 
        WHERE od.orderNumber = o.orderNumber) = c.customerNumber) AS first_name,
    (SELECT 
         contactLastName
     FROM
         customers c 
    WHERE 
        (SELECT 
            customerNumber
        FROM 
            orders o 
        WHERE od.orderNumber = o.orderNumber) = c.customerNumber) AS last_name,
    -- sum of (sale price - buy price) * quantity sold
    SUM((priceEach - (SELECT buyPrice 
                        FROM products p 
                       WHERE od.productCode = p.productCode)) * quantityOrdered) AS total_revenue
FROM
    orderdetails od
GROUP BY
    customer_number
ORDER BY
    total_revenue DESC
LIMIT
    10;

 * sqlite:///stores.db
Done.


customer_number,first_name,last_name,total_revenue
141,Diego,Freyre,326519.65999999986
124,Susan,Nelson,236769.39
151,Jeff,Young,72370.09000000001
114,Peter,Ferguson,70311.06999999999
119,Janine,Labrune,60875.30000000001
148,Eric,Natividad,60477.37999999999
187,Rachel,Ashworth,60095.85999999999
323,Mike,Graham,60013.99
131,Kwai,Lee,58669.10000000001
450,Sue,Frick,55931.37


In [41]:
%%sql

-- SECOND OPTION FOR DETERMINING - JOINING TABLES

SELECT 
    o.customerNumber,
    c.contactFirstName,
    c.contactLastName,
    c.city,
    c.country,
    SUM((priceEach - buyPrice) * quantityOrdered) AS total_revenue
FROM 
    orderdetails od
JOIN 
    orders o
ON 
    od.orderNumber = o.orderNumber
JOIN 
    products p
ON 
    od.productCode = p.productCode
JOIN
    customers c
ON
    o.customerNumber = c.customerNumber
GROUP BY
    o.customerNumber
ORDER BY
    total_revenue DESC
LIMIT 
    5;

 * sqlite:///stores.db
Done.


customerNumber,contactFirstName,contactLastName,city,country,total_revenue
141,Diego,Freyre,Madrid,Spain,326519.65999999986
124,Susan,Nelson,San Rafael,USA,236769.39
151,Jeff,Young,NYC,USA,72370.09000000001
114,Peter,Ferguson,Melbourne,Australia,70311.06999999999
119,Janine,Labrune,Nantes,France,60875.30000000001


Determining which customers generate the least profit:

In [4]:
%%sql

SELECT 
    o.customerNumber,
    c.contactFirstName,
    c.contactLastName,
    c.city,
    c.country,
    SUM((priceEach - buyPrice) * quantityOrdered) AS total_revenue
FROM 
    orderdetails od
JOIN 
    orders o
ON 
    od.orderNumber = o.orderNumber
JOIN 
    products p
ON 
    od.productCode = p.productCode
JOIN
    customers c
ON
    o.customerNumber = c.customerNumber
GROUP BY
    o.customerNumber
ORDER BY
    total_revenue
LIMIT 
    5;

 * sqlite:///stores.db
Done.


customerNumber,contactFirstName,contactLastName,city,country,total_revenue
219,Mary,Young,Glendale,USA,2610.870000000001
198,Leslie,Taylor,Brickhaven,USA,6586.02
473,Franco,Ricotti,Milan,Italy,9532.93
103,Carine,Schmitt,Nantes,France,10063.8
489,Thomas,Smith,London,UK,10868.04


## Question 3: How Much Can We Spend on Acquiring New Customers?

Determing how the number of new customers has changed:

In [54]:
%%sql

WITH 

payment_with_year_month_table AS (
SELECT *, 
       CAST(SUBSTR(paymentDate, 1,4) AS INTEGER)*100 + CAST(SUBSTR(paymentDate, 6,7) AS INTEGER) AS year_month
  FROM payments p
),

customers_by_month_table AS (
SELECT p1.year_month, COUNT(*) AS number_of_customers, SUM(p1.amount) AS total
  FROM payment_with_year_month_table p1
 GROUP BY p1.year_month
),

new_customers_by_month_table AS (
SELECT p1.year_month, 
       COUNT(*) AS number_of_new_customers,
       SUM(p1.amount) AS new_customer_total,
       (SELECT number_of_customers
          FROM customers_by_month_table c
        WHERE c.year_month = p1.year_month) AS number_of_customers,
       (SELECT total
          FROM customers_by_month_table c
         WHERE c.year_month = p1.year_month) AS total
  FROM payment_with_year_month_table p1
 WHERE p1.customerNumber NOT IN (SELECT customerNumber
                                   FROM payment_with_year_month_table p2
                                  WHERE p2.year_month < p1.year_month)
 GROUP BY p1.year_month
)

SELECT year_month, 
       ROUND(number_of_new_customers*100/number_of_customers,1) AS number_of_new_customers_props,
       ROUND(new_customer_total*100/total,1) AS new_customers_total_props
  FROM new_customers_by_month_table;

 * sqlite:///stores.db
Done.


year_month,number_of_new_customers_props,new_customers_total_props
200301,100.0,100.0
200302,100.0,100.0
200303,100.0,100.0
200304,100.0,100.0
200305,100.0,100.0
200306,100.0,100.0
200307,75.0,68.3
200308,66.0,54.2
200309,80.0,95.9
200310,69.0,69.3


As you can see, the number of clients has been decreasing since 2003, and in 2004, we had the lowest values. The year 2005, which is present in the database as well, isn't present in the table above, this means that the store has not had any new customers since September of 2004. This means it makes sense to spend money acquiring new customers.

To determine how much money we can spend acquiring new customers, we can compute the Customer Lifetime Value (LTV), which represents the average amount of money a customer generates. We can then determine how much we can spend on marketing.

To determine how much money we can spend acquiring new customers, we can compute the Customer Lifetime Value (LTV), which represents the average amount of money a customer generates. We can then determine how much we can spend on marketing.

In [53]:
%%sql

WITH customer_revenue_table AS (
SELECT
o.customerNumber,
SUM((od.priceEach - p.buyPrice) * quantityOrdered) AS revenue
FROM
orderdetails od
JOIN
orders o
ON 
od.orderNumber = o.orderNumber
JOIN
products p
ON od.productCode = p.productCode
GROUP BY
o.customerNumber
ORDER BY
revenue DESC
)

SELECT
ROUND(AVG(revenue),2) as lifetime_value
FROM customer_revenue_table;

 * sqlite:///stores.db
Done.


lifetime_value
39039.59
