# E-Commerce Logs Analysis

## Project Objective

The objective of this project is to analyze the website logs from an e-commerce store to obtain useful insight into customer behaviour.

## Data Source

The dataset was obtained here: https://www.kaggle.com/datasets/kzmontage/e-commerce-website-logs and the data has an Open Database license: https://opendatacommons.org/licenses/odbl/1-0/

In [20]:
-- Specifying database
USE EcommerceLogsDb

In [57]:
-- preview of table
SELECT TOP 3*
FROM logs;

access_date,duration_seconds,Proto,IP,Src_IP_type,Src_Pt,Bytes,Accessed_From,country,membership,Languages,Sales,Returned,Returned_Amount
2016-11-01 09:58:00.0000000,2533,TCP,1.10.195.126,EXT_SERVER,8082,20100,Chrome,Albania,Normal,Albanian,261.96,No,0
2016-11-01 09:59:00.0000000,4034,TCP,1.1.217.211,OPENSTACK_NET,56978,20500,Mozilla Firefox,Albania,Normal,Albanian,731.94,No,0
2016-11-01 09:59:00.0000000,1525,TCP,1.115.198.107,EXT_SERVER,8082,90100,Mozilla Firefox,Albania,Normal,Albanian,14.62,No,0


In [21]:
-- Which country had the most sales (transactions) and sales ($ amount)?
-- showing country rankings for both metrics using window functions
SELECT country, 
    COUNT(*) AS sales_transaction, 
    DENSE_RANK() OVER(ORDER BY COUNT(*) DESC) AS sales_transaction_rank, 
    SUM(Sales) AS sales_$, 
    DENSE_RANK() OVER(ORDER BY SUM(Sales) DESC) AS sales_$_rank
FROM logs
GROUP BY country
ORDER BY sales_$ DESC;

country,sales_transaction,sales_transaction_rank,sales_$,sales_$_rank
Austria,2674,1,1296463.3414999906,1
Belgium,2604,2,1154321.3705999977,2
Argentina,2603,3,1140238.692799999,3
Australia,2520,4,1139576.1649999968,4
United States,2231,12,1132496.38,5
Brazil,2604,2,1129919.2000999977,6
Norway,2160,13,1129793.889999999,7
Netherlands,2292,8,1106623.810000002,8
Spain,2232,11,1102578.460000004,9
Puerto Rico,2232,11,1089656.3800000006,10


From the table above, the top 4 countries for sales\_transaction and sales\_$ are Austria, Belgium, Argentina and Australia. The number of sales transactions seems highly correlated with sales ($) with some exceptions like the United States which is ranked 12th in number of transactions and 5th in sales ($).

Let's look at the average sale amount per country and compare to the worldwide average. The difference between the sale amount for each country and the worldwide average will also be computed.

In [34]:
-- average sales ($) per country
SELECT  
    country, AVG(Sales) AS avg_sales_$,
    -- subquery in SELECT to determine worldwide avg sales ($)
    (SELECT AVG(Sales) FROM logs) AS world_avg_sales_$,
    -- difference bewteen country avg and worldwide avg with another SELECT subquery
    (AVG(Sales) - (SELECT AVG(Sales) FROM logs)) AS difference
FROM logs
GROUP BY country
ORDER BY difference DESC;

country,avg_sales_$,world_avg_sales_$,difference
Turkey,557.5280158730159,411.3464487919854,146.18156708104408
Norway,523.0527268518516,411.3464487919854,111.70627805987982
Qatar,519.2086891385769,411.3464487919854,107.86224034660512
Nicaragua,516.7759722222227,411.3464487919854,105.42952343025088
Slovenia,516.1550165453341,411.3464487919854,104.8085677533623
Suriname,514.7100248015872,411.3464487919854,103.36357600961544
Russian Federation,508.748739711934,411.3464487919854,97.40229091996218
United States,507.6182787987449,411.3464487919854,96.27183000677309
Sri Lanka,499.8968434343435,411.3464487919854,88.55039464237171
Montenegro,496.54326388888927,411.3464487919854,85.19681509691748


Turkey has the largest positive difference in average $ per sale with $146/sale greater than the worldwide average. In contrast, Russia has the largest negative difference with $253/sale less than the world average.

We'll now look at returns of purchased items for each country.

In [114]:
WITH returned_$ AS (
    SELECT country,
        SUM(Returned_Amount) AS returned_amount_$,
        SUM(Returned_Amount) / SUM(Sales) AS pct_returns_over_sales
    From logs
    GROUP BY country),

returned_orders AS (
    SELECT t1.country, 
        -- counting all rows (number of transactions)
        COUNT(*) AS sales_transaction,
        -- counting number of order returns 
        COUNT(t2.Returned) AS orders_returned,
        -- calculating percentage of orders that were returned
        1.0 * COUNT(t2.Returned) / COUNT(*) AS pct_orders_returned
FROM logs as t1
    LEFT JOIN (SELECT duration_seconds, ip, country, Returned
    FROM logs 
    WHERE Returned = 'Yes') AS t2
    ON t1.duration_seconds = t2.duration_seconds
    AND t1.ip = t2.ip
GROUP BY t1.country)

SELECT CTE1.country, sales_transaction, orders_returned, pct_orders_returned, returned_amount_$, 
    pct_returns_over_sales 
FROM returned_$ AS CTE1, returned_orders AS CTE2
GROUP BY CTE1.country
ORDER BY pct_returns_over_sales DESC;



: Msg 8120, Level 16, State 1, Line 24
Column 'returned_orders.sales_transaction' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

In [None]:
WITH returncountry AS (
    SELECT COUNT(*) AS orders_returned
    FROM logs 
    WHERE Returned = 'Yes'
    GROUP BY country
)

-- Returned purchases
SELECT country, 
    -- subquery to get number of returned orders
    COUNT(*) AS orders_returned,
    -- subquery to get percentage of orders returned
    ((SELECT COUNT(*) FROM logs WHERE Returned = 'Yes') / SUM(COUNT(*)) OVER()) AS pct_orders_returned,
    SUM(Returned_Amount) AS returned_amount_$,
    SUM(Returned_Amount) / SUM(Sales) AS pct_returns_over_sales
From logs
GROUP BY country

In [100]:
SELECT t1.country, 
    -- counting all rows (number of transactions)
    COUNT(*) AS sales_transaction,
    -- counting number of order returns 
    COUNT(t2.Returned) AS orders_returned,
    -- calculating percentage of orders that were returned
    1.0 * COUNT(t2.Returned) / COUNT(*) AS pct_orders_returned
FROM logs as t1
    LEFT JOIN (SELECT duration_seconds, ip, country, Returned
    FROM logs 
    WHERE Returned = 'Yes') AS t2
    ON t1.duration_seconds = t2.duration_seconds
    AND t1.ip = t2.ip
GROUP BY t1.country
ORDER BY pct_orders_returned DESC;

country,sales_transaction,orders_returned,pct_orders_returned
Oman,216,46,0.212962962962
Nicaragua,432,92,0.212962962962
Montenegro,720,137,0.190277777777
Puerto Rico,2232,411,0.184139784946
Poland,1728,316,0.18287037037
Paraguay,1944,347,0.178497942386
Philippines,1079,191,0.177015755329
New Zealand,2088,358,0.171455938697
Qatar,1068,183,0.171348314606
Portugal,1944,332,0.170781893004


In [71]:
SELECT country, COUNT(*) AS sales_transaction
FROM logs 
GROUP BY country

country,sales_transaction
Austria,2674
Greece,2232
Kiribati,792
Kuwait,1800
Luxembourg,2232
Oman,216
Panama,1800
Russia,8
Cuba,1728
Estonia,1512


In [88]:
SELECT duration_seconds, ip, country, Returned
FROM logs 
WHERE Returned = 'Yes';

duration_seconds,ip,country,Returned
2090,1.124.48.99,Albania,Yes
2203,1.132.97.60,Albania,Yes
4107,1.144.97.41,Albania,Yes
2979,1.152.97.25,Albania,Yes
3301,1.187.86.94,Albania,Yes
4678,1.20.34.63,Albania,Yes
1623,1.212.121.254,Albania,Yes
4962,1.39.11.87,Albania,Yes
1661,1.39.24.136,Albania,Yes
2620,1.39.24.241,Albania,Yes
