# E-Commerce Logs Analysis

## Project Objective

The objective of this project is to analyze the website logs from an e-commerce store to obtain useful insight into customer behaviour.

## Data Source

The dataset was obtained here: https://www.kaggle.com/datasets/kzmontage/e-commerce-website-logs and the data has an Open Database license: https://opendatacommons.org/licenses/odbl/1-0/

In [1]:
-- Specifying database
USE EcommerceLogsDb

In [2]:
-- preview of table
SELECT TOP 3*
FROM logs;

access_date,duration_seconds,Proto,IP,Src_IP_type,Src_Pt,Bytes,Accessed_From,country,membership,Languages,Sales,Returned,Returned_Amount
2016-11-01 09:58:00.0000000,2533,TCP,1.10.195.126,EXT_SERVER,8082,20100,Chrome,Albania,Normal,Albanian,261.96,No,0
2016-11-01 09:59:00.0000000,4034,TCP,1.1.217.211,OPENSTACK_NET,56978,20500,Mozilla Firefox,Albania,Normal,Albanian,731.94,No,0
2016-11-01 09:59:00.0000000,1525,TCP,1.115.198.107,EXT_SERVER,8082,90100,Mozilla Firefox,Albania,Normal,Albanian,14.62,No,0


In [3]:
-- Which country had the most sales (transactions) and sales ($ amount)?
-- showing country rankings for both metrics using window functions
SELECT country, 
    COUNT(*) AS sales_transaction, 
    DENSE_RANK() OVER(ORDER BY COUNT(*) DESC) AS sales_transaction_rank, 
    SUM(Sales) AS sales_$, 
    DENSE_RANK() OVER(ORDER BY SUM(Sales) DESC) AS sales_$_rank
FROM logs
GROUP BY country
ORDER BY sales_$ DESC;

country,sales_transaction,sales_transaction_rank,sales_$,sales_$_rank
Austria,2674,1,1296463.3414999906,1
Belgium,2604,2,1154321.3705999977,2
Argentina,2603,3,1140238.692799999,3
Australia,2520,4,1139576.1649999968,4
United States,2231,12,1132496.38,5
Brazil,2604,2,1129919.2000999977,6
Norway,2160,13,1129793.889999999,7
Netherlands,2292,8,1106623.810000002,8
Spain,2232,11,1102578.460000004,9
Puerto Rico,2232,11,1089656.3800000006,10


From the table above, the top 4 countries for sales\_transaction and sales\_$ are Austria, Belgium, Argentina and Australia. The number of sales transactions seems highly correlated with sales ($) with some exceptions like the United States which is ranked 12th in number of transactions and 5th in sales ($).

Let's look at the average sale amount per country and compare to the worldwide average. The difference between the sale amount for each country and the worldwide average will also be computed.

In [4]:
-- average sales ($) per country
SELECT  
    country, AVG(Sales) AS avg_sales_$,
    -- subquery in SELECT to determine worldwide avg sales ($)
    (SELECT AVG(Sales) FROM logs) AS world_avg_sales_$,
    -- difference bewteen country avg and worldwide avg with another SELECT subquery
    (AVG(Sales) - (SELECT AVG(Sales) FROM logs)) AS difference
FROM logs
GROUP BY country
ORDER BY difference DESC;

country,avg_sales_$,world_avg_sales_$,difference
Turkey,557.5280158730159,411.3464487919592,146.18156708104272
Norway,523.0527268518517,411.3464487919592,111.70627805987856
Qatar,519.2086891385768,411.3464487919592,107.86224034660364
Nicaragua,516.7759722222225,411.3464487919592,105.4295234302494
Slovenia,516.1550165453345,411.3464487919592,104.8085677533614
Suriname,514.7100248015873,411.3464487919592,103.36357600961418
Russian Federation,508.7487397119337,411.3464487919592,97.40229091996054
United States,507.6182787987448,411.3464487919592,96.27183000677168
Sri Lanka,499.8968434343435,411.3464487919592,88.55039464237035
Montenegro,496.5432638888892,411.3464487919592,85.196815096916


Turkey has the largest positive difference in average $ per sale with $146/sale greater than the worldwide average. In contrast, Russia has the largest negative difference with $253/sale less than the world average.

We'll now look at returns of purchased items for each country.

In [5]:
-- CTE for value of sales returned
WITH returned_$ AS (
    SELECT country,
        -- Sum of total dollars returned
        SUM(Returned_Amount) AS returned_amount_$,
        -- Percentage of dollars returned over dollars sold
        -- Sales column doesn't include amount for returned items
        SUM(Returned_Amount) / (SUM(Sales) + SUM(Returned_Amount)) AS pct_$_returned
    From logs
    GROUP BY country),

-- CTE for number of sales returned
returned_orders AS (
    SELECT t1.country, 
        -- counting all rows (number of transactions)
        COUNT(*) AS sales_transaction,
        -- counting number of order returns 
        COUNT(t2.Returned) AS orders_returned,
        -- calculating percentage of orders that were returned
        1.0 * COUNT(t2.Returned) / COUNT(*) AS pct_orders_returned
    FROM logs as t1
    LEFT JOIN (SELECT duration_seconds, ip, country, Returned
    FROM logs 
    WHERE Returned = 'Yes') AS t2
    ON t1.duration_seconds = t2.duration_seconds
    AND t1.ip = t2.ip
    GROUP BY t1.country)

-- final query with percent of orders returned and percent of sales refunded
SELECT TOP 5 CTE1.country, pct_orders_returned, pct_$_returned 
FROM returned_$ AS CTE1
    INNER JOIN returned_orders AS CTE2
    ON CTE1.country = CTE2.country 
ORDER BY pct_orders_returned DESC, pct_$_returned DESC;



country,pct_orders_returned,pct_$_returned
Nicaragua,0.212962962962,0.2117912818013144
Oman,0.212962962962,0.1519525476136124
Montenegro,0.190277777777,0.2197341784693778
Puerto Rico,0.184139784946,0.1560551240049551
Poland,0.18287037037,0.1416946334543682


Nicaragua and Oman have the highest percentage of orders returned at 21.3%. This represents a possible opportunity to increase revenue by making the return policy more strict to attempt to lower the number of returned orders. The affect that a stricter return policy would have on customer experience should also be taken into account though.

We will next look at the number of returns for each language. This will help inform us of any language/communication issues thay may be causing a higher rate of purchase returns.

In [6]:
SELECT TOP 5 t1.Languages, 
        -- calculating percentage of orders that were returned
        ROUND((1.0 * COUNT(t2.Returned) / COUNT(*)), 2) AS pct_orders_returned
FROM logs as t1
LEFT JOIN (SELECT duration_seconds, ip, country, Returned
FROM logs 
WHERE Returned = 'Yes') AS t2
ON t1.duration_seconds = t2.duration_seconds
AND t1.ip = t2.ip
GROUP BY t1.Languages
ORDER BY pct_orders_returned DESC;

Languages,pct_orders_returned
marathi,0.22
romanian,0.18
kinyarwanda,0.18
Slovak,0.17
polish,0.17


Marathi, a language spoken in India, has the highest percentage for orders returned (4% above the second language). A statistical test should be performed to determine if this percentage is statistically significantly greater than the 2nd highest percentage. If it is, this indicates that higher return rates could be linked to translation issues with the language. This means customers could be misinformed about products, resulting in higher return rates. The impact of language on return rates should be further investigated for the website as a whole to idenitfy if that is a frequent issue.

The membership types associated with online purchases will now be investigated per country. It is important to note that each count of a membership type represents an online purchase but not a unique individual (no unique identifiers are present in dataset). It is likely that premium members make more frequent purchases than normal and non-members. This category therefore investigates purchases per membership type and not individuals per membership type.

In [28]:
-- first CTE for total website purchase per country
WITH totals AS (
    SELECT country, COUNT(*) AS total_visits
    FROM logs
    GROUP BY country),

-- second CTE for website visits per membership type per country
memberships AS (
    SELECT country, membership, COUNT(1) AS website_visits
    FROM logs
    GROUP BY country, membership)

-- joining CTEs and calculating pct membership type for total visits per country
SELECT memberships.country, membership, website_visits, 1.0 * website_visits/total_visits AS pct_visits
FROM memberships
    LEFT JOIN totals
    ON memberships.country = totals.country
ORDER BY country;

country,membership,website_visits,pct_visits
Albania,Normal,646,0.349567099567
Albania,Premium,1185,0.641233766233
Albania,Not Logged In,17,0.009199134199
Antigua and Barbuda,Premium,1527,0.67328042328
Antigua and Barbuda,Not Logged In,20,0.008818342151
Antigua and Barbuda,Normal,721,0.317901234567
Argentina,Not Logged In,24,0.009220130618
Argentina,Premium,1612,0.619285439877
Argentina,Normal,967,0.371494429504
Armenia,Normal,678,0.325023969319


In [32]:
SELECT country, Accessed_From, COUNT(1) AS website_visits
FROM logs
GROUP BY country, Accessed_From
ORDER BY country;

country,Accessed_From,website_visits
Albania,Safari,180
Albania,Android App,392
Albania,Others,281
Albania,Mozilla Firefox,292
Albania,Microsoft Edge,171
Albania,IOS App,233
Albania,Chrome,299
Antigua and Barbuda,IOS App,289
Antigua and Barbuda,Safari,210
Antigua and Barbuda,Chrome,401
