##  E-Commerce Funnel Analysis - Exploratory Data Analysis (EDA) and ETL


####  Summary: In this analysis, we aim to understand the customer journey on an e-commerce website. Using four datasets representing different stages of the funnel (visits, cart additions, checkouts, and purchases), we perform Exploratory Data Analysis (EDA) to identify bottlenecks and conversion rates. We'll also compute metrics like the percentage of users dropping off at each stage and the average time to purchase. The insights derived can help optimize the user experience and increase conversions.


###### As usual, we import our required libraries

In [1]:
import pandas as pd

###### First we load all the dataset, wich came from differents csv files and stores each dataset in a variable

In [2]:
visits = pd.read_csv('visits.csv', parse_dates=['visit_time'])
cart = pd.read_csv('cart.csv', parse_dates=['cart_time'])
checkout = pd.read_csv('checkout.csv', parse_dates=['checkout_time'])
purchase = pd.read_csv('purchase.csv', parse_dates=['purchase_time'])

### 1. Preview of the Data

###### Just a quick overview of all of them. Later we can examine each with much more details

In [3]:
print("Visits Data:")
display(visits.head())
print("\nCart Data:")
display(cart.head())
print("\nCheckout Data:")
display(checkout.head())
print("\nPurchase Data:")
display(purchase.head())

Visits Data:


Unnamed: 0,user_id,visit_time
0,943647ef-3682-4750-a2e1-918ba6f16188,2017-04-07 15:14:00
1,0c3a3dd0-fb64-4eac-bf84-ba069ce409f2,2017-01-26 14:24:00
2,6e0b2d60-4027-4d9a-babd-0e7d40859fb1,2017-08-20 08:23:00
3,6879527e-c5a6-4d14-b2da-50b85212b0ab,2017-11-04 18:15:00
4,a84327ff-5daa-4ba1-b789-d5b4caf81e96,2017-02-27 11:25:00



Cart Data:


Unnamed: 0,user_id,cart_time
0,2be90e7c-9cca-44e0-bcc5-124b945ff168,2017-11-07 20:45:00
1,4397f73f-1da3-4ab3-91af-762792e25973,2017-05-27 01:35:00
2,a9db3d4b-0a0a-4398-a55a-ebb2c7adf663,2017-03-04 10:38:00
3,b594862a-36c5-47d5-b818-6e9512b939b3,2017-09-27 08:22:00
4,a68a16e2-94f0-4ce8-8ce3-784af0bbb974,2017-07-26 15:48:00



Checkout Data:


Unnamed: 0,user_id,checkout_time
0,d33bdc47-4afa-45bc-b4e4-dbe948e34c0d,2017-06-25 09:29:00
1,4ac186f0-9954-4fea-8a27-c081e428e34e,2017-04-07 20:11:00
2,3c9c78a7-124a-4b77-8d2e-e1926e011e7d,2017-07-13 11:38:00
3,89fe330a-8966-4756-8f7c-3bdbcd47279a,2017-04-20 16:15:00
4,3ccdaf69-2d30-40de-b083-51372881aedd,2017-01-08 20:52:00



Purchase Data:


Unnamed: 0,user_id,purchase_time
0,4b44ace4-2721-47a0-b24b-15fbfa2abf85,2017-05-11 04:25:00
1,02e684ae-a448-408f-a9ff-dcb4a5c99aac,2017-09-05 08:45:00
2,4b4bc391-749e-4b90-ab8f-4f6e3c84d6dc,2017-11-20 20:49:00
3,a5dbb25f-3c36-4103-9030-9f7c6241cd8d,2017-01-22 15:18:00
4,46a3186d-7f5a-4ab9-87af-84d05bfd4867,2017-06-11 11:32:00


### # 2. Merge visits and cart data


In [4]:
visits_cart = pd.merge(visits, cart, how = "left", on= "user_id")
print("\nMerged Visits and Cart Data:")
print(visits_cart.head())


Merged Visits and Cart Data:
                                user_id          visit_time  \
0  943647ef-3682-4750-a2e1-918ba6f16188 2017-04-07 15:14:00   
1  0c3a3dd0-fb64-4eac-bf84-ba069ce409f2 2017-01-26 14:24:00   
2  6e0b2d60-4027-4d9a-babd-0e7d40859fb1 2017-08-20 08:23:00   
3  6879527e-c5a6-4d14-b2da-50b85212b0ab 2017-11-04 18:15:00   
4  a84327ff-5daa-4ba1-b789-d5b4caf81e96 2017-02-27 11:25:00   

            cart_time  
0                 NaT  
1 2017-01-26 14:44:00  
2 2017-08-20 08:31:00  
3                 NaT  
4                 NaT  


In [5]:
print("\nBasic info of our new merged table:")
print(visits_cart.info)


Basic info of our new merged table:
<bound method DataFrame.info of                                    user_id          visit_time  \
0     943647ef-3682-4750-a2e1-918ba6f16188 2017-04-07 15:14:00   
1     0c3a3dd0-fb64-4eac-bf84-ba069ce409f2 2017-01-26 14:24:00   
2     6e0b2d60-4027-4d9a-babd-0e7d40859fb1 2017-08-20 08:23:00   
3     6879527e-c5a6-4d14-b2da-50b85212b0ab 2017-11-04 18:15:00   
4     a84327ff-5daa-4ba1-b789-d5b4caf81e96 2017-02-27 11:25:00   
...                                    ...                 ...   
1995  33913ac2-03da-45ae-8fc3-fea39df827c6 2017-03-25 03:29:00   
1996  4f850132-b99d-4623-80e6-6e61d003577e 2017-01-08 09:57:00   
1997  f0830b9b-1f5c-4e74-b63d-3f847cc6ce70 2017-09-07 12:56:00   
1998  b01bffa7-63ba-4cd3-9d93-eb1477c23831 2017-07-20 04:37:00   
1999  0336ca81-8d68-443f-9248-ac0b8ad147d5 2017-11-15 10:11:00   

               cart_time  
0                    NaT  
1    2017-01-26 14:44:00  
2    2017-08-20 08:31:00  
3                    NaT  
4  

#### To see how long is the Dataframe we can use the Python len method

In [6]:
length_merged_df = len(visits_cart)
print(f" The total number of  rows of the Dataframe is {length_merged_df}")

 The total number of  rows of the Dataframe is 2000


#### How many of the timestamps are null for the column cart_time?

In [7]:
is_null_times = visits_cart['cart_time'].isnull().sum()
print(f" The total number of timestamps from the cart_time column is {is_null_times}")


 The total number of timestamps from the cart_time column is 1652


### 3.  Get some info values and then we can calculate drop-off rate from visits to cart


In [8]:
not_in_cart = visits_cart['cart_time'].isnull().sum()
total_visitors = len(visits_cart)
percentage_not_in_cart = (not_in_cart / total_visitors) * 100
print(f"\nPercentage of visitors who did not add items to the cart: {percentage_not_in_cart:.2f}%")


Percentage of visitors who did not add items to the cart: 82.60%


### 4. Merge Cart and Checkout dataframes


In [9]:
cart_checkout = pd.merge(cart, checkout, how='left', on='user_id')
total_in_cart = len(cart_checkout)
not_checked_out = cart_checkout['checkout_time'].isnull().sum()
percentage_not_checked_out = (not_checked_out / total_in_cart) * 100
print(f"\nPercentage of users who added to cart but did not proceed to checkout: {percentage_not_checked_out:.2f}%")


Percentage of users who added to cart but did not proceed to checkout: 35.06%


### 5. Merge all Stages of the funnel and see

In [10]:
all_data = visits.merge(cart, how='left', on='user_id')\
                 .merge(checkout, how='left', on='user_id')\
                 .merge(purchase, how='left', on='user_id')
print("\nFull Funnel Data:")
display(all_data.head())


Full Funnel Data:


Unnamed: 0,user_id,visit_time,cart_time,checkout_time,purchase_time
0,943647ef-3682-4750-a2e1-918ba6f16188,2017-04-07 15:14:00,NaT,NaT,NaT
1,0c3a3dd0-fb64-4eac-bf84-ba069ce409f2,2017-01-26 14:24:00,2017-01-26 14:44:00,2017-01-26 14:54:00,2017-01-26 15:08:00
2,6e0b2d60-4027-4d9a-babd-0e7d40859fb1,2017-08-20 08:23:00,2017-08-20 08:31:00,NaT,NaT
3,6879527e-c5a6-4d14-b2da-50b85212b0ab,2017-11-04 18:15:00,NaT,NaT,NaT
4,a84327ff-5daa-4ba1-b789-d5b4caf81e96,2017-02-27 11:25:00,NaT,NaT,NaT


### 6. Calculate drop-off rate from checkout to purchase


In [11]:
total_checked_out = all_data['checkout_time'].notnull().sum()
not_purchased = all_data['purchase_time'].isnull() & all_data['checkout_time'].notnull()
percentage_not_purchased = (not_purchased.sum() / total_checked_out) * 100
print(f"\nPercentage of users who proceeded to checkout but did not purchase: {percentage_not_purchased:.2f}%")


Percentage of users who proceeded to checkout but did not purchase: 24.55%


### 7. Compute average time to purchase


##### Finally, we calculate the average time it takes for buyers to complete the purchase process on our shopping website.

In [12]:
all_data['time_to_purchase'] = all_data['purchase_time'] - all_data['visit_time']
mean_time_to_purchase = all_data['time_to_purchase'].mean()
print(f"\nAverage time to purchase: {mean_time_to_purchase} minutes")


Average time to purchase: 0 days 00:43:12.380952380 minutes


### 8. Some conlusions

- 1. A significant percentage of visitors drop off before adding items to their cart. This indicates a potential issue with product visibility or user experience on the site.
- 2. Some users add items to the cart but do not proceed to checkout, suggesting a need for improvements in cart-to-checkout conversion mechanisms.
- 3. A fraction of users abandon their purchase after proceeding to checkout, pointing towards potential payment or trust issues.
- 4. The average time to purchase can help refine marketing strategies to target users more effectively within this time window.