### Page Visits Funnel
Cool T-Shirts Inc. has asked you to analyze data on visits to their website. Your job is to build a funnel, which is a description of how many people continue to the next step of a multi-step process.

In this case, our funnel is going to describe the following process:

A user visits CoolTShirts.com
A user adds a t-shirt to their cart
A user clicks "checkout"
A user actually purchases a t-shirt


Inspect the DataFrames using print and head:

    visits lists all of the users who have visited the website
    cart lists all of the users who have added a t-shirt to their cart
    checkout lists all of the users who have started the checkout
    purchase lists all of the users who have purchased a t-shirt

Combine visits and cart using a left merge.

How long is your merged DataFrame?

How many of the timestamps are null for the column cart_time?

What do these null rows mean?

What percent of users who visited Cool T-Shirts Inc. ended up not placing a t-shirt in their cart?

Repeat the left merge for cart and checkout and count null values. What percentage of users put items in their cart, but did not proceed to checkout?

Merge all four steps of the funnel, in order, using a series of left merges. Save the results to the variable all_data.

Examine the result using print and head.

What percentage of users proceeded to checkout, but did not purchase a t-shirt?


Which step of the funnel is weakest (i.e., has the highest percentage of users not completing it)?

How might Cool T-Shirts Inc. change their website to fix this problem?

Using the giant merged DataFrame all_data that you created, let's calculate the average time from initial visit to final purchase.

Examine the results

Calculate the average time to purchase

In [4]:
import pandas as pd

visits = pd.read_csv('visits1.csv', parse_dates=[1])
cart = pd.read_csv('cart.csv', parse_dates=[1])
checkout = pd.read_csv('checkout.csv', parse_dates=[1])
purchase = pd.read_csv('purchase.csv', parse_dates=[1])

print(visits.head(4))
print(cart.head(4))
print(checkout.head(4))
print(purchase.head(4))

visits_cart = pd.merge(visits, cart, how='left')
print(len(visits_cart))
print(len(visits_cart[visits_cart.cart_time.isnull()]))

# Percentage of carts that were empty
num_empty = float(len(visits_cart[visits_cart.cart_time.isnull()]))
percent_empty = (num_empty / len(visits)) * 100 
print(percent_empty, '%')

cart_checkout = pd.merge(cart, checkout, how='left')
num_no_checkout = float(len(cart_checkout[cart_checkout.checkout_time.isnull()]))
percent_no_checkout = (num_no_checkout / len(cart_checkout)) * 100
print(percent_no_checkout, '%')

all_data = visits.merge(cart, how='left').merge(checkout, how='left').merge(purchase, how='left')

print(all_data.head(10))

len_all = len(all_data)
null_checkout = len(all_data[all_data.checkout_time.isnull()])
did_checkout = float(len_all - null_checkout)
null_purchase = len(all_data[all_data.purchase_time.isnull()])
did_purchase = float(len_all - null_purchase)

checked_noPurchase = (did_checkout - did_purchase) / len_all
print( checked_noPurchase, '%')

all_data['time_to_purchase'] = all_data.purchase_time - all_data.visit_time
print(all_data.time_to_purchase)
print(all_data.time_to_purchase.mean())

                                user_id          visit_time
0  943647ef-3682-4750-a2e1-918ba6f16188 2017-04-07 15:14:00
1  0c3a3dd0-fb64-4eac-bf84-ba069ce409f2 2017-01-26 14:24:00
2  6e0b2d60-4027-4d9a-babd-0e7d40859fb1 2017-08-20 08:23:00
3  6879527e-c5a6-4d14-b2da-50b85212b0ab 2017-11-04 18:15:00
                                user_id           cart_time
0  2be90e7c-9cca-44e0-bcc5-124b945ff168 2017-11-07 20:45:00
1  4397f73f-1da3-4ab3-91af-762792e25973 2017-05-27 01:35:00
2  a9db3d4b-0a0a-4398-a55a-ebb2c7adf663 2017-03-04 10:38:00
3  b594862a-36c5-47d5-b818-6e9512b939b3 2017-09-27 08:22:00
                                user_id       checkout_time
0  d33bdc47-4afa-45bc-b4e4-dbe948e34c0d 2017-06-25 09:29:00
1  4ac186f0-9954-4fea-8a27-c081e428e34e 2017-04-07 20:11:00
2  3c9c78a7-124a-4b77-8d2e-e1926e011e7d 2017-07-13 11:38:00
3  89fe330a-8966-4756-8f7c-3bdbcd47279a 2017-04-20 16:15:00
                                user_id       purchase_time
0  4b44ace4-2721-47a0-b24b-15fbfa2abf85 