# Homework 2 - eCommerce analytics

<font size="2">E-commerce, also known as electronic commerce or internet commerce, refers to the buying and selling of goods or services using the internet, and the transfer of money and data to execute these transactions. The first e-commerce implementations date back to the 1990s and since then, millions of people every day visit some e-commerce sites to look for some product or service and, eventually, to purchase it.

You have been hired as a data scientist from a big multi-category online store. You and your team have been required to perform an analysis of the customer behavior in the store. Each row in the dataset represents an event, which catches different interactions (views, a product added/removed to/from the cart, purchases) of customers with your e-commerce. All events are related to products and users.

Your goal is to answer some research questions (RQs) that may help us discover and interpret meaningful patterns in data and eventually increase the number of sales.</font>

## Libraries

In [59]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

## Importing datasets

<font size="3">**NOTE!!!** 
    The _**category_code**_  and _**brand**_ columns of the dataset may have missing values.</font>

In [60]:
# Importing the file 2019-Oct.csv and converting the values of the column "event_time" from string to type datetime64

dataset_oct = pd.read_csv('./data/2019-Oct.csv', 
                      header='infer', 
                      parse_dates=['event_time'],
                     date_parser=pd.to_datetime, nrows=1200000)

In [22]:
# Importing the file 2019-Nov.csv and converting the values of the column "event_time" from string to timestamp

dataset_nov = pd.read_csv('./data/2019-Nov.csv', 
                      header='infer', 
                      parse_dates=['event_time'],
                     date_parser=pd.to_datetime, nrows=1200000)

In [23]:
# Concatenating the datasets info on single dataset
dataset = pd.concat([dataset_oct, dataset_nov])

## Let's visualize our dataset

In [26]:
dataset.head()

Unnamed: 0,event_time,event_type,product_id,category_id,category_code,brand,price,user_id,user_session
0,2019-10-01 00:00:00+00:00,view,44600062,2103807459595387724,,shiseido,35.79,541312140,72d76fde-8bb3-4e00-8c23-a032dfed738c
1,2019-10-01 00:00:00+00:00,view,3900821,2053013552326770905,appliances.environment.water_heater,aqua,33.2,554748717,9333dfbd-b87a-4708-9857-6336556b0fcc
2,2019-10-01 00:00:01+00:00,view,17200506,2053013559792632471,furniture.living_room.sofa,,543.1,519107250,566511c2-e2e3-422b-b695-cf8e6e792ca8
3,2019-10-01 00:00:01+00:00,view,1307067,2053013558920217191,computers.notebook,lenovo,251.74,550050854,7c90fc70-0e80-4590-96f3-13c02c18c713
4,2019-10-01 00:00:04+00:00,view,1004237,2053013555631882655,electronics.smartphone,apple,1081.98,535871217,c6bd7419-2748-4c56-95b4-8cec9ff8b80d


## Research questions

### [RQ1] 

A marketing funnel describes your customer’s journey with your e-commerce. It may involve different stages, beginning when someone learns about your business, when he/she visits your website for the first time, to the purchasing stage, marketing funnels map routes to conversion and beyond. Suppose your funnel involves just three simple steps: 1) view, 2) cart, 3) purchase. Which is the rate of complete funnels?

### [RQ1.1]

What’s the operation users repeat more on average within a session? Produce a plot that shows the average number of times users perform each operation (view/removefromchart etc etc).

In [127]:
# Group by user_id, user_session and event_type in order to count the occurences for all the events. 

dataset.groupby([dataset.user_id, dataset.user_session, dataset.event_type]).event_type.count()

user_id    user_session                          event_type
244951053  91769fdf-461b-4e43-9c73-88a07481b75c  view           2
260013793  70d27bfa-e05f-4f30-b533-0ffc1587f216  view          23
274969076  705a85e5-d17d-4a33-8fbb-d8c3a6518b73  view           1
           e5b8bbcc-0184-4510-8d80-6a6120fda838  view           2
275256741  48b5b9c0-3d1b-4380-94f8-dcadb9dd7b5c  view           1
                                                               ..
566540240  cfd7e416-a0d6-4b43-8977-1e14c1a6046f  view           1
566540256  071e6ea0-43b2-4523-a659-dd5d163c43b9  view           1
566540261  c4b3a0ac-eb04-4a2c-b973-5fedfe552a42  view           1
566540269  c0f5a036-5192-4290-a6f5-c782a9667677  view           2
566540355  1580400e-8c77-4361-b978-7d093604c44c  view           1
Name: event_type, Length: 591308, dtype: int64