![Masterclass](logo.png)
# Project

The `.csv` files in this repo contain a subset of data pertaining to
users visiting the Gordon Ramsay MasterClass course marketing page for a
certain period of time. This data is captured by Segment's analytics.js
library and passed to Redshift, Amplitude, and other platforms. Please
find more context on the data provided below.

# Context 
This is data pulled from 11/1/2017 to 11/7/2017 of various
activity by individuals who had visited the Gordon Ramsay course
marketing page within the same period of time.

# Relevant Pages

-   [Homepage](https://www.masterclass.com/) 

-   [Gordon Ramsay course marketing page](https://www.masterclass.com/classes/gordon-ramsay-teaches-cooking)
    (also known as a marketing landing page)

# Files

-   `pages.csv` - major pageviews (homepage and course marketing page). 

-   `homepage_click.csv` - any click on the homepage 

-   `course_marketing_click.csv` - any click on the course marketing page
    (except purchase click) 

-   `purchase_click.csv` - any "take the class/give as a gift" purchase on
    the course marketing page 

-   `purchased_class.csv` - when a user purchases a class or an annual-pass.
    When a user purchases multiple items, there will be one row per item
    purchase.

These tables can give you a story of where the user went after they
landed and viewed one of our pages. Pages will give you an idea of where
they viewed, and then `hompage_click` and `course_marketing_click` are
clicks on those marketing pages. Then, once they begin the checkout
process with purchase_click, they finalize the process with
`purchased_class`.

# Fields

-   `anonymous_id` : an identifier given to unique device session 

-   `received_at`: when the event or page view occurred 

-   `location`: place on the page where the event occurred 

-   `action`: descriptor given to event 

-   `channel_grouping`: marketing bucket given to source of traffic 
    -    `"paid"`: acquired via paid traffic 
    -   `organic-social-pr`: free traffic via referrals, social networks, PR
    stories, etc. 
    -   `null`: equivalent to organics 

-   `traffic_source`: origin of how the user came to the website ●
- `ad_type`: type of ad (e.g. video) 

-   `acquisition_type`: type of user that the marketing ad was intended
    towards 

-   `prospecting`: advertising to users who hadn't visited the website in
    at least 14 days 
-   `remarketing`: advertising to users who had visited the website in 14
    days  
- `lifecycle`: advertising to users who have made a purchase
    and/or enrolled 

We are looking for this Data Analyst to be both reactive and proactive.
In this case, we want you to look at the data and pull insights about
the user behavior. When finished, please compile your response in
a Jupyter Notebook.


# Summary Of Findings

Only 2% of sessions' first click on Gordon's class ended up with their last click being Gordons.
Overall, strong organics, paid acquistion can be explored further.

Although organic has brought in more total revenue and users, paid still brings in higher value per user. This also shows organics are very strong for MasterClass

More total users were purchasing for them selves than for others. 
User's on average spent more when they were purchasing for themselves.

Strongest ad type was video in terms of user value and volume, sitelinks were strong but less users converted. 
No ad type proves again organics are strong.

Traffic type again shows strong organics. 
Google search also seems to drive good revenue and the next highest user acquistion.
Facebook(Instagram included) typically drives high value users so pretty standard to see there.

Remarketing or reengagement seems to drive high value and those coming back are expected to overall spend more. Organics seem strong here too.

You can see all my work and thought proccess below, feel free to ask any additional questions. 
The data set could have a lot more analysis done upon it.
If you would like me to continue this project as a pt.2, things I can/would add are:
- Lifetime value analylsis (LTV) Using the 'lifetimes' or scikit learn (simple linear regression) on idividual users
- user_agent analysis e.g. device type/browser type, video and location
- Max/Min and average session length see how long users sessions typically last

Thought Process
- When first receiving the data set I  wanted to see what each .csv contained. I chose the python library pandas because it makes data manipulation easy with DataFrames. Once I opened each .csv things I looked for are columns, datatypes, null values to fill etc. I renamed any columns that shared the same column name.

- All the csv's have a different amount of rows and columns. Due to this fact I wanted to use a column that all the .csv's shared which was anonymous_id. It was described to be a unique identifier for each session as well so I decided to join the other .csv's on that column. When I join the .csv's I was able to see the total amount of clicks/events each user has made in a session. I am making the assumption that every user goes through major pageviews (homepage and course marketing page) so all users further down the funnel should be in the pages.csv dataset.

I decided to aggregate on acquistion metrics and sessions.
I want to look at conversion rate of users who initially clicked on gordon-ramsay to purchasing the class as well as bucketing what type of genre the other purchases were made.

In [1]:
# Dependancies
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('ggplot')
import os
from functools import reduce

pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)

In [2]:
# Load .csvs into pandas DataFrames.
pages = pd.read_csv('pages.csv')
homepage_click = pd.read_csv('homepage_click.csv')
course_marketing_click = pd.read_csv('course_marketing_click.csv')
purchased_class = pd.read_csv('purchased_class.csv')
purchase_click = pd.read_csv('purchase_click.csv')

# Pages

In [3]:
# Inspecting pages.csv
pages.head()

Unnamed: 0,anonymous_id,received_at,name,class,channel_grouping,traffic_source,ad_type,acquisition_type,user_agent
0,faff1903-357c-44e8-b98e-2d36d8be5832,11/01/2017 00:01:13,Course Marketing,gordon-ramsay-teaches-cooking,organic-social-pr,website,gr_mainpage,prospecting,Mozilla/5.0 (Windows NT 6.1; Win64; x64) Apple...
1,cb41781f-feb6-47ed-abe1-867716a0bc34,11/01/2017 00:01:39,Course Marketing,gordon-ramsay-teaches-cooking,paid,facebook,video,remarketing,Mozilla/5.0 (iPhone; CPU iPhone OS 11_0_1 like...
2,f48cb91d-4e6c-42ad-b32b-6e532c1b49b0,11/01/2017 00:02:07,Course Marketing,gordon-ramsay-teaches-cooking,,,,,Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6...
3,f48cb91d-4e6c-42ad-b32b-6e532c1b49b0,11/01/2017 00:01:37,Home,,,,,,Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6...
4,b8d1d717-f4b1-4d39-9383-f63b32b74fce,11/01/2017 00:04:27,Course Marketing,gordon-ramsay-teaches-cooking,paid,facebook,video,remarketing,Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6...


In [4]:
pages.dtypes

anonymous_id        object
received_at         object
name                object
class               object
channel_grouping    object
traffic_source      object
ad_type             object
acquisition_type    object
user_agent          object
dtype: object

In [5]:
pages.describe()

Unnamed: 0,anonymous_id,received_at,name,class,channel_grouping,traffic_source,ad_type,acquisition_type,user_agent
count,146264,146264,146264,114932,71188,71182,71468,71181,146264
unique,66735,123206,2,27,9,31,77,5,23502
top,8dcae0d7-6a70-45ee-a0ad-b835cf4112c3,11/07/2017 20:47:45,Course Marketing,gordon-ramsay-teaches-cooking,paid,instagram,video,prospecting,Mozilla/5.0 (Windows NT 10.0; Win64; x64) Appl...
freq,109,8,114932,78542,68454,27815,53886,66292,13495


In [6]:
# channel_group: null: equivalent to organics so I will replace all NaN with organic. Included traffic_source and
# aquisition_type.
pages['channel_grouping'] = pages['channel_grouping'].fillna('organic')
pages['traffic_source'] = pages['traffic_source'].fillna('organic')
pages['acquisition_type'] = pages['acquisition_type'].fillna('organic')

# rename received_at to pages_received at.
pages = pages.rename(columns={'received_at':'pages_received_at'})
pages = pages.rename(columns={'class':'pages_class'})

# replacing all NaN with 'none where applies.
pages['pages_class'] = pages['pages_class'].fillna('none')
pages['ad_type'] = pages['ad_type'].fillna('none')

pages.head()

Unnamed: 0,anonymous_id,pages_received_at,name,pages_class,channel_grouping,traffic_source,ad_type,acquisition_type,user_agent
0,faff1903-357c-44e8-b98e-2d36d8be5832,11/01/2017 00:01:13,Course Marketing,gordon-ramsay-teaches-cooking,organic-social-pr,website,gr_mainpage,prospecting,Mozilla/5.0 (Windows NT 6.1; Win64; x64) Apple...
1,cb41781f-feb6-47ed-abe1-867716a0bc34,11/01/2017 00:01:39,Course Marketing,gordon-ramsay-teaches-cooking,paid,facebook,video,remarketing,Mozilla/5.0 (iPhone; CPU iPhone OS 11_0_1 like...
2,f48cb91d-4e6c-42ad-b32b-6e532c1b49b0,11/01/2017 00:02:07,Course Marketing,gordon-ramsay-teaches-cooking,organic,organic,none,organic,Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6...
3,f48cb91d-4e6c-42ad-b32b-6e532c1b49b0,11/01/2017 00:01:37,Home,none,organic,organic,none,organic,Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6...
4,b8d1d717-f4b1-4d39-9383-f63b32b74fce,11/01/2017 00:04:27,Course Marketing,gordon-ramsay-teaches-cooking,paid,facebook,video,remarketing,Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6...


In [7]:
# Context: This is data pulled from 11/1/2017 to 11/7/2017 of various activity by individuals who 
# had visited the Gordon Ramsay course marketing page within the same period of time.

# See all the other pages users who visited 'gordon-ramsay-teaches-cooking' looked at.
pages['pages_class'].value_counts()

gordon-ramsay-teaches-cooking                             78542
none                                                      31332
samuel-l-jackson-teaches-acting                            3747
steve-martin-teaches-comedy                                2537
garry-kasparov-teaches-chess                               2480
martin-scorsese-teaches-filmmaking                         2303
jane-goodall-teaches-conservation                          2204
deadmau5-teaches-electronic-music-production               2161
judy-blume-teaches-writing                                 2042
christina-aguilera-teaches-singing                         1863
frank-gehry-teaches-design-and-architecture                1768
hans-zimmer-teaches-film-scoring                           1733
james-patterson-teaches-writing                            1525
aaron-sorkin-teaches-screenwriting                         1463
serena-williams-teaches-tennis                             1421
herbie-hancock-teaches-jazz             

In [8]:
# Find each unique class.
classes = list(pages['pages_class'].unique())
classes

['gordon-ramsay-teaches-cooking',
 'none',
 'hans-zimmer-teaches-film-scoring',
 'garry-kasparov-teaches-chess',
 'dustin-hoffman-teaches-acting',
 'jane-goodall-teaches-conservation',
 'frank-gehry-teaches-design-and-architecture',
 'martin-scorsese-teaches-filmmaking',
 'werner-herzog-teaches-filmmaking',
 'aaron-sorkin-teaches-screenwriting',
 'serena-williams-teaches-tennis',
 'reba-mcentire-teaches-country-music',
 'herbie-hancock-teaches-jazz',
 'shonda-rhimes-teaches-writing-for-television',
 'steve-martin-teaches-comedy',
 'deadmau5-teaches-electronic-music-production',
 'samuel-l-jackson-teaches-acting',
 'usher-teaches-the-art-of-performance',
 'kevin-spacey-teaches-acting',
 'james-patterson-teaches-writing',
 'christina-aguilera-teaches-singing',
 'judy-blume-teaches-writing',
 'diane-von-furstenberg-teaches-building-a-fashion-brand',
 'annie-leibovitz-teaches-photography',
 'david-mamet-teaches-dramatic-writing',
 'ron-howard-teaches-directing',
 'ron-howard-teaches-filmma

In [9]:
# Parsing last word of each class to create identifier.
bins = list(set([cls.split('-')[-1] for cls in classes]))
bins

['filmmaking',
 'television',
 'writing',
 'journalism',
 'chess',
 'music',
 'brand',
 'architecture',
 'tennis',
 'screenwriting',
 'jazz',
 'performance',
 'conservation',
 'singing',
 'acting',
 'cooking',
 'directing',
 'production',
 'none',
 'comedy',
 'scoring',
 'photography']

In [10]:
# Categorize Bins
bins = {
    'theatre':[
        'filmmaking',
        'scoring',
        'screenwriting',
        'production',
        'television'
        'acting',
        'performance',
        'comedy',
        'directing'
    ],
    'sports':['tennis'],
    'music' :[
        'singing',
        'jazz',
        'music'
    ],
    'journalism' :[
        'journalism',
        'photography',
        'writing'
    ],
    'games' : ['chess'],
    'cooking' : ['cooking'],
    'activism' : ['conservation'],
    'clothing' : ['brand'],
    'architecture' : ['architecture'],
    'annual_pass':['pass']}

In [11]:
# Function will parse through each row on the 'pages_class' column to see if the last item matches with the bin.
def binning(row):
    if row['pages_class'].split('-')[-1] in bins['theatre']:
        return 'theatre'
    if row['pages_class'].split('-')[-1] in bins['sports']:
        return 'sports'
    if row['pages_class'].split('-')[-1] in bins['music']:
        return 'music'
    if row['pages_class'].split('-')[-1] in bins['journalism']:
        return 'journalism' 
    if row['pages_class'].split('-')[-1] in bins['games']:
        return 'games'
    if row['pages_class'].split('-')[-1] in bins['cooking']:
        return 'cooking'
    if row['pages_class'].split('-')[-1] in bins['activism']:
        return 'conservation'
    if row['pages_class'].split('-')[-1] in bins['clothing']:
        return 'brand'
    if row['pages_class'].split('-')[-1] in bins['architecture']:
        return 'architecture'
    if row['pages_class'].split('-')[-1] in bins['annual_pass']:
        return 'annual_pass'
    else:
        return 'none'

In [12]:
# Adding column 'class_genre'
pages['first_click_genre'] = pages.apply(lambda row: binning (row),axis=1)

In [13]:
# Check if the function applied correctly.
pages.head()

Unnamed: 0,anonymous_id,pages_received_at,name,pages_class,channel_grouping,traffic_source,ad_type,acquisition_type,user_agent,first_click_genre
0,faff1903-357c-44e8-b98e-2d36d8be5832,11/01/2017 00:01:13,Course Marketing,gordon-ramsay-teaches-cooking,organic-social-pr,website,gr_mainpage,prospecting,Mozilla/5.0 (Windows NT 6.1; Win64; x64) Apple...,cooking
1,cb41781f-feb6-47ed-abe1-867716a0bc34,11/01/2017 00:01:39,Course Marketing,gordon-ramsay-teaches-cooking,paid,facebook,video,remarketing,Mozilla/5.0 (iPhone; CPU iPhone OS 11_0_1 like...,cooking
2,f48cb91d-4e6c-42ad-b32b-6e532c1b49b0,11/01/2017 00:02:07,Course Marketing,gordon-ramsay-teaches-cooking,organic,organic,none,organic,Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6...,cooking
3,f48cb91d-4e6c-42ad-b32b-6e532c1b49b0,11/01/2017 00:01:37,Home,none,organic,organic,none,organic,Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6...,none
4,b8d1d717-f4b1-4d39-9383-f63b32b74fce,11/01/2017 00:04:27,Course Marketing,gordon-ramsay-teaches-cooking,paid,facebook,video,remarketing,Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6...,cooking


# Homepage_click

In [14]:
# Inspecting homepage_click.csv
# Rename columns that share names with other .csv's
homepage_click = homepage_click.rename(columns={'received_at':'homepage_received_at'})
homepage_click = homepage_click.rename(columns={'action':'homepage_action'})
homepage_click = homepage_click.rename(columns={'class':'homepage_class'})
homepage_click = homepage_click.rename(columns={'location':'homepage_location'})
homepage_click.head()

Unnamed: 0,anonymous_id,homepage_received_at,homepage_action,homepage_class,homepage_location
0,e921f531-128f-4e71-922d-28f71d65dc93,11/1/2017 0:15:58,gordon-ramsay-teaches-cooking,,tile
1,e921f531-128f-4e71-922d-28f71d65dc93,11/1/2017 0:14:24,steve-martin-teaches-comedy,,hero
2,e921f531-128f-4e71-922d-28f71d65dc93,11/1/2017 0:14:23,samuel-l-jackson-teaches-acting,,hero
3,e921f531-128f-4e71-922d-28f71d65dc93,11/1/2017 0:14:21,martin-scorsese-teaches-filmmaking,,hero
4,399c0019-7367-43c1-88e3-0d9a74885710,11/1/2017 0:27:39,gordon-ramsay-teaches-cooking,,hero


In [15]:
homepage_click.dtypes

anonymous_id            object
homepage_received_at    object
homepage_action         object
homepage_class          object
homepage_location       object
dtype: object

In [16]:
homepage_click.describe()

Unnamed: 0,anonymous_id,homepage_received_at,homepage_action,homepage_class,homepage_location
count,110258,110258,109178,342,110258
unique,11439,90635,31,1,5
top,0b6f6ec6-c96f-4240-ab76-1f847e1e8760,11/5/2017 21:41:01,gordon-ramsay-teaches-cooking,deadmau5-teaches-electronic-music-production,hero
freq,271,19,13677,342,98183


In [17]:
homepage_click['homepage_class'].value_counts()

deadmau5-teaches-electronic-music-production    342
Name: homepage_class, dtype: int64

In [18]:
homepage_click['homepage_action'].value_counts()

gordon-ramsay-teaches-cooking                             13677
martin-scorsese-teaches-filmmaking                         6808
judy-blume-teaches-writing                                 6762
samuel-l-jackson-teaches-acting                            6619
steve-martin-teaches-comedy                                5128
deadmau5-teaches-electronic-music-production               4865
shonda-rhimes-teaches-writing-for-television               4261
diane-von-furstenberg-teaches-building-a-fashion-brand     4174
aaron-sorkin-teaches-screenwriting                         4142
hans-zimmer-teaches-film-scoring                           4094
dustin-hoffman-teaches-acting                              4068
jane-goodall-teaches-conservation                          3976
garry-kasparov-teaches-chess                               3880
james-patterson-teaches-writing                            3846
christina-aguilera-teaches-singing                         3758
herbie-hancock-teaches-jazz             

# Course_marketing_click

In [19]:
# Inspecting course_marketing_click.csv
# Rename columns that share names with other .csv's
course_marketing_click = course_marketing_click.rename(columns={'received_at':'cmclick_received_at'})
course_marketing_click = course_marketing_click.rename(columns={'class':'cmclick_class'})
course_marketing_click = course_marketing_click.rename(columns={'location':'cmclick_location'})
course_marketing_click = course_marketing_click.rename(columns={'action':'cmclick_action'})
course_marketing_click.head()

Unnamed: 0,anonymous_id,cmclick_received_at,cmclick_class,cmclick_location,cmclick_action,video,video_carousel_number
0,b8d1d717-f4b1-4d39-9383-f63b32b74fce,11/1/2017 0:04:32,gordon-ramsay-teaches-cooking,hero,play-trailer,trailer,
1,074b9167-b7f3-4f0d-8e13-c93dc9d2ba6a,11/1/2017 0:05:19,aaron-sorkin-teaches-screenwriting,hero,play-trailer,trailer,
2,074b9167-b7f3-4f0d-8e13-c93dc9d2ba6a,11/1/2017 0:09:35,gordon-ramsay-teaches-cooking,hero,play-trailer,trailer,
3,45d158e5-2ff7-4aab-b6ad-70dcc27ebaa9,11/1/2017 0:10:04,gordon-ramsay-teaches-cooking,hero,play-trailer,trailer,
4,074b9167-b7f3-4f0d-8e13-c93dc9d2ba6a,11/1/2017 0:11:56,frank-gehry-teaches-design-and-architecture,hero,play-trailer,trailer,


In [20]:
course_marketing_click.dtypes

anonymous_id              object
cmclick_received_at       object
cmclick_class             object
cmclick_location          object
cmclick_action            object
video                     object
video_carousel_number    float64
dtype: object

In [21]:
course_marketing_click.describe()

Unnamed: 0,video_carousel_number
count,7145.0
mean,2.017355
std,0.91245
min,1.0
25%,1.0
50%,2.0
75%,3.0
max,6.0


In [22]:
course_marketing_click['cmclick_class'].value_counts()

gordon-ramsay-teaches-cooking                             44327
samuel-l-jackson-teaches-acting                            2749
hans-zimmer-teaches-film-scoring                           2165
steve-martin-teaches-comedy                                1871
garry-kasparov-teaches-chess                               1773
deadmau5-teaches-electronic-music-production               1718
christina-aguilera-teaches-singing                         1646
aaron-sorkin-teaches-screenwriting                         1480
martin-scorsese-teaches-filmmaking                         1348
jane-goodall-teaches-conservation                          1295
judy-blume-teaches-writing                                 1155
frank-gehry-teaches-design-and-architecture                1069
james-patterson-teaches-writing                            1040
dustin-hoffman-teaches-acting                               911
diane-von-furstenberg-teaches-building-a-fashion-brand      894
shonda-rhimes-teaches-writing-for-televi

# Purchase_click

In [23]:
# Inspecting purchase_click.csv
# Rename columns that share names with other .csv's
purchase_click = purchase_click.rename(columns={'received_at':'purchase_received_at'})
purchase_click = purchase_click.rename(columns={'class':'purchase_class'})
purchase_click = purchase_click.rename(columns={'location':'purchase_location'})
purchase_click = purchase_click.rename(columns={'action':'purchase_action'})
purchase_click.head()

Unnamed: 0,anonymous_id,purchase_received_at,purchase_class,purchase_location,purchase_action
0,9be8d642-3000-45db-970f-aedbc9d9ee3c,11/1/2017 0:24:58,gordon-ramsay-teaches-cooking,hero,primary
1,21862340-a8fb-4e6f-bca7-85f5cf1d2f68,11/1/2017 0:36:47,gordon-ramsay-teaches-cooking,video-carousel,primary
2,13d9d32f-a11b-489e-9dda-740442d60961,11/1/2017 0:37:53,gordon-ramsay-teaches-cooking,hero,primary
3,13d9d32f-a11b-489e-9dda-740442d60961,11/1/2017 0:37:19,gordon-ramsay-teaches-cooking,hero,primary
4,abe3e8aa-b323-47d8-b7e0-2507ee081646,11/4/2017 21:53:28,gordon-ramsay-teaches-cooking,hero,primary


In [24]:
purchase_click.dtypes

anonymous_id            object
purchase_received_at    object
purchase_class          object
purchase_location       object
purchase_action         object
dtype: object

In [25]:
purchase_click.describe()

Unnamed: 0,anonymous_id,purchase_received_at,purchase_class,purchase_location,purchase_action
count,8846,8846,8814,8846,8845
unique,6816,8598,24,8,2
top,c609e8ac-19fd-4c2f-9b38-83dc5261ba8e,11/3/2017 13:12:37,gordon-ramsay-teaches-cooking,hero,primary
freq,26,6,7028,5397,7098


In [26]:
purchase_click['purchase_class'].value_counts()

gordon-ramsay-teaches-cooking                             7028
garry-kasparov-teaches-chess                               204
frank-gehry-teaches-design-and-architecture                138
martin-scorsese-teaches-filmmaking                         129
samuel-l-jackson-teaches-acting                            126
deadmau5-teaches-electronic-music-production               124
steve-martin-teaches-comedy                                103
jane-goodall-teaches-conservation                           92
james-patterson-teaches-writing                             91
hans-zimmer-teaches-film-scoring                            89
christina-aguilera-teaches-singing                          88
judy-blume-teaches-writing                                  80
diane-von-furstenberg-teaches-building-a-fashion-brand      74
aaron-sorkin-teaches-screenwriting                          70
herbie-hancock-teaches-jazz                                 61
shonda-rhimes-teaches-writing-for-television           

# Purchased_class

In [27]:
# Inspecting purchased_class.csv
# Rename columns that share names with other .csv's
purchased_class = purchased_class.rename(columns={'received_at':'purchase_class_received_at'})
purchased_class = purchased_class.rename(columns={'product_id':'pages_class'})
purchased_class.head()

Unnamed: 0,anonymous_id,purchase_class_received_at,pages_class,total,revenue,discount,is_gift
0,13d9d32f-a11b-489e-9dda-740442d60961,11/1/2017 0:39,gordon-ramsay-teaches-cooking,90,90,0,f
1,47c79436-b6e8-4009-a5e4-b82a0a32e93b,11/1/2017 1:07,gordon-ramsay-teaches-cooking,90,90,0,f
2,83259ee8-4de6-4748-94a3-1f6646c9fd69,11/1/2017 1:45,shonda-rhimes-teaches-writing-for-television,90,90,0,f
3,c44ec613-e294-42c7-b1cf-26418190fd98,11/1/2017 2:43,gordon-ramsay-teaches-cooking,90,90,0,f
4,5016b713-1269-45bf-b868-e35db22c458a,11/1/2017 3:47,werner-herzog-teaches-filmmaking,90,90,0,f


In [28]:
purchased_class.dtypes

anonymous_id                  object
purchase_class_received_at    object
pages_class                   object
total                          int64
revenue                        int64
discount                       int64
is_gift                       object
dtype: object

In [29]:
# is_gift works as booloean, t and f for True and False.
purchased_class['is_gift'].value_counts()

f    392
t    122
Name: is_gift, dtype: int64

In [30]:
purchased_class.describe()

Unnamed: 0,total,revenue,discount
count,514.0,514.0,514.0
mean,92.11284,92.11284,3.677043
std,36.146158,36.146158,17.83345
min,0.0,0.0,0.0
25%,90.0,90.0,0.0
50%,90.0,90.0,0.0
75%,90.0,90.0,0.0
max,180.0,180.0,90.0


In [31]:
purchased_class['pages_class'].value_counts()

gordon-ramsay-teaches-cooking                             325
annual-pass                                                69
aaron-sorkin-teaches-screenwriting                         11
dustin-hoffman-teaches-acting                              11
martin-scorsese-teaches-filmmaking                          9
deadmau5-teaches-electronic-music-production                9
garry-kasparov-teaches-chess                                8
steve-martin-teaches-comedy                                 7
frank-gehry-teaches-design-and-architecture                 7
jane-goodall-teaches-conservation                           7
werner-herzog-teaches-filmmaking                            6
david-mamet-teaches-dramatic-writing                        6
james-patterson-teaches-writing                             6
usher-teaches-the-art-of-performance                        4
herbie-hancock-teaches-jazz                                 4
samuel-l-jackson-teaches-acting                             4
hans-zim

In [32]:
# Added annual_pass to bins
purchased_classes = list(purchased_class['pages_class'].unique())
purchased_classes

['gordon-ramsay-teaches-cooking',
 'shonda-rhimes-teaches-writing-for-television',
 'werner-herzog-teaches-filmmaking',
 'frank-gehry-teaches-design-and-architecture',
 'david-mamet-teaches-dramatic-writing',
 'aaron-sorkin-teaches-screenwriting',
 'dustin-hoffman-teaches-acting',
 'martin-scorsese-teaches-filmmaking',
 'jane-goodall-teaches-conservation',
 'garry-kasparov-teaches-chess',
 'james-patterson-teaches-writing',
 'samuel-l-jackson-teaches-acting',
 'herbie-hancock-teaches-jazz',
 'deadmau5-teaches-electronic-music-production',
 'hans-zimmer-teaches-film-scoring',
 'diane-von-furstenberg-teaches-building-a-fashion-brand',
 'steve-martin-teaches-comedy',
 'judy-blume-teaches-writing',
 'serena-williams-teaches-tennis',
 'usher-teaches-the-art-of-performance',
 'christina-aguilera-teaches-singing',
 'annual-pass',
 'masterclass',
 'bob-woodward-teaches-investigative-journalism']

In [33]:
purchased_class['last_click_genre'] = purchased_class.apply (lambda row: binning (row),axis=1)

In [34]:
purchased_class.head()

Unnamed: 0,anonymous_id,purchase_class_received_at,pages_class,total,revenue,discount,is_gift,last_click_genre
0,13d9d32f-a11b-489e-9dda-740442d60961,11/1/2017 0:39,gordon-ramsay-teaches-cooking,90,90,0,f,cooking
1,47c79436-b6e8-4009-a5e4-b82a0a32e93b,11/1/2017 1:07,gordon-ramsay-teaches-cooking,90,90,0,f,cooking
2,83259ee8-4de6-4748-94a3-1f6646c9fd69,11/1/2017 1:45,shonda-rhimes-teaches-writing-for-television,90,90,0,f,none
3,c44ec613-e294-42c7-b1cf-26418190fd98,11/1/2017 2:43,gordon-ramsay-teaches-cooking,90,90,0,f,cooking
4,5016b713-1269-45bf-b868-e35db22c458a,11/1/2017 3:47,werner-herzog-teaches-filmmaking,90,90,0,f,theatre


In [35]:
# After function applied, rename to original column name.
purchased_class = purchased_class.rename(columns={'pages_class':'product_id'})

In [36]:
purchased_class.head()

Unnamed: 0,anonymous_id,purchase_class_received_at,product_id,total,revenue,discount,is_gift,last_click_genre
0,13d9d32f-a11b-489e-9dda-740442d60961,11/1/2017 0:39,gordon-ramsay-teaches-cooking,90,90,0,f,cooking
1,47c79436-b6e8-4009-a5e4-b82a0a32e93b,11/1/2017 1:07,gordon-ramsay-teaches-cooking,90,90,0,f,cooking
2,83259ee8-4de6-4748-94a3-1f6646c9fd69,11/1/2017 1:45,shonda-rhimes-teaches-writing-for-television,90,90,0,f,none
3,c44ec613-e294-42c7-b1cf-26418190fd98,11/1/2017 2:43,gordon-ramsay-teaches-cooking,90,90,0,f,cooking
4,5016b713-1269-45bf-b868-e35db22c458a,11/1/2017 3:47,werner-herzog-teaches-filmmaking,90,90,0,f,theatre


# Joining DataFrames

In [37]:
# I'll be splitting these columns into more useful datasets to make analysis on
dfs = [pages, homepage_click, purchased_class, purchase_click, course_marketing_click] 
merged_df = reduce(lambda left,right: pd.merge(left,right,on='anonymous_id'), dfs)

In [38]:
merged_df.head()

Unnamed: 0,anonymous_id,pages_received_at,name,pages_class,channel_grouping,traffic_source,ad_type,acquisition_type,user_agent,first_click_genre,homepage_received_at,homepage_action,homepage_class,homepage_location,purchase_class_received_at,product_id,total,revenue,discount,is_gift,last_click_genre,purchase_received_at,purchase_class,purchase_location,purchase_action,cmclick_received_at,cmclick_class,cmclick_location,cmclick_action,video,video_carousel_number
0,45d158e5-2ff7-4aab-b6ad-70dcc27ebaa9,11/03/2017 17:19:39,Home,none,organic,organic,none,organic,Mozilla/5.0 (iPad; CPU OS 11_1 like Mac OS X) ...,none,11/3/2017 17:19:44,,,enrolled-course-banner,11/2/2017 0:33,gordon-ramsay-teaches-cooking,90,90,0,f,cooking,11/2/2017 0:30:37,gordon-ramsay-teaches-cooking,hero,primary,11/1/2017 0:10:04,gordon-ramsay-teaches-cooking,hero,play-trailer,trailer,
1,45d158e5-2ff7-4aab-b6ad-70dcc27ebaa9,11/03/2017 17:19:39,Home,none,organic,organic,none,organic,Mozilla/5.0 (iPad; CPU OS 11_1 like Mac OS X) ...,none,11/3/2017 17:19:44,,,enrolled-course-banner,11/2/2017 0:33,gordon-ramsay-teaches-cooking,90,90,0,f,cooking,11/2/2017 0:30:37,gordon-ramsay-teaches-cooking,hero,primary,11/1/2017 0:12:00,gordon-ramsay-teaches-cooking,autoplay,play-gem,Make: Poached Eggs & Mushroom on Toast,
2,45d158e5-2ff7-4aab-b6ad-70dcc27ebaa9,11/03/2017 17:19:39,Home,none,organic,organic,none,organic,Mozilla/5.0 (iPad; CPU OS 11_1 like Mac OS X) ...,none,11/3/2017 17:19:44,,,enrolled-course-banner,11/2/2017 0:33,gordon-ramsay-teaches-cooking,90,90,0,f,cooking,11/2/2017 0:30:37,gordon-ramsay-teaches-cooking,hero,primary,11/1/2017 0:12:11,gordon-ramsay-teaches-cooking,video-carousel,play-gem,Make: Poached Eggs & Mushroom on Toast,
3,45d158e5-2ff7-4aab-b6ad-70dcc27ebaa9,11/03/2017 17:19:39,Home,none,organic,organic,none,organic,Mozilla/5.0 (iPad; CPU OS 11_1 like Mac OS X) ...,none,11/3/2017 17:19:44,,,enrolled-course-banner,11/2/2017 0:33,gordon-ramsay-teaches-cooking,90,90,0,f,cooking,11/2/2017 0:30:37,gordon-ramsay-teaches-cooking,hero,primary,11/1/2017 0:12:24,gordon-ramsay-teaches-cooking,video-carousel,play-gem,Make: Poached Eggs & Mushroom on Toast,
4,45d158e5-2ff7-4aab-b6ad-70dcc27ebaa9,11/03/2017 17:19:39,Home,none,organic,organic,none,organic,Mozilla/5.0 (iPad; CPU OS 11_1 like Mac OS X) ...,none,11/3/2017 17:19:44,,,enrolled-course-banner,11/2/2017 0:33,gordon-ramsay-teaches-cooking,90,90,0,f,cooking,11/2/2017 0:30:37,gordon-ramsay-teaches-cooking,hero,primary,11/1/2017 0:12:25,gordon-ramsay-teaches-cooking,video-carousel,play-gem,Make: Poached Eggs & Mushroom on Toast,


# Session Analysis

In [39]:
# Could find Max and Min for user session average session length
# Find percentage of users who started with gordon and ended with gordon, if not where did they end up?

In [40]:
events_dupes = [
    'anonymous_id',
    'pages_received_at',
    'homepage_received_at',
    'cmclick_received_at',
    'purchase_received_at',
    'purchase_class_received_at',
    'pages_class',
    'homepage_action',
    'cmclick_class',
    'purchase_class',
    'product_id',
    'first_click_genre',
    'last_click_genre'
]

In [41]:
# This DataFrame can be used to follow the time it took between initial clicks on the homepage 
# to the final purchase of a class.

# In addition we can see what clicks were made on the way to the final purchase and if their initial interest matched
# their purchase decision

events = merged_df[
    ['anonymous_id',
     'pages_received_at',
     'pages_class',
     'homepage_received_at',
     'homepage_action',
     'cmclick_received_at',
     'cmclick_class',
     'purchase_received_at',
     'purchase_class',
     'purchase_class_received_at',
     'product_id',
     'first_click_genre',
     'last_click_genre']
].drop_duplicates(events_dupes)

events.head()

Unnamed: 0,anonymous_id,pages_received_at,pages_class,homepage_received_at,homepage_action,cmclick_received_at,cmclick_class,purchase_received_at,purchase_class,purchase_class_received_at,product_id,first_click_genre,last_click_genre
0,45d158e5-2ff7-4aab-b6ad-70dcc27ebaa9,11/03/2017 17:19:39,none,11/3/2017 17:19:44,,11/1/2017 0:10:04,gordon-ramsay-teaches-cooking,11/2/2017 0:30:37,gordon-ramsay-teaches-cooking,11/2/2017 0:33,gordon-ramsay-teaches-cooking,none,cooking
1,45d158e5-2ff7-4aab-b6ad-70dcc27ebaa9,11/03/2017 17:19:39,none,11/3/2017 17:19:44,,11/1/2017 0:12:00,gordon-ramsay-teaches-cooking,11/2/2017 0:30:37,gordon-ramsay-teaches-cooking,11/2/2017 0:33,gordon-ramsay-teaches-cooking,none,cooking
2,45d158e5-2ff7-4aab-b6ad-70dcc27ebaa9,11/03/2017 17:19:39,none,11/3/2017 17:19:44,,11/1/2017 0:12:11,gordon-ramsay-teaches-cooking,11/2/2017 0:30:37,gordon-ramsay-teaches-cooking,11/2/2017 0:33,gordon-ramsay-teaches-cooking,none,cooking
3,45d158e5-2ff7-4aab-b6ad-70dcc27ebaa9,11/03/2017 17:19:39,none,11/3/2017 17:19:44,,11/1/2017 0:12:24,gordon-ramsay-teaches-cooking,11/2/2017 0:30:37,gordon-ramsay-teaches-cooking,11/2/2017 0:33,gordon-ramsay-teaches-cooking,none,cooking
4,45d158e5-2ff7-4aab-b6ad-70dcc27ebaa9,11/03/2017 17:19:39,none,11/3/2017 17:19:44,,11/1/2017 0:12:25,gordon-ramsay-teaches-cooking,11/2/2017 0:30:37,gordon-ramsay-teaches-cooking,11/2/2017 0:33,gordon-ramsay-teaches-cooking,none,cooking


In [42]:
# Drop any records with exactly the same entries in every column.
grouped_events = events.groupby(
    ['anonymous_id',
     'pages_received_at',
     'homepage_received_at',
     'cmclick_received_at',
     'purchase_received_at',
     'purchase_class_received_at',
     'pages_class',
     'homepage_action',
     'cmclick_class',
     'purchase_class',
     'product_id',
     'first_click_genre',
     'last_click_genre']
).count().reset_index()

In [43]:
# Count shows total number of unique sessions with at least one different attribute. 
grouped_events.describe()

Unnamed: 0,anonymous_id,pages_received_at,homepage_received_at,cmclick_received_at,purchase_received_at,purchase_class_received_at,pages_class,homepage_action,cmclick_class,purchase_class,product_id,first_click_genre,last_click_genre
count,25322732,25322732,25322732,25322732,25322732,25322732,25322732,25322732,25322732,25322732,25322732,25322732,25322732
unique,179,2488,2383,1034,386,201,25,31,24,22,23,10,11
top,2346530d-1162-4809-b85f-ef406f1641e2,11/03/2017 16:03:42,11/3/2017 22:19:34,11/4/2017 1:22:04,11/3/2017 16:02:32,11/4/2017 11:36,none,gordon-ramsay-teaches-cooking,judy-blume-teaches-writing,gordon-ramsay-teaches-cooking,gordon-ramsay-teaches-cooking,none,theatre
freq,22394610,546210,376380,439110,2488290,4478922,11304184,2574978,3151542,5730082,5837095,13765273,13850879


In [44]:
# DF can be used to see how many sessions stayed in the genre they first clicked on
grouped_events[['anonymous_id','first_click_genre','last_click_genre']].groupby(['first_click_genre','last_click_genre']).count()

Unnamed: 0_level_0,Unnamed: 1_level_0,anonymous_id
first_click_genre,last_click_genre,Unnamed: 2_level_1
architecture,annual_pass,4055
architecture,architecture,13904
architecture,brand,135
architecture,conservation,64
architecture,cooking,21562
architecture,journalism,1096
architecture,music,156
architecture,none,1056
architecture,theatre,1283
brand,annual_pass,3727


In [45]:
ramsay = grouped_events[(grouped_events['pages_class'] == 'gordon-ramsay-teaches-cooking') & (grouped_events['product_id']=='gordon-ramsay-teaches-cooking')]

In [46]:
non_ramsay = grouped_events[(grouped_events['pages_class'] != 'gordon-ramsay-teaches-cooking') & (grouped_events['product_id'] !='gordon-ramsay-teaches-cooking')]

In [47]:
non_ramsay['anonymous_id'].count()

18323798

In [48]:
# Sessions that started with Ramsay and ended with Ramsay
ramsay['anonymous_id'].count()

403832

In [49]:
# Only 2 percent of sessions initially started with Gordons class and ended with Gordon's class.
ramsay['anonymous_id'].count()/non_ramsay['anonymous_id'].count()

0.022038662508722263

# Acquistion Analysis

In [50]:
# Break down by traffic source/acquistion type
# Is gift of total revenue
# Organic vs non organic analysis

In [51]:
acq_dupes = [
    'anonymous_id',
    'total',
    'revenue',
    'is_gift',
    'name',
    'channel_grouping',
    'traffic_source',
    'ad_type',
    'acquisition_type',
    'user_agent',
    'purchase_location',
    'cmclick_location',
    'cmclick_action',
    'video',
    'video_carousel_number'
]

In [52]:
# Removed duplicates to see how much revenue each individual session produced baseds on anonymous_id.
# I made an assumption that although there may be duplicate anonymous_id's I would only remove any 
# duplicate sessions that were exactly identical.
acquistion = merged_df[
    ['anonymous_id',
     'total',
     'revenue',
     'is_gift',
     'name',
     'channel_grouping',
     'traffic_source',
     'ad_type',
     'acquisition_type',
     'user_agent',
     'purchase_location',
     'cmclick_location',
     'cmclick_action',
     'video',
     'video_carousel_number']
].drop_duplicates(acq_dupes)


In [53]:
acquistion.head()

Unnamed: 0,anonymous_id,total,revenue,is_gift,name,channel_grouping,traffic_source,ad_type,acquisition_type,user_agent,purchase_location,cmclick_location,cmclick_action,video,video_carousel_number
0,45d158e5-2ff7-4aab-b6ad-70dcc27ebaa9,90,90,f,Home,organic,organic,none,organic,Mozilla/5.0 (iPad; CPU OS 11_1 like Mac OS X) ...,hero,hero,play-trailer,trailer,
1,45d158e5-2ff7-4aab-b6ad-70dcc27ebaa9,90,90,f,Home,organic,organic,none,organic,Mozilla/5.0 (iPad; CPU OS 11_1 like Mac OS X) ...,hero,autoplay,play-gem,Make: Poached Eggs & Mushroom on Toast,
2,45d158e5-2ff7-4aab-b6ad-70dcc27ebaa9,90,90,f,Home,organic,organic,none,organic,Mozilla/5.0 (iPad; CPU OS 11_1 like Mac OS X) ...,hero,video-carousel,play-gem,Make: Poached Eggs & Mushroom on Toast,
6,45d158e5-2ff7-4aab-b6ad-70dcc27ebaa9,90,90,f,Home,organic,organic,none,organic,Mozilla/5.0 (iPad; CPU OS 11_1 like Mac OS X) ...,hero,video-carousel,play-gem,Class Trailer,
7,45d158e5-2ff7-4aab-b6ad-70dcc27ebaa9,90,90,f,Home,organic,organic,none,organic,Mozilla/5.0 (iPad; CPU OS 11_1 like Mac OS X) ...,hero,autoplay,play-gem,Kitchen Layout,


In [54]:
# A function could be made to process these similar DataFrames.
# I found the total revenue for each of the following and unique number of users for the following:
# channel_grouping, traffic_source, ad_type, acquisition_type.
# I then calculated rev/user basis

In [55]:
# Check if revenue and total are the same
print(acquistion['revenue'].sum())
acquistion['total'].sum()

226554


226554

In [56]:
# Channel grouping
channel_rev = acquistion[['channel_grouping','revenue']].groupby('channel_grouping').sum().sort_values(by='revenue', ascending=False)
channel_rev

Unnamed: 0_level_0,revenue
channel_grouping,Unnamed: 1_level_1
organic,155316
paid,51528
organic-social-pr,17550
email,2160


In [57]:
channel_users = acquistion[['anonymous_id','channel_grouping']].groupby('channel_grouping').count().sort_values(by='anonymous_id', ascending=False)
channel_users

Unnamed: 0_level_0,anonymous_id
channel_grouping,Unnamed: 1_level_1
organic,1707
paid,510
organic-social-pr,180
email,30


In [58]:
# Although organic has brought in more total revenue and users, 
# paid still brings in higher value per user. This also shows organics are very strong for MasterClass.

channel_df = pd.merge(channel_rev, channel_users, how = 'left', on = 'channel_grouping').reset_index()
channel_df['rev_per_user'] = channel_df['revenue']/channel_df['anonymous_id']
channel_df

Unnamed: 0,channel_grouping,revenue,anonymous_id,rev_per_user
0,organic,155316,1707,90.987698
1,paid,51528,510,101.035294
2,organic-social-pr,17550,180,97.5
3,email,2160,30,72.0


In [59]:
# Traffic source
traffic_rev = acquistion[['traffic_source','revenue']].groupby('traffic_source').sum().sort_values(by='revenue', ascending=False)
traffic_rev

Unnamed: 0_level_0,revenue
traffic_source,Unnamed: 1_level_1
organic,155316
google_search_network,26451
masterclass,13860
facebook,12600
instagram,5130
youtube_network,5097
affiliate,2610
email,2160
google_display_network,1710
youtube,1170


In [60]:
traffic_users = acquistion[['anonymous_id','traffic_source']].groupby('traffic_source').count().sort_values(by='anonymous_id', ascending=False)
traffic_users

Unnamed: 0_level_0,anonymous_id
traffic_source,Unnamed: 1_level_1
organic,1707
google_search_network,279
masterclass,148
facebook,110
youtube_network,56
instagram,36
email,30
affiliate,29
google_display_network,18
youtube,9


In [61]:
# Traffic type again shows strong organics. 
# Google search also seems to drive good revenue and the next highest user acquistion.
# Facebook(Instagram included) typically drives high value users so pretty standard to see there.
traffic_df = pd.merge(traffic_rev, traffic_users, how = 'left', on = 'traffic_source').reset_index()
traffic_df['rev_per_user'] = traffic_df['revenue']/traffic_df['anonymous_id']
traffic_df

Unnamed: 0,traffic_source,revenue,anonymous_id,rev_per_user
0,organic,155316,1707,90.987698
1,google_search_network,26451,279,94.806452
2,masterclass,13860,148,93.648649
3,facebook,12600,110,114.545455
4,instagram,5130,36,142.5
5,youtube_network,5097,56,91.017857
6,affiliate,2610,29,90.0
7,email,2160,30,72.0
8,google_display_network,1710,18,95.0
9,youtube,1170,9,130.0


In [62]:
# Acquisition type
acq_rev = acquistion[['acquisition_type','revenue']].groupby('acquisition_type').sum().sort_values(by='revenue', ascending=False)
acq_rev

Unnamed: 0_level_0,revenue
acquisition_type,Unnamed: 1_level_1
organic,155316
prospecting,64848
remarketing,6390


In [63]:
acq_users = acquistion[['anonymous_id','acquisition_type']].groupby('acquisition_type').count().sort_values(by='anonymous_id', ascending=False)
acq_users

Unnamed: 0_level_0,anonymous_id
acquisition_type,Unnamed: 1_level_1
organic,1707
prospecting,657
remarketing,63


In [64]:
acq_df = pd.merge(acq_rev, acq_users, how = 'left', on = 'acquisition_type').reset_index()
acq_df['rev_per_user'] = acq_df['revenue']/acq_df['anonymous_id']
acq_df

Unnamed: 0,acquisition_type,revenue,anonymous_id,rev_per_user
0,organic,155316,1707,90.987698
1,prospecting,64848,657,98.703196
2,remarketing,6390,63,101.428571


In [65]:
# Ad type
ad_type_rev = acquistion[['revenue','ad_type']].groupby('ad_type').sum().sort_values(by='revenue', ascending=False)
ad_type_rev

Unnamed: 0_level_0,revenue
ad_type,Unnamed: 1_level_1
none,154596
search,24201
video,17607
vanitylink,17460
1430173,2520
sitelink,2250
cart-abandon-1-original,1800
rhs,1800
display,1530
288359,810


In [66]:
ad_type_users = acquistion[['anonymous_id','ad_type']].groupby('ad_type').count().sort_values(by='anonymous_id', ascending=False)
ad_type_users

Unnamed: 0_level_0,anonymous_id
ad_type,Unnamed: 1_level_1
none,1689
search,262
vanitylink,179
video,163
1430173,28
cart-abandon-1-original,20
display,17
sitelink,17
1333168,10
cart-abandon-1,10


In [67]:
ad_type_df = pd.merge(ad_type_users, ad_type_rev, how = 'left', on = 'ad_type')
ad_type_df

Unnamed: 0_level_0,anonymous_id,revenue
ad_type,Unnamed: 1_level_1,Unnamed: 2_level_1
none,1689,154596
search,262,24201
vanitylink,179,17460
video,163,17607
1430173,28,2520
cart-abandon-1-original,20,1800
display,17,1530
sitelink,17,2250
1333168,10,0
cart-abandon-1,10,360


In [68]:
# Strongest ad type was video in terms of user value and volume, sitelinks were strong but less users converted. 
# No ad type proves again organics are strong.
ad_type_df = pd.merge(ad_type_rev, ad_type_users, how = 'left', on = 'ad_type').reset_index()
ad_type_df['rev_per_user'] = ad_type_df['revenue']/ad_type_df['anonymous_id']
ad_type_df

Unnamed: 0,ad_type,revenue,anonymous_id,rev_per_user
0,none,154596,1689,91.531083
1,search,24201,262,92.370229
2,video,17607,163,108.018405
3,vanitylink,17460,179,97.541899
4,1430173,2520,28,90.0
5,sitelink,2250,17,132.352941
6,cart-abandon-1-original,1800,20,90.0
7,rhs,1800,10,180.0
8,display,1530,17,90.0
9,288359,810,9,90.0


In [69]:
# Is gift?
is_gift_rev = acquistion[['revenue','is_gift']].groupby('is_gift').sum().sort_values(by='revenue', ascending=False)
is_gift_rev

Unnamed: 0_level_0,revenue
is_gift,Unnamed: 1_level_1
f,200307
t,26247


In [70]:
is_gift_users = acquistion[['anonymous_id','is_gift']].groupby('is_gift').count().sort_values(by='anonymous_id', ascending=False)
is_gift_users

Unnamed: 0_level_0,anonymous_id
is_gift,Unnamed: 1_level_1
f,2135
t,292


In [71]:
# More total users were purchasing for them selves than for others. 
# User's on average spent more when they were purchasing for themselves.
is_gift_df = pd.merge(is_gift_rev, is_gift_users, how = 'left', on = 'is_gift').reset_index()
is_gift_df['rev_per_user'] = is_gift_df['revenue']/is_gift_df['anonymous_id']
is_gift_df

Unnamed: 0,is_gift,revenue,anonymous_id,rev_per_user
0,f,200307,2135,93.820609
1,t,26247,292,89.886986
