## Example Take-Home Challenge: Relax Inc.

The data is available as two attached CSV files:
takehome_user_engagement. csv
takehome_users . csv

The data has the following two tables:

1. A user table ("takehome_users") with data on 12,000 users who signed up for the product in the last two years. This table includes:
   * name: the user's name
   * object_id: the user's id
   * email: email address
   * creation_source: how their account was created. This takes on one of 5 values:
     - PERSONAL_PROJECTS: invited to join another user's personal workspace
     - GUEST_INVITE: invited to an organization as a guest (limited permissions)
     - ORG_INVITE: invited to an organization (as a full member)
     - SIGNUP: signed up via the website
     - SIGNUP_GOOGLE_AUTH: signed up using Google Authentication (using a Google email account for their login id)
   * creation_time: when they created their account
   * last_session_creation_time: unix timestamp of last login
   * opted_in_to_mailing_list: whether they have opted into receiving marketing emails
   * enabled_for_marketing_drip: whether they are on the regular marketing email drip
   * org_id: the organization (group of users) they belong to
   * invited_by_user_id: which user invited them to join (if applicable).


2. A usage summary table ("takehome_user_engagement") that has a row for each day that a user logged into the product. Defining an "adopted user" as a user who has logged into the product on three separate days in at least one seven day period, identify which factors predict future user adoption.

Provide a brief writeup of your findings (the more concise, the better no more than one page), along with any summary tables, graphs, code, or queries that can help understanding your approach. Please note any factors you considered or investigation you did, even if they did not pan out. Feel free to identify any further research or data you think would be valuable.

In [1]:
import numpy as np
import pandas as pd

In [2]:
user_df = pd.read_csv('takehome_users.csv', parse_dates=['creation_time'], index_col='object_id', encoding='latin-1')

In [3]:
user_df['last_session_creation_time'] = pd.to_datetime(user_df['last_session_creation_time'], unit='s')

In [4]:
user_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 12000 entries, 1 to 12000
Data columns (total 9 columns):
 #   Column                      Non-Null Count  Dtype         
---  ------                      --------------  -----         
 0   creation_time               12000 non-null  datetime64[ns]
 1   name                        12000 non-null  object        
 2   email                       12000 non-null  object        
 3   creation_source             12000 non-null  object        
 4   last_session_creation_time  8823 non-null   datetime64[ns]
 5   opted_in_to_mailing_list    12000 non-null  int64         
 6   enabled_for_marketing_drip  12000 non-null  int64         
 7   org_id                      12000 non-null  int64         
 8   invited_by_user_id          6417 non-null   float64       
dtypes: datetime64[ns](2), float64(1), int64(3), object(3)
memory usage: 937.5+ KB


In [5]:
user_df.head()

Unnamed: 0_level_0,creation_time,name,email,creation_source,last_session_creation_time,opted_in_to_mailing_list,enabled_for_marketing_drip,org_id,invited_by_user_id
object_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
1,2014-04-22 03:53:30,Clausen August,AugustCClausen@yahoo.com,GUEST_INVITE,2014-04-22 03:53:30,1,0,11,10803.0
2,2013-11-15 03:45:04,Poole Matthew,MatthewPoole@gustr.com,ORG_INVITE,2014-03-31 03:45:04,0,0,1,316.0
3,2013-03-19 23:14:52,Bottrill Mitchell,MitchellBottrill@gustr.com,ORG_INVITE,2013-03-19 23:14:52,0,0,94,1525.0
4,2013-05-21 08:09:28,Clausen Nicklas,NicklasSClausen@yahoo.com,GUEST_INVITE,2013-05-22 08:09:28,0,0,1,5151.0
5,2013-01-17 10:14:20,Raw Grace,GraceRaw@yahoo.com,GUEST_INVITE,2013-01-22 10:14:20,0,0,193,5240.0


In [6]:
user_engagement_df = pd.read_csv('takehome_user_engagement.csv', parse_dates=['time_stamp'], index_col='time_stamp')

In [7]:
user_engagement_df.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 207917 entries, 2014-04-22 03:53:30 to 2014-01-26 08:57:12
Data columns (total 2 columns):
 #   Column   Non-Null Count   Dtype
---  ------   --------------   -----
 0   user_id  207917 non-null  int64
 1   visited  207917 non-null  int64
dtypes: int64(2)
memory usage: 4.8 MB


In [8]:
user_engagement_df.head()

Unnamed: 0_level_0,user_id,visited
time_stamp,Unnamed: 1_level_1,Unnamed: 2_level_1
2014-04-22 03:53:30,1,1
2013-11-15 03:45:04,2,1
2013-11-29 03:45:04,2,1
2013-12-09 03:45:04,2,1
2013-12-25 03:45:04,2,1


In [9]:
user_engagement_df.describe().transpose()

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
user_id,207917.0,5913.314197,3394.941674,1.0,3087.0,5682.0,8944.0,12000.0
visited,207917.0,1.0,0.0,1.0,1.0,1.0,1.0,1.0


In [10]:
multi_visit_user_df = user_engagement_df.groupby('user_id').count() > 2

multi_visit_user_df

Unnamed: 0_level_0,visited
user_id,Unnamed: 1_level_1
1,False
2,True
3,False
4,False
5,False
...,...
11996,False
11997,False
11998,False
11999,False


In [11]:
user2_engagement_df = user_engagement_df[user_engagement_df['user_id'] == 2].sort_index()

user2_engagement_df

Unnamed: 0_level_0,user_id,visited
time_stamp,Unnamed: 1_level_1,Unnamed: 2_level_1
2013-11-15 03:45:04,2,1
2013-11-29 03:45:04,2,1
2013-12-09 03:45:04,2,1
2013-12-25 03:45:04,2,1
2013-12-31 03:45:04,2,1
2014-01-08 03:45:04,2,1
2014-02-03 03:45:04,2,1
2014-02-08 03:45:04,2,1
2014-02-09 03:45:04,2,1
2014-02-13 03:45:04,2,1


In [12]:
user2_engagement_df.groupby(pd.Grouper(freq='7D')).count()

Unnamed: 0_level_0,user_id,visited
time_stamp,Unnamed: 1_level_1,Unnamed: 2_level_1
2013-11-15,1,1
2013-11-22,0,0
2013-11-29,1,1
2013-12-06,1,1
2013-12-13,0,0
2013-12-20,1,1
2013-12-27,1,1
2014-01-03,1,1
2014-01-10,0,0
2014-01-17,0,0


In [13]:
multi_visit_user_index = multi_visit_user_df[multi_visit_user_df['visited'] == True].index

multi_visit_user_index

Int64Index([    2,    10,    20,    33,    42,    43,    50,    53,    59,
               60,
            ...
            11961, 11964, 11965, 11967, 11969, 11975, 11980, 11981, 11988,
            11991],
           dtype='int64', name='user_id', length=2248)

In [14]:
user_df.index

Int64Index([    1,     2,     3,     4,     5,     6,     7,     8,     9,
               10,
            ...
            11991, 11992, 11993, 11994, 11995, 11996, 11997, 11998, 11999,
            12000],
           dtype='int64', name='object_id', length=12000)

In [15]:
user_df.head()

Unnamed: 0_level_0,creation_time,name,email,creation_source,last_session_creation_time,opted_in_to_mailing_list,enabled_for_marketing_drip,org_id,invited_by_user_id
object_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
1,2014-04-22 03:53:30,Clausen August,AugustCClausen@yahoo.com,GUEST_INVITE,2014-04-22 03:53:30,1,0,11,10803.0
2,2013-11-15 03:45:04,Poole Matthew,MatthewPoole@gustr.com,ORG_INVITE,2014-03-31 03:45:04,0,0,1,316.0
3,2013-03-19 23:14:52,Bottrill Mitchell,MitchellBottrill@gustr.com,ORG_INVITE,2013-03-19 23:14:52,0,0,94,1525.0
4,2013-05-21 08:09:28,Clausen Nicklas,NicklasSClausen@yahoo.com,GUEST_INVITE,2013-05-22 08:09:28,0,0,1,5151.0
5,2013-01-17 10:14:20,Raw Grace,GraceRaw@yahoo.com,GUEST_INVITE,2013-01-22 10:14:20,0,0,193,5240.0


In [16]:
potential_user_df = user_df.loc[multi_visit_user_index]

potential_user_df

Unnamed: 0_level_0,creation_time,name,email,creation_source,last_session_creation_time,opted_in_to_mailing_list,enabled_for_marketing_drip,org_id,invited_by_user_id
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
2,2013-11-15 03:45:04,Poole Matthew,MatthewPoole@gustr.com,ORG_INVITE,2014-03-31 03:45:04,0,0,1,316.0
10,2013-01-16 22:08:03,Santos Carla,CarlaFerreiraSantos@gustr.com,ORG_INVITE,2014-06-03 22:08:03,1,1,318,4143.0
20,2014-03-06 11:46:38,Helms Mikayla,lqyvjilf@uhzdq.com,SIGNUP,2014-05-29 11:46:38,0,0,58,
33,2014-03-11 06:29:09,Araujo José,JoseMartinsAraujo@cuvox.de,GUEST_INVITE,2014-05-31 06:29:09,0,0,401,79.0
42,2012-11-11 19:05:07,Pinto Giovanna,GiovannaCunhaPinto@cuvox.de,SIGNUP,2014-05-25 19:05:07,1,0,235,
...,...,...,...,...,...,...,...,...,...
11975,2013-03-23 11:10:11,Daecher Jürgen,JurgenDaecher@gustr.com,GUEST_INVITE,2014-05-22 11:10:11,1,0,31,6410.0
11980,2014-02-02 15:23:18,Gloeckner Franziska,ljnnbqdr@cgbld.com,ORG_INVITE,2014-04-18 15:23:18,0,0,406,3068.0
11981,2013-03-05 01:53:48,Fry Tyler,TylerFry@gmail.com,GUEST_INVITE,2013-04-02 01:53:48,0,0,110,5775.0
11988,2014-03-15 11:04:47,Minick John,JohnFMinick@yahoo.com,PERSONAL_PROJECTS,2014-06-01 11:04:47,0,0,114,


In [17]:
potential_user_df['adopted_user'] = False

In [18]:
potential_user_df.loc[potential_user_df.index == 2, 'adopted_user'] = True

In [19]:
potential_user_df.head()

Unnamed: 0_level_0,creation_time,name,email,creation_source,last_session_creation_time,opted_in_to_mailing_list,enabled_for_marketing_drip,org_id,invited_by_user_id,adopted_user
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
2,2013-11-15 03:45:04,Poole Matthew,MatthewPoole@gustr.com,ORG_INVITE,2014-03-31 03:45:04,0,0,1,316.0,True
10,2013-01-16 22:08:03,Santos Carla,CarlaFerreiraSantos@gustr.com,ORG_INVITE,2014-06-03 22:08:03,1,1,318,4143.0,False
20,2014-03-06 11:46:38,Helms Mikayla,lqyvjilf@uhzdq.com,SIGNUP,2014-05-29 11:46:38,0,0,58,,False
33,2014-03-11 06:29:09,Araujo José,JoseMartinsAraujo@cuvox.de,GUEST_INVITE,2014-05-31 06:29:09,0,0,401,79.0,False
42,2012-11-11 19:05:07,Pinto Giovanna,GiovannaCunhaPinto@cuvox.de,SIGNUP,2014-05-25 19:05:07,1,0,235,,False


In [20]:
adopted_user_df = potential_user_df.drop(['name', 'email'], axis=1)

In [21]:
adopted_user_df.head()

Unnamed: 0_level_0,creation_time,creation_source,last_session_creation_time,opted_in_to_mailing_list,enabled_for_marketing_drip,org_id,invited_by_user_id,adopted_user
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2,2013-11-15 03:45:04,ORG_INVITE,2014-03-31 03:45:04,0,0,1,316.0,True
10,2013-01-16 22:08:03,ORG_INVITE,2014-06-03 22:08:03,1,1,318,4143.0,False
20,2014-03-06 11:46:38,SIGNUP,2014-05-29 11:46:38,0,0,58,,False
33,2014-03-11 06:29:09,GUEST_INVITE,2014-05-31 06:29:09,0,0,401,79.0,False
42,2012-11-11 19:05:07,SIGNUP,2014-05-25 19:05:07,1,0,235,,False
