In [1]:
import numpy as np
import pandas as pd

# Prompt: 

### Defining an "adopted user" as a user who has logged into the product on three separate days in at least one seven-day period, 
### identify which factors predict future user adoption.

Strategy:

1. Get the data
2. First segment the adopted based on business definition
3. Then use correlation matrix to see which features have best signal for adoption
4. Make new features to get better signal for adoption?

# Data

## Dictionary:
Users table with data on 12,000 users who signed up for the product in the last two years.  This table includes: 
* name: the user's name
-object_id: the user's id
-email: email address 
-creation_source: how their account was created. This takes on one of 5 values:
	PERSONAL_PROJECTS: invited to join another user's personal workspace
	GUEST_INVITE: invited to an organization as a guest (limited permissions)
	ORG_INVITE: invited to an organization (as a full member)
	SIGNUP: signed up via asana.com
	SIGNUP_GOOGLE_AUTH: signed up using Google Authentication (using a Google email account for their login id)
-creation_time: when they created their account
-last_session_creation_time: unix timestamp of last login
-opted_in_to_mailing_list: whether they have opted into receiving marketing emails
-enabled_for_marketing_drip: whether they are on the regular marketing email drip
-org_id: the organization (group of users) they belong to
-invited_by_user_id: which user invited them to join (if applicable).

A usage summary user engagement table that has a row for each day that a user logged into the product. 

In [2]:
users = pd.read_csv('/kaggle/input/relax-datachallenge/takehome_users.csv', encoding='latin_1')
user_engagement = pd.read_csv('/kaggle/input/relax-datachallenge/takehome_user_engagement.csv', encoding='latin_1')

In [3]:
users.head()

Unnamed: 0,object_id,creation_time,name,email,creation_source,last_session_creation_time,opted_in_to_mailing_list,enabled_for_marketing_drip,org_id,invited_by_user_id
0,1,2014-04-22 03:53:30,Clausen August,AugustCClausen@yahoo.com,GUEST_INVITE,1398139000.0,1,0,11,10803.0
1,2,2013-11-15 03:45:04,Poole Matthew,MatthewPoole@gustr.com,ORG_INVITE,1396238000.0,0,0,1,316.0
2,3,2013-03-19 23:14:52,Bottrill Mitchell,MitchellBottrill@gustr.com,ORG_INVITE,1363735000.0,0,0,94,1525.0
3,4,2013-05-21 08:09:28,Clausen Nicklas,NicklasSClausen@yahoo.com,GUEST_INVITE,1369210000.0,0,0,1,5151.0
4,5,2013-01-17 10:14:20,Raw Grace,GraceRaw@yahoo.com,GUEST_INVITE,1358850000.0,0,0,193,5240.0


In [4]:
user_engagement.head()

Unnamed: 0,time_stamp,user_id,visited
0,2014-04-22 03:53:30,1,1
1,2013-11-15 03:45:04,2,1
2,2013-11-29 03:45:04,2,1
3,2013-12-09 03:45:04,2,1
4,2013-12-25 03:45:04,2,1


1. get the week for each timestamp
2. then count each week per user_id
3. filter on >=3 engagements for each week for each user as adopted

Loaded users and user engagement tables into Postgres. Got adopted users based on business definition and joined back to users. With resultant table, got user information as well as those who adopted/not adopted the product.

In [5]:
adopted_users = pd.read_csv('/kaggle/input/adopted-users3/adopted_users.csv')
adopted_users.head()

Unnamed: 0,user_id,creation_time,creation_time_date,creation_time_year,creation_time_month,creation_time_dow,name,email,creation_source,last_session_creation_time,last_session_creation_time_year,last_session_creation_time_month,last_session_creation_time_dow,opted_in_to_mailing_list,enabled_for_marketing_drip,org_id,invited_by_user_id,adopted
0,2.0,2013-11-15 03:45:04,2013-11-15,2013,11,5,Poole Matthew,MatthewPoole@gustr.com,ORG_INVITE,2014-03-30 20:45:04-07,2014.0,3.0,7.0,0,0,1,316.0,1
1,10.0,2013-01-16 22:08:03,2013-01-16,2013,1,3,Santos Carla,CarlaFerreiraSantos@gustr.com,ORG_INVITE,2014-06-03 15:08:03-07,2014.0,6.0,2.0,1,1,318,4143.0,1
2,20.0,2014-03-06 11:46:38,2014-03-06,2014,3,4,Helms Mikayla,lqyvjilf@uhzdq.com,SIGNUP,2014-05-29 04:46:38-07,2014.0,5.0,4.0,0,0,58,,1
3,33.0,2014-03-11 06:29:09,2014-03-11,2014,3,2,Araujo Josй,JoseMartinsAraujo@cuvox.de,GUEST_INVITE,2014-05-30 23:29:09-07,2014.0,5.0,5.0,0,0,401,79.0,1
4,42.0,2012-11-11 19:05:07,2012-11-11,2012,11,7,Pinto Giovanna,GiovannaCunhaPinto@cuvox.de,SIGNUP,2014-05-25 12:05:07-07,2014.0,5.0,7.0,1,0,235,,1


To answer the question: which factors predict future user adoption we can look at the data provided and think about other data that we can gather that would help see common customer actions leading to adoption.

Look at common engagement metrics for SaaS company.
* From this dataset:
1. Stickiness
1. Virality
1. how they got to platform
1. adopted come from same org/marketing drip?

Outside of this dataset:
1. what are they doing in Asana app
1. measuring lingering time and giving threshold on what is good/bad lingering time
1. specific features being used more vs others
1. feature they only use/feature they never use?
1. did they return on their own or via a push notification/email


# Exploratory Data Analysis 