# Take-home challenge sample

### One of the take-home challenge samples from Galvanize's interview preparation repository

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

## Context

#### There are 2 tasks:

Collect metrics of interest.
Offer insights for how we could improve CPM.
The first task requires specific metrics collection: Find the conversion rate and CPM per campaign within each application. Include all of the code you need to transform and calculate the data.

The second goal is more of an open ended question and involves writing about your methods and reasoning: Given the data that was collected in the first task, what are some metrics we can calculate to give us insights as to how to improve CPM? For this second question, if you don't have enough data or would like to have additional data, please specify the format of the data(the columns in each file) that you would like to have and desscribe your transformations to acquire the information that you need.



## Backgound

CPM - cost per impression


Campaign - a specific, defined series of activities used in marketing a new or changed product or service

---

## Task 1 - Calculate conversion rate and CPM

In [2]:
engagement = pd.read_csv('https://raw.githubusercontent.com/gSchool/dsi-interview-prep/master/interview_questions/takehomes/takehome1/example_engagements.csv?token=AfcppxqXZaAL5zIr5EjjiNTdiAyin4Odks5cFD4EwA%3D%3D')

In [3]:
offers = pd.read_csv('https://raw.githubusercontent.com/gSchool/dsi-interview-prep/master/interview_questions/takehomes/takehome1/example_offers.csv?token=AfcppzaFmL-AsJKiJoUDUkB7F-lb2FqEks5cFD4pwA%3D%3D')

In [4]:
engagement.head(5)

Unnamed: 0.1,Unnamed: 0,revenue,reward_id,campaign_id,application_id
0,2014-07-26 00:00:29.257095,0.499,53d2ef9d-361c-c0d1-9015-6525c28c8564,18,3
1,2014-07-26 00:00:30.468959,0.149,53d2ef9e-72f3-84bf-a243-78ae58d1626f,4,0
2,2014-07-26 00:00:43.396503,0.149,53d2efab-91fb-ec54-3435-40a502e34e83,4,3
3,2014-07-26 00:01:01.234404,0.149,53d2efbd-8f91-db89-12d3-c373bcde9c30,4,3
4,2014-07-26 00:01:15.100982,0.149,53d2efcb-3e74-a234-f986-938765766950,4,0


In [5]:
engagement.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2441 entries, 0 to 2440
Data columns (total 5 columns):
Unnamed: 0        2441 non-null object
revenue           2441 non-null float64
reward_id         2441 non-null object
campaign_id       2441 non-null int64
application_id    2441 non-null int64
dtypes: float64(1), int64(2), object(2)
memory usage: 95.4+ KB


In [6]:
set(engagement.revenue);

In [7]:
offers.head()

Unnamed: 0.1,Unnamed: 0,reward_id,application_id,campaign_id
0,2014-07-26 00:00:02.995009,53d2ef83-0008-50fd-80b6-022bd353332d,0,0
1,2014-07-26 00:00:03.114537,53d2ef83-1860-7515-2f58-bc73db3b6ce8,1,1
2,2014-07-26 00:00:03.738329,53d2ef83-dc59-4efc-8e6d-1840b994e96d,0,2
3,2014-07-26 00:00:04.333408,53d2ef84-ef12-f2f9-799f-d549f4acf691,1,0
4,2014-07-26 00:00:05.023120,53d2ef85-a900-e839-b0e5-4d07d619fa58,0,0


In [8]:
offer_df = offers.groupby(['application_id', 'campaign_id']).count()

In [9]:
offer_df.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 0,reward_id
application_id,campaign_id,Unnamed: 2_level_1,Unnamed: 3_level_1
0,0,257,257
0,2,269,269
0,4,5140,5140
0,5,161,161
0,7,1898,1898


In [10]:
eng_df = engagement.groupby(['application_id', 'campaign_id']).count()

In [11]:
eng_df.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 0,revenue,reward_id
application_id,campaign_id,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,0,4,4,4
0,2,1,1,1
0,4,300,300,300
0,5,4,4,4
0,7,9,9,9


In [12]:
print('So conversation rate is {}for campaign 0, application 0???'.format(4/257))

So conversation rate is 0.01556420233463035for campaign 0, application 0???


Continue on with the same example (campaign 0, application 0) for **CPM**

In [13]:
sample_offer = (offers['application_id'] == 0) & (offers['campaign_id'] == 0)
sample_eng = (engagement['application_id'] == 0) & (engagement['campaign_id'] == 0)

In [14]:
cpm_df_eng = engagement[sample_eng]

In [15]:
cpm_df_off = offers[sample_offer]

In [16]:
cpm_sample = cpm_df_eng.revenue.sum() / cpm_df_off.reward_id.count()

In [17]:
cpm_sample

0.01556420233463035

## Task 2 - Analysis

After calculated the conversion rate and CPM above, look the who are the top 10 performancers. We can then build graphs to visualize the differences.

In [18]:
#engagement.revenue.plot(kind='hist', bins= 100, figsize = (12,8));

In [19]:
#engagement.groupby('campaign_id').agg({'application_id':'count'}).head();

In [20]:
#offers.groupby('campaign_id').agg({'application_id':'count'}).head()

In [21]:
#offers.head(5)

-----

## pandas join function notes 
- join through index on dataframe
- by default it is left join
- when 'on' is defined it is referring to the left dataframe

In [49]:
left = pd.DataFrame({'A': ['A0', 'A1', 'A2'],
                     'B': ['B0', 'B1', 'B2']},
                      index=['K0', 'K1', 'K2'])
    

right = pd.DataFrame({'C': ['C0', 'C2', 'C3'],
                      'D': ['D0', 'D2', 'D3']},
                     index=['K0', 'K2', 'K3'])

In [50]:
left

Unnamed: 0,A,B
K0,A0,B0
K1,A1,B1
K2,A2,B2


In [51]:
right

Unnamed: 0,C,D
K0,C0,D0
K2,C2,D2
K3,C3,D3


In [52]:
left.join(right)

Unnamed: 0,A,B,C,D
K0,A0,B0,C0,D0
K1,A1,B1,,
K2,A2,B2,C2,D2


In [53]:
left.join(right, how = 'outer')

Unnamed: 0,A,B,C,D
K0,A0,B0,C0,D0
K1,A1,B1,,
K2,A2,B2,C2,D2
K3,,,C3,D3


In [54]:
left1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
                     'B': ['B0', 'B1', 'B2', 'B3'],
                     'key': ['K0', 'K1', 'K2', 'K1']})


right1 = pd.DataFrame({'C': ['C0', 'C1'],
                       'D': ['D0', 'D1']},
                       index=['K0', 'K1'])
 
result = left1.join(right1, on='key')

In [55]:
left1

Unnamed: 0,A,B,key
0,A0,B0,K0
1,A1,B1,K1
2,A2,B2,K2
3,A3,B3,K1


In [56]:
right1

Unnamed: 0,C,D
K0,C0,D0
K1,C1,D1


In [57]:
result

Unnamed: 0,A,B,key,C,D
0,A0,B0,K0,C0,D0
1,A1,B1,K1,C1,D1
2,A2,B2,K2,,
3,A3,B3,K1,C1,D1


In [58]:
pd.merge(left1, right1, left_on='key', right_index=True, how='left')

Unnamed: 0,A,B,key,C,D
0,A0,B0,K0,C0,D0
1,A1,B1,K1,C1,D1
2,A2,B2,K2,,
3,A3,B3,K1,C1,D1
