# Introduction

This notebook answers the following question:  

- For those who haven’t donated, what is the probability of a gift in the following year of their first instance of standard volunteerism?  

- What is the probabillity of a donation in the first year of volunteerism ?

by Fred Etter - December, 2019

In [1]:
# Import modules

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
from sklearn import linear_model
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn import preprocessing
import sklearn
from sklearn.feature_selection import SelectFromModel
from datetime import datetime
from sklearn.metrics import confusion_matrix
from sklearn import metrics
import seaborn as sns

In [2]:
# Read in the data
df = pd.read_csv('file1_12_3.csv', low_memory=False)

In [3]:
# show the first 5 lines of the data
df.head()

Unnamed: 0,ContactId,Year,PledgeTotal,VolType,VolunteerActivityCnt,BirthYear,Gender
0,874ddbce-11cd-e111-941f-00259073dc22,2007,52.0,,,1977.0,female
1,9a4ddbce-11cd-e111-941f-00259073dc22,2007,80.0,,,1968.0,female
2,5e4edbce-11cd-e111-941f-00259073dc22,2007,10000.0,,,1958.0,male
3,c150dbce-11cd-e111-941f-00259073dc22,2007,120.0,,,,male
4,0b53dbce-11cd-e111-941f-00259073dc22,2007,500.16,,,1947.0,female


In [4]:
# show the number of rows and columns of the original data
df.shape

(385722, 7)

In [5]:
# drop Council, Committe, Board members
df = df.drop(df[df.VolType == 'Council, Committe or Board'].index).fillna(0)

In [6]:
# create a dataframe that only has volunteers of 'Standard' type:
df_gender = df.copy()
df_gender = df_gender.fillna(0)
df_gender = df_gender.drop(df_gender[df_gender.VolType == 0].index).fillna(0)
df_std_only = df_gender.drop(df_gender[df_gender.VolType == 'Council, Committe or Board'].index).fillna(0)

In [7]:
# create a series that captures the earliest year that someone volunteered
a = df_std_only.groupby('ContactId')['Year'].transform('min')

In [8]:
# create a new column and populate it:
df_std_only['min_year'] = a

In [9]:
# display a random sample of 10 rows of the dataframe
df_std_only.sample(10)

Unnamed: 0,ContactId,Year,PledgeTotal,VolType,VolunteerActivityCnt,BirthYear,Gender,min_year
383751,2e4315d2-11cd-e111-941f-00259073dc22,2012,500.0,Standard,1.0,0.0,female,2012
378277,3cada565-0d9e-e211-a0e0-4040184c1c1a,2012,0.0,Standard,1.0,0.0,female,2012
377541,2c4315d2-11cd-e111-941f-00259073dc22,2011,725.0,Standard,1.0,1950.0,male,2011
380057,97b755d0-11cd-e111-941f-00259073dc22,2010,0.0,Standard,1.0,0.0,female,2010
379152,3540dbce-11cd-e111-941f-00259073dc22,2015,1300.0,Standard,1.0,1948.0,male,2010
385063,b2ea7dc9-5406-e611-828d-26d4160798d6,2016,390.0,Standard,3.0,0.0,male,2016
380352,c2dc8191-a231-e711-b10c-005056975e92,2017,0.0,Standard,1.0,0.0,female,2017
377432,5fbf99cf-11cd-e111-941f-00259073dc22,2012,52.0,Standard,1.0,0.0,female,2012
382691,6f435cd9-7aa7-e211-a0e0-4040184c1c1a,2013,0.0,Standard,1.0,0.0,female,2012
380407,214e9364-55cc-e211-a0e0-4040184c1c1a,2012,0.0,Standard,1.0,0.0,female,2012


In [10]:
# drop rows where Year does not equal min_year
df_std_only_t = df_std_only.drop(df_std_only[df_std_only.Year != df_std_only.min_year].index)

In [11]:
# drop rows where there was a pledge
df_init = df_std_only_t.drop(df_std_only_t[df_std_only_t.PledgeTotal != 0].index)

In [12]:
# number of rows and columns of the dataframe
df_init.shape

(4089, 8)

In [13]:
# show a random sample of 10 rows of the dataframe
df_init.sample(10)

Unnamed: 0,ContactId,Year,PledgeTotal,VolType,VolunteerActivityCnt,BirthYear,Gender,min_year
379941,443d15d2-11cd-e111-941f-00259073dc22,2011,0.0,Standard,1.0,0.0,male,2011
384710,d7db941b-2fec-4148-a344-377cf1e7a10e,2016,0.0,Standard,2.0,0.0,female,2016
377997,5e3d15d2-11cd-e111-941f-00259073dc22,2011,0.0,Standard,1.0,0.0,female,2011
378635,7f6cf402-8524-e311-95aa-4040184c1c1a,2013,0.0,Standard,1.0,0.0,male,2013
383066,84615108-eb25-e311-a975-4040184c1c1a,2013,0.0,Standard,1.0,0.0,female,2013
376201,f63e15d2-11cd-e111-941f-00259073dc22,2011,0.0,Standard,1.0,0.0,male,2011
375947,7565c18a-2d20-e511-9cbe-26d4160798d6,2010,0.0,Standard,1.0,0.0,female,2010
378028,c8cdd312-67d7-e211-a0e0-4040184c1c1a,2012,0.0,Standard,1.0,0.0,female,2012
381040,e44cb1d1-11cd-e111-941f-00259073dc22,2011,0.0,Standard,1.0,0.0,female,2011
375987,9937814c-5f24-e311-95aa-4040184c1c1a,2013,0.0,Standard,1.0,0.0,female,2013


The above dataframe shows the volunteers who volunteered and have never donated.

In [14]:
# create a new column that add 1 year to the 'min_year' column
df_init['y2'] = df_init.min_year + 1

In [15]:
# display the first 5 rows of the dataframe
df_init.head()

Unnamed: 0,ContactId,Year,PledgeTotal,VolType,VolunteerActivityCnt,BirthYear,Gender,min_year,y2
375651,4c58f4ca-3545-e511-92f8-26d4160798d6,2015,0.0,Standard,1.0,0.0,0,2015,2016
375652,e34cb1d1-11cd-e111-941f-00259073dc22,2011,0.0,Standard,1.0,0.0,female,2011,2012
375653,29f9f78a-6681-e411-9b7f-26d4160798d6,2018,0.0,Standard,1.0,1955.0,female,2018,2019
375655,34aaf5d1-d3d1-e211-a0e0-4040184c1c1a,2012,0.0,Standard,1.0,0.0,female,2012,2013
375658,9350b1d1-11cd-e111-941f-00259073dc22,2011,0.0,Standard,1.0,0.0,female,2011,2012


In [16]:
# create a new dataframe to capture the pledge amount for the year after they volunteered
new_df = pd.merge(df_init, df, how='left', left_on=['ContactId','y2'], right_on = ['ContactId', 'Year'])

In [17]:
# display the number of rows and columns of dataframe
new_df.shape

(4089, 15)

In [18]:
# view a random sample of 5 rows
new_df.sample(5)

Unnamed: 0,ContactId,Year_x,PledgeTotal_x,VolType_x,VolunteerActivityCnt_x,BirthYear_x,Gender_x,min_year,y2,Year_y,PledgeTotal_y,VolType_y,VolunteerActivityCnt_y,BirthYear_y,Gender_y
1302,35b062d1-11cd-e111-941f-00259073dc22,2011,0.0,Standard,1.0,0.0,female,2011,2012,,,,,,
2596,16b25678-c4ce-e211-a0e0-4040184c1c1a,2012,0.0,Standard,1.0,0.0,female,2012,2013,2013.0,0.0,Standard,2.0,0.0,female
2726,c25ab1d1-11cd-e111-941f-00259073dc22,2011,0.0,Standard,1.0,0.0,female,2011,2012,,,,,,
1849,fa2698d2-e3f2-e211-a0e0-4040184c1c1a,2013,0.0,Standard,1.0,1963.0,female,2013,2014,,,,,,
413,cd4c3f6d-fec7-4574-b9e3-62349d523e71,2012,0.0,Standard,1.0,0.0,male,2012,2013,2013.0,52.0,Standard,1.0,0.0,male


In [19]:
# fill NaNs with 0
new_df = new_df.fillna(0)

In [20]:
# drop all rows with 0 in pledge total
new_df = new_df.drop(new_df[new_df.PledgeTotal_y == 0].index)

In [21]:
# new number of rows which is the number of people who donated the year after they volunteered
new_df.shape

(255, 15)

In [22]:
# capture the mean donation
new_df.PledgeTotal_y.mean()

237.19337254901964

In [23]:
# get the probability of those who donated the year after they volunteered
255/4089

0.06236243580337491

So, **6.2%** of volunteers who had never donated, donated the following year. In other words, there is a **6.2%** chance that a volunteer who has never donated will make a donation in the year following there first volunteer activity.

There average donation for this group was **237** USD.

### Now, check about the case for 5 years later:

In [24]:
##### All of the following steps are same as above except adding 5 years instead of 1 #######
df_init['y5'] = df_init.min_year + 5

In [25]:
new_df_5 = pd.merge(df_init, df, how='left', left_on=['ContactId','y5'], right_on = ['ContactId', 'Year'])

In [26]:
new_df_5.head()

Unnamed: 0,ContactId,Year_x,PledgeTotal_x,VolType_x,VolunteerActivityCnt_x,BirthYear_x,Gender_x,min_year,y2,y5,Year_y,PledgeTotal_y,VolType_y,VolunteerActivityCnt_y,BirthYear_y,Gender_y
0,4c58f4ca-3545-e511-92f8-26d4160798d6,2015,0.0,Standard,1.0,0.0,0,2015,2016,2020,,,,,,
1,e34cb1d1-11cd-e111-941f-00259073dc22,2011,0.0,Standard,1.0,0.0,female,2011,2012,2016,,,,,,
2,29f9f78a-6681-e411-9b7f-26d4160798d6,2018,0.0,Standard,1.0,1955.0,female,2018,2019,2023,,,,,,
3,34aaf5d1-d3d1-e211-a0e0-4040184c1c1a,2012,0.0,Standard,1.0,0.0,female,2012,2013,2017,,,,,,
4,9350b1d1-11cd-e111-941f-00259073dc22,2011,0.0,Standard,1.0,0.0,female,2011,2012,2016,2016.0,54.6,0.0,0.0,0.0,female


In [27]:
new_df_5.shape

(4089, 16)

In [28]:
new_df_5 = new_df_5.fillna(0)

In [29]:
new_df_5 = new_df_5.drop(new_df_5[new_df_5.PledgeTotal_y == 0].index)

In [30]:
new_df_5.shape

(192, 16)

In [31]:
new_df_5.PledgeTotal_y.mean()

267.56671875

In [32]:
192/4097

0.046863558701488894

# Conclusion:  

After the first year of volunteering, those volunteers gave a gift **6.2%** of the time the following year and **4.7%** of the time in the 5th year after volunteering.  

The average gift 1 year later was **237 USD** and it was **268 USD** 5 years later.

# Addendum:

#### Check about the case for the current year of volunteerism:

In [43]:
df_std_only_t.shape

(6147, 8)

In [40]:
df_pledge_y0 = df_std_only_t.drop(df_std_only_t[df_std_only_t.PledgeTotal == 0].index)

In [45]:
df_pledge_y0.sample(10)

Unnamed: 0,ContactId,Year,PledgeTotal,VolType,VolunteerActivityCnt,BirthYear,Gender,min_year
377656,d859b7d0-11cd-e111-941f-00259073dc22,2010,72.0,Standard,1.0,0.0,female,2010
383521,a1b862d1-11cd-e111-941f-00259073dc22,2015,752.0,Standard,1.0,1986.0,male,2015
377609,f33ab1d1-11cd-e111-941f-00259073dc22,2010,50.0,Standard,1.0,0.0,male,2010
381220,2f47b1d1-11cd-e111-941f-00259073dc22,2017,26.0,Standard,1.0,1979.0,female,2017
384852,0b33061a-e4ac-e611-8226-26d4160798d6,2017,600.0,Standard,2.0,1965.0,male,2017
377549,3a8703d1-11cd-e111-941f-00259073dc22,2011,1820.0,Standard,1.0,0.0,male,2011
379165,36bb99cf-11cd-e111-941f-00259073dc22,2015,260.0,Standard,1.0,1975.0,female,2015
381630,f0c599cf-11cd-e111-941f-00259073dc22,2013,312.0,Standard,1.0,0.0,female,2013
385564,bcb85611-c533-e611-a946-26d4160798d6,2016,1038.0,Standard,9.0,1954.0,male,2016
381174,6ebe30d0-1481-4b68-a934-22b3f3872b7d,2018,104.0,Standard,1.0,0.0,male,2018


In [42]:
df_pledge_y0.shape

(2058, 8)

In [44]:
2058/6147

0.33479746217667156

So, **33.5%** of the Standard volunteers donate in their first year of volunteerism.