## Task 2 (Unsupervised Learning) - Characterizing Donors and Donation Type

In this task you should **use unsupervised learning algorithms and try to characterize donors (people who really did a donation) and their donation type**. You can use:
* **Association rule mining** to find **associations between the features and the target Donation/DonationTYPE**.
* **Clustering algorithms to find similar groups of donors**. Is it possible to find groups of donors with the same/similar DonationTYPE?
* **Be creative and define your own unsupervised analysis!** What would it be interesting to find out ?

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import OneHotEncoder
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))
from IPython.display import HTML

### Preprocessing Data for Association Rule Mining

In [2]:
df_clean = pd.read_csv('donors_dataset_clean.csv') 
df_clean.head()

Unnamed: 0,TARGET_B,TARGET_D,MONTHS_SINCE_ORIGIN,DONOR_AGE,IN_HOUSE,SES,CLUSTER_CODE,INCOME_GROUP,MOR_HIT_RATE,MEDIAN_HOME_VALUE,...,LIFETIME_GIFT_RANGE_BIN,LAST_GIFT_AMT_BIN,CARD_PROM_12_BIN,NUMBER_PROM_12_BIN,MONTHS_SINCE_LAST_GIFT_BIN,MONTHS_SINCE_FIRST_GIFT_BIN,FILE_CARD_GIFT_BIN,INCOME_GROUP_BIN,RECENCY_STATUS_96NK_e,DONATION_TYPE
0,1.0,10.0,137.0,79.0,0.0,2.0,45.0,7.0,0.0,334.0,...,"(15.0, 30.0]","(15.0, 30.0]","(3.0, 9.0]","(20.0, 30.0]","(-0.1, 8.0]","(120.0, 260.0]","(10.0, 20.0]","(6.4, 8.337]",4.0,D
1,0.0,0.0,113.0,75.0,0.0,1.0,11.0,5.0,0.0,2388.0,...,"(15.0, 30.0]","(15.0, 30.0]","(9.0, 14.0]","(30.0, 64.0]","(-0.1, 8.0]","(80.0, 120.0]","(10.0, 20.0]","(4.4, 5.4]",4.0,
2,0.0,0.0,92.0,62.695162,0.0,2.0,4.0,6.0,0.0,1688.0,...,"(-0.1, 15.0]","(-0.1, 15.0]","(9.0, 14.0]","(30.0, 64.0]","(-0.1, 8.0]","(80.0, 120.0]","(10.0, 20.0]","(5.4, 6.4]",0.0,
3,0.0,0.0,101.0,74.0,0.0,2.0,49.0,2.0,8.0,514.0,...,"(15.0, 30.0]","(15.0, 30.0]","(3.0, 9.0]","(10.0, 20.0]","(16.0, 24.0]","(80.0, 120.0]","(-0.1, 10.0]","(1.4, 2.4]",0.0,
4,0.0,0.0,101.0,63.0,0.0,3.0,8.0,3.0,0.0,452.0,...,"(-0.1, 15.0]","(-0.1, 15.0]","(3.0, 9.0]","(10.0, 20.0]","(16.0, 24.0]","(80.0, 120.0]","(-0.1, 10.0]","(2.4, 3.4]",0.0,


We need to find rules for the type of donation. Thus we will use a subset of the dataset with DONORS ONLY. We will not consider the columns: TARGET_B nor TARGET_D.

In [3]:
df_donors = df_clean.drop(df_clean[df_clean['DONATION_TYPE'].isnull()].index)
df_donors.drop(columns='TARGET_B', inplace=True)
df_donors.head()

Unnamed: 0,TARGET_D,MONTHS_SINCE_ORIGIN,DONOR_AGE,IN_HOUSE,SES,CLUSTER_CODE,INCOME_GROUP,MOR_HIT_RATE,MEDIAN_HOME_VALUE,MEDIAN_HOUSEHOLD_INCOME,...,LIFETIME_GIFT_RANGE_BIN,LAST_GIFT_AMT_BIN,CARD_PROM_12_BIN,NUMBER_PROM_12_BIN,MONTHS_SINCE_LAST_GIFT_BIN,MONTHS_SINCE_FIRST_GIFT_BIN,FILE_CARD_GIFT_BIN,INCOME_GROUP_BIN,RECENCY_STATUS_96NK_e,DONATION_TYPE
0,10.0,137.0,79.0,0.0,2.0,45.0,7.0,0.0,334.0,212.0,...,"(15.0, 30.0]","(15.0, 30.0]","(3.0, 9.0]","(20.0, 30.0]","(-0.1, 8.0]","(120.0, 260.0]","(10.0, 20.0]","(6.4, 8.337]",4.0,D
6,5.0,89.0,79.0,0.0,2.0,28.0,1.0,0.0,1004.0,189.0,...,"(-0.1, 15.0]","(-0.1, 15.0]","(3.0, 9.0]","(20.0, 30.0]","(8.0, 16.0]","(80.0, 120.0]","(-0.1, 10.0]","(-0.1, 1.4]",0.0,E
8,16.0,101.0,63.0,0.0,2.0,43.0,4.0,0.0,399.0,307.0,...,"(-0.1, 15.0]","(-0.1, 15.0]","(9.0, 14.0]","(20.0, 30.0]","(8.0, 16.0]","(80.0, 120.0]","(10.0, 20.0]","(3.4, 4.4]",4.0,C
13,3.0,137.0,62.472055,0.0,2.0,43.0,3.346812,0.0,475.0,227.0,...,"(-0.1, 15.0]","(-0.1, 15.0]","(3.0, 9.0]","(20.0, 30.0]","(16.0, 24.0]","(120.0, 260.0]","(10.0, 20.0]","(2.4, 3.4]",4.0,E
14,12.0,77.0,81.0,0.0,2.0,45.0,4.0,24.0,530.0,236.0,...,"(-0.1, 15.0]","(-0.1, 15.0]","(9.0, 14.0]","(20.0, 30.0]","(-0.1, 8.0]","(80.0, 120.0]","(-0.1, 10.0]","(3.4, 4.4]",0.0,D


In [4]:
# Transform these columns into categorical data
df_donation = pd.get_dummies(df_donors, columns = ['IN_HOUSE','SES','CLUSTER_CODE','RECENT_STAR_STATUS','PEP_STAR','FREQUENCY_STATUS_97NK','URBANICITY_e', 'DONOR_GENDER_e', 'OVERLAY_SOURCE_e',
       'RECENCY_STATUS_96NK_e', 'DONATION_TYPE', 'DONOR_AGE_BIN', 'MONTHS_SINCE_ORIGIN_BIN', 'MOR_HIT_RATE_BIN', 'MEDIAN_HOME_VALUE_BIN', 'MEDIAN_HOME_VALUE_BIN', 'MEDIAN_HOUSEHOLD_INCOME_BIN', 
             'PCT_OWNER_OCCUPIED_BIN', 'PER_CAPITA_INCOME_BIN','PCT_ATTRIBUTE1_BIN','PCT_ATTRIBUTE2_BIN', 'PCT_ATTRIBUTE3_BIN', 'PCT_ATTRIBUTE4_BIN', 'INCOME_GROUP_BIN',
                  'RECENT_RESPONSE_PROP_BIN', 'RECENT_AVG_GIFT_AMT_BIN', 'RECENT_CARD_RESPONSE_PROP_BIN', 'RECENT_AVG_CARD_GIFT_AMT_BIN', 'RECENT_RESPONSE_COUNT_BIN','RECENT_CARD_RESPONSE_COUNT_BIN', 'MONTHS_SINCE_LAST_PROM_RESP_BIN', 'LIFETIME_CARD_PROM_BIN',
                        'LIFETIME_PROM_BIN', 'LIFETIME_GIFT_AMOUNT_BIN', 'LIFETIME_GIFT_COUNT_BIN', 'LIFETIME_AVG_GIFT_AMT_BIN', 'LIFETIME_GIFT_RANGE_BIN', 'LAST_GIFT_AMT_BIN', 'CARD_PROM_12_BIN', 'NUMBER_PROM_12_BIN', 'MONTHS_SINCE_LAST_GIFT_BIN', 'MONTHS_SINCE_FIRST_GIFT_BIN', 'FILE_CARD_GIFT_BIN' ])

In [5]:
# Final dataset without non-categorical data
df_enc = df_donation.drop(columns = ['TARGET_D', 'MONTHS_SINCE_ORIGIN', 'DONOR_AGE', 'MOR_HIT_RATE', 'MEDIAN_HOME_VALUE', 'MEDIAN_HOUSEHOLD_INCOME', 'PCT_OWNER_OCCUPIED', 'PER_CAPITA_INCOME', 'INCOME_GROUP',
                            'PCT_ATTRIBUTE1', 'PCT_ATTRIBUTE2', 'PCT_ATTRIBUTE3', 'PCT_ATTRIBUTE4', 'RECENT_RESPONSE_PROP', 'RECENT_AVG_GIFT_AMT', 'RECENT_CARD_RESPONSE_PROP', 'RECENT_AVG_CARD_GIFT_AMT',
                            'RECENT_RESPONSE_COUNT', 'RECENT_CARD_RESPONSE_COUNT', 'MONTHS_SINCE_LAST_PROM_RESP', 'LIFETIME_CARD_PROM', 'LIFETIME_PROM', 'LIFETIME_GIFT_AMOUNT', 'LIFETIME_GIFT_COUNT', 
                            'LIFETIME_AVG_GIFT_AMT', 'LIFETIME_GIFT_RANGE', 'LAST_GIFT_AMT', 'CARD_PROM_12', 'NUMBER_PROM_12', 'MONTHS_SINCE_LAST_GIFT', 'MONTHS_SINCE_FIRST_GIFT', 'FILE_CARD_GIFT',
                            'LIFETIME_CARD_PROM', 'LIFETIME_GIFT_RANGE', 'MONTHS_SINCE_FIRST_GIFT', 'MONTHS_SINCE_LAST_PROM_RESP', 'MONTHS_SINCE_ORIGIN', 'NUMBER_PROM_12'])

In [6]:
df_enc.head()

Unnamed: 0,IN_HOUSE_0.0,IN_HOUSE_1.0,SES_1.0,SES_2.0,SES_3.0,SES_4.0,CLUSTER_CODE_1.0,CLUSTER_CODE_2.0,CLUSTER_CODE_3.0,CLUSTER_CODE_4.0,...,"MONTHS_SINCE_LAST_GIFT_BIN_(16.0, 24.0]","MONTHS_SINCE_LAST_GIFT_BIN_(24.0, 27.0]","MONTHS_SINCE_LAST_GIFT_BIN_(8.0, 16.0]","MONTHS_SINCE_FIRST_GIFT_BIN_(-0.1, 40.0]","MONTHS_SINCE_FIRST_GIFT_BIN_(120.0, 260.0]","MONTHS_SINCE_FIRST_GIFT_BIN_(40.0, 80.0]","MONTHS_SINCE_FIRST_GIFT_BIN_(80.0, 120.0]","FILE_CARD_GIFT_BIN_(-0.1, 10.0]","FILE_CARD_GIFT_BIN_(10.0, 20.0]","FILE_CARD_GIFT_BIN_(20.0, 30.0]"
0,1,0,0,1,0,0,0,0,0,0,...,0,0,0,0,1,0,0,0,1,0
6,1,0,0,1,0,0,0,0,0,0,...,0,0,1,0,0,0,1,1,0,0
8,1,0,0,1,0,0,0,0,0,0,...,0,0,1,0,0,0,1,0,1,0
13,1,0,0,1,0,0,0,0,0,0,...,1,0,0,0,1,0,0,0,1,0
14,1,0,0,1,0,0,0,0,0,0,...,0,0,0,0,0,0,1,1,0,0


To find frequent items for each donation type we can create one subset for each. 

In [7]:
# Donation type: A
type_a = df_enc.loc[df_enc['DONATION_TYPE_A'] == 1]
# Donation type: B
type_b = df_enc.loc[df_enc['DONATION_TYPE_B'] == 1]
# Donation type: C
type_c = df_enc.loc[df_enc['DONATION_TYPE_C'] == 1]
# Donation type: D
type_d = df_enc.loc[df_enc['DONATION_TYPE_D'] == 1]
# Donation type: E
type_e = df_enc.loc[df_enc['DONATION_TYPE_E'] == 1]

### Finding Frequent Items

#### Donation Type: A

In [8]:
# Finding frequent items for Donation type A
frequent_itemsets_a = apriori(type_a, min_support=0.6, use_colnames=True)
## We can filter the data by adding a length
frequent_itemsets_a['length'] = frequent_itemsets_a['itemsets'].apply(lambda x: len(x))
frequent_itemsets_a = frequent_itemsets_a[frequent_itemsets_a['length'] >= 3]
frequent_itemsets_a

Unnamed: 0,support,itemsets,length
172,0.677083,"(RECENT_STAR_STATUS_0.0, IN_HOUSE_0.0, DONATIO...",3
173,0.677083,"(RECENT_STAR_STATUS_0.0, IN_HOUSE_0.0, MOR_HIT...",3
174,0.666667,"(RECENT_STAR_STATUS_0.0, IN_HOUSE_0.0, PCT_ATT...",3
175,0.645833,"(RECENT_STAR_STATUS_0.0, IN_HOUSE_0.0, RECENT_...",3
176,0.645833,"(RECENT_STAR_STATUS_0.0, IN_HOUSE_0.0, RECENT_...",3
...,...,...,...
5402,0.604167,"(IN_HOUSE_0.0, RECENT_RESPONSE_COUNT_BIN_(-0.1...",10
5403,0.614583,"(IN_HOUSE_0.0, RECENT_RESPONSE_COUNT_BIN_(-0.1...",10
5404,0.604167,"(IN_HOUSE_0.0, RECENT_AVG_GIFT_AMT_BIN_(18.0, ...",10
5405,0.604167,"(IN_HOUSE_0.0, RECENT_AVG_GIFT_AMT_BIN_(18.0, ...",10


#### Donation Type: B

In [9]:
# Finding frequent items for Donation type B
frequent_itemsets_b = apriori(type_b, min_support=0.6, use_colnames=True)
## We can filter the data by adding a length
frequent_itemsets_b['length'] = frequent_itemsets_b['itemsets'].apply(lambda x: len(x))
frequent_itemsets_b = frequent_itemsets_b[frequent_itemsets_b['length'] >= 3]
frequent_itemsets_b

Unnamed: 0,support,itemsets,length
173,0.758077,"(RECENT_STAR_STATUS_0.0, IN_HOUSE_0.0, DONATIO...",3
174,0.752561,"(RECENT_STAR_STATUS_0.0, IN_HOUSE_0.0, MOR_HIT...",3
175,0.756501,"(RECENT_STAR_STATUS_0.0, IN_HOUSE_0.0, PCT_ATT...",3
176,0.697400,"(RECENT_STAR_STATUS_0.0, IN_HOUSE_0.0, RECENT_...",3
177,0.724192,"(RECENT_STAR_STATUS_0.0, IN_HOUSE_0.0, RECENT_...",3
...,...,...,...
3142,0.642238,"(IN_HOUSE_0.0, RECENT_RESPONSE_COUNT_BIN_(-0.1...",9
3143,0.637510,"(IN_HOUSE_0.0, RECENT_RESPONSE_COUNT_BIN_(-0.1...",9
3144,0.732861,"(IN_HOUSE_0.0, RECENT_RESPONSE_COUNT_BIN_(-0.1...",9
3145,0.667455,"(RECENT_RESPONSE_COUNT_BIN_(-0.1, 4.0], DONATI...",9


#### Donation Type: C

In [10]:
# Finding frequent items for Donation type C
frequent_itemsets_c = apriori(type_c, min_support=0.6, use_colnames=True)
## We can filter the data by adding a length
frequent_itemsets_c['length'] = frequent_itemsets_c['itemsets'].apply(lambda x: len(x))
frequent_itemsets_c = frequent_itemsets_c[frequent_itemsets_c['length'] >= 2]
frequent_itemsets_c

Unnamed: 0,support,itemsets,length
24,0.643836,"(RECENT_STAR_STATUS_0.0, IN_HOUSE_0.0)",2
25,0.610731,"(IN_HOUSE_0.0, RECENCY_STATUS_96NK_e_0.0)",2
26,0.916667,"(IN_HOUSE_0.0, DONATION_TYPE_C)",2
27,0.900685,"(IN_HOUSE_0.0, MOR_HIT_RATE_BIN_(-0.1, 30.0])",2
28,0.673516,"(MEDIAN_HOUSEHOLD_INCOME_BIN_(200.0, 500.0], I...",2
...,...,...,...
2210,0.622146,"(RECENT_RESPONSE_COUNT_BIN_(-0.1, 4.0], DONATI...",8
2211,0.644977,"(RECENT_RESPONSE_COUNT_BIN_(-0.1, 4.0], DONATI...",8
2212,0.609589,"(DONATION_TYPE_C, LIFETIME_GIFT_RANGE_BIN_(-0....",8
2213,0.617580,"(RECENT_RESPONSE_COUNT_BIN_(-0.1, 4.0], DONATI...",8


#### Donation Type: D

In [11]:
# Finding frequent items for Donation type D
frequent_itemsets_d = apriori(type_d, min_support=0.6, use_colnames=True)
## We can filter the data by adding a length
frequent_itemsets_d['length'] = frequent_itemsets_d['itemsets'].apply(lambda x: len(x))
frequent_itemsets_d = frequent_itemsets_d[frequent_itemsets_d['length'] >= 2]
frequent_itemsets_d

Unnamed: 0,support,itemsets,length
21,0.614978,"(IN_HOUSE_0.0, PEP_STAR_1.0)",2
22,0.892511,"(IN_HOUSE_0.0, DONATION_TYPE_D)",2
23,0.887225,"(IN_HOUSE_0.0, MOR_HIT_RATE_BIN_(-0.1, 30.0])",2
24,0.637885,"(MEDIAN_HOUSEHOLD_INCOME_BIN_(200.0, 500.0], I...",2
25,0.884581,"(IN_HOUSE_0.0, PCT_ATTRIBUTE1_BIN_(-0.1, 25.0])",2
...,...,...,...
1172,0.608811,"(DONATION_TYPE_D, LIFETIME_GIFT_RANGE_BIN_(-0....",7
1173,0.656388,"(DONATION_TYPE_D, LIFETIME_GIFT_RANGE_BIN_(-0....",7
1174,0.607930,"(LAST_GIFT_AMT_BIN_(-0.1, 15.0], DONATION_TYPE...",7
1175,0.622026,"(IN_HOUSE_0.0, LAST_GIFT_AMT_BIN_(-0.1, 15.0],...",8


#### Donation Type: E

In [12]:
# Finding frequent items for Donation type E
frequent_itemsets_e = apriori(type_e, min_support=0.6, use_colnames=True)
## We can filter the data by adding a length
frequent_itemsets_e['length'] = frequent_itemsets_e['itemsets'].apply(lambda x: len(x))
frequent_itemsets_e = frequent_itemsets_e[frequent_itemsets_e['length'] >= 2]
frequent_itemsets_e

Unnamed: 0,support,itemsets,length
20,0.732615,"(IN_HOUSE_0.0, PEP_STAR_1.0)",2
21,0.929481,"(IN_HOUSE_0.0, DONATION_TYPE_E)",2
22,0.913810,"(IN_HOUSE_0.0, MOR_HIT_RATE_BIN_(-0.1, 30.0])",2
23,0.656219,"(IN_HOUSE_0.0, MEDIAN_HOME_VALUE_BIN_(-0.1, 10...",2
24,0.656219,"(IN_HOUSE_0.0, MEDIAN_HOME_VALUE_BIN_(-0.1, 10...",2
...,...,...,...
2688,0.615083,"(LAST_GIFT_AMT_BIN_(-0.1, 15.0], LIFETIME_GIFT...",8
2689,0.604310,"(IN_HOUSE_0.0, PEP_STAR_1.0, LAST_GIFT_AMT_BIN...",9
2690,0.643487,"(IN_HOUSE_0.0, LAST_GIFT_AMT_BIN_(-0.1, 15.0],...",9
2691,0.628795,"(IN_HOUSE_0.0, LAST_GIFT_AMT_BIN_(-0.1, 15.0],...",9


### Finding Associations

In [13]:
# Finding frequent items for complete subset
frequent_itemsets = apriori(df_enc, min_support=0.3, use_colnames=True)
frequent_itemsets['length'] = frequent_itemsets['itemsets'].apply(lambda x: len(x))
frequent_itemsets = frequent_itemsets[frequent_itemsets['length'] >= 3]
frequent_itemsets

Unnamed: 0,support,itemsets,length
661,0.303844,"(IN_HOUSE_0.0, SES_1.0, MOR_HIT_RATE_BIN_(-0.1...",3
662,0.307028,"(IN_HOUSE_0.0, PCT_ATTRIBUTE1_BIN_(-0.1, 25.0]...",3
663,0.444621,"(SES_2.0, IN_HOUSE_0.0, MOR_HIT_RATE_BIN_(-0.1...",3
664,0.357744,"(SES_2.0, IN_HOUSE_0.0, MEDIAN_HOME_VALUE_BIN_...",3
665,0.357744,"(SES_2.0, IN_HOUSE_0.0, MEDIAN_HOME_VALUE_BIN_...",3
...,...,...,...
34897,0.315897,"(IN_HOUSE_0.0, RECENT_RESPONSE_COUNT_BIN_(-0.1...",9
34898,0.306573,"(IN_HOUSE_0.0, LIFETIME_GIFT_RANGE_BIN_(-0.1, ...",9
34899,0.312941,"(IN_HOUSE_0.0, RECENT_RESPONSE_COUNT_BIN_(-0.1...",10
34900,0.310439,"(IN_HOUSE_0.0, RECENT_RESPONSE_COUNT_BIN_(-0.1...",10


In [14]:
# Generate association rules with confidence >= 80% for the dataset
frequent_itemsets = apriori(df_enc, min_support=0.4, use_colnames=True)
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.8)
rules['length']= rules['antecedents'].apply(lambda x: len(x))
rules['Consequents']= rules['consequents'].apply(lambda x: ','.join(list(x))).astype("unicode")
rules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,length,Consequents
0,(SES_2.0),(IN_HOUSE_0.0),0.491926,0.914715,0.450534,0.915858,1.001250,0.000562,1.013584,1,IN_HOUSE_0.0
1,(RECENT_STAR_STATUS_0.0),(IN_HOUSE_0.0),0.622242,0.914715,0.581760,0.934942,1.022113,0.012586,1.310904,1,IN_HOUSE_0.0
2,(PEP_STAR_1.0),(IN_HOUSE_0.0),0.591767,0.914715,0.526723,0.890085,0.973074,-0.014575,0.775918,1,IN_HOUSE_0.0
3,(DONOR_GENDER_e_0.0),(IN_HOUSE_0.0),0.571071,0.914715,0.526495,0.921943,1.007903,0.004128,1.092611,1,IN_HOUSE_0.0
4,(OVERLAY_SOURCE_e_0.0),(IN_HOUSE_0.0),0.480327,0.914715,0.437116,0.910038,0.994887,-0.002246,0.948015,1,IN_HOUSE_0.0
...,...,...,...,...,...,...,...,...,...,...,...
61735,"(LIFETIME_GIFT_RANGE_BIN_(-0.1, 15.0], FILE_CA...","(IN_HOUSE_0.0, RECENT_RESPONSE_COUNT_BIN_(-0.1...",0.496930,0.597908,0.411189,0.827460,1.383926,0.114071,2.330429,4,"IN_HOUSE_0.0,RECENT_RESPONSE_COUNT_BIN_(-0.1, ..."
61736,"(LIFETIME_GIFT_RANGE_BIN_(-0.1, 15.0], LIFETIM...","(IN_HOUSE_0.0, RECENT_RESPONSE_COUNT_BIN_(-0.1...",0.507846,0.595861,0.411189,0.809673,1.358829,0.108584,2.123395,4,"IN_HOUSE_0.0,RECENT_RESPONSE_COUNT_BIN_(-0.1, ..."
61737,"(FILE_CARD_GIFT_BIN_(-0.1, 10.0], LIFETIME_GIF...","(IN_HOUSE_0.0, RECENT_RESPONSE_COUNT_BIN_(-0.1...",0.500114,0.604048,0.411189,0.822192,1.361136,0.109097,2.226849,4,"IN_HOUSE_0.0,RECENT_RESPONSE_COUNT_BIN_(-0.1, ..."
61738,"(LIFETIME_GIFT_RANGE_BIN_(-0.1, 15.0], FILE_CA...","(IN_HOUSE_0.0, RECENT_RESPONSE_COUNT_BIN_(-0.1...",0.471003,0.642029,0.411189,0.873008,1.359765,0.108792,2.818854,4,"IN_HOUSE_0.0,RECENT_RESPONSE_COUNT_BIN_(-0.1, ..."


In [15]:
rules[(rules['length']>=2) & (rules['Consequents'].str.contains('DONATION_TYPE', regex=False))]

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,length,Consequents


### Association Rules - Results and Discussion 

This time we decided to leave the minimum support threshold at 0.4, and minimum threshold at 0.8, to observe the border rules. We can see that not achieveing STAR donor status is related to the individual never donating to the charitable organizations' In House program. But we can also see that achieveing STAR donor status about half of the times is an antecendent to the individual never donating to the charitable organization's In House program. These were only a couple of rules that seemed to give an interesting insight on the data. However we were interested in creating rules involving Donation Type.
But rules with minimum threshold over or equal to 0.4 do not cover any of the donation types. This is most likely due to the small sample size of donation types.
We tried to diminuish the minimum support in order to find rules covering DONATION_TYPE. Rules formed with minimal support under 0.4 cannot be computed by our computers as they do not have enough power. This phenomenon exemplifies the amount of resources machine learning can demande.