<h1>Speed Dating: Who to Date Long Term</h1>

What influences love at first sight? (Or, at least, love in the first four minutes?) This dataset was compiled by Columbia Business School professors Ray Fisman and Sheena Iyengar for their paper Gender Differences in Mate Selection: Evidence From a Speed Dating Experiment.<br>

Data was gathered from participants in experimental speed dating events from 2002-2004. During the events, the attendees would have a four minute "first date" with every other participant of the opposite sex. At the end of their four minutes, participants were asked if they would like to see their date again. They were also asked to rate their date on six attributes: Attractiveness, Sincerity, Intelligence, Fun, Ambition, and Shared Interests.<br>

The dataset also includes questionnaire data gathered from participants at different points in the process. These fields include: demographics, dating habits, self-perception across key attributes, beliefs on what others find valuable in a mate, and lifestyle information. See the Speed Dating Data Key document below for details.<br>

For more analysis from Iyengar and Fisman, read Racial Preferences in Dating.<br>

Data Exploration Ideas<br>

What are the least desirable attributes in a male partner? Does this differ for female partners?<br>
How important do people think attractiveness is in potential mate selection vs. its real impact?<br>
Are shared interests more important than a shared racial background?<br>
Can people accurately predict their own perceived value in the dating market?<br>
In terms of getting a second date, is it better to be someone's first speed date of the night or their last?

In [1]:
import pandas as pd
import numpy as np
import sklearn 
from IPython.display import display
%matplotlib inline
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

print('pandas version is {}.'.format(pd.__version__))
print('numpy version is {}.'.format(np.__version__))
print('scikit-learn version is {}.'.format(sklearn.__version__))

pandas version is 0.18.0.
numpy version is 1.10.4.
scikit-learn version is 0.17.1.


In [2]:
data = pd.read_csv("Speed Dating Data.csv")
print "This set has {} data points and {} features.".format(*data.shape)

This set has 8378 data points and 195 features.


<h1>Data Exploration</h1>

In [3]:
import features_creator as fc #importing feature names made in file features_creator.py
for i in fc.clean_up_2:
    data.replace(to_replace = 12.0, value = 10.0, inplace = True)

<h3>Import locale to change Income and Tuition to int from string type</h3>

<h2>Unique Profiles</h2>

In [4]:
unique = data.copy()
unique.drop_duplicates(subset = 'iid', keep = 'first', inplace = True)
for i in fc.list_of_lists:
    unique[i] = (unique[i] - unique[i].min()) / (unique[i].max() - unique[i].min())

<h3>Stats and Frequency Charts for Females</h3>

In [5]:
fc.dating_attributes_vs_time_describe(data = unique, gender = 0)

Unnamed: 0,attr3_1,attr3_2,attr3_3
count,268.0,238.0,146.0
mean,0.655317,0.657563,0.66524
std,0.166766,0.16478,0.17976
min,0.0,0.0,0.0
25%,0.625,0.625,0.625
50%,0.625,0.625,0.625
75%,0.75,0.75,0.75
max,1.0,1.0,1.0


Unnamed: 0,fun4_1,fun4_2,fun4_3
count,206.0,180.0,106.0
mean,0.334844,0.288778,0.455346
std,0.154951,0.12697,0.223767
min,0.0,0.0,0.0
25%,0.222222,0.18,0.3
50%,0.333333,0.3,0.333333
75%,0.444444,0.4,0.666667
max,1.0,0.8,1.0


Unnamed: 0,attr2_1,attr2_2,attr2_3
count,269.0,180.0,106.0
mean,0.356006,0.387571,0.299874
std,0.168152,0.176518,0.24164
min,0.1,0.117647,0.026667
25%,0.2368,0.244118,0.066667
50%,0.3,0.352941,0.266667
75%,0.45,0.470588,0.466667
max,1.0,1.0,1.0


Unnamed: 0,intel2_1,intel2_2,intel2_3
count,269.0,180.0,106.0
mean,0.311217,0.405649,0.177201
std,0.128207,0.152169,0.090306
min,0.0,0.0,0.0
25%,0.25,0.324992,0.116667
50%,0.25,0.422489,0.166667
75%,0.375,0.494638,0.25
max,0.75,1.0,0.5


Unnamed: 0,sinc4_1,sinc4_2,sinc4_3
count,206.0,180.0,106.0
mean,0.324827,0.363333,0.304009
std,0.182237,0.179899,0.150124
min,0.0,0.0,0.0
25%,0.2,0.228571,0.2
50%,0.285714,0.285714,0.25
75%,0.428571,0.457143,0.375
max,1.0,1.0,1.0


Unnamed: 0,shar2_1,shar2_2,shar2_3
count,269.0,180.0,73.0
mean,0.420149,0.458548,0.301979
std,0.201022,0.181653,0.171087
min,0.0,0.0,0.0
25%,0.333333,0.333333,0.222222
50%,0.378667,0.5,0.333333
75%,0.555667,0.571667,0.444444
max,1.0,1.0,1.0


Unnamed: 0,attr1_1,attr1_2,attr1_3
count,269.0,237.0,146.0
mean,0.179981,0.210471,0.257071
std,0.099457,0.14961,0.14996
min,0.0,0.0,0.0
25%,0.1224,0.125,0.1875
50%,0.1509,0.1875,0.208375
75%,0.2,0.25,0.28125
max,0.9,1.0,0.9375


Unnamed: 0,amb5_1,amb5_2,amb5_3
count,155.0,136.0,73.0
mean,0.751971,0.658088,0.652968
std,0.184722,0.187722,0.212717
min,0.0,0.0,0.0
25%,0.666667,0.5,0.555556
50%,0.777778,0.625,0.666667
75%,0.888889,0.75,0.777778
max,1.0,1.0,1.0


Unnamed: 0,intel3_1,intel3_2,intel3_3
count,268.0,238.0,146.0
mean,0.756397,0.684174,0.745597
std,0.151356,0.193777,0.182633
min,0.0,0.0,0.0
25%,0.714286,0.5,0.714286
50%,0.714286,0.666667,0.714286
75%,0.857143,0.833333,0.857143
max,1.0,1.0,1.0


Unnamed: 0,sinc5_1,sinc5_2,sinc5_3
count,155.0,136.0,73.0
mean,0.793548,0.721507,0.751712
std,0.179715,0.191828,0.181014
min,0.111111,0.125,0.125
25%,0.666667,0.625,0.625
50%,0.777778,0.75,0.75
75%,0.888889,0.875,0.875
max,1.0,1.0,1.0


Unnamed: 0,amb3_1,amb3_2,amb3_3
count,268.0,238.0,146.0
mean,0.703825,0.68645,0.705479
std,0.205044,0.212586,0.207805
min,0.125,0.0,0.0
25%,0.625,0.625,0.666667
50%,0.75,0.75,0.777778
75%,0.875,0.875,0.888889
max,1.0,1.0,1.0


Unnamed: 0,fun5_1,fun5_2,fun5_3
count,155.0,136.0,73.0
mean,0.685484,0.665441,0.694064
std,0.205621,0.205752,0.177412
min,0.0,0.0,0.0
25%,0.625,0.5,0.666667
50%,0.75,0.625,0.666667
75%,0.875,0.75,0.777778
max,1.0,1.0,1.0


Unnamed: 0,fun3_1,fun3_2,fun3_3
count,268.0,238.0,146.0
mean,0.737407,0.753501,0.726884
std,0.178142,0.168437,0.178598
min,0.0,0.0,0.0
25%,0.625,0.666667,0.625
50%,0.75,0.777778,0.75
75%,0.875,0.888889,0.875
max,1.0,1.0,1.0


Unnamed: 0,sinc2_1,sinc2_2,sinc2_3
count,269.0,180.0,106.0
mean,0.224799,0.302364,0.203585
std,0.122803,0.137365,0.1355
min,0.0,0.0,0.0
25%,0.1,0.25,0.12
50%,0.2,0.315875,0.2
75%,0.3,0.389813,0.3
max,0.6,0.625,1.0


Unnamed: 0,intel1_1,intel1_2,intel1_3
count,269.0,238.0,146.0
mean,0.41942,0.477743,0.441038
std,0.136213,0.156051,0.13589
min,0.04,0.0,0.111111
25%,0.36,0.375,0.370444
50%,0.4,0.5,0.444444
75%,0.5,0.5,0.444444
max,1.0,1.0,0.888889


Unnamed: 0,shar1_1,shar1_2,shar1_3
count,268.0,238.0,146.0
mean,0.422515,0.398118,0.256653
std,0.197286,0.189863,0.124604
min,0.0,0.0,0.0
25%,0.333333,0.285714,0.181818
50%,0.440333,0.428571,0.272727
75%,0.533333,0.512214,0.327273
max,1.0,1.0,1.0


Unnamed: 0,attr4_1,attr4_2,attr4_3
count,206.0,180.0,106.0
mean,0.214995,0.193144,0.296816
std,0.164055,0.156907,0.199404
min,0.0,0.0,0.0
25%,0.055556,0.042553,0.125
50%,0.166667,0.148936,0.25
75%,0.277778,0.255319,0.375
max,0.833333,0.840426,1.0


Unnamed: 0,sinc3_1,sinc3_2,sinc3_3
count,268.0,238.0,146.0
mean,0.808769,0.770483,0.781678
std,0.171187,0.189487,0.170453
min,0.0,0.0,0.125
25%,0.75,0.65625,0.75
50%,0.875,0.75,0.75
75%,0.875,0.875,0.875
max,1.0,1.0,1.0


Unnamed: 0,amb1_1,amb1_2,amb1_3
count,269.0,238.0,146.0
mean,0.241577,0.514782,0.409349
std,0.103134,0.236303,0.189474
min,0.0,0.0,0.0
25%,0.188679,0.450045,0.333333
50%,0.283019,0.450045,0.5
75%,0.314528,0.675068,0.530333
max,0.566038,1.0,1.0


Unnamed: 0,intel5_1,intel5_2,intel5_3
count,155.0,136.0,73.0
mean,0.743779,0.71875,0.641553
std,0.18397,0.154223,0.233517
min,0.0,0.375,0.0
25%,0.714286,0.625,0.5
50%,0.714286,0.75,0.666667
75%,0.857143,0.875,0.833333
max,1.0,1.0,1.0


Unnamed: 0,fun1_1,fun1_2,fun1_3
count,269.0,238.0,146.0
mean,0.345536,0.349629,0.52792
std,0.113255,0.122427,0.170765
min,0.0,0.0,0.0
25%,0.3,0.3,0.424667
50%,0.36,0.3532,0.55
75%,0.4,0.4,0.666667
max,0.8,1.0,1.0


Unnamed: 0,intel4_1,intel4_2,intel4_3
count,206.0,180.0,106.0
mean,0.372954,0.326667,0.384591
std,0.179335,0.150061,0.191136
min,0.0,0.05,0.0
25%,0.257143,0.225,0.233333
50%,0.285714,0.25,0.333333
75%,0.457143,0.43125,0.5
max,0.857143,1.0,1.0


Unnamed: 0,attr5_1,attr5_2,attr5_3
count,155.0,136.0,73.0
mean,0.632258,0.608456,0.623288
std,0.176914,0.163022,0.182209
min,0.125,0.0,0.0
25%,0.5,0.5,0.5
50%,0.625,0.625,0.625
75%,0.75,0.75,0.75
max,1.0,1.0,1.0


Unnamed: 0,shar4_1,shar4_2,shar4_3
count,205.0,180.0,106.0
mean,0.27878,0.292778,0.260797
std,0.146928,0.144728,0.156994
min,0.0,0.0,0.0
25%,0.2,0.2,0.155556
50%,0.25,0.25,0.222222
75%,0.375,0.375,0.333333
max,1.0,0.75,1.0


Unnamed: 0,amb2_1,amb2_2,amb2_3
count,269.0,180.0,106.0
mean,0.182139,0.193008,0.148113
std,0.10647,0.093795,0.089646
min,0.0,0.0,0.0
25%,0.1,0.1,0.1
50%,0.2,0.2,0.15
75%,0.2564,0.279,0.2
max,0.6,0.4,0.5


Unnamed: 0,sinc1_1,sinc1_2,sinc1_3
count,269.0,238.0,146.0
mean,0.30358,0.331365,0.266366
std,0.116424,0.125946,0.125639
min,0.0,0.0,0.0
25%,0.25,0.29715,0.230769
50%,0.333333,0.3519,0.267538
75%,0.333333,0.4,0.307692
max,1.0,1.0,1.0


Unnamed: 0,fun2_1,fun2_2,fun2_3
count,269.0,180.0,106.0
mean,0.380884,0.4692,0.39434
std,0.133813,0.138135,0.20697
min,0.0,0.125,0.0
25%,0.3,0.375,0.225
50%,0.4,0.5,0.375
75%,0.439,0.5,0.5
max,1.0,1.0,1.0


Unnamed: 0,amb4_1,amb4_2,amb4_3
count,206.0,180.0,106.0
mean,0.236699,0.304127,0.260377
std,0.147277,0.165435,0.162417
min,0.0,0.0,0.0
25%,0.14,0.2,0.175
50%,0.2,0.285714,0.25
75%,0.3,0.407143,0.3625
max,1.0,1.0,1.0


In [6]:
#fc.dating_attributes_vs_time_hist(data = unique, gender = 0)

<h3>Stats and Frequency Charts for Males</h3>

In [7]:
fc.dating_attributes_vs_time_describe(data = unique, gender = 1)

Unnamed: 0,attr3_1,attr3_2,attr3_3
count,273.0,247.0,117.0
mean,0.617216,0.621964,0.616453
std,0.178385,0.17603,0.184769
min,0.0,0.0,0.0
25%,0.5,0.5,0.5
50%,0.625,0.625,0.625
75%,0.75,0.75,0.75
max,1.0,1.0,1.0


Unnamed: 0,fun4_1,fun4_2,fun4_3
count,211.0,188.0,87.0
mean,0.358715,0.315638,0.48659
std,0.174411,0.164184,0.244973
min,0.0,0.0,0.0
25%,0.211111,0.18,0.266667
50%,0.333333,0.3,0.5
75%,0.444444,0.4,0.666667
max,0.888889,1.0,1.0


Unnamed: 0,attr2_1,attr2_2,attr2_3
count,274.0,188.0,87.0
mean,0.250123,0.297583,0.223755
std,0.131555,0.154482,0.199601
min,0.0,0.0,0.0
25%,0.167475,0.196118,0.04
50%,0.2,0.235294,0.2
75%,0.3,0.352941,0.333333
max,0.95,0.941176,1.0


Unnamed: 0,intel2_1,intel2_2,intel2_3
count,274.0,188.0,87.0
mean,0.407306,0.50706,0.225287
std,0.166075,0.181016,0.135039
min,0.0,0.0,0.0
25%,0.25,0.324992,0.133333
50%,0.4035,0.524537,0.166667
75%,0.5,0.649984,0.333333
max,1.0,0.974976,1.0


Unnamed: 0,sinc4_1,sinc4_2,sinc4_3
count,211.0,188.0,87.0
mean,0.307786,0.313374,0.234483
std,0.194592,0.175776,0.132301
min,0.0,0.0,0.0
25%,0.157143,0.2,0.15
50%,0.285714,0.285714,0.25
75%,0.428571,0.428571,0.25
max,1.0,0.857143,0.55


Unnamed: 0,shar2_1,shar2_2,shar2_3
count,273.0,188.0,58.0
mean,0.373508,0.394723,0.219923
std,0.200398,0.193853,0.122616
min,0.0,0.0,0.0
25%,0.233333,0.333333,0.111111
50%,0.333333,0.352833,0.222222
75%,0.5,0.514167,0.305556
max,1.0,1.0,0.444444


Unnamed: 0,attr1_1,attr1_2,attr1_3
count,274.0,247.0,117.0
mean,0.27122,0.314084,0.364779
std,0.13848,0.191826,0.184088
min,0.0667,0.0625,0.0625
25%,0.1957,0.1875,0.2365
50%,0.23,0.25,0.3125
75%,0.3,0.4375,0.5
max,1.0,1.0,1.0


Unnamed: 0,amb5_1,amb5_2,amb5_3
count,159.0,141.0,58.0
mean,0.715584,0.672872,0.672414
std,0.217456,0.198577,0.191233
min,0.0,0.0,0.111111
25%,0.555556,0.5,0.555556
50%,0.777778,0.625,0.666667
75%,0.888889,0.875,0.777778
max,1.0,1.0,1.0


Unnamed: 0,intel3_1,intel3_2,intel3_3
count,273.0,247.0,117.0
mean,0.781266,0.721997,0.760684
std,0.158773,0.198452,0.169684
min,0.142857,0.0,0.142857
25%,0.714286,0.666667,0.714286
50%,0.857143,0.666667,0.714286
75%,0.857143,0.833333,0.857143
max,1.0,1.0,1.0


Unnamed: 0,sinc5_1,sinc5_2,sinc5_3
count,159.0,141.0,58.0
mean,0.747729,0.637411,0.644397
std,0.183166,0.198378,0.188487
min,0.0,0.0,0.0
25%,0.666667,0.5,0.5
50%,0.777778,0.625,0.625
75%,0.888889,0.75,0.75
max,1.0,1.0,1.0


Unnamed: 0,amb3_1,amb3_2,amb3_3
count,273.0,247.0,117.0
mean,0.690018,0.67915,0.690408
std,0.240172,0.231073,0.207782
min,0.0,0.0,0.111111
25%,0.625,0.5,0.555556
50%,0.75,0.75,0.777778
75%,0.875,0.875,0.888889
max,1.0,1.0,1.0


Unnamed: 0,fun5_1,fun5_2,fun5_3
count,159.0,141.0,58.0
mean,0.661164,0.649823,0.655172
std,0.241692,0.200011,0.192663
min,0.0,0.125,0.111111
25%,0.5,0.5,0.555556
50%,0.625,0.625,0.666667
75%,0.875,0.75,0.777778
max,1.0,1.0,1.0


Unnamed: 0,fun3_1,fun3_2,fun3_3
count,273.0,247.0,117.0
mean,0.687729,0.7139,0.653846
std,0.204815,0.17329,0.216123
min,0.0,0.111111,0.0
25%,0.625,0.666667,0.625
50%,0.75,0.777778,0.75
75%,0.875,0.777778,0.75
max,1.0,1.0,1.0


Unnamed: 0,sinc2_1,sinc2_2,sinc2_3
count,274.0,188.0,87.0
mean,0.302388,0.397437,0.242069
std,0.143536,0.155959,0.116979
min,0.0,0.0,0.0
25%,0.2,0.25,0.16
50%,0.3,0.424375,0.2
75%,0.4,0.5,0.4
max,1.0,1.0,0.5


Unnamed: 0,intel1_1,intel1_2,intel1_3
count,274.0,247.0,117.0
mean,0.389164,0.418786,0.42105
std,0.13558,0.167226,0.141068
min,0.0,0.0,0.0
25%,0.32335,0.366125,0.333333
50%,0.4,0.45,0.444444
75%,0.4433,0.5,0.444444
max,0.8572,1.0,1.0


Unnamed: 0,shar1_1,shar1_2,shar1_3
count,272.0,247.0,117.0
mean,0.366441,0.328915,0.197326
std,0.228924,0.181113,0.103888
min,0.0,0.0,0.0
25%,0.166667,0.264429,0.090909
50%,0.333333,0.285714,0.181818
75%,0.522333,0.428571,0.272727
max,1.0,0.857143,0.363636


Unnamed: 0,attr4_1,attr4_2,attr4_3
count,211.0,188.0,87.0
mean,0.259031,0.246039,0.343103
std,0.193822,0.189157,0.236356
min,0.0,0.010638,0.075
25%,0.055556,0.082447,0.125
50%,0.222222,0.228723,0.3125
75%,0.388889,0.361702,0.5
max,1.0,1.0,1.0


Unnamed: 0,sinc3_1,sinc3_2,sinc3_3
count,273.0,247.0,117.0
mean,0.762821,0.714575,0.714744
std,0.179218,0.181766,0.179683
min,0.0,0.125,0.0
25%,0.625,0.625,0.625
50%,0.75,0.75,0.75
75%,0.875,0.875,0.875
max,1.0,1.0,1.0


Unnamed: 0,amb1_1,amb1_2,amb1_3
count,272.0,247.0,117.0
mean,0.166547,0.383402,0.299199
std,0.118693,0.256528,0.19491
min,0.0,0.0,0.0
25%,0.09434,0.225023,0.166667
50%,0.188679,0.450045,0.333333
75%,0.24684,0.581008,0.486
max,1.0,0.90009,0.666667


Unnamed: 0,intel5_1,intel5_2,intel5_3
count,159.0,141.0,58.0
mean,0.758311,0.740248,0.672414
std,0.196159,0.166416,0.229342
min,0.142857,0.0,0.166667
25%,0.714286,0.625,0.5
50%,0.714286,0.75,0.666667
75%,0.857143,0.875,0.833333
max,1.0,1.0,1.0


Unnamed: 0,fun1_1,fun1_2,fun1_3
count,273.0,247.0,117.0
mean,0.349936,0.354727,0.546231
std,0.131829,0.125739,0.184544
min,0.0,0.0,0.0
25%,0.3,0.3,0.493667
50%,0.36,0.38,0.542667
75%,0.4,0.4,0.666667
max,1.0,0.8,1.0


Unnamed: 0,intel4_1,intel4_2,intel4_3
count,211.0,188.0,87.0
mean,0.343805,0.276729,0.389655
std,0.201286,0.145706,0.213819
min,0.0,0.0,0.0
25%,0.2,0.175,0.2
50%,0.285714,0.25,0.333333
75%,0.442857,0.375,0.583333
max,1.0,0.75,1.0


Unnamed: 0,attr5_1,attr5_2,attr5_3
count,159.0,141.0,58.0
mean,0.600629,0.597518,0.568966
std,0.198131,0.182427,0.197535
min,0.0,0.0,0.0
25%,0.5,0.5,0.5
50%,0.625,0.625,0.625
75%,0.75,0.75,0.625
max,1.0,1.0,0.875


Unnamed: 0,shar4_1,shar4_2,shar4_3
count,211.0,188.0,87.0
mean,0.270616,0.270213,0.229119
std,0.15398,0.167427,0.12704
min,0.0,0.0,0.0
25%,0.15,0.175,0.133333
50%,0.25,0.25,0.222222
75%,0.375,0.375,0.333333
max,0.75,1.0,0.555556


Unnamed: 0,amb2_1,amb2_2,amb2_3
count,273.0,188.0,87.0
mean,0.287103,0.285355,0.23954
std,0.142272,0.131452,0.141946
min,0.0,0.0,0.0
25%,0.2,0.2,0.16
50%,0.3,0.3,0.2
75%,0.4,0.36455,0.3
max,1.0,1.0,1.0


Unnamed: 0,sinc1_1,sinc1_2,sinc1_3
count,274.0,247.0,117.0
mean,0.273868,0.303767,0.245849
std,0.121165,0.137702,0.111039
min,0.0,0.0,0.0
25%,0.166667,0.2,0.153846
50%,0.2835,0.3,0.256462
75%,0.333333,0.4,0.307692
max,0.666667,0.6,0.538462


Unnamed: 0,fun2_1,fun2_2,fun2_3
count,274.0,188.0,87.0
mean,0.360158,0.430086,0.343678
std,0.132188,0.157693,0.182649
min,0.0,0.0,0.0
25%,0.3,0.375,0.2
50%,0.3866,0.43375,0.25
75%,0.4,0.5,0.5
max,0.88,1.0,1.0


Unnamed: 0,amb4_1,amb4_2,amb4_3
count,211.0,188.0,87.0
mean,0.154218,0.232979,0.200862
std,0.115613,0.158302,0.153659
min,0.0,0.0,0.0
25%,0.09,0.142857,0.125
50%,0.12,0.228571,0.175
75%,0.2,0.285714,0.25
max,0.6,0.857143,0.75


In [8]:
#fc.dating_attributes_vs_time_hist(data = unique, gender = 1)

<h2>Create Matched People DataFrame</h2>

In [9]:
#people_matched = data[data['match'] == 1].copy()
#people_matched.drop_duplicates(subset = 'iid', keep = 'first', inplace = True)
#display(people_matched)

<h2>Exploring Matches</h2>

In [10]:
#people_matched[['iid', 'gender', 'dec'] + fc.features_of_attraction + fc.preferences_of_attraction + ['dec_o', 'pid', 'goal', 'int_corr', 'match']]

<h2>Get Index for 'iid' for non-matches</h2>

In [11]:
#number = [int(i) for i in people_matched['iid']]
#not_ever_matched = [i for i in range(1,553) if i not in number]
#print not_ever_matched

In [12]:
#people_not_matched = data[data['iid'].isin(not_ever_matched)].copy()

<h2>Exploring Non-Matches</h2>

In [13]:
#people_not_matched[['iid', 'gender', 'dec'] + fc.features_of_attraction + fc.preferences_of_attraction + ['dec_o', 'pid', 'goal', 'int_corr', 'match']]

<h3>Non-Matched Females: Graphs</h3>

In [14]:
#fc.dating_attributes_vs_time(data = people_not_matched, gender = 0)

<h3>Non-Matched Males: Graphs</h3>

In [15]:
#fc.dating_attributes_vs_time(data = people_not_matched, gender = 1)

<h1>Features</h1>

In [16]:
for i, j in fc.data_cleaner.iteritems():
    print i, j, '\n'
for i, j in fc.master_list.items():
    print i, j, '\n'
print 'clean_up_1', '\n', fc.clean_up_1, '\n'
print 'clean_up_2', '\n', fc.clean_up_2, '\n'
print 'clean_up_3', '\n', fc.clean_up_3, '\n'
print 'clean_up_4', '\n', fc.clean_up_4, '\n'
print 'clean_up_5', '\n', fc.clean_up_5, '\n'
print 'features_of_attraction', '\n', fc.features_of_attraction, '\n'
print 'actual_decisions', '\n', fc.actual_decisions, '\n'
print 'preferences_of_attraction', '\n', fc.preferences_of_attraction, '\n'
print 'rating_by_partner_features', '\n', fc.rating_by_partner_features, '\n'
print 'halfway_questions', '\n', fc.halfway_questions, '\n'
print 'interests', '\n', fc.interests, '\n'
print 'list_of_lists', '\n', fc.list_of_lists, '\n'
print 'all columns in dataset', '\n'
for i in data.keys():
    print i,
to_drop = [i for i in data.keys() if i not in fc.list_of_lists]
print '\n'*2, 'to_drop', '\n', to_drop, '\n'

first_round ['attr1_1', 'sinc1_1', 'intel1_1', 'fun1_1', 'amb1_1', 'shar1_1', 'attr2_1', 'sinc2_1', 'intel2_1', 'fun2_1', 'amb2_1', 'shar2_1', 'attr3_1', 'sinc3_1', 'intel3_1', 'fun3_1', 'amb3_1', 'attr4_1', 'sinc4_1', 'intel4_1', 'fun4_1', 'amb4_1', 'shar4_1', 'attr5_1', 'sinc5_1', 'intel5_1', 'fun5_1', 'amb5_1'] 

second_round ['attr1_2', 'sinc1_2', 'intel1_2', 'fun1_2', 'amb1_2', 'shar1_2', 'attr2_2', 'sinc2_2', 'intel2_2', 'fun2_2', 'amb2_2', 'shar2_2', 'attr3_2', 'sinc3_2', 'intel3_2', 'fun3_2', 'amb3_2', 'attr4_2', 'sinc4_2', 'intel4_2', 'fun4_2', 'amb4_2', 'shar4_2', 'attr5_2', 'sinc5_2', 'intel5_2', 'fun5_2', 'amb5_2'] 

third_round ['attr1_3', 'sinc1_3', 'intel1_3', 'fun1_3', 'amb1_3', 'shar1_3', 'attr2_3', 'sinc2_3', 'intel2_3', 'fun2_3', 'amb2_3', 'shar2_3', 'attr3_3', 'sinc3_3', 'intel3_3', 'fun3_3', 'amb3_3', 'attr4_3', 'sinc4_3', 'intel4_3', 'fun4_3', 'amb4_3', 'shar4_3', 'attr5_3', 'sinc5_3', 'intel5_3', 'fun5_3', 'amb5_3'] 

how_you_measure_attr ['attr3_1', 'attr3_2', '

<h2>Dating Attributes as a function of Time: Distributing 100pts</h2>

<h3>Female Attributes</h3>

In [17]:
#fc.dating_attributes_vs_time_describe(unique[(unique['wave'] >= 6) & (unique['wave']<= 11)], 0)

<h3>Male Attributes</h3>

In [18]:
#fc.dating_attributes_vs_time_describe(unique[(unique['wave'] >= 6) & (unique['wave']<= 11)], 1)

<h2>Dating Attributes as a function of Time: Likert Scale</h2>

<h3>Female Attributes</h3>

In [19]:
#fc.dating_attributes_vs_time_describe(unique[(unique['wave'] >= 15) & (unique['wave']<= 20)], 0)

<h3>Male Attributes</h3>

In [20]:
#fc.dating_attributes_vs_time_describe(unique[(unique['wave'] >= 15) & (unique['wave']<= 20)], 1)

<h3>Female Subset</h3>

In [21]:
#women = data[data['gender'] == 0].copy()
#women_decision = women['dec'].copy()
#women.drop(['dec', 'dec_o', 'match'], axis = 1, inplace = True)

<h3>Male Subset</h3>

In [22]:
#men = data[data['gender'] == 1].copy()
#men_decision = men['dec'].copy()
#men.drop(['dec', 'dec_o', 'match'], axis = 1, inplace = True)

In [23]:
for i in fc.list_of_lists:
    data[i] = (data[i] - data[i].min()) / (data[i].max() - data[i].min())
entry = data[['iid', 'gender', 'age', 'race'] + fc.list_of_lists + ['age_o', 'race_o']].copy()
target = data['dec'].copy()

In [24]:
from sklearn.feature_selection import RFE
from sklearn.svm import SVC
clf = SVC(kernel = 'linear')
rfe = RFE(clf)

In [25]:
#rfe = rfe.fit(entry, target)

In [26]:
entry[fc.list_of_lists].count()

attr1_1     8299
sinc1_1     8299
intel1_1    8299
fun1_1      8289
amb1_1      8279
shar1_1     8257
attr1_2     7445
sinc1_2     7463
intel1_2    7463
fun1_2      7463
amb1_2      7463
shar1_2     7463
attr1_3     3974
sinc1_3     3974
intel1_3    3974
fun1_3      3974
amb1_3      3974
shar1_3     3974
attr2_1     8299
sinc2_1     8299
intel2_1    8299
fun2_1      8299
amb2_1      8289
shar2_1     8289
attr3_1     8273
sinc3_1     8273
intel3_1    8273
fun3_1      8273
amb3_1      8273
attr3_2     7463
sinc3_2     7463
intel3_2    7463
fun3_2      7463
amb3_2      7463
attr3_3     3974
sinc3_3     3974
intel3_3    3974
fun3_3      3974
amb3_3      3974
attr5_1     4906
sinc5_1     4906
intel5_1    4906
fun5_1      4906
amb5_1      4906
attr5_2     4377
sinc5_2     4377
intel5_2    4377
fun5_2      4377
amb5_2      4377
attr5_3     2016
sinc5_3     2016
intel5_3    2016
fun5_3      2016
amb5_3      2016
attr4_1     6489
sinc4_1     6489
intel4_1    6489
fun4_1      6489
amb4_1      64

In [33]:
data[data['dec'] == 0]['income'].count() + data[data['dec'] == 1]['income'].count()

4279

In [39]:
data[data['match'] == 0]['income'].count()

3543

In [37]:
data.count()

iid         8378
id          8377
gender      8378
idg         8378
condtn      8378
wave        8378
round       8378
position    8378
positin1    6532
order       8378
partner     8378
pid         8368
match       8378
int_corr    8220
samerace    8378
age_o       8274
race_o      8305
pf_o_att    8289
pf_o_sin    8289
pf_o_int    8289
pf_o_fun    8280
pf_o_amb    8271
pf_o_sha    8249
dec_o       8378
attr_o      8166
sinc_o      8091
intel_o     8072
fun_o       8018
amb_o       7656
shar_o      7302
like_o      8128
prob_o      8060
met_o       7993
age         8283
field       8315
field_cd    8296
undergra    4914
mn_sat      3133
tuition     3583
race        8315
imprace     8299
imprelig    8299
from        8299
zipcode     7314
income      4279
goal        8299
date        8281
go_out      8299
career      8289
career_c    8240
sports      8299
tvsports    8299
exercise    8299
dining      8299
museums     8299
art         8299
hiking      8299
gaming      8299
clubbing    82