<h1>Speed Dating: Who to Date Long Term</h1>

What influences love at first sight? (Or, at least, love in the first four minutes?) This dataset was compiled by Columbia Business School professors Ray Fisman and Sheena Iyengar for their paper Gender Differences in Mate Selection: Evidence From a Speed Dating Experiment.<br>

Data was gathered from participants in experimental speed dating events from 2002-2004. During the events, the attendees would have a four minute "first date" with every other participant of the opposite sex. At the end of their four minutes, participants were asked if they would like to see their date again. They were also asked to rate their date on six attributes: Attractiveness, Sincerity, Intelligence, Fun, Ambition, and Shared Interests.<br>

The dataset also includes questionnaire data gathered from participants at different points in the process. These fields include: demographics, dating habits, self-perception across key attributes, beliefs on what others find valuable in a mate, and lifestyle information. See the Speed Dating Data Key document below for details.<br>

For more analysis from Iyengar and Fisman, read Racial Preferences in Dating.<br>

Data Exploration Ideas<br>

What are the least desirable attributes in a male partner? Does this differ for female partners?<br>
How important do people think attractiveness is in potential mate selection vs. its real impact?<br>
Are shared interests more important than a shared racial background?<br>
Can people accurately predict their own perceived value in the dating market?<br>
In terms of getting a second date, is it better to be someone's first speed date of the night or their last?

In [1]:
import pandas as pd
import numpy as np
import sklearn
from IPython.display import display
%matplotlib inline
pd.set_option('display.max_columns', None)
#pd.set_option('display.max_rows', None)

print('pandas version is {}.'.format(pd.__version__))
print('numpy version is {}.'.format(np.__version__))
print('scikit-learn version is {}.'.format(sklearn.__version__))

pandas version is 0.18.0.
numpy version is 1.10.4.
scikit-learn version is 0.17.1.


In [2]:
data = pd.read_csv("Speed Dating Data.csv")
print "This set has {} data points and {} features.".format(*data.shape)

This set has 8378 data points and 195 features.


<h1>Data Exploration</h1>

In [3]:
import features_creator as fc #importing feature names made in file features_creator.py
for i in fc.clean_up_2:
    data.replace(to_replace = 12.0, value = 10.0, inplace = True)

<h3>Import locale to change Income and Tuition to int from string type</h3>

<h2>Unique Profiles</h2>

In [4]:
unique = data.copy()
unique.drop_duplicates(subset = 'iid', keep = 'first', inplace = True)
for i in fc.list_of_lists:
    unique[i] = (unique[i] - unique[i].min()) / (unique[i].max() - unique[i].min())

<h3>Clustering for Unique Entries</h3>

In [5]:
#unique_clustered = sklearn.cluster.KMeans(random_state = 0).fit(unique_scaled[fc.list_of_lists])

<h3>Stats and Frequency Charts for Females</h3>

In [6]:
#fc.dating_attributes_vs_time(data = unique, gender = 0)

<h3>Stats and Frequency Charts for Males</h3>

In [7]:
#fc.dating_attributes_vs_time(data = unique, gender = 1)

<h2>Create Matched People DataFrame</h2>

In [8]:
people_matched = data[data['match'] == 1].copy()
people_matched.drop_duplicates(subset = 'iid', keep = 'first', inplace = True)
#display(people_matched)

<h2>Exploring Matches</h2>

In [9]:
#people_matched[['iid', 'gender', 'dec'] + fc.features_of_attraction + fc.preferences_of_attraction + ['dec_o', 'pid', 'goal', 'int_corr', 'match']]

<h2>Get Index for 'iid' for non-matches</h2>

In [10]:
number = [int(i) for i in people_matched['iid']]
not_ever_matched = [i for i in range(1,553) if i not in number]
print not_ever_matched

[3, 11, 12, 21, 24, 25, 26, 32, 33, 40, 41, 42, 54, 59, 65, 68, 72, 73, 88, 96, 101, 111, 118, 121, 123, 124, 131, 133, 139, 143, 145, 158, 170, 177, 182, 189, 198, 203, 204, 209, 216, 222, 234, 236, 247, 249, 254, 255, 257, 262, 267, 272, 278, 286, 287, 295, 298, 302, 314, 318, 320, 321, 327, 329, 331, 334, 347, 405, 418, 425, 427, 430, 440, 443, 444, 451, 454, 455, 457, 459, 461, 463, 465, 466, 477, 479, 483, 487, 497, 498, 502, 503, 506, 514, 517, 519, 520, 525, 527, 528, 543]


In [11]:
people_not_matched = data[data['iid'].isin(not_ever_matched)].copy()

<h2>Exploring Non-Matches</h2>

In [12]:
#people_not_matched[['iid', 'gender', 'dec'] + fc.features_of_attraction + fc.preferences_of_attraction + ['dec_o', 'pid', 'goal', 'int_corr', 'match']]

<h2>Non-Matched Females: Graphs</h2>

In [13]:
#fc.dating_attributes_vs_time(data = people_not_matched, gender = 0)

<h2>Non-Matched Males: Graphs</h2>

In [14]:
#fc.dating_attributes_vs_time(data = people_not_matched, gender = 1)

<h1>Features</h1>

In [15]:
for i, j in fc.data_cleaner.iteritems():
    print i, j, '\n'
for i, j in fc.master_list.items():
    print i, j, '\n'
print 'clean_up_1', '\n', fc.clean_up_1, '\n'
print 'clean_up_2', '\n', fc.clean_up_2, '\n'
print 'clean_up_3', '\n', fc.clean_up_3, '\n'
print 'clean_up_4', '\n', fc.clean_up_4, '\n'
print 'clean_up_5', '\n', fc.clean_up_5, '\n'
print 'list_of_lists', '\n', fc.list_of_lists, '\n'
print 'all columns in dataset', '\n'
for i in data.keys():
    print i,

first_round ['attr1_1', 'sinc1_1', 'intel1_1', 'fun1_1', 'amb1_1', 'shar1_1', 'attr2_1', 'sinc2_1', 'intel2_1', 'fun2_1', 'amb2_1', 'shar2_1', 'attr3_1', 'sinc3_1', 'intel3_1', 'fun3_1', 'amb3_1', 'attr4_1', 'sinc4_1', 'intel4_1', 'fun4_1', 'amb4_1', 'shar4_1', 'attr5_1', 'sinc5_1', 'intel5_1', 'fun5_1', 'amb5_1'] 

second_round ['attr1_2', 'sinc1_2', 'intel1_2', 'fun1_2', 'amb1_2', 'shar1_2', 'attr2_2', 'sinc2_2', 'intel2_2', 'fun2_2', 'amb2_2', 'shar2_2', 'attr3_2', 'sinc3_2', 'intel3_2', 'fun3_2', 'amb3_2', 'attr4_2', 'sinc4_2', 'intel4_2', 'fun4_2', 'amb4_2', 'shar4_2', 'attr5_2', 'sinc5_2', 'intel5_2', 'fun5_2', 'amb5_2'] 

third_round ['attr1_3', 'sinc1_3', 'intel1_3', 'fun1_3', 'amb1_3', 'shar1_3', 'attr2_3', 'sinc2_3', 'intel2_3', 'fun2_3', 'amb2_3', 'shar2_3', 'attr3_3', 'sinc3_3', 'intel3_3', 'fun3_3', 'amb3_3', 'attr4_3', 'sinc4_3', 'intel4_3', 'fun4_3', 'amb4_3', 'shar4_3', 'attr5_3', 'sinc5_3', 'intel5_3', 'fun5_3', 'amb5_3'] 

how_you_measure_attr ['attr3_1', 'attr3_2', '

In [16]:
unique[fc.list_of_lists]

Unnamed: 0,attr1_1,sinc1_1,intel1_1,fun1_1,amb1_1,shar1_1,attr1_2,sinc1_2,intel1_2,fun1_2,amb1_2,shar1_2,attr1_3,sinc1_3,intel1_3,fun1_3,amb1_3,shar1_3,attr2_1,sinc2_1,intel2_1,fun2_1,amb2_1,shar2_1,attr3_1,sinc3_1,intel3_1,fun3_1,amb3_1,attr3_2,sinc3_2,intel3_2,fun3_2,amb3_2,attr3_3,sinc3_3,intel3_3,fun3_3,amb3_3,attr5_1,sinc5_1,intel5_1,fun5_1,amb5_1,attr5_2,sinc5_2,intel5_2,fun5_2,amb5_2,attr5_3,sinc5_3,intel5_3,fun5_3,amb5_3,attr4_1,sinc4_1,intel4_1,fun4_1,amb4_1,shar4_1,attr4_2,sinc4_2,intel4_2,fun4_2,amb4_2,shar4_2,attr4_3,sinc4_3,intel4_3,fun4_3,amb4_3,shar4_3,attr2_3,sinc2_3,intel2_3,fun2_3,amb2_3,shar2_3,attr2_2,sinc2_2,intel2_2,fun2_2,amb2_2,shar2_2,attr1_s,sinc1_s,intel1_s,fun1_s,amb1_s,shar1_s,attr3_s,sinc3_s,intel3_s,fun3_s,amb3_s,attr7_2,sinc7_2,intel7_2,fun7_2,amb7_2,shar7_2,attr7_3,sinc7_3,intel7_3,fun7_3,amb7_3,shar7_3,attr,sinc,intel,fun,amb,shar,attr_o,sinc_o,intel_o,fun_o,amb_o,shar_o,pf_o_att,pf_o_sin,pf_o_int,pf_o_fun,pf_o_amb,pf_o_sha,like_o,prob_o,imprace,imprelig,like,prob,exphappy,expnum,match_es,satis_2,you_call,them_cal,numdat_3,num_in_3,sports,tvsports,exercise,dining,museums,art,hiking,gaming,clubbing,reading,tv,theater,movies,concerts,music,shopping,yoga
0,0.1500,0.333333,0.4000,0.3000,0.283019,0.500000,0.180500,0.3334,0.34725,0.4444,0.500000,0.476286,0.1875,0.307692,0.444444,0.500000,0.500000,0.272727,0.35,0.40,0.375,0.40,0.10,0.166667,0.500,0.750,0.714286,0.750,0.625,0.500,0.625,0.666667,0.666667,0.500,0.375,0.625,0.571429,0.625,0.666667,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0.6,0.888889,0.7,0.7,0.6,0.5,0.6,0.8,0.8,0.727273,0.8,0.6,0.583333,0.272727,0.4,0.75,0.000000,0.25,0.7,0.4,0.2,0.333333,0.70,0.6,0.222222,0.10,0.222222,0.555556,0.047619,0.111111,,,0.888889,0.111111,0.777778,0.888889,0.1,0.1,0.5,0.071429,0.5,0.416667,0.888889,0.1,1.0,1.0,0.888889,0.777778,0.1
10,0.4500,0.083333,0.5000,0.4000,0.000000,0.166667,0.174000,0.3784,0.54050,0.5406,0.243474,0.231714,0.3750,0.076923,0.888889,0.500000,0.000000,0.181818,0.65,0.00,0.250,0.50,0.00,0.000000,0.625,0.375,0.714286,1.000,0.125,0.625,0.500,0.666667,0.888889,0.250,0.625,0.500,0.857143,0.875,0.333333,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0.5,0.666667,0.8,0.4,0.6,0.3,0.8,0.7,0.6,0.818182,0.7,0.4,0.583333,0.272727,0.4,0.75,0.000000,0.25,0.7,0.2,0.2,0.444444,0.60,0.4,0.333333,0.25,0.166667,0.444444,0.000000,0.000000,,,0.222222,0.111111,0.666667,1.000000,0.8,0.6,0.3,0.357143,0.8,0.750000,0.000000,0.9,0.8,0.7,0.777778,0.222222,0.1
20,0.3500,0.166667,0.7000,0.2000,0.188679,0.000000,,,,,,,,,,,,,0.50,0.00,0.500,0.60,0.00,0.000000,0.750,0.875,0.857143,0.750,0.750,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0.7,0.888889,1.0,0.7,0.8,0.9,0.7,0.8,0.6,0.454545,0.8,0.4,0.583333,0.272727,0.4,0.75,0.000000,0.25,0.5,0.2,0.8,0.333333,0.80,0.7,0.333333,0.10,,,,,,,0.222222,0.777778,0.666667,0.777778,0.5,0.5,0.8,0.285714,0.5,0.500000,0.777778,0.7,0.7,0.7,0.444444,0.777778,0.7
30,0.2000,0.333333,0.4000,0.4000,0.188679,0.333333,0.239250,0.2758,0.51725,0.5518,0.465347,0.098571,0.2500,0.307692,0.444444,0.666667,0.000000,0.363636,0.30,0.20,0.375,0.60,0.10,0.333333,0.625,0.750,0.571429,0.875,0.750,0.500,0.750,0.500000,0.777778,0.500,0.500,0.375,0.428571,0.750,0.444444,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0.4,1.000000,0.8,0.5,0.8,0.7,0.6,0.7,0.8,0.636364,0.7,0.5,0.583333,0.272727,0.4,0.75,0.000000,0.25,0.6,0.3,0.1,0.000000,0.60,0.7,0.000000,0.10,0.111111,0.333333,0.000000,0.000000,,,0.000000,0.000000,0.555556,0.666667,0.6,0.7,0.7,0.357143,0.7,0.500000,0.666667,0.9,0.7,0.8,0.666667,0.000000,0.8
40,0.2000,0.083333,0.5000,0.5000,0.188679,0.500000,0.134875,0.2632,0.46050,0.3158,0.710621,0.601429,0.3750,0.153846,0.444444,0.666667,0.333333,0.181818,0.50,0.20,0.250,0.40,0.10,0.166667,0.500,0.125,1.000000,0.500,0.750,0.500,0.500,0.833333,0.888889,0.875,0.250,0.375,1.000000,0.500,1.000000,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0.5,0.777778,0.8,0.2,0.2,0.2,0.6,0.8,0.8,0.727273,0.7,0.6,0.583333,0.272727,0.4,0.75,0.000000,0.25,0.6,0.5,0.8,0.000000,0.70,0.5,0.666667,0.50,,0.666667,0.000000,0.000000,,,0.666667,0.333333,0.666667,0.666667,0.6,0.8,0.6,0.428571,0.8,0.416667,0.777778,0.6,0.6,0.3,0.666667,0.777778,0.3
50,0.1000,0.416667,0.4000,0.5000,0.094340,0.500000,0.065750,0.5128,0.51275,0.4616,0.115212,0.512857,,,,,,,0.25,0.20,0.250,0.50,0.20,0.666667,0.375,0.625,0.857143,0.750,0.375,0.375,0.750,0.833333,0.777778,0.125,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0.6,0.888889,0.7,0.6,0.7,0.6,0.7,0.9,0.8,0.818182,0.8,0.4,0.583333,0.272727,0.4,0.75,0.000000,0.25,0.7,0.4,0.1,0.000000,0.60,0.6,0.333333,0.15,0.222222,0.222222,,,,,1.000000,0.777778,0.888889,0.666667,0.8,0.7,0.9,0.142857,0.6,0.666667,0.111111,0.5,0.6,0.6,0.333333,0.000000,0.1
60,0.1500,0.250000,0.5000,0.4000,0.283019,0.333333,,,,,,,,,,,,,0.30,0.20,0.325,0.50,0.20,0.333333,0.500,0.500,0.571429,0.375,0.625,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0.4,0.777778,0.8,0.5,0.7,0.4,0.7,0.8,0.8,0.727273,0.7,0.7,0.583333,0.272727,0.4,0.75,0.000000,0.25,0.7,0.4,0.2,0.333333,0.70,0.4,0.666667,0.15,0.111111,,,,,,0.444444,0.222222,0.333333,1.000000,1.0,1.0,0.2,0.214286,0.8,0.583333,0.777778,1.0,1.0,1.0,1.000000,1.000000,1.0
70,0.0909,0.303000,0.5454,0.3636,0.343019,0.303000,,,,,,,,,,,,,0.30,0.20,0.500,0.60,0.10,0.166667,0.625,0.250,0.714286,0.750,0.750,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0.5,0.777778,0.7,0.4,0.7,0.5,0.8,0.7,0.6,0.545455,0.6,0.5,0.583333,0.272727,0.4,0.75,0.000000,0.25,0.7,0.4,0.1,0.000000,0.40,0.5,0.555556,0.50,,,,,,,0.111111,0.111111,0.000000,1.000000,0.9,0.9,0.3,0.142857,1.0,0.583333,1.000000,0.9,0.9,0.6,0.555556,0.777778,0.6
80,0.2000,0.166667,0.4000,0.6000,0.188679,0.333333,0.104125,0.3556,0.55550,0.4444,0.599910,0.317429,,,,,,,0.30,0.40,0.250,0.40,0.20,0.333333,0.625,0.500,0.571429,1.000,0.625,0.625,1.000,0.666667,1.000000,0.875,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0.8,1.000000,0.9,0.8,0.7,0.6,0.6,0.8,0.7,0.727273,0.7,0.5,0.583333,0.272727,0.4,0.75,0.000000,0.25,0.6,0.4,0.1,0.000000,0.50,0.5,0.555556,0.75,0.055556,0.555556,,,,,0.333333,0.222222,0.000000,0.777778,0.6,0.7,0.2,0.142857,1.0,0.583333,0.777778,1.0,1.0,0.9,0.888889,0.777778,0.3
90,0.1500,0.250000,0.3000,0.8000,0.188679,0.166667,0.383875,0.0714,0.26775,0.5714,0.160666,0.510286,,,,,,,0.45,0.20,0.375,0.40,0.10,0.166667,0.500,0.750,0.428571,1.000,0.875,1.000,0.375,1.000000,1.000000,1.000,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0.6,1.000000,1.0,0.6,0.6,,0.7,0.8,0.7,0.727273,0.6,0.5,0.583333,0.272727,0.4,0.75,0.000000,0.25,0.6,0.3,0.4,0.333333,0.60,0.3,0.666667,0.50,,0.111111,,,,,0.888889,0.888889,0.888889,0.666667,0.6,0.6,0.7,0.071429,0.7,0.583333,0.444444,0.6,0.7,0.7,0.777778,0.666667,0.7
