<h1>Speed Dating: Who to Date Long Term</h1>

What influences love at first sight? (Or, at least, love in the first four minutes?) This dataset was compiled by Columbia Business School professors Ray Fisman and Sheena Iyengar for their paper Gender Differences in Mate Selection: Evidence From a Speed Dating Experiment.<br>

Data was gathered from participants in experimental speed dating events from 2002-2004. During the events, the attendees would have a four minute "first date" with every other participant of the opposite sex. At the end of their four minutes, participants were asked if they would like to see their date again. They were also asked to rate their date on six attributes: Attractiveness, Sincerity, Intelligence, Fun, Ambition, and Shared Interests.<br>

The dataset also includes questionnaire data gathered from participants at different points in the process. These fields include: demographics, dating habits, self-perception across key attributes, beliefs on what others find valuable in a mate, and lifestyle information. See the Speed Dating Data Key document below for details.<br>

For more analysis from Iyengar and Fisman, read Racial Preferences in Dating.<br>

Data Exploration Ideas<br>

What are the least desirable attributes in a male partner? Does this differ for female partners?<br>
How important do people think attractiveness is in potential mate selection vs. its real impact?<br>
Are shared interests more important than a shared racial background?<br>
Can people accurately predict their own perceived value in the dating market?<br>
In terms of getting a second date, is it better to be someone's first speed date of the night or their last?

In [1]:
import pandas as pd
import numpy as np
import sklearn
from IPython.display import display
%matplotlib inline

print('pandas version is {}.'.format(pd.__version__))
print('numpy version is {}.'.format(np.__version__))
print('scikit-learn version is {}.'.format(sklearn.__version__))

pandas version is 0.18.0.
numpy version is 1.10.4.
scikit-learn version is 0.17.1.


In [2]:
data = pd.read_csv("Speed Dating Data.csv")
print "This set has {} data points and {} features.".format(*data.shape)

This set has 8378 data points and 195 features.


<h6>Initialize features to explore what Women and Men look for in a partner.</h6>

In [3]:
features = ["attr1_", "sinc1_", "intel1_", "fun1_", "amb1_", "shar1_"]#general feature set will append number at end
of_the_day = [j + str(i) for i in range(1,2) for j in features]#feature set for desired traits at start of event waves 1 - 5 & waves 10 - 21
half_way = [j + 's' for j in features]#feature set for desired traits half way through event waves 6 - 9
a_day_after = [j + str(i) for i in range(2,3) for j in features]#feature set for desired traits day after the event
three_weeks_after = [j + str(i) for i in range(3,4) for j in features]#feature set for desired traits 3 - 4 weeks after the event
you_looking_for = ['iid', 'wave', 'gender']#exploratory features
listing_of_features = [of_the_day, half_way, a_day_after, three_weeks_after]
for i in listing_of_features:
    for j in i:
        you_looking_for.append(j)

In [4]:
stuff = pd.DataFrame(data = data, columns = you_looking_for)
unique_iid = stuff.copy()
unique_iid = unique_iid.drop_duplicates(keep = 'first')
self_perception_of_the_day_starting = pd.DataFrame(data = unique_iid, columns = you_looking_for[0:9])

<h6>This block looks at what both Women and Men look for in an ideal partner at the beginning of the event. Statistics include both genders.</h6>

In [5]:
waves_men_women = self_perception_of_the_day_starting.copy()
waves_men_women.fillna(value = 0, inplace = True)
new_waves_men_women = waves_men_women.drop(['iid', 'wave', 'gender'], axis = 1)
new_waves_men_women.describe()

Unnamed: 0,attr1_1,sinc1_1,intel1_1,fun1_1,amb1_1,shar1_1
count,551.0,551.0,551.0,551.0,551.0,551.0
mean,22.397278,17.071089,19.914229,17.197985,10.629964,11.617387
std,13.137566,7.416477,7.199345,6.491248,6.328234,6.606905
min,0.0,0.0,0.0,0.0,0.0,0.0
25%,15.0,14.0,17.01,15.0,5.0,7.25
50%,20.0,18.0,20.0,18.0,10.0,10.0
75%,25.0,20.0,22.865,20.0,15.0,16.0
max,100.0,60.0,50.0,50.0,53.0,30.0


<h6>This block looks at what Women look for in an ideal partner at the beginning of the event.</h6>

In [6]:
#women_at_start = waves_men_women[waves_men_women['gender'] == 0]
women_at_start = waves_men_women.copy()
women_at_start = women_at_start[women_at_start['gender'] == 0]
women_at_start.drop(['iid', 'wave', 'gender'], axis = 1, inplace = True)
women_at_start.describe()

Unnamed: 0,attr1_1,sinc1_1,intel1_1,fun1_1,amb1_1,shar1_1
count,274.0,274.0,274.0,274.0,274.0,274.0
mean,17.691533,17.889708,20.588321,16.983431,12.584562,12.419781
std,10.130907,7.332601,7.310479,6.048468,5.677342,6.136102
min,0.0,0.0,0.0,0.0,0.0,0.0
25%,11.655,15.0,17.5375,15.0,10.0,10.0
50%,15.045,19.23,20.0,17.585,15.0,12.885
75%,20.0,20.0,25.0,20.0,16.5925,16.0
max,90.0,60.0,50.0,40.0,30.0,30.0


<h6>This block looks at what Men look for in an ideal partner at the beginning of the event.</h6>

In [7]:
men_at_start = waves_men_women.copy()
men_at_start = men_at_start[men_at_start['gender'] == 1]
men_at_start.drop(['iid', 'wave', 'gender'], axis = 1, inplace = True)
men_at_start.describe()

Unnamed: 0,attr1_1,sinc1_1,intel1_1,fun1_1,amb1_1,shar1_1
count,277.0,277.0,277.0,277.0,277.0,277.0
mean,27.052058,16.261336,19.247437,17.410217,8.696534,10.823682
std,14.09581,7.423188,7.037473,6.905798,6.354256,6.962196
min,0.0,0.0,0.0,0.0,0.0,0.0
25%,19.44,10.0,15.79,15.0,5.0,5.0
50%,23.0,17.0,20.0,18.0,10.0,10.0
75%,30.0,20.0,22.0,20.0,13.04,15.56
max,100.0,40.0,42.86,50.0,53.0,30.0


<h6>This block looks at what Women and Men look for in an ideal partner halfway through the Speed Dating event.</h6>

In [8]:
waves_6_9 = unique_iid.copy()
waves_6_9 = waves_6_9[~((waves_6_9['wave'] < 6 ) | (waves_6_9['wave'] > 17))]
waves_6_9 = waves_6_9[(waves_6_9['wave'] < 12) | (waves_6_9['wave'] > 14)]
waves_6_9 = waves_6_9[["iid", "wave", "gender", "attr1_s", "sinc1_s", "intel1_s", "fun1_s", "amb1_s", "shar1_s"]]
waves_6_9.fillna(value = 0, inplace = True)
waves_6_9_both_genders = waves_6_9.copy()
waves_6_9_both_genders.drop(['iid', 'wave', 'gender'], axis = 1, inplace = True)
waves_6_9_both_genders.describe()

Unnamed: 0,attr1_s,sinc1_s,intel1_s,fun1_s,amb1_s,shar1_s
count,237.0,237.0,237.0,237.0,237.0,237.0
mean,22.138776,16.289747,18.03827,15.836498,11.348523,12.673882
std,13.211174,7.123288,6.675128,5.62298,5.88231,6.230171
min,0.0,0.0,0.0,0.0,0.0,0.0
25%,15.22,12.0,16.0,13.64,8.11,10.0
50%,19.23,17.02,19.0,16.67,11.63,13.89
75%,25.0,20.0,20.0,20.0,15.38,16.33
max,95.0,50.0,40.0,40.0,23.81,30.0


<h6>This block looks at what Women look for in an ideal partner halfway through the Speed Dating event.</h6>

In [9]:
waves_6_9_women = waves_6_9.copy()
waves_6_9_women = waves_6_9_women[waves_6_9_women['gender'] == 0]
waves_6_9_women.drop(['iid', 'wave', 'gender'], axis = 1, inplace = True)
waves_6_9_women.describe()

Unnamed: 0,attr1_s,sinc1_s,intel1_s,fun1_s,amb1_s,shar1_s
count,115.0,115.0,115.0,115.0,115.0,115.0
mean,17.965739,17.112696,18.65,15.532522,12.882261,13.866261
std,8.83678,7.349096,6.514154,5.737216,5.495261,5.907598
min,0.0,0.0,0.0,0.0,0.0,0.0
25%,14.58,15.0,16.67,11.605,10.0,10.0
50%,16.0,18.0,19.15,16.67,14.89,15.0
75%,20.0,20.0,20.0,19.59,16.67,16.67
max,60.0,50.0,40.0,40.0,21.95,30.0


<h6>This block looks at what Men look for in an ideal partner halfway through the Speed Dating event.</h6>

In [10]:
waves_6_9_men = waves_6_9.copy()
waves_6_9_men = waves_6_9_men[waves_6_9_men['gender'] == 1]
waves_6_9_men.drop(['iid', 'wave', 'gender'], axis = 1, inplace = True)
waves_6_9_men.describe()

Unnamed: 0,attr1_s,sinc1_s,intel1_s,fun1_s,amb1_s,shar1_s
count,122.0,122.0,122.0,122.0,122.0,122.0
mean,26.072377,15.514016,17.461639,16.123033,9.902787,11.549918
std,15.319681,6.843337,6.799631,5.521408,5.890104,6.340318
min,0.0,0.0,0.0,0.0,0.0,0.0
25%,18.0,10.085,15.0,14.185,5.0,7.3825
50%,20.0,15.895,18.6,17.02,10.0,11.37
75%,30.0,20.0,20.0,20.0,14.545,16.0
max,95.0,30.0,35.0,35.0,23.81,25.0


<h6>This block looks at what Women and Men look for in an ideal partner one day after the Speed Dating event.</h6>

In [11]:
waves_men_women_day_after = pd.DataFrame(unique_iid, 
                                           columns = ["iid", "wave", "gender", "attr1_2", "sinc1_2", "intel1_2", "fun1_2", "amb1_2", "shar1_2"])
waves_men_women_day_after.dropna(inplace = True)
both_genders_day_after = waves_men_women_day_after.copy()
both_genders_day_after.drop(["iid", "wave", "gender"], axis = 1, inplace = True)
both_genders_day_after.describe()

Unnamed: 0,attr1_2,sinc1_2,intel1_2,fun1_2,amb1_2,shar1_2
count,484.0,484.0,484.0,484.0,484.0,484.0
mean,26.080227,15.869339,17.908512,17.569153,9.98469,12.689649
std,14.379509,6.627533,6.575431,6.118782,5.659893,6.594027
min,5.0,0.0,0.0,0.0,0.0,0.0
25%,16.67,10.0,15.0,15.0,5.0,10.0
50%,20.0,16.67,19.0,18.0,10.0,13.02
75%,30.0,20.0,20.0,20.0,15.0,16.67
max,85.0,50.0,40.0,50.0,22.22,35.0


<h6>This block looks at what Women look for in an ideal partner one day after the Speed Dating event.</h6>

In [12]:
women_day_after = waves_men_women_day_after.copy()
women_day_after = women_day_after[women_day_after['gender'] == 0]
women_day_after.drop(["iid", "wave", "gender"], axis = 1, inplace = True)
women_day_after.describe()

Unnamed: 0,attr1_2,sinc1_2,intel1_2,fun1_2,amb1_2,shar1_2
count,237.0,237.0,237.0,237.0,237.0,237.0
mean,21.862996,16.562194,19.114388,17.394895,11.495148,13.916962
std,11.945715,6.299193,6.243993,5.946725,5.207794,6.643789
min,5.0,0.0,0.0,0.0,0.0,0.0
25%,15.0,14.81,15.0,15.0,10.0,10.0
50%,20.0,17.54,20.0,17.54,10.0,15.0
75%,25.0,20.0,20.0,20.0,15.0,17.86
max,85.0,50.0,40.0,50.0,22.22,35.0


<h6>This block looks at what Men look for in an ideal partner one day after the Speed Dating event.</h6>

In [13]:
men_day_after = waves_men_women_day_after.copy()
men_day_after = men_day_after[men_day_after['gender'] == 1]
men_day_after
men_day_after.drop(["iid", "wave", "gender"], axis = 1, inplace = True)
men_day_after.describe()

Unnamed: 0,attr1_2,sinc1_2,intel1_2,fun1_2,amb1_2,shar1_2
count,247.0,247.0,247.0,247.0,247.0,247.0
mean,30.126721,15.204534,16.751457,17.736356,8.535385,11.512024
std,15.346057,6.875195,6.689023,6.286965,5.707111,6.338956
min,10.0,0.0,0.0,0.0,0.0,0.0
25%,20.0,10.0,14.645,15.0,5.0,9.255
50%,25.0,15.0,18.0,19.0,10.0,10.0
75%,40.0,20.0,20.0,20.0,12.91,15.0
max,85.0,30.0,40.0,40.0,20.0,30.0


<h6>This block looks at what Women and Men look for in an ideal partner 3 - 4 weeks after the Speed Dating event.</h6>

In [14]:
waves_men_women_weeks_after = pd.DataFrame(unique_iid, 
                                           columns = ["iid", "wave", "gender", "attr1_3", "sinc1_3", "intel1_3", "fun1_3", "amb1_3", "shar1_3"])
waves_men_women_weeks_after.dropna(inplace = True)
both_genders_weeks_after = pd.DataFrame(waves_men_women_weeks_after, 
                                       columns = ["iid", "wave", "gender", "attr1_3", "sinc1_3", "intel1_3", "fun1_3", "amb1_3", "shar1_3"])
both_genders_weeks_after.drop(["iid", "wave", "gender"], axis = 1, inplace = True)
both_genders_weeks_after.describe()

Unnamed: 0,attr1_3,sinc1_3,intel1_3,fun1_3,amb1_3,shar1_3
count,263.0,263.0,263.0,263.0,263.0,263.0
mean,24.398935,16.720494,19.446578,16.089582,10.818023,12.679506
std,13.931388,7.772678,6.223928,5.300232,5.977424,6.5602
min,0.0,0.0,0.0,0.0,0.0,0.0
25%,15.045,10.0,16.5,13.705,5.0,10.0
50%,20.0,17.07,20.0,16.28,10.0,14.0
75%,30.0,20.0,20.0,20.0,15.0,16.67
max,80.0,65.0,45.0,30.0,30.0,55.0


<h6>This block looks at what Women look for in an ideal partner 3 - 4 weeks after the Speed Dating event.</h6>

In [15]:
women_weeks_after = waves_men_women_weeks_after.copy()
women_weeks_after = women_weeks_after[women_weeks_after['gender'] == 0]
women_weeks_after.drop(["iid", "wave", "gender"], axis = 1, inplace = True)
women_weeks_after.describe()

Unnamed: 0,attr1_3,sinc1_3,intel1_3,fun1_3,amb1_3,shar1_3
count,146.0,146.0,146.0,146.0,146.0,146.0
mean,20.565685,17.313767,19.846712,15.851301,12.294178,14.129589
std,11.996768,8.166509,6.115045,5.109891,5.681105,6.846931
min,0.0,0.0,5.0,0.0,0.0,0.0
25%,15.0,15.0,16.67,12.74,10.0,10.0
50%,16.67,17.39,20.0,16.5,15.0,15.0
75%,22.5,20.0,20.0,20.0,15.91,18.0
max,75.0,65.0,40.0,30.0,30.0,55.0


<h6>This block looks at what Men look for in an ideal partner 3 - 4 weeks after the Speed Dating event.</h6>

In [16]:
men_weeks_after = waves_men_women_weeks_after.copy()
men_weeks_after = men_weeks_after[men_weeks_after['gender'] == 1]
men_weeks_after.drop(["iid", "wave", "gender"], axis = 1, inplace = True)
men_weeks_after.describe()

Unnamed: 0,attr1_3,sinc1_3,intel1_3,fun1_3,amb1_3,shar1_3
count,117.0,117.0,117.0,117.0,117.0,117.0
mean,29.182308,15.980171,18.947265,16.386923,8.975983,10.87
std,14.727061,7.217537,6.348041,5.536321,5.847305,5.714267
min,5.0,0.0,0.0,0.0,0.0,0.0
25%,18.92,10.0,15.0,14.81,5.0,5.0
50%,25.0,16.67,20.0,16.28,10.0,10.0
75%,40.0,20.0,20.0,20.0,14.58,15.0
max,80.0,35.0,45.0,30.0,20.0,20.0


<h6>This block looks at how both Women and Men rate themselves, respectively, at the start of the Speed Dating event.</h6>

In [17]:
self_perceived = ["iid", "gender", "wave", "attr3_1", "sinc3_1", "fun3_1", "intel3_1", "amb3_1",
                  "attr3_s", "sinc3_s", "intel3_s", "fun3_s", "amb3_s", 
                  "attr3_2", "sinc3_2", "intel3_2", "fun3_2", "amb3_2", 
                  "attr3_3", "sinc3_3", "intel3_3", "fun3_3", "amb3_3"]
self_rating = pd.DataFrame(data = data, columns = self_perceived)
self_rating.drop_duplicates(keep = 'first', inplace = True)

In [18]:
self_rating_on_the_day = self_rating.copy()
self_rating_on_the_day.drop(self_rating_on_the_day.columns[8:], axis = 1, inplace = True)
self_rating_on_the_day.fillna(value = 1, inplace = True)
self_rating_on_the_day_both = self_rating_on_the_day.copy()
self_rating_on_the_day_both.drop(['iid', 'wave', 'gender'], axis = 1, inplace = True)
self_rating_on_the_day_both.describe()

Unnamed: 0,attr3_1,sinc3_1,fun3_1,intel3_1,amb3_1
count,551.0,551.0,551.0,551.0,551.0
mean,6.99274,8.166969,7.591652,8.264973,7.470054
std,1.580547,1.678007,1.754538,1.430393,1.958318
min,1.0,1.0,1.0,1.0,1.0
25%,6.0,7.0,7.0,8.0,7.0
50%,7.0,8.0,8.0,8.0,8.0
75%,8.0,9.0,9.0,9.0,9.0
max,10.0,10.0,10.0,10.0,10.0


This block looks at how Women rate themselves at the start of the Speed Dating event.

In [19]:
self_rating_on_the_day_women = self_rating_on_the_day.copy()
self_rating_on_the_day_women = self_rating_on_the_day_women[self_rating_on_the_day_women['gender'] == 0]
self_rating_on_the_day_women.drop(['iid', 'wave', 'gender'], axis = 1, inplace = True)
self_rating_on_the_day_women.describe()

Unnamed: 0,attr3_1,sinc3_1,fun3_1,intel3_1,amb3_1
count,274.0,274.0,274.0,274.0,274.0
mean,7.105839,8.306569,7.748175,8.135036,7.485401
std,1.605771,1.741803,1.734823,1.497255,1.891219
min,1.0,1.0,1.0,1.0,1.0
25%,7.0,8.0,7.0,8.0,7.0
50%,7.0,9.0,8.0,8.0,8.0
75%,8.0,9.0,9.0,9.0,9.0
max,10.0,10.0,10.0,10.0,10.0


This block looks at how Men rate themselves at the start of the Speed Dating event.

In [20]:
self_rating_on_the_day_men = self_rating_on_the_day.copy()
self_rating_on_the_day_men = self_rating_on_the_day_men[self_rating_on_the_day_men['gender'] == 1]
self_rating_on_the_day_men.drop(['iid', 'wave', 'gender'], axis = 1, inplace = True)
self_rating_on_the_day_men.describe()

Unnamed: 0,attr3_1,sinc3_1,fun3_1,intel3_1,amb3_1
count,277.0,277.0,277.0,277.0,277.0
mean,6.880866,8.028881,7.436823,8.393502,7.454874
std,1.549975,1.603629,1.763298,1.351473,2.025817
min,1.0,1.0,1.0,1.0,1.0
25%,6.0,7.0,7.0,8.0,6.0
50%,7.0,8.0,8.0,9.0,8.0
75%,8.0,9.0,9.0,9.0,9.0
max,10.0,10.0,10.0,10.0,10.0


<h6>This block looks at how both how Women and Men rate themselves halfway through the Speed Dating event.</h6>

In [21]:
self_rating_half_way = self_rating.copy()
self_rating_half_way.drop(self_rating_half_way.columns[3:8], axis = 1, inplace = True)
self_rating_half_way.drop(self_rating_half_way.columns[8:], axis = 1, inplace = True)
self_rating_half_way.dropna(inplace = True)

In [22]:
self_rating_half_way_both = self_rating_half_way.copy()
self_rating_half_way_both.drop(['iid', 'wave', 'gender'], axis = 1, inplace = True)
self_rating_half_way_both.describe()

Unnamed: 0,attr3_s,sinc3_s,intel3_s,fun3_s,amb3_s
count,276.0,276.0,276.0,276.0,276.0
mean,7.20471,8.103261,8.230072,7.706522,7.568841
std,1.376007,1.460146,1.202483,1.617212,1.810768
min,3.0,1.0,4.0,3.0,2.0
25%,7.0,7.0,8.0,7.0,7.0
50%,7.0,8.0,8.0,8.0,8.0
75%,8.0,9.0,9.0,9.0,9.0
max,10.0,10.0,10.0,10.0,10.0


<h6>This block looks at how Women rate themselves half through the Speed Dating event.</h6>

In [23]:
self_rating_half_way_women = self_rating_half_way.copy()
self_rating_half_way_women = self_rating_half_way_women[self_rating_half_way_women['gender'] == 0]
self_rating_half_way_women.drop(['iid', 'wave', 'gender'], axis = 1, inplace = True)
self_rating_half_way_women.describe()

Unnamed: 0,attr3_s,sinc3_s,intel3_s,fun3_s,amb3_s
count,131.0,131.0,131.0,131.0,131.0
mean,7.442748,8.278626,8.10687,7.969466,7.541985
std,1.253673,1.373037,1.151934,1.488101,1.785667
min,3.0,3.0,5.0,3.0,3.0
25%,7.0,8.0,7.0,7.0,7.0
50%,8.0,8.0,8.0,8.0,8.0
75%,8.0,9.0,9.0,9.0,9.0
max,10.0,10.0,10.0,10.0,10.0


<h6>This block looks at how Men rate themselves half way through the Speed Dating event.</h6>

In [24]:
self_rating_half_way_men = self_rating_half_way.copy()
self_rating_half_way_men = self_rating_half_way_men[self_rating_half_way_men['gender'] == 1]
self_rating_half_way_men.drop(['iid', 'wave', 'gender'], axis = 1, inplace = True)
self_rating_half_way_men.describe()

Unnamed: 0,attr3_s,sinc3_s,intel3_s,fun3_s,amb3_s
count,145.0,145.0,145.0,145.0,145.0
mean,6.989655,7.944828,8.341379,7.468966,7.593103
std,1.448741,1.521966,1.239824,1.695808,1.838993
min,3.0,1.0,4.0,3.0,2.0
25%,6.0,7.0,8.0,7.0,7.0
50%,7.0,8.0,8.0,8.0,8.0
75%,8.0,9.0,9.0,9.0,9.0
max,10.0,10.0,10.0,10.0,10.0


<h6>This block looks at how both how Women and Men rate themselves a day after the Speed Dating event.</h6>

In [25]:
self_rating_day_after = self_rating.copy()
self_rating_day_after.drop(self_rating_day_after.columns[3:13], axis = 1, inplace = True)
self_rating_day_after.drop(self_rating_day_after.columns[8:], axis = 1, inplace = True)
self_rating_day_after.dropna(inplace = True)

In [26]:
self_rating_day_after_both = self_rating_day_after.copy()
self_rating_day_after_both.drop(['iid', 'wave', 'gender'], axis = 1, inplace = True)
self_rating_day_after_both.describe()

Unnamed: 0,attr3_2,sinc3_2,intel3_2,fun3_2,amb3_2
count,485.0,485.0,485.0,485.0,485.0
mean,7.115464,7.936082,8.220619,7.6,7.461856
std,1.37084,1.500013,1.181288,1.547058,1.775955
min,2.0,2.0,4.0,1.0,2.0
25%,6.0,7.0,8.0,7.0,7.0
50%,7.0,8.0,8.0,8.0,8.0
75%,8.0,9.0,9.0,9.0,9.0
max,10.0,10.0,10.0,10.0,10.0


<h6>This block looks at how Women rate themselves a day after the Speed Dating event.</h6>

In [27]:
self_rating_day_after_women = self_rating_day_after.copy()
self_rating_day_after_women = self_rating_day_after_women[self_rating_day_after_women['gender'] == 0]
self_rating_day_after_women.drop(['iid', 'wave', 'gender'], axis = 1, inplace = True)
self_rating_day_after_women.describe()

Unnamed: 0,attr3_2,sinc3_2,intel3_2,fun3_2,amb3_2
count,238.0,238.0,238.0,238.0,238.0
mean,7.260504,8.163866,8.105042,7.781513,7.491597
std,1.318241,1.515892,1.162665,1.515933,1.700686
min,2.0,2.0,4.0,1.0,2.0
25%,7.0,7.25,7.0,7.0,7.0
50%,7.0,8.0,8.0,8.0,8.0
75%,8.0,9.0,9.0,9.0,9.0
max,10.0,10.0,10.0,10.0,10.0


<h6>This block looks at how Men rate themselves a day after the Speed Dating event.</h6>

In [28]:
self_rating_day_after_men = self_rating_day_after.copy()
self_rating_day_after_men = self_rating_day_after_men[self_rating_day_after_men['gender'] == 1]
self_rating_day_after_men.drop(['iid', 'wave', 'gender'], axis = 1, inplace = True)
self_rating_day_after_men.describe()

Unnamed: 0,attr3_2,sinc3_2,intel3_2,fun3_2,amb3_2
count,247.0,247.0,247.0,247.0,247.0
mean,6.975709,7.716599,8.331984,7.425101,7.433198
std,1.408243,1.454125,1.190712,1.559607,1.848585
min,2.0,3.0,4.0,2.0,2.0
25%,6.0,7.0,8.0,7.0,6.0
50%,7.0,8.0,8.0,8.0,8.0
75%,8.0,9.0,9.0,8.0,9.0
max,10.0,10.0,10.0,10.0,10.0


<h6>This block looks at how Women and Men rate themselves 3 - 4 weeks after the Speed Dating event.</h6>

In [29]:
self_rating_weeks_after = self_rating.copy()
self_rating_weeks_after.drop(self_rating_weeks_after.columns[3:18], axis = 1, inplace = True)
self_rating_weeks_after.dropna(inplace = True)

In [30]:
self_rating_weeks_after_both = self_rating_weeks_after.copy()
self_rating_weeks_after_both.drop(['iid', 'wave', 'gender'], axis = 1, inplace = True)
self_rating_weeks_after_both.describe()

Unnamed: 0,attr3_3,sinc3_3,intel3_3,fun3_3,amb3_3
count,263.0,263.0,263.0,263.0,263.0
mean,7.193916,8.091255,8.372624,7.623574,7.365019
std,1.581272,1.570007,1.450826,1.73396,2.012327
min,2.0,2.0,3.0,2.0,1.0
25%,6.5,7.0,8.0,7.0,6.0
50%,7.0,8.0,8.0,8.0,8.0
75%,8.0,9.0,9.0,9.0,9.0
max,12.0,12.0,12.0,12.0,12.0


<h6>This block looks at how Women rate themselves 3 - 4 weeks after the Speed Dating event.</h6>

In [31]:
self_rating_weeks_after_women = self_rating_weeks_after.copy()
self_rating_weeks_after_women = self_rating_weeks_after_women[self_rating_weeks_after_women['gender'] == 0]
self_rating_weeks_after_women.drop(['iid', 'wave', 'gender'], axis = 1, inplace = True)
self_rating_weeks_after_women.describe()

Unnamed: 0,attr3_3,sinc3_3,intel3_3,fun3_3,amb3_3
count,146.0,146.0,146.0,146.0,146.0
mean,7.376712,8.321918,8.342466,7.89726,7.458904
std,1.57191,1.494518,1.519702,1.600558,2.07157
min,2.0,3.0,3.0,2.0,1.0
25%,7.0,8.0,8.0,7.0,7.0
50%,7.0,8.0,8.0,8.0,8.0
75%,8.0,9.0,9.0,9.0,9.0
max,12.0,12.0,12.0,12.0,12.0


<h6>This block looks at how Men rate themselves 3 - 4 weeks after the Speed Dating event.</h6>

In [32]:
self_rating_weeks_after_men = self_rating_weeks_after.copy()
self_rating_weeks_after_men = self_rating_weeks_after_men[self_rating_weeks_after_men['gender'] == 1]
self_rating_weeks_after_men.drop(['iid', 'wave', 'gender'], axis = 1, inplace = True)
self_rating_weeks_after_men.describe()

Unnamed: 0,attr3_3,sinc3_3,intel3_3,fun3_3,amb3_3
count,117.0,117.0,117.0,117.0,117.0
mean,6.965812,7.803419,8.410256,7.282051,7.247863
std,1.569821,1.620128,1.365548,1.837568,1.938298
min,2.0,2.0,4.0,2.0,2.0
25%,6.0,7.0,8.0,7.0,6.0
50%,7.0,8.0,8.0,8.0,8.0
75%,8.0,9.0,9.0,8.0,9.0
max,12.0,12.0,12.0,12.0,12.0


<h1>Data Exploration</h1>

In [33]:
matches = data[data['match'] == 1].copy()

In [34]:
classifier_features = ["iid", "gender",  "wave", "pid", "samerace", "age_o", "race_o", "pf_o_att", "pf_o_sin",
                       "pf_o_int", "pf_o_fun", "pf_o_amb", "pf_o_sha", "attr_o", "sinc_o", "intel_o", "fun_o", 
                       "amb_o", "shar_o", "like_o", "attr_", "sinc_", "intel_", "fun_", 
                       "amb_", "shar_", "like_", "age", "race", "imprace", "imprelig", "income", 
                       "goal", "go_out", "sports", "tvsports", "exercise", "dining", "museums", "art", "hiking", "gaming", 
                       "clubbing", "reading", "tv", "theater", "movies", "concerts", "music", "shopping", "yoga", "dec_o", "dec", "match"]
new_stuff = pd.DataFrame(data = matches, columns = classifier_features)

In [35]:
women_matched = new_stuff.copy()
women_matched = women_matched[women_matched['gender'] == 0]
women_matched.drop(['iid', 'gender', 'wave', 'pid', 'dec_o', 'dec'], axis = 1).describe()
#new_stuff[new_stuff[(new_stuff['gender'] == 0) & (new_stuff['match'] == 1)].columns[4:]].drop(['dec_o', 'dec'], axis = 1)

Unnamed: 0,samerace,age_o,race_o,pf_o_att,pf_o_sin,pf_o_int,pf_o_fun,pf_o_amb,pf_o_sha,attr_o,...,clubbing,reading,tv,theater,movies,concerts,music,shopping,yoga,match
count,690.0,682.0,685.0,685.0,685.0,685.0,683.0,680.0,680.0,689.0,...,680.0,680.0,680.0,680.0,680.0,680.0,680.0,680.0,680.0,690.0
mean,0.410145,26.353372,2.627737,28.073971,15.464569,19.669883,18.170878,8.386912,10.348206,7.520319,...,6.275,7.991176,5.779412,7.622059,8.180882,7.35,8.070588,6.505882,5.226471,1.0
std,0.492217,3.21017,1.240209,15.416928,7.299436,6.677274,7.241189,6.012384,6.977522,1.417404,...,2.35432,1.718798,2.439522,2.069439,1.602994,1.975065,1.726776,2.287179,2.576409,0.0
min,0.0,20.0,1.0,6.67,0.0,0.0,0.0,0.0,0.0,2.0,...,1.0,2.0,1.0,1.0,2.0,1.0,1.0,1.0,1.0,1.0
25%,0.0,24.0,2.0,20.0,10.0,17.24,15.0,5.0,5.0,7.0,...,5.0,7.0,4.0,7.0,8.0,6.0,7.0,5.0,3.0,1.0
50%,0.0,26.0,2.0,23.0,16.28,20.0,18.75,10.0,10.0,8.0,...,7.0,8.0,6.0,8.0,8.0,8.0,8.0,7.0,6.0,1.0
75%,1.0,28.0,3.0,30.0,20.0,23.81,20.0,12.0,15.1225,8.0,...,8.0,9.0,8.0,9.0,9.0,9.0,10.0,8.0,7.0,1.0
max,1.0,42.0,6.0,100.0,40.0,42.86,50.0,53.0,30.0,10.0,...,10.0,13.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,1.0
