<h1>Speed Dating: Who to Date Long Term</h1>

What influences love at first sight? (Or, at least, love in the first four minutes?) This dataset was compiled by Columbia Business School professors Ray Fisman and Sheena Iyengar for their paper Gender Differences in Mate Selection: Evidence From a Speed Dating Experiment.<br>

Data was gathered from participants in experimental speed dating events from 2002-2004. During the events, the attendees would have a four minute "first date" with every other participant of the opposite sex. At the end of their four minutes, participants were asked if they would like to see their date again. They were also asked to rate their date on six attributes: Attractiveness, Sincerity, Intelligence, Fun, Ambition, and Shared Interests.<br>

The dataset also includes questionnaire data gathered from participants at different points in the process. These fields include: demographics, dating habits, self-perception across key attributes, beliefs on what others find valuable in a mate, and lifestyle information. See the Speed Dating Data Key document below for details.<br>

For more analysis from Iyengar and Fisman, read Racial Preferences in Dating.<br>

Data Exploration Ideas<br>

What are the least desirable attributes in a male partner? Does this differ for female partners?<br>
How important do people think attractiveness is in potential mate selection vs. its real impact?<br>
Are shared interests more important than a shared racial background?<br>
Can people accurately predict their own perceived value in the dating market?<br>
In terms of getting a second date, is it better to be someone's first speed date of the night or their last?

In [1]:
import pandas as pd
import numpy as np
import sklearn 
from IPython.display import display
%matplotlib inline
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

print('pandas version is {}.'.format(pd.__version__))
print('numpy version is {}.'.format(np.__version__))
print('scikit-learn version is {}.'.format(sklearn.__version__))

pandas version is 0.18.0.
numpy version is 1.10.4.
scikit-learn version is 0.17.1.


In [2]:
data = pd.read_csv("Speed Dating Data.csv")
print "This set has {} data points and {} features.".format(*data.shape)

This set has 8378 data points and 195 features.


<h1>Data Exploration</h1>

In [3]:
import features_creator as fc #importing feature names made in file features_creator.py
for i in fc.clean_up_2:
    data.replace(to_replace = 12.0, value = 10.0, inplace = True)

<h3>Import locale to change Income and Tuition to int from string type</h3>

<h2>Unique Profiles</h2>

In [4]:
unique = data.copy()
unique.drop_duplicates(subset = 'iid', keep = 'first', inplace = True)
for i in fc.list_of_lists:
    unique[i] = (unique[i] - unique[i].min()) / (unique[i].max() - unique[i].min())

<h3>Set up Clustering DataFrame</h3>

In [5]:
to_cluster = unique[['gender'] + fc.clean_up_2[:5] + fc.interests]

In [6]:
to_cluster.dropna(axis = 0, inplace = True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':


In [7]:
women_to_cluster = to_cluster[to_cluster['gender'] == 0].copy()
men_to_cluster = to_cluster[to_cluster['gender'] == 1].copy()

<h3>Clustering for Unique Entries</h3>

In [8]:
from sklearn.cluster import KMeans
unique_clustered_women = KMeans(random_state = 0).fit(women_to_cluster)
unique_clustered_men = KMeans(random_state = 0).fit(men_to_cluster)

In [9]:
women_to_cluster['cluster'] = unique_clustered_women.labels_
men_to_cluster['cluster'] = unique_clustered_men.labels_

In [10]:
for i in range(0, 8):
    print 'Cluster ', str(i)
    display(women_to_cluster[women_to_cluster['cluster'] == i].describe())

Cluster  0


Unnamed: 0,gender,attr3_1,sinc3_1,intel3_1,fun3_1,amb3_1,sports,tvsports,exercise,dining,museums,art,hiking,gaming,clubbing,reading,tv,theater,movies,concerts,music,shopping,yoga,cluster
count,35.0,35.0,35.0,35.0,35.0,35.0,35.0,35.0,35.0,35.0,35.0,35.0,35.0,35.0,35.0,35.0,35.0,35.0,35.0,35.0,35.0,35.0,35.0,35.0
mean,0.0,0.678571,0.907143,0.763265,0.767857,0.710714,0.679365,0.24127,0.730159,0.831746,0.834286,0.822857,0.76,0.179592,0.514286,0.6,0.304762,0.805714,0.862857,0.854286,0.904762,0.479365,0.68,0.0
std,0.0,0.164064,0.092667,0.103613,0.184766,0.209541,0.1939,0.237216,0.191098,0.162522,0.130481,0.130802,0.214476,0.13007,0.270232,0.13528,0.25472,0.174799,0.135225,0.135783,0.108274,0.210079,0.257591,0.0
min,0.0,0.125,0.75,0.571429,0.125,0.25,0.333333,0.0,0.333333,0.444444,0.6,0.5,0.2,0.071429,0.1,0.333333,0.0,0.4,0.3,0.3,0.666667,0.0,0.1,0.0
25%,0.0,0.625,0.875,0.714286,0.625,0.625,0.5,0.0,0.555556,0.722222,0.7,0.7,0.7,0.071429,0.3,0.583333,0.055556,0.75,0.8,0.8,0.888889,0.333333,0.55,0.0
50%,0.0,0.75,0.875,0.714286,0.75,0.75,0.666667,0.222222,0.777778,0.888889,0.9,0.8,0.8,0.142857,0.5,0.666667,0.333333,0.8,0.9,0.9,0.888889,0.444444,0.7,0.0
75%,0.0,0.75,1.0,0.857143,0.875,0.875,0.777778,0.388889,0.888889,1.0,0.9,0.9,0.9,0.25,0.7,0.666667,0.555556,0.9,0.95,0.9,1.0,0.611111,0.9,0.0
max,0.0,1.0,1.0,1.0,1.0,1.0,1.0,0.777778,1.0,1.0,1.0,1.0,1.0,0.571429,0.9,0.75,0.888889,1.0,1.0,1.0,1.0,1.0,1.0,0.0


Cluster  1


Unnamed: 0,gender,attr3_1,sinc3_1,intel3_1,fun3_1,amb3_1,sports,tvsports,exercise,dining,museums,art,hiking,gaming,clubbing,reading,tv,theater,movies,concerts,music,shopping,yoga,cluster
count,43.0,43.0,43.0,43.0,43.0,43.0,43.0,43.0,43.0,43.0,43.0,43.0,43.0,43.0,43.0,43.0,43.0,43.0,43.0,43.0,43.0,43.0,43.0,43.0
mean,0.0,0.665698,0.819767,0.79402,0.787791,0.758721,0.361757,0.268734,0.366925,0.873385,0.837209,0.830233,0.574419,0.214286,0.681395,0.633721,0.648579,0.85814,0.874419,0.813953,0.863049,0.824289,0.634884,1.0
std,0.0,0.176188,0.179496,0.129672,0.169327,0.150385,0.216993,0.168391,0.209328,0.129625,0.129142,0.135462,0.226876,0.142857,0.209598,0.112304,0.243589,0.148376,0.138173,0.130167,0.123245,0.14392,0.225604,0.0
min,0.0,0.0,0.25,0.571429,0.375,0.375,0.0,0.0,0.0,0.444444,0.6,0.6,0.1,0.071429,0.1,0.333333,0.0,0.4,0.4,0.5,0.555556,0.444444,0.1,1.0
25%,0.0,0.625,0.75,0.714286,0.6875,0.625,0.222222,0.111111,0.222222,0.777778,0.7,0.7,0.45,0.071429,0.6,0.583333,0.555556,0.8,0.8,0.7,0.777778,0.722222,0.5,1.0
50%,0.0,0.75,0.875,0.857143,0.875,0.75,0.333333,0.222222,0.444444,0.888889,0.8,0.8,0.6,0.142857,0.7,0.666667,0.666667,0.9,0.9,0.8,0.888889,0.888889,0.7,1.0
75%,0.0,0.75,0.875,0.857143,0.875,0.875,0.444444,0.444444,0.5,1.0,0.95,1.0,0.7,0.321429,0.85,0.75,0.777778,1.0,1.0,0.9,1.0,0.888889,0.8,1.0
max,0.0,1.0,1.0,1.0,1.0,1.0,0.777778,0.777778,0.777778,1.0,1.0,1.0,1.0,0.571429,1.0,0.75,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


Cluster  2


Unnamed: 0,gender,attr3_1,sinc3_1,intel3_1,fun3_1,amb3_1,sports,tvsports,exercise,dining,museums,art,hiking,gaming,clubbing,reading,tv,theater,movies,concerts,music,shopping,yoga,cluster
count,34.0,34.0,34.0,34.0,34.0,34.0,34.0,34.0,34.0,34.0,34.0,34.0,34.0,34.0,34.0,34.0,34.0,34.0,34.0,34.0,34.0,34.0,34.0,34.0
mean,0.0,0.628676,0.790441,0.705882,0.621324,0.613971,0.251634,0.137255,0.686275,0.784314,0.723529,0.717647,0.55,0.176471,0.564706,0.553922,0.411765,0.770588,0.761765,0.617647,0.686275,0.457516,0.505882,2.0
std,0.0,0.128678,0.156066,0.110887,0.185878,0.214016,0.175807,0.16875,0.198141,0.14459,0.149866,0.156613,0.217771,0.135338,0.243562,0.162683,0.211048,0.174997,0.130302,0.174895,0.167441,0.223506,0.266221,0.0
min,0.0,0.375,0.375,0.428571,0.25,0.125,0.0,0.0,0.333333,0.444444,0.5,0.5,0.1,0.0,0.1,0.166667,0.0,0.3,0.3,0.2,0.333333,0.0,0.1,2.0
25%,0.0,0.53125,0.75,0.607143,0.5,0.53125,0.111111,0.0,0.555556,0.666667,0.6,0.6,0.4,0.071429,0.425,0.5,0.333333,0.7,0.7,0.5,0.555556,0.25,0.3,2.0
50%,0.0,0.625,0.8125,0.714286,0.625,0.625,0.222222,0.111111,0.666667,0.777778,0.75,0.7,0.6,0.142857,0.65,0.583333,0.444444,0.8,0.8,0.6,0.666667,0.555556,0.5,2.0
75%,0.0,0.75,0.875,0.714286,0.75,0.75,0.333333,0.222222,0.861111,0.888889,0.8,0.8,0.7,0.267857,0.775,0.666667,0.555556,0.9,0.8,0.7,0.777778,0.638889,0.7,2.0
max,0.0,0.875,1.0,0.857143,1.0,0.875,0.555556,0.666667,1.0,1.0,1.0,1.0,0.9,0.571429,0.9,0.75,0.777778,1.0,1.0,1.0,1.0,0.777778,1.0,2.0


Cluster  3


Unnamed: 0,gender,attr3_1,sinc3_1,intel3_1,fun3_1,amb3_1,sports,tvsports,exercise,dining,museums,art,hiking,gaming,clubbing,reading,tv,theater,movies,concerts,music,shopping,yoga,cluster
count,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0
mean,0.0,0.621875,0.7875,0.725,0.734375,0.75625,0.677778,0.608333,0.686111,0.683333,0.525,0.5,0.43,0.2375,0.57,0.485417,0.688889,0.655,0.82,0.69,0.758333,0.647222,0.3725,3.0
std,0.0,0.161343,0.174954,0.178333,0.139216,0.162488,0.24108,0.262681,0.193086,0.143554,0.161325,0.176867,0.175704,0.186114,0.211466,0.175906,0.169165,0.203747,0.13625,0.175119,0.150734,0.235887,0.234234,0.0
min,0.0,0.125,0.0,0.285714,0.375,0.5,0.111111,0.0,0.111111,0.333333,0.1,0.1,0.1,0.071429,0.1,0.083333,0.333333,0.1,0.6,0.3,0.333333,0.111111,0.1,3.0
25%,0.0,0.5,0.75,0.571429,0.625,0.625,0.527778,0.527778,0.555556,0.555556,0.475,0.4,0.3,0.071429,0.475,0.395833,0.555556,0.6,0.7,0.6,0.666667,0.444444,0.2,3.0
50%,0.0,0.625,0.8125,0.714286,0.75,0.75,0.666667,0.666667,0.666667,0.666667,0.5,0.5,0.4,0.214286,0.6,0.5,0.666667,0.7,0.8,0.7,0.777778,0.666667,0.35,3.0
75%,0.0,0.75,0.875,0.857143,0.78125,0.875,0.888889,0.777778,0.777778,0.777778,0.6,0.6,0.6,0.285714,0.7,0.583333,0.805556,0.725,0.925,0.8,0.888889,0.888889,0.5,3.0
max,0.0,0.875,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.8,0.9,0.8,1.0,0.9,0.75,1.0,1.0,1.0,1.0,1.0,1.0,0.9,3.0


Cluster  4


Unnamed: 0,gender,attr3_1,sinc3_1,intel3_1,fun3_1,amb3_1,sports,tvsports,exercise,dining,museums,art,hiking,gaming,clubbing,reading,tv,theater,movies,concerts,music,shopping,yoga,cluster
count,29.0,29.0,29.0,29.0,29.0,29.0,29.0,29.0,29.0,29.0,29.0,29.0,29.0,29.0,29.0,29.0,29.0,29.0,29.0,29.0,29.0,29.0,29.0,29.0
mean,0.0,0.62931,0.784483,0.778325,0.715517,0.612069,0.758621,0.367816,0.601533,0.662835,0.658621,0.613793,0.748276,0.231527,0.52069,0.577586,0.252874,0.493103,0.668966,0.541379,0.597701,0.348659,0.303448,4.0
std,0.0,0.165301,0.173258,0.12996,0.159857,0.227429,0.139503,0.300064,0.204701,0.20027,0.168008,0.204807,0.163927,0.166046,0.249828,0.180842,0.205516,0.240433,0.173418,0.161505,0.195662,0.223654,0.236768,0.0
min,0.0,0.375,0.25,0.571429,0.375,0.125,0.444444,0.0,0.222222,0.111111,0.2,0.2,0.3,0.071429,0.1,0.083333,0.0,0.1,0.3,0.2,0.222222,0.0,0.1,4.0
25%,0.0,0.5,0.75,0.714286,0.625,0.375,0.777778,0.0,0.444444,0.555556,0.6,0.5,0.7,0.071429,0.3,0.5,0.111111,0.3,0.5,0.5,0.444444,0.111111,0.1,4.0
50%,0.0,0.625,0.75,0.857143,0.75,0.625,0.777778,0.333333,0.555556,0.666667,0.7,0.7,0.8,0.214286,0.6,0.666667,0.222222,0.5,0.7,0.6,0.555556,0.333333,0.2,4.0
75%,0.0,0.75,0.875,0.857143,0.75,0.75,0.888889,0.555556,0.777778,0.777778,0.8,0.7,0.9,0.357143,0.7,0.666667,0.444444,0.7,0.8,0.7,0.666667,0.555556,0.4,4.0
max,0.0,0.875,1.0,1.0,1.0,1.0,1.0,0.888889,0.888889,1.0,0.9,0.9,1.0,0.642857,0.9,1.0,0.666667,0.9,1.0,0.8,1.0,0.888889,1.0,4.0


Cluster  5


Unnamed: 0,gender,attr3_1,sinc3_1,intel3_1,fun3_1,amb3_1,sports,tvsports,exercise,dining,museums,art,hiking,gaming,clubbing,reading,tv,theater,movies,concerts,music,shopping,yoga,cluster
count,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0,20.0
mean,0.0,0.7625,0.6875,0.814286,0.8,0.75,0.472222,0.216667,0.677778,0.844444,0.69,0.65,0.3,0.146429,0.7,0.554167,0.588889,0.605,0.69,0.41,0.583333,0.805556,0.4,5.0
std,0.0,0.139901,0.227616,0.154419,0.130787,0.225803,0.285848,0.217717,0.233347,0.17807,0.257314,0.260566,0.202614,0.104863,0.174718,0.199185,0.239395,0.223548,0.197084,0.144732,0.215919,0.186883,0.242791,0.0
min,0.0,0.5,0.125,0.428571,0.5,0.25,0.0,0.0,0.111111,0.444444,0.3,0.2,0.1,0.071429,0.3,0.083333,0.111111,0.2,0.2,0.1,0.0,0.444444,0.1,5.0
25%,0.0,0.625,0.625,0.714286,0.75,0.625,0.222222,0.0,0.555556,0.75,0.475,0.5,0.175,0.071429,0.6,0.479167,0.444444,0.5,0.575,0.3,0.444444,0.666667,0.275,5.0
50%,0.0,0.75,0.6875,0.785714,0.75,0.75,0.444444,0.166667,0.666667,0.888889,0.75,0.65,0.3,0.107143,0.7,0.583333,0.555556,0.6,0.75,0.4,0.666667,0.777778,0.35,5.0
75%,0.0,0.875,0.875,1.0,0.875,0.90625,0.694444,0.333333,0.805556,1.0,0.9,0.825,0.325,0.160714,0.8,0.6875,0.777778,0.8,0.8,0.5,0.694444,1.0,0.45,5.0
max,0.0,1.0,1.0,1.0,1.0,1.0,1.0,0.777778,1.0,1.0,1.0,1.0,0.9,0.428571,0.9,0.75,1.0,1.0,1.0,0.6,0.888889,1.0,0.9,5.0


Cluster  6


Unnamed: 0,gender,attr3_1,sinc3_1,intel3_1,fun3_1,amb3_1,sports,tvsports,exercise,dining,museums,art,hiking,gaming,clubbing,reading,tv,theater,movies,concerts,music,shopping,yoga,cluster
count,26.0,26.0,26.0,26.0,26.0,26.0,26.0,26.0,26.0,26.0,26.0,26.0,26.0,26.0,26.0,26.0,26.0,26.0,26.0,26.0,26.0,26.0,26.0,26.0
mean,0.0,0.5625,0.745192,0.681319,0.668269,0.629808,0.145299,0.098291,0.145299,0.773504,0.826923,0.792308,0.492308,0.181319,0.538462,0.583333,0.405983,0.85,0.857692,0.757692,0.833333,0.495726,0.438462,6.0
std,0.0,0.194454,0.181937,0.186516,0.19982,0.233133,0.1626,0.118954,0.14663,0.250394,0.175631,0.193748,0.311028,0.178692,0.260886,0.169967,0.260943,0.127279,0.14744,0.157919,0.130526,0.328454,0.265446,0.0
min,0.0,0.125,0.375,0.285714,0.0,0.125,0.0,0.0,0.0,0.0,0.2,0.2,0.1,0.0,0.1,0.25,0.0,0.5,0.3,0.5,0.555556,0.0,0.1,6.0
25%,0.0,0.40625,0.625,0.571429,0.625,0.5,0.0,0.0,0.0,0.777778,0.8,0.7,0.2,0.071429,0.325,0.5,0.25,0.8,0.8,0.625,0.777778,0.25,0.2,6.0
50%,0.0,0.625,0.75,0.714286,0.75,0.625,0.111111,0.055556,0.111111,0.833333,0.9,0.8,0.45,0.071429,0.6,0.625,0.388889,0.9,0.9,0.8,0.777778,0.444444,0.5,6.0
75%,0.0,0.625,0.875,0.821429,0.75,0.75,0.222222,0.194444,0.222222,0.972222,0.9,0.9,0.7,0.321429,0.7,0.75,0.555556,0.9,0.975,0.875,0.972222,0.777778,0.6,6.0
max,0.0,0.875,1.0,1.0,0.875,1.0,0.555556,0.333333,0.444444,1.0,1.0,1.0,1.0,0.571429,1.0,0.75,1.0,1.0,1.0,1.0,1.0,1.0,1.0,6.0


Cluster  7


Unnamed: 0,gender,attr3_1,sinc3_1,intel3_1,fun3_1,amb3_1,sports,tvsports,exercise,dining,museums,art,hiking,gaming,clubbing,reading,tv,theater,movies,concerts,music,shopping,yoga,cluster
count,41.0,41.0,41.0,41.0,41.0,41.0,41.0,41.0,41.0,41.0,41.0,41.0,41.0,41.0,41.0,41.0,41.0,41.0,41.0,41.0,41.0,41.0,41.0,41.0
mean,0.0,0.704268,0.865854,0.787456,0.786585,0.753049,0.745257,0.647696,0.804878,0.878049,0.841463,0.814634,0.768293,0.407666,0.631707,0.605691,0.753388,0.865854,0.890244,0.84878,0.888889,0.753388,0.607317,7.0
std,0.0,0.149886,0.12308,0.169399,0.177315,0.188514,0.208232,0.187417,0.173526,0.104836,0.109489,0.127595,0.188996,0.184366,0.262144,0.125034,0.165761,0.106324,0.111366,0.130618,0.140546,0.17658,0.225156,0.0
min,0.0,0.125,0.625,0.0,0.375,0.125,0.222222,0.111111,0.444444,0.666667,0.5,0.5,0.1,0.071429,0.1,0.25,0.333333,0.7,0.5,0.6,0.555556,0.333333,0.1,7.0
25%,0.0,0.625,0.75,0.714286,0.625,0.625,0.666667,0.444444,0.666667,0.777778,0.8,0.7,0.7,0.357143,0.5,0.5,0.666667,0.8,0.8,0.8,0.777778,0.666667,0.5,7.0
50%,0.0,0.75,0.875,0.857143,0.875,0.75,0.777778,0.666667,0.777778,0.888889,0.8,0.8,0.8,0.428571,0.7,0.666667,0.777778,0.9,0.9,0.9,1.0,0.777778,0.6,7.0
75%,0.0,0.75,1.0,0.857143,0.875,0.875,0.888889,0.777778,1.0,1.0,0.9,0.9,0.9,0.5,0.8,0.666667,0.888889,1.0,1.0,1.0,1.0,0.888889,0.8,7.0
max,0.0,0.875,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.75,1.0,1.0,1.0,1.0,1.0,1.0,1.0,7.0


In [11]:
for i in range(0, 8):
    print 'Cluster ', str(i)
    display(men_to_cluster[men_to_cluster['cluster'] == i].describe())

Cluster  0


Unnamed: 0,gender,attr3_1,sinc3_1,intel3_1,fun3_1,amb3_1,sports,tvsports,exercise,dining,museums,art,hiking,gaming,clubbing,reading,tv,theater,movies,concerts,music,shopping,yoga,cluster
count,39.0,39.0,39.0,39.0,39.0,39.0,39.0,39.0,39.0,39.0,39.0,39.0,39.0,39.0,39.0,39.0,39.0,39.0,39.0,39.0,39.0,39.0,39.0,39.0
mean,1.0,0.689103,0.737179,0.791209,0.807692,0.785256,0.863248,0.749288,0.709402,0.666667,0.44359,0.394872,0.410256,0.324176,0.641026,0.429487,0.51567,0.417949,0.746154,0.648718,0.783476,0.492877,0.346154,0.0
std,0.0,0.143017,0.145646,0.142278,0.114155,0.162091,0.142978,0.229016,0.1918,0.21327,0.129361,0.139451,0.222184,0.157046,0.242485,0.188812,0.238426,0.194492,0.187569,0.195841,0.165097,0.25208,0.257361,0.0
min,1.0,0.375,0.375,0.571429,0.625,0.375,0.555556,0.0,0.222222,0.111111,0.2,0.2,0.0,0.071429,0.1,0.0,0.0,0.1,0.2,0.2,0.444444,0.0,0.1,0.0
25%,1.0,0.625,0.625,0.714286,0.75,0.625,0.777778,0.666667,0.555556,0.555556,0.3,0.3,0.25,0.214286,0.55,0.333333,0.333333,0.3,0.65,0.55,0.666667,0.333333,0.1,0.0
50%,1.0,0.625,0.75,0.857143,0.75,0.75,0.888889,0.777778,0.666667,0.777778,0.4,0.4,0.4,0.357143,0.7,0.5,0.555556,0.4,0.7,0.7,0.777778,0.444444,0.3,0.0
75%,1.0,0.75,0.875,0.857143,0.875,0.875,1.0,0.888889,0.833333,0.777778,0.5,0.45,0.6,0.428571,0.8,0.583333,0.666667,0.5,0.9,0.8,0.944444,0.666667,0.5,0.0
max,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.8,0.8,1.0,0.571429,1.0,0.666667,1.0,0.8,1.0,1.0,1.0,1.0,1.0,0.0


Cluster  1


Unnamed: 0,gender,attr3_1,sinc3_1,intel3_1,fun3_1,amb3_1,sports,tvsports,exercise,dining,museums,art,hiking,gaming,clubbing,reading,tv,theater,movies,concerts,music,shopping,yoga,cluster
count,34.0,34.0,34.0,34.0,34.0,34.0,34.0,34.0,34.0,34.0,34.0,34.0,34.0,34.0,34.0,34.0,34.0,34.0,34.0,34.0,34.0,34.0,34.0,34.0
mean,1.0,0.5,0.724265,0.710084,0.569853,0.544118,0.287582,0.169935,0.352941,0.676471,0.608824,0.558824,0.435294,0.32563,0.411765,0.580882,0.660131,0.647059,0.841176,0.626471,0.696078,0.519608,0.302941,1.0
std,0.0,0.171336,0.184071,0.177544,0.211397,0.273312,0.181686,0.183973,0.21271,0.212529,0.179844,0.234995,0.242315,0.187084,0.25198,0.166016,0.14459,0.163735,0.104787,0.197421,0.197835,0.214897,0.211037,0.0
min,1.0,0.0,0.25,0.142857,0.125,0.0,0.0,0.0,0.111111,0.0,0.2,0.1,0.1,0.071429,0.1,0.166667,0.444444,0.4,0.6,0.2,0.333333,0.111111,0.0,1.0
25%,1.0,0.375,0.625,0.571429,0.5,0.375,0.111111,0.0,0.138889,0.555556,0.5,0.325,0.3,0.142857,0.2,0.4375,0.555556,0.5,0.8,0.5,0.555556,0.444444,0.125,1.0
50%,1.0,0.5625,0.75,0.714286,0.625,0.5,0.222222,0.111111,0.333333,0.666667,0.6,0.5,0.35,0.357143,0.4,0.583333,0.666667,0.65,0.85,0.6,0.777778,0.555556,0.2,1.0
75%,1.0,0.625,0.875,0.857143,0.75,0.75,0.444444,0.305556,0.444444,0.777778,0.7,0.7,0.6,0.428571,0.6,0.666667,0.777778,0.775,0.9,0.8,0.861111,0.666667,0.475,1.0
max,1.0,0.75,1.0,1.0,1.0,1.0,0.666667,0.666667,0.888889,1.0,1.0,1.0,1.0,0.714286,0.9,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.8,1.0


Cluster  2


Unnamed: 0,gender,attr3_1,sinc3_1,intel3_1,fun3_1,amb3_1,sports,tvsports,exercise,dining,museums,art,hiking,gaming,clubbing,reading,tv,theater,movies,concerts,music,shopping,yoga,cluster
count,31.0,31.0,31.0,31.0,31.0,31.0,31.0,31.0,31.0,31.0,31.0,31.0,31.0,31.0,31.0,31.0,31.0,31.0,31.0,31.0,31.0,31.0,31.0,31.0
mean,1.0,0.701613,0.705645,0.797235,0.717742,0.826613,0.677419,0.150538,0.544803,0.802867,0.841935,0.874194,0.703226,0.322581,0.63871,0.577957,0.27957,0.73871,0.822581,0.819355,0.860215,0.594982,0.493548,2.0
std,0.0,0.146762,0.2202,0.176026,0.185314,0.169796,0.244031,0.139217,0.28161,0.183046,0.108855,0.106357,0.224327,0.206914,0.233349,0.128984,0.218244,0.178283,0.164742,0.157944,0.206813,0.20397,0.267002,0.0
min,1.0,0.375,0.0,0.285714,0.125,0.375,0.111111,0.0,0.0,0.333333,0.6,0.6,0.1,0.071429,0.1,0.333333,0.0,0.3,0.3,0.5,0.0,0.222222,0.1,2.0
25%,1.0,0.625,0.625,0.714286,0.625,0.75,0.555556,0.0,0.333333,0.666667,0.8,0.8,0.6,0.142857,0.5,0.5,0.111111,0.6,0.7,0.7,0.833333,0.444444,0.3,2.0
50%,1.0,0.75,0.75,0.857143,0.75,0.875,0.777778,0.111111,0.444444,0.888889,0.9,0.9,0.7,0.285714,0.7,0.583333,0.333333,0.7,0.9,0.8,0.888889,0.666667,0.5,2.0
75%,1.0,0.75,0.875,0.857143,0.875,1.0,0.888889,0.222222,0.777778,0.888889,0.9,1.0,0.85,0.428571,0.8,0.666667,0.444444,0.9,0.9,0.95,1.0,0.666667,0.7,2.0
max,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.555556,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.75,0.777778,1.0,1.0,1.0,1.0,1.0,1.0,2.0


Cluster  3


Unnamed: 0,gender,attr3_1,sinc3_1,intel3_1,fun3_1,amb3_1,sports,tvsports,exercise,dining,museums,art,hiking,gaming,clubbing,reading,tv,theater,movies,concerts,music,shopping,yoga,cluster
count,17.0,17.0,17.0,17.0,17.0,17.0,17.0,17.0,17.0,17.0,17.0,17.0,17.0,17.0,17.0,17.0,17.0,17.0,17.0,17.0,17.0,17.0,17.0,17.0
mean,1.0,0.522059,0.713235,0.739496,0.477941,0.617647,0.705882,0.431373,0.542484,0.522876,0.305882,0.252941,0.570588,0.273109,0.411765,0.446078,0.339869,0.294118,0.582353,0.317647,0.48366,0.156863,0.223529,3.0
std,0.0,0.154587,0.228637,0.215682,0.213029,0.288011,0.211335,0.241786,0.248086,0.234931,0.124853,0.117886,0.225734,0.147762,0.214716,0.134826,0.249909,0.139062,0.203824,0.138,0.23551,0.136416,0.185504,0.0
min,1.0,0.25,0.25,0.285714,0.0,0.0,0.444444,0.0,0.111111,0.0,0.1,0.1,0.1,0.071429,0.1,0.166667,0.0,0.1,0.2,0.1,0.111111,0.0,0.1,3.0
25%,1.0,0.375,0.625,0.714286,0.375,0.5,0.555556,0.333333,0.444444,0.444444,0.3,0.2,0.5,0.214286,0.3,0.333333,0.111111,0.2,0.4,0.2,0.333333,0.111111,0.1,3.0
50%,1.0,0.5,0.75,0.714286,0.5,0.625,0.666667,0.444444,0.555556,0.444444,0.3,0.3,0.6,0.214286,0.4,0.416667,0.222222,0.3,0.6,0.3,0.444444,0.111111,0.1,3.0
75%,1.0,0.625,0.875,0.857143,0.625,0.875,0.888889,0.555556,0.666667,0.666667,0.4,0.3,0.7,0.357143,0.5,0.583333,0.555556,0.4,0.8,0.4,0.666667,0.222222,0.3,3.0
max,1.0,0.75,1.0,1.0,0.75,1.0,1.0,0.777778,1.0,0.888889,0.5,0.5,1.0,0.714286,0.9,0.666667,0.888889,0.5,0.9,0.6,0.888889,0.555556,0.6,3.0


Cluster  4


Unnamed: 0,gender,attr3_1,sinc3_1,intel3_1,fun3_1,amb3_1,sports,tvsports,exercise,dining,museums,art,hiking,gaming,clubbing,reading,tv,theater,movies,concerts,music,shopping,yoga,cluster
count,39.0,39.0,39.0,39.0,39.0,39.0,39.0,39.0,39.0,39.0,39.0,39.0,39.0,39.0,39.0,39.0,39.0,39.0,39.0,39.0,39.0,39.0,39.0,39.0
mean,1.0,0.708333,0.858974,0.85348,0.798077,0.766026,0.74359,0.754986,0.692308,0.871795,0.841026,0.802564,0.579487,0.434066,0.646154,0.566239,0.689459,0.784615,0.882051,0.810256,0.843305,0.669516,0.533333,4.0
std,0.0,0.147159,0.125589,0.124743,0.153403,0.184024,0.223903,0.201004,0.20624,0.143153,0.114059,0.138578,0.2515,0.218249,0.245856,0.171188,0.213543,0.156505,0.11209,0.160255,0.141279,0.245144,0.262912,0.0
min,1.0,0.375,0.625,0.571429,0.375,0.125,0.222222,0.333333,0.222222,0.444444,0.5,0.5,0.1,0.071429,0.1,0.083333,0.222222,0.3,0.5,0.2,0.444444,0.0,0.1,4.0
25%,1.0,0.625,0.75,0.714286,0.75,0.625,0.611111,0.611111,0.555556,0.777778,0.8,0.7,0.4,0.321429,0.5,0.5,0.555556,0.7,0.8,0.75,0.777778,0.555556,0.35,4.0
50%,1.0,0.625,0.875,0.857143,0.75,0.75,0.777778,0.777778,0.666667,0.888889,0.8,0.8,0.6,0.428571,0.7,0.583333,0.666667,0.8,0.9,0.8,0.888889,0.666667,0.5,4.0
75%,1.0,0.75,1.0,1.0,0.875,0.875,0.944444,0.944444,0.777778,1.0,0.9,0.9,0.8,0.571429,0.8,0.666667,0.777778,0.9,1.0,0.9,1.0,0.833333,0.7,4.0
max,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.75,1.0,1.0,1.0,1.0,1.0,1.0,1.0,4.0


Cluster  5


Unnamed: 0,gender,attr3_1,sinc3_1,intel3_1,fun3_1,amb3_1,sports,tvsports,exercise,dining,museums,art,hiking,gaming,clubbing,reading,tv,theater,movies,concerts,music,shopping,yoga,cluster
count,28.0,28.0,28.0,28.0,28.0,28.0,28.0,28.0,28.0,28.0,28.0,28.0,28.0,28.0,28.0,28.0,28.0,28.0,28.0,28.0,28.0,28.0,28.0,28.0
mean,1.0,0.477679,0.763393,0.709184,0.602679,0.433036,0.757937,0.678571,0.373016,0.611111,0.703571,0.671429,0.585714,0.329082,0.489286,0.53869,0.424603,0.560714,0.717857,0.710714,0.718254,0.234127,0.25,5.0
std,0.0,0.217981,0.15343,0.152986,0.204276,0.251114,0.17386,0.189672,0.267444,0.208419,0.157485,0.127242,0.27041,0.173161,0.209655,0.136991,0.23629,0.202465,0.178582,0.170705,0.182655,0.205857,0.189541,0.0
min,1.0,0.125,0.5,0.428571,0.125,0.0,0.333333,0.222222,0.0,0.111111,0.3,0.4,0.1,0.071429,0.1,0.166667,0.0,0.1,0.2,0.4,0.444444,0.0,0.1,5.0
25%,1.0,0.25,0.625,0.571429,0.5,0.34375,0.666667,0.555556,0.194444,0.555556,0.6,0.6,0.4,0.196429,0.375,0.479167,0.222222,0.475,0.7,0.6,0.555556,0.111111,0.1,5.0
50%,1.0,0.5,0.75,0.714286,0.625,0.375,0.777778,0.666667,0.388889,0.611111,0.7,0.7,0.6,0.357143,0.45,0.541667,0.444444,0.6,0.75,0.7,0.666667,0.111111,0.2,5.0
75%,1.0,0.625,0.875,0.857143,0.75,0.625,0.888889,0.888889,0.555556,0.777778,0.8,0.8,0.8,0.5,0.625,0.666667,0.583333,0.7,0.8,0.8,0.888889,0.333333,0.3,5.0
max,1.0,0.75,1.0,1.0,1.0,0.875,1.0,0.888889,1.0,1.0,1.0,0.9,1.0,0.642857,0.9,0.75,0.777778,1.0,1.0,1.0,1.0,0.777778,0.8,5.0


Cluster  6


Unnamed: 0,gender,attr3_1,sinc3_1,intel3_1,fun3_1,amb3_1,sports,tvsports,exercise,dining,museums,art,hiking,gaming,clubbing,reading,tv,theater,movies,concerts,music,shopping,yoga,cluster
count,39.0,39.0,39.0,39.0,39.0,39.0,39.0,39.0,39.0,39.0,39.0,39.0,39.0,39.0,39.0,39.0,39.0,39.0,39.0,39.0,39.0,39.0,39.0,39.0
mean,1.0,0.573718,0.75641,0.772894,0.586538,0.63141,0.498575,0.148148,0.498575,0.678063,0.697436,0.65641,0.558974,0.159341,0.489744,0.570513,0.108262,0.646154,0.761538,0.682051,0.754986,0.139601,0.392308,6.0
std,0.0,0.13966,0.20867,0.141793,0.201054,0.210632,0.244327,0.15365,0.275572,0.181673,0.144162,0.202348,0.251017,0.112893,0.267341,0.171571,0.13842,0.163588,0.153238,0.216274,0.187628,0.172355,0.320677,0.0
min,1.0,0.25,0.125,0.571429,0.0,0.125,0.0,0.0,0.111111,0.333333,0.4,0.2,0.1,0.071429,0.1,0.083333,0.0,0.3,0.5,0.2,0.333333,0.0,0.1,6.0
25%,1.0,0.5,0.625,0.714286,0.5,0.5,0.333333,0.0,0.333333,0.555556,0.6,0.5,0.4,0.071429,0.3,0.5,0.0,0.5,0.65,0.6,0.666667,0.0,0.1,6.0
50%,1.0,0.625,0.75,0.714286,0.625,0.625,0.555556,0.111111,0.444444,0.666667,0.7,0.7,0.6,0.142857,0.5,0.583333,0.111111,0.7,0.8,0.7,0.777778,0.111111,0.2,6.0
75%,1.0,0.625,0.875,0.857143,0.75,0.75,0.666667,0.222222,0.666667,0.777778,0.8,0.8,0.8,0.214286,0.7,0.666667,0.166667,0.75,0.9,0.85,0.888889,0.222222,0.7,6.0
max,1.0,0.75,1.0,1.0,0.875,1.0,1.0,0.555556,1.0,1.0,1.0,1.0,1.0,0.571429,0.9,0.75,0.444444,1.0,1.0,1.0,1.0,0.666667,1.0,6.0


Cluster  7


Unnamed: 0,gender,attr3_1,sinc3_1,intel3_1,fun3_1,amb3_1,sports,tvsports,exercise,dining,museums,art,hiking,gaming,clubbing,reading,tv,theater,movies,concerts,music,shopping,yoga,cluster
count,46.0,46.0,46.0,46.0,46.0,46.0,46.0,46.0,46.0,46.0,46.0,46.0,46.0,46.0,46.0,46.0,46.0,46.0,46.0,46.0,46.0,46.0,46.0,46.0
mean,1.0,0.665761,0.793478,0.819876,0.774457,0.793478,0.804348,0.415459,0.748792,0.746377,0.634783,0.597826,0.641304,0.343168,0.641304,0.523551,0.408213,0.602174,0.706522,0.515217,0.707729,0.384058,0.371739,7.0
std,0.0,0.144468,0.149577,0.129309,0.140978,0.142442,0.15919,0.261299,0.154378,0.151196,0.140186,0.16798,0.225661,0.164696,0.196159,0.180588,0.221052,0.135793,0.165196,0.161888,0.181985,0.199734,0.227706,0.0
min,1.0,0.375,0.5,0.571429,0.5,0.375,0.333333,0.0,0.444444,0.333333,0.4,0.1,0.1,0.071429,0.1,0.083333,0.0,0.3,0.3,0.2,0.333333,0.0,0.1,7.0
25%,1.0,0.625,0.65625,0.714286,0.625,0.75,0.666667,0.111111,0.666667,0.666667,0.5,0.5,0.5,0.214286,0.6,0.416667,0.222222,0.5,0.6,0.4,0.555556,0.222222,0.125,7.0
50%,1.0,0.625,0.75,0.857143,0.75,0.8125,0.888889,0.444444,0.777778,0.777778,0.6,0.6,0.7,0.357143,0.65,0.583333,0.444444,0.6,0.7,0.5,0.777778,0.333333,0.4,7.0
75%,1.0,0.75,0.875,0.964286,0.875,0.875,0.888889,0.666667,0.888889,0.888889,0.7,0.7,0.8,0.5,0.8,0.666667,0.555556,0.7,0.8,0.6,0.777778,0.527778,0.5,7.0
max,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.888889,1.0,1.0,0.9,1.0,1.0,0.642857,0.9,1.0,0.888889,0.9,1.0,0.8,1.0,0.777778,0.9,7.0


<h3>Stats and Frequency Charts for Females</h3>

In [12]:
#fc.dating_attributes_vs_time(data = unique, gender = 0)

<h3>Stats and Frequency Charts for Males</h3>

In [13]:
#fc.dating_attributes_vs_time(data = unique, gender = 1)

<h2>Create Matched People DataFrame</h2>

In [14]:
people_matched = data[data['match'] == 1].copy()
people_matched.drop_duplicates(subset = 'iid', keep = 'first', inplace = True)
#display(people_matched)

<h2>Exploring Matches</h2>

In [15]:
#people_matched[['iid', 'gender', 'dec'] + fc.features_of_attraction + fc.preferences_of_attraction + ['dec_o', 'pid', 'goal', 'int_corr', 'match']]

<h2>Get Index for 'iid' for non-matches</h2>

In [16]:
number = [int(i) for i in people_matched['iid']]
not_ever_matched = [i for i in range(1,553) if i not in number]
print not_ever_matched

[3, 11, 12, 21, 24, 25, 26, 32, 33, 40, 41, 42, 54, 59, 65, 68, 72, 73, 88, 96, 101, 111, 118, 121, 123, 124, 131, 133, 139, 143, 145, 158, 170, 177, 182, 189, 198, 203, 204, 209, 216, 222, 234, 236, 247, 249, 254, 255, 257, 262, 267, 272, 278, 286, 287, 295, 298, 302, 314, 318, 320, 321, 327, 329, 331, 334, 347, 405, 418, 425, 427, 430, 440, 443, 444, 451, 454, 455, 457, 459, 461, 463, 465, 466, 477, 479, 483, 487, 497, 498, 502, 503, 506, 514, 517, 519, 520, 525, 527, 528, 543]


In [17]:
people_not_matched = data[data['iid'].isin(not_ever_matched)].copy()

<h2>Exploring Non-Matches</h2>

In [18]:
#people_not_matched[['iid', 'gender', 'dec'] + fc.features_of_attraction + fc.preferences_of_attraction + ['dec_o', 'pid', 'goal', 'int_corr', 'match']]

<h2>Non-Matched Females: Graphs</h2>

In [19]:
#fc.dating_attributes_vs_time(data = people_not_matched, gender = 0)

<h2>Non-Matched Males: Graphs</h2>

In [20]:
#fc.dating_attributes_vs_time(data = people_not_matched, gender = 1)

<h1>Features</h1>

In [21]:
for i, j in fc.data_cleaner.iteritems():
    print i, j, '\n'
for i, j in fc.master_list.items():
    print i, j, '\n'
print 'clean_up_1', '\n', fc.clean_up_1, '\n'
print 'clean_up_2', '\n', fc.clean_up_2, '\n'
print 'clean_up_3', '\n', fc.clean_up_3, '\n'
print 'clean_up_4', '\n', fc.clean_up_4, '\n'
print 'clean_up_5', '\n', fc.clean_up_5, '\n'
print 'list_of_lists', '\n', fc.list_of_lists, '\n'
print 'all columns in dataset', '\n'
for i in data.keys():
    print i,

first_round ['attr1_1', 'sinc1_1', 'intel1_1', 'fun1_1', 'amb1_1', 'shar1_1', 'attr2_1', 'sinc2_1', 'intel2_1', 'fun2_1', 'amb2_1', 'shar2_1', 'attr3_1', 'sinc3_1', 'intel3_1', 'fun3_1', 'amb3_1', 'attr4_1', 'sinc4_1', 'intel4_1', 'fun4_1', 'amb4_1', 'shar4_1', 'attr5_1', 'sinc5_1', 'intel5_1', 'fun5_1', 'amb5_1'] 

second_round ['attr1_2', 'sinc1_2', 'intel1_2', 'fun1_2', 'amb1_2', 'shar1_2', 'attr2_2', 'sinc2_2', 'intel2_2', 'fun2_2', 'amb2_2', 'shar2_2', 'attr3_2', 'sinc3_2', 'intel3_2', 'fun3_2', 'amb3_2', 'attr4_2', 'sinc4_2', 'intel4_2', 'fun4_2', 'amb4_2', 'shar4_2', 'attr5_2', 'sinc5_2', 'intel5_2', 'fun5_2', 'amb5_2'] 

third_round ['attr1_3', 'sinc1_3', 'intel1_3', 'fun1_3', 'amb1_3', 'shar1_3', 'attr2_3', 'sinc2_3', 'intel2_3', 'fun2_3', 'amb2_3', 'shar2_3', 'attr3_3', 'sinc3_3', 'intel3_3', 'fun3_3', 'amb3_3', 'attr4_3', 'sinc4_3', 'intel4_3', 'fun4_3', 'amb4_3', 'shar4_3', 'attr5_3', 'sinc5_3', 'intel5_3', 'fun5_3', 'amb5_3'] 

how_you_measure_attr ['attr3_1', 'attr3_2', '