# Problem Set 1: Titanic Survivor Data

## Predict whether or not the passengers survived or perished.

You can read more about the Titanic and specifics about this dataset at:

http://en.wikipedia.org/wiki/RMS_Titanic

http://www.kaggle.com/c/titanic-gettingStarted
        
More information about the data can be seen at:

http://www.kaggle.com/c/titanic-gettingStarted/data

https://s3.amazonaws.com/content.udacity-data.com/courses/ud359/titanic_data.csv

### Predictions in 'predictions' dictionary
The key of the dictionary is the passenger's id (which can be accessed via passenger["PassengerId"]) and the associated value should be 1 if the passenger survied or 0 otherwise.

For example, if a passenger is predicted to have survived:

    passenger_id = passenger['PassengerId']
    predictions[passenger_id] = 1

And if a passenger is predicted to have perished in the disaster:

    passenger_id = passenger['PassengerId']
    predictions[passenger_id] = 0

In [20]:
import numpy as np
import pandas as pd
import statsmodels.api as sm

import os

### Simple Heuristic used to predict if that person survived the Titanic disaster (78.68% accurate)
1) If the passenger is female, your heuristic should assume that the passenger survived.

2) If the passenger is male, you heuristic should assume that the passenger did not survive.

In [6]:
def simple_heuristic(file_path):

    predictions = {}
    df = pd.read_csv(file_path)
    print (df)
    
    for passenger_index, passenger in df.iterrows():
        passenger_id = passenger['PassengerId']
        
        if passenger['Sex'] == 'female':
            predictions[passenger_id] = 1
        else:
            # male, died 
            predictions[passenger_id] = 0
        
    return predictions

In [7]:
file = os.getcwd() + '/test.csv'

pred_results_simple = simple_heuristic(file)

     PassengerId  Pclass                                               Name  \
0            892       3                                   Kelly, Mr. James   
1            893       3                   Wilkes, Mrs. James (Ellen Needs)   
2            894       2                          Myles, Mr. Thomas Francis   
3            895       3                                   Wirz, Mr. Albert   
4            896       3       Hirvonen, Mrs. Alexander (Helga E Lindqvist)   
5            897       3                         Svensson, Mr. Johan Cervin   
6            898       3                               Connolly, Miss. Kate   
7            899       2                       Caldwell, Mr. Albert Francis   
8            900       3          Abrahim, Mrs. Joseph (Sophie Halaut Easu)   
9            901       3                            Davies, Mr. John Samuel   
10           902       3                                   Ilieff, Mr. Ylio   
11           903       1                         Jon

In [8]:
print (pred_results_simple)

{892: 0, 893: 1, 894: 0, 895: 0, 896: 1, 897: 0, 898: 1, 899: 0, 900: 1, 901: 0, 902: 0, 903: 0, 904: 1, 905: 0, 906: 1, 907: 1, 908: 0, 909: 0, 910: 1, 911: 1, 912: 0, 913: 0, 914: 1, 915: 0, 916: 1, 917: 0, 918: 1, 919: 0, 920: 0, 921: 0, 922: 0, 923: 0, 924: 1, 925: 1, 926: 0, 927: 0, 928: 1, 929: 1, 930: 0, 931: 0, 932: 0, 933: 0, 934: 0, 935: 1, 936: 1, 937: 0, 938: 0, 939: 0, 940: 1, 941: 1, 942: 0, 943: 0, 944: 1, 945: 1, 946: 0, 947: 0, 948: 0, 949: 0, 950: 0, 951: 1, 952: 0, 953: 0, 954: 0, 955: 1, 956: 0, 957: 1, 958: 1, 959: 0, 960: 0, 961: 1, 962: 1, 963: 0, 964: 1, 965: 0, 966: 1, 967: 0, 968: 0, 969: 1, 970: 0, 971: 1, 972: 0, 973: 0, 974: 0, 975: 0, 976: 0, 977: 0, 978: 1, 979: 1, 980: 1, 981: 0, 982: 1, 983: 0, 984: 1, 985: 0, 986: 0, 987: 0, 988: 1, 989: 0, 990: 1, 991: 0, 992: 1, 993: 0, 994: 0, 995: 0, 996: 1, 997: 0, 998: 0, 999: 0, 1000: 0, 1001: 0, 1002: 0, 1003: 1, 1004: 1, 1005: 1, 1006: 1, 1007: 0, 1008: 0, 1009: 1, 1010: 0, 1011: 1, 1012: 1, 1013: 0, 1014: 1, 

### A more Complex Heuristic (79,12% accurate)
Predict the passenger survived if:

1) If the passenger is female OR

2) if his/her socioeconomic status is high AND if the passenger is under 18

Otherwise, died

In [9]:
def complex_heuristic(file_path):
    
    predictions = {}
    df = pd.read_csv(file_path)
    
    for passenger_index, passenger in df.iterrows():
        passenger_id = passenger['PassengerId']
        
        if passenger['Sex'] == 'female' or (passenger['Pclass'] == 1 and passenger['Age'] < 18):
            predictions[passenger_id] = 1
        else:
            predictions[passenger_id] = 0
        
    return predictions

In [10]:
pred_results_complex = complex_heuristic(file)
print (pred_results_complex)

{892: 0, 893: 1, 894: 0, 895: 0, 896: 1, 897: 0, 898: 1, 899: 0, 900: 1, 901: 0, 902: 0, 903: 0, 904: 1, 905: 0, 906: 1, 907: 1, 908: 0, 909: 0, 910: 1, 911: 1, 912: 0, 913: 0, 914: 1, 915: 0, 916: 1, 917: 0, 918: 1, 919: 0, 920: 0, 921: 0, 922: 0, 923: 0, 924: 1, 925: 1, 926: 0, 927: 0, 928: 1, 929: 1, 930: 0, 931: 0, 932: 0, 933: 0, 934: 0, 935: 1, 936: 1, 937: 0, 938: 0, 939: 0, 940: 1, 941: 1, 942: 0, 943: 0, 944: 1, 945: 1, 946: 0, 947: 0, 948: 0, 949: 0, 950: 0, 951: 1, 952: 0, 953: 0, 954: 0, 955: 1, 956: 1, 957: 1, 958: 1, 959: 0, 960: 0, 961: 1, 962: 1, 963: 0, 964: 1, 965: 0, 966: 1, 967: 0, 968: 0, 969: 1, 970: 0, 971: 1, 972: 0, 973: 0, 974: 0, 975: 0, 976: 0, 977: 0, 978: 1, 979: 1, 980: 1, 981: 0, 982: 1, 983: 0, 984: 1, 985: 0, 986: 0, 987: 0, 988: 1, 989: 0, 990: 1, 991: 0, 992: 1, 993: 0, 994: 0, 995: 0, 996: 1, 997: 0, 998: 0, 999: 0, 1000: 0, 1001: 0, 1002: 0, 1003: 1, 1004: 1, 1005: 1, 1006: 1, 1007: 0, 1008: 0, 1009: 1, 1010: 0, 1011: 1, 1012: 1, 1013: 0, 1014: 1, 

In [11]:
def dicts_get_diff(dict1, dict2):
    diffkeys = [k for k in dict1 if dict1[k] != dict2[k]]
    for k in diffkeys:
      print ('Key', k, '- Dict1:', dict1[k], '-> Dict2:', dict2[k])

In [21]:
dicts_get_diff(pred_results_simple,pred_results_complex)

Key 956 - Dict1: 0 -> Dict2: 1
Key 1088 - Dict1: 0 -> Dict2: 1
Key 1295 - Dict1: 0 -> Dict2: 1


In [22]:
def custom_heuristic(file_path):
    
    predictions = {}
    df = pd.read_csv(file_path)
    
    for passenger_index, passenger in df.iterrows():
        passenger_id = passenger['PassengerId']
        
        if (passenger['Sex'] == 'female' and passenger['Pclass'] < 3) or (passenger['Pclass'] == 1 and passenger['Age'] < 18) or (passenger['Pclass'] < 3  and passenger['Age'] < 10):
            predictions[passenger_id] = 1
        else:
            predictions[passenger_id] = 0
        
    return predictions

In [23]:
df = pd.read_csv(file)

for passenger_index, passenger in df.iterrows():
    print (pd.isnull(passenger['Age']))

False
False
False
False
False
False
False
False
False
False
True
False
False
False
False
False
False
False
False
False
False
False
True
False
False
False
False
False
False
True
False
False
False
True
False
False
True
False
False
True
False
True
False
False
False
False
False
True
False
False
False
False
False
False
True
False
False
False
True
False
False
False
False
False
False
True
False
False
False
False
False
False
False
False
False
False
True
False
False
False
False
False
False
True
True
True
False
False
True
False
False
True
False
True
False
False
False
False
False
False
False
False
True
False
False
False
False
True
True
False
False
True
False
False
False
False
True
False
False
False
False
True
False
False
True
False
False
True
False
False
False
False
True
True
False
False
False
False
False
False
False
False
False
False
False
False
True
False
True
False
False
True
False
False
False
False
False
False
False
False
True
False
False
True
False
False
False
False
True
False
True
False
Fal

In [24]:
pred_results_custom = custom_heuristic(file)
print (pred_results_custom)

{892: 0, 893: 0, 894: 0, 895: 0, 896: 0, 897: 0, 898: 0, 899: 0, 900: 0, 901: 0, 902: 0, 903: 0, 904: 1, 905: 0, 906: 1, 907: 1, 908: 0, 909: 0, 910: 0, 911: 0, 912: 0, 913: 0, 914: 1, 915: 0, 916: 1, 917: 0, 918: 1, 919: 0, 920: 0, 921: 0, 922: 0, 923: 0, 924: 0, 925: 0, 926: 0, 927: 0, 928: 0, 929: 0, 930: 0, 931: 0, 932: 0, 933: 0, 934: 0, 935: 1, 936: 1, 937: 0, 938: 0, 939: 0, 940: 1, 941: 0, 942: 0, 943: 0, 944: 1, 945: 1, 946: 0, 947: 0, 948: 0, 949: 0, 950: 0, 951: 1, 952: 0, 953: 0, 954: 0, 955: 0, 956: 1, 957: 1, 958: 0, 959: 0, 960: 0, 961: 1, 962: 0, 963: 0, 964: 0, 965: 0, 966: 1, 967: 0, 968: 0, 969: 1, 970: 0, 971: 0, 972: 0, 973: 0, 974: 0, 975: 0, 976: 0, 977: 0, 978: 0, 979: 0, 980: 0, 981: 1, 982: 0, 983: 0, 984: 1, 985: 0, 986: 0, 987: 0, 988: 1, 989: 0, 990: 0, 991: 0, 992: 1, 993: 0, 994: 0, 995: 0, 996: 0, 997: 0, 998: 0, 999: 0, 1000: 0, 1001: 0, 1002: 0, 1003: 0, 1004: 1, 1005: 0, 1006: 1, 1007: 0, 1008: 0, 1009: 0, 1010: 0, 1011: 1, 1012: 1, 1013: 0, 1014: 1, 

In [25]:
dicts_get_diff(pred_results_complex,pred_results_custom)

Key 893 - Dict1: 1 -> Dict2: 0
Key 896 - Dict1: 1 -> Dict2: 0
Key 898 - Dict1: 1 -> Dict2: 0
Key 900 - Dict1: 1 -> Dict2: 0
Key 910 - Dict1: 1 -> Dict2: 0
Key 911 - Dict1: 1 -> Dict2: 0
Key 924 - Dict1: 1 -> Dict2: 0
Key 925 - Dict1: 1 -> Dict2: 0
Key 928 - Dict1: 1 -> Dict2: 0
Key 929 - Dict1: 1 -> Dict2: 0
Key 941 - Dict1: 1 -> Dict2: 0
Key 955 - Dict1: 1 -> Dict2: 0
Key 958 - Dict1: 1 -> Dict2: 0
Key 962 - Dict1: 1 -> Dict2: 0
Key 964 - Dict1: 1 -> Dict2: 0
Key 971 - Dict1: 1 -> Dict2: 0
Key 978 - Dict1: 1 -> Dict2: 0
Key 979 - Dict1: 1 -> Dict2: 0
Key 980 - Dict1: 1 -> Dict2: 0
Key 981 - Dict1: 0 -> Dict2: 1
Key 982 - Dict1: 1 -> Dict2: 0
Key 990 - Dict1: 1 -> Dict2: 0
Key 996 - Dict1: 1 -> Dict2: 0
Key 1003 - Dict1: 1 -> Dict2: 0
Key 1005 - Dict1: 1 -> Dict2: 0
Key 1009 - Dict1: 1 -> Dict2: 0
Key 1017 - Dict1: 1 -> Dict2: 0
Key 1019 - Dict1: 1 -> Dict2: 0
Key 1024 - Dict1: 1 -> Dict2: 0
Key 1030 - Dict1: 1 -> Dict2: 0
Key 1032 - Dict1: 1 -> Dict2: 0
Key 1045 - Dict1: 1 -> Dict2: 0