### About the _Titanic_ problem
    http://www.kaggle.com/c/titanic-gettingStarted
    http://www.kaggle.com/c/titanic-gettingStarted/data

### Dataset
    https://s3.amazonaws.com/content.udacity-data.com/courses/ud359/titanic_data.csv

# Quiz 1

Write a simple heuristic that will use the passengers' gender to predict if that person survived the Titanic disaster.
    
You prediction should be *78%* accurate or higher.
        
    1) If the passenger is female, your heuristic should assume that the passenger survived.
    2) If the passenger is male, you heuristic should assume that the passenger did not survive.

## Import the correct packages

In [1]:
import numpy
import pandas
import statsmodels.api as sm

## Write a simple decision tree predicting survival given gender

In [3]:
def simple_heuristic(file_path):
    
    predictions = {}
    df = pandas.read_csv(file_path)
    
    for passenger_index, passenger in df.iterrows():
        passengerIsMale = passenger['Sex'] == 'male'
        passengerIsFemale = passenger['Sex'] == 'female'
        
        passenger_id = passenger['PassengerId']
        if passengerIsMale:
            survived = 0
        elif passengerIsFemale:
            survived = 1
        
        predictions[passenger_id] = survived
        
    return predictions

Your heuristic is **78.68%** accurate.

# Quiz 2

You need to write a more sophisticated algorithm that will use the passengers' gender and their socioeconomical class and age to predict if they survived the Titanic diaster. 
    
You prediction should be 79% accurate or higher.
    
Here's the algorithm, predict the passenger survived if:
    1) If the passenger is female or
    2) if his/her socioeconomic status is high AND if the passenger is under 18
    
Otherwise, your algorithm should predict that the passenger perished in the disaster.
    
You can access the socioeconomic status of a passenger via passenger['Pclass']:
* High socioeconomic status -- passenger['Pclass'] is 1
* Medium socioeconomic status -- passenger['Pclass'] is 2
* Low socioeconomic status -- passenger['Pclass'] is 3

You can access the age of a passenger via passenger['Age'].

In [5]:
def complex_heuristic(file_path):
    predictions = {}
    df = pandas.read_csv(file_path)
    for passenger_index, passenger in df.iterrows():
        passenger_id = passenger['PassengerId']
        
        passengerIsMale = passenger['Sex'] == 'male'
        passengerIsFemale = passenger['Sex'] == 'female'
        
        hasHighSocioEconomicStatus = passenger['Pclass'] == 1
        isYoung = passenger['Age'] < 18
        
        if passengerIsFemale or (hasHighSocioEconomicStatus and isYoung):
            survived = 1
        elif passengerIsMale:
            survived = 0
        
        predictions[passenger_id] = survived
    return predictions

Your heuristic is **79.12%** accurate.

## Quiz 3

For this exercise, you need to write a custom heuristic that will take in some combination of the passenger's attributes and predict if the passenger survived the Titanic diaster.

Can your custom heuristic beat 80% accuracy?
    
The available attributes are:
    Pclass          Passenger Class
                    (1 = 1st; 2 = 2nd; 3 = 3rd)
    Name            Name
    Sex             Sex
    Age             Age
    SibSp           Number of Siblings/Spouses Aboard
    Parch           Number of Parents/Children Aboard
    Ticket          Ticket Number
    Fare            Passenger Fare
    Cabin           Cabin
    Embarked        Port of Embarkation
                    (C = Cherbourg; Q = Queenstown; S = Southampton)
                    
SPECIAL NOTES:
* Pclass is a proxy for socioeconomic status (SES)
    1st ~ Upper; 2nd ~ Middle; 3rd ~ Lower

* Age is in years; fractional if age less than one
    If the age is estimated, it is in the form xx.5

* The family relation variables (i.e. SibSp and Parch) ignore some relations. The following are the definitions used for SibSp and Parch.

  * __Sibling__:  brother, sister, stepbrother, or stepsister of passenger aboard Titanic
  * __Spouse__:   husband or wife of passenger aboard Titanic (mistresses and fiancees ignored)
  * __Parent__:   mother or father of passenger aboard Titanic
  * __Child__:    son, daughter, stepson, or stepdaughter of passenger aboard Titanic

In [7]:
def custom_heuristic(file_path):
    predictions = {}
    df = pandas.read_csv(file_path)
    for passenger_index, passenger in df.iterrows():
        passenger_id = passenger['PassengerId']
        
        passengerIsMale = passenger['Sex'] == 'male'
        passengerIsFemale = passenger['Sex'] == 'female'
        
        hasHighSocioEconomicStatus = passenger['Pclass'] == 1
        isOfLowSocioEconomicStatus = passenger['Pclass'] < 3
        isYoung = passenger['Age'] < 18
        hadExpensiveTicket = passenger['Fare'] > 300
        
        if hadExpensiveTicket or passengerIsFemale or (isOfLowSocioEconomicStatus and isYoung):
            survived = 1
        elif passengerIsMale:
            survived = 0
        predictions[passenger_id] = survived
    return predictions

Your heuristic is **80.13%** accurate.