## Domain

This is an introductory data set considered the "hello world" of data science. It is an ongoing competition on Kaggle allowing students of data science to prepare a model and make a submission to a competition while they are still learning the subject. 

## Problem

This is a binary classification problem in which the challenge is to predict whether a passenger survived the sinking of the Titanic given the demographic data of the passengers. Here, the task $T$ is a binary classification and the experience $E$ is the list of passengers and their survival outcome. 

Note `read.table` and `read.csv` are equivalent accept for the default args. `read.table` defaults to separating on white space. `read.csv` defaults to separating on commas.

## Solution

To solve this problem, we will generate a vector of integers using filtering and masking.

In [1]:
titanic <- read.table('train.csv', sep=",", header = T)
rownames(titanic) <- titanic$PassengerId
titanic$PassengerId <- NULL
titanic$Name <- NULL

## Data Exploration

In [2]:
summary(titanic$Sex)

### Use a Proporation Table to Look at Survival by Gender

This represents the proportion of all passengers in each group.

In [3]:
prop.table(table(titanic$Sex, titanic$Survived))

        
                  0          1
  female 0.09090909 0.26150393
  male   0.52525253 0.12233446

This represents the proportion of survival by gender.

In [4]:
prop.table(table(titanic$Sex, titanic$Survived), 1)

        
                 0         1
  female 0.2579618 0.7420382
  male   0.8110919 0.1889081

This represents the proportion of gender by survival.

In [None]:
prop.table(table(titanic$Sex, titanic$Survived), 2)

### Use a Proporation Table to Look at Survival of Children

In [None]:
prop.table(table(titanic$Age < 10, titanic$Survived), 1)

In [None]:
prop.table(table(titanic$Age < 10, titanic$Survived), 2)

## Benchmark Model

In [None]:
verify_length <- function (v1, v2 ){
    if (length(v1) != length(v2)) {
        stop('length of vectors do not match') 
    }
}

accuracy <- function (actual, predicted) {
    verify_length(actual, predicted)
    return(sum(actual == predicted)/length(actual))
}

In [None]:
no_survivors <- rep(0, length(titanic$Survived))
accuracy(titanic$Survived, no_survivors)

## Women Survived

In [None]:
women_survived <- titanic$Sex == 'female'
accuracy(titanic$Survived, women_survived)

## Children Survived

In [None]:
women_and_children_survived <- women_survived
women_and_children_survived[titanic$Age < 10] <- 1

In [None]:
accuracy(titanic$Survived, women_and_children_survived)