# Spouses, baggage

We found, in the Titanic dataset, that Third Class passengers were less likely to survive the disaster.

Why?

Was it because they were locked behind gates while the higher-class passengers were being boarded onto lifeboats?  Or some other reason?

In [1]:
# Run this cell to start.
import numpy as np
import pandas as pd

%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')

The official report into the disaster was the [British Wreck Commissioner's
Inquiry report](https://www.titanicinquiry.org/BOTInq/BOTReport/botRep01.php)
by [Lord
Mersey](https://en.wikipedia.org/wiki/John_Bigham,_1st_Viscount_Mersey).

There is a short section of the report entitled [Third Class
Passengers](https://www.titanicinquiry.org/BOTInq/BOTReport/botRep3rdClass.php).
It includes:

> It had been suggested before the Enquiry that the third class passengers had
> been unfairly treated; that their access to the Boat deck had been impeded,
> and that when at last they reached that deck the first and second class
> passengers were given precedence in getting places in the boats. There
> appears to have been no truth in these suggestions. It is no doubt true that
> the proportion of third class passengers saved falls far short of the
> proportion of the first and second class, but this is accounted for by the
> greater reluctance of the third class passengers to leave the ship, by their
> unwillingness to part with their baggage, by the difficulty in getting them
> up from their quarters, which were at the extreme ends of the ship, and by
> other similar causes.

Your job in this notebook it is to explore the evidence in the data for the
"greater reluctance of the third class passengers to leave the ship".

For example, we see [figures in Lord Mersey's
report](https://www.titanicinquiry.org/BOTInq/BOTReport/botRepSaved.php), using
slightly different data from the data you have here, that show:

* 16% of adult male Third Class passengers survived, compared to 8% of Second
  Class males, and 33% of First Class males;
* The corresponding figures for women are 46% (Third) 86% (Second) 97% (First).

Why were Third Class women about half as likely to be saved as Second Class
women, when Third Class men were, if anything, more likely to be saved than
Second Class men?

One possible explanation is that Third Class passengers were more likely to be
young couples, maybe with children.   It may well have been true the young
wives, maybe with children, would be more reluctant to leave their husbands
behind on the ship.  See [Rhoda Abbott's
story](https://en.wikipedia.org/wiki/Rhoda_Abbott) for an example.

One way of getting at this effect could be to use the `sibsp` and `parch`
columns of the dataset:

In [2]:
# Read the dataset as a data frame.
titanic = pd.read_csv("titanic_stlearn.csv")
# Boolean with True for passengers with not-NA sibsp values, False otherwise.
have_sibsp = titanic['sibsp'].notna()
# Select rows with value (not-NA) sibsp values.
with_sibsp = titanic[have_sibsp]
len(with_sibsp)

1307

Here we have dropped all cases where the `sibsp` value is missing, but you might want to:

1. Investigate why the `sibsp` values might be missing, and
2. Consider restoring some of the passengers where the value is missing, or
   removing more passengers that do not correspond to your questions.

You will find more information about the `sibsp` and `parch` variables in the
[Vanderbilt site info
file](http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic3info.txt).
Quoting from that file:

> sibsp           Number of Siblings/Spouses Aboard
>
> parch           Number of Parents/Children Aboard
>
> ...
>
> With respect to the family relation variables (i.e. sibsp and parch) some
> relations were ignored.  The following are the definitions used for sibsp and
> parch.
>
> Sibling:  Brother, Sister, Stepbrother, or Stepsister of Passenger Aboard
>           Titanic
>
> Spouse:   Husband or Wife of Passenger Aboard Titanic (Mistresses and
>           Fiancées Ignored)
>
> Parent:   Mother or Father of Passenger Aboard Titanic
>
> Child:    Son, Daughter, Stepson, or Stepdaughter of Passenger Aboard Titanic
>
> Other family relatives excluded from this study include cousins,
> nephews/nieces, aunts/uncles, and in-laws.  Some children travelled only with
> a nanny, therefore parch=0 for them.  As well, some travelled with very close
> friends or neighbors in a village, however, the definitions do not support
> such relations.

Of course, you also have the passengers' names to go on, including the names of
the children, and any research you do into the passengers and their families.

Use the variables in the data file, and any other methods you can come up with,
to test the following ideas:

1. One explanation for passengers being lost or saved was reluctance to leave a
   spouse, children or other family behind and
2. This goes some way to explaining the relatively low proportion of Third
   Class female passengers that were saved.

Give your assessment of both of these ideas, along with the analyses that
support your conclusions.


## Marking scheme

* Depth of analysis: 25% of marks.
* Analysis appropriate to questions: 25% of marks.
* Quality, clarity and organization of analysis code: 25% of marks.
* Answers based in analysis: 25% of marks.


## Your analysis

Fill out the notebook with your analysis and answers from here.

-------------------
First, the function created in the `titanic` exercise will be used to recode the values from the `class` column and correctly classify the people in the dataset. This will be done to separate the crew members from the actual passengers (the only group we are interested in for our current purposes).

----------------------

In [3]:
def classify_role(row):
    if row.loc['class'] == 'deck crew':
        return 'deck'
    if row.loc['class'] == 'engineering crew':
        return 'engineering'
    if row.loc['class'] == 'victualling crew':
        return 'catering'
    if row.loc['class'] == 'restaurant staff':
        return 'catering'
    musician = 'Brailey','Bricoux', 'Hartley', 'Hume', 'Krins', 'Woodward', 'Clarke, Mr. John Frederick Preston', 'Taylor, Mr. Percy Cornelius' 
    for m in musician:
        if row.loc['name'].startswith(m):
            return 'musician'
    guarantee = 'Chisholm', 'Frost', 'Parkes', 'Parr, Mr. William Henry Marsh', 'Andrews, Mr. Thomas', 'Campbell, Mr. William Henry', 'Cunningham, Mr. Alfred Fleming', 'Knight, Mr. Robert', 'Watson, Mr. Ennis Hastings' 
    for g in guarantee:
        if row.loc['name'].startswith(g):
            return 'guarantee'
    if row.loc['class'] == '1st':
        return '1st'
    if row.loc['class'] == '2nd':
        return '2nd'
    if row.loc['class'] == '3rd':
        return '3rd'

In [4]:
#Read the titanic_stlearn.csv file
titanic = pd.read_csv('titanic_stlearn.csv')

# Apply the recoding function to the titanic data frame
roles = titanic.apply(classify_role, axis = 'columns')

# Create a new column 'roles' that has the actual class/role of the people aboard.
titanic['roles'] = roles 
titanic.head()

Unnamed: 0,name,gender,age,class,embarked,country,ticketno,fare,sibsp,parch,survived,roles
0,"Abbing, Mr. Anthony",male,42.0,3rd,Southampton,United States,5547.0,7.11,0.0,0.0,no,3rd
1,"Abbott, Mr. Eugene Joseph",male,13.0,3rd,Southampton,United States,2673.0,20.05,0.0,2.0,no,3rd
2,"Abbott, Mr. Rossmore Edward",male,16.0,3rd,Southampton,United States,2673.0,20.05,1.0,1.0,no,3rd
3,"Abbott, Mrs. Rhoda Mary 'Rosa'",female,39.0,3rd,Southampton,England,2673.0,20.05,1.0,1.0,yes,3rd
4,"Abelseth, Miss. Karen Marie",female,16.0,3rd,Southampton,Norway,348125.0,7.13,0.0,0.0,yes,3rd


---------------------------------------------
A new variable `is_passenger` which is a Boolean series with 'True' for 1st, 2nd and 3rd class passengers and 'False' otherwise (crew members) will be created. `is_passenger` will be used to create a new data frame `passengers` with 1300 rows corresponding only to the passengers in the Titanic.  

Note: According to the [Encyclopedia Titanica](http://www.encyclopedia-titanica.org/) there were 1317 passengers and 890 crew members. We found out in the `titanic` exercise that the 8 musicians and 9 members of the guarantee group were classified in the [titanic_stlearn.csv](https://github.com/matthew-brett/datasets/tree/master/titanic/processed) dataset as 1st or 2nd class passengers when they were crew members instead. Therefore, the total number of passengers is 1300 and the total number of crew members is 907.  

------------------------

In [5]:
is_passenger = titanic['roles'].str.contains('1st') | titanic['roles'].str.contains('2nd') | titanic['roles'].str.contains('3rd')

print('passengers:', np.count_nonzero(is_passenger)) #Check the number of passengers
print('crew members:', np.count_nonzero(~is_passenger)) #Check the number of crew members

#Create a new data frame
passengers = pd.DataFrame(titanic[is_passenger])
passengers.head()

passengers: 1300
crew members: 907


Unnamed: 0,name,gender,age,class,embarked,country,ticketno,fare,sibsp,parch,survived,roles
0,"Abbing, Mr. Anthony",male,42.0,3rd,Southampton,United States,5547.0,7.11,0.0,0.0,no,3rd
1,"Abbott, Mr. Eugene Joseph",male,13.0,3rd,Southampton,United States,2673.0,20.05,0.0,2.0,no,3rd
2,"Abbott, Mr. Rossmore Edward",male,16.0,3rd,Southampton,United States,2673.0,20.05,1.0,1.0,no,3rd
3,"Abbott, Mrs. Rhoda Mary 'Rosa'",female,39.0,3rd,Southampton,England,2673.0,20.05,1.0,1.0,yes,3rd
4,"Abelseth, Miss. Karen Marie",female,16.0,3rd,Southampton,Norway,348125.0,7.13,0.0,0.0,yes,3rd


---------------------------------------
Now it would be important to know how many values are missing in the `sibsp` and `parch` columns of our `passengers` data frame. 

------------------------------

In [6]:
sibsp = passengers['sibsp'] # Pandas series with the 'sibsp' column 
print('missing sibsp values:', np.count_nonzero(pd.isna(sibsp))) # Number of missing values from the 'sibsp' column
print(sibsp[sibsp.isnull()]) # Rows with missing values

missing sibsp values: 2
615    NaN
1094   NaN
Name: sibsp, dtype: float64


In [7]:
parch = passengers['parch'] # Pandas series with the 'parch' column 
print('missing parch values:', np.count_nonzero(pd.isna(parch))) # Number of missing values from the 'parch' column
print(parch[parch.isnull()]) # Rows with missing values

missing parch values: 2
615    NaN
1094   NaN
Name: parch, dtype: float64


In [8]:
# Now we want to know which passengers correspond to those rows with missing values
print(passengers.loc[615]) 
print(passengers.loc[1094])

name        Johnson, Mr. August
gender                     male
age                          49
class                       3rd
embarked            Southampton
country           United States
ticketno                 370160
fare                        NaN
sibsp                       NaN
parch                       NaN
survived                     no
roles                       3rd
Name: 615, dtype: object
name        Shannon, Mr. Andrew John
gender                          male
age                               35
class                            3rd
embarked                 Southampton
country                      Ireland
ticketno                      370160
fare                             NaN
sibsp                            NaN
parch                            NaN
survived                          no
roles                            3rd
Name: 1094, dtype: object


---------------------------------
According to the biography of [Mr. August Johnson](https://www.encyclopedia-titanica.org/titanic-victim/alfred-johnson.html) and [Mr. Andrew John Shannon](https://www.encyclopedia-titanica.org/titanic-victim/lionel-leonard.html) they were both part of the American Line (that is why they have the same ticket number). The `sibsp` and `parch` values of these two passengers might be missing because they embarked the Titanic with different names. Mr. Johnson is listed in some datasets as Mr. Alfred Johnson and Mr. Shannon assumed name was Mr. Lionel Leonard. 

The missing values can be restored now that we know both passengers were traveling without family.

--------------------------------------

In [9]:
#Replace the NaN values with 0.
passengers['sibsp'].fillna(0, inplace = True) 
passengers['parch'].fillna(0, inplace = True)

#Check that it was done right
print(passengers.loc[615])
print(passengers.loc[1094])


name        Johnson, Mr. August
gender                     male
age                          49
class                       3rd
embarked            Southampton
country           United States
ticketno                 370160
fare                        NaN
sibsp                         0
parch                         0
survived                     no
roles                       3rd
Name: 615, dtype: object
name        Shannon, Mr. Andrew John
gender                          male
age                               35
class                            3rd
embarked                 Southampton
country                      Ireland
ticketno                      370160
fare                             NaN
sibsp                              0
parch                              0
survived                          no
roles                            3rd
Name: 1094, dtype: object


-----------------------------------
Considering the findings on unusual passenger cases suggested on the [Vanderbilt site](http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic3info.txt), I decided to look for cases of children (under or equal to 15 years old) that have a "0" in both the `sibsp` and `parch` columns. Then, I checked their autobiography on the [Encyclopedia Titanica site](https://www.encyclopedia-titanica.org/) to find out who they thraveled with and to replace the values in the `sibsp` or `parch` columns.   

-------------------------

In [10]:
is_child = passengers['age'] <= 15
children = passengers[is_child]
children_zero = (children['sibsp'] == 0) & (children['parch'] == 0)
weird_cases_children = children[children_zero]
weird_cases_children

Unnamed: 0,name,gender,age,class,embarked,country,ticketno,fare,sibsp,parch,survived,roles
62,"Asplund, Mr. Johan Charles",male,13.0,3rd,Southampton,Sweden,350054.0,7.1511,0.0,0.0,yes,3rd
72,"Ayoub Daher, Miss. Banoura",female,15.0,3rd,Cherbourg,Lebanon,2687.0,7.0407,0.0,0.0,yes,3rd
376,"Emanuel, Miss. Virginia Ethel",female,5.0,3rd,Southampton,United States,364516.0,12.0906,0.0,0.0,yes,3rd
579,"Ibrāhīm, Mr. Husayn Mahmūd Husayn",male,11.0,3rd,Cherbourg,,2699.0,18.1509,0.0,0.0,no,3rd
854,"Najib Kiamie, Miss. Adele",female,15.0,3rd,Cherbourg,Lebanon,2667.0,7.0406,0.0,0.0,yes,3rd
1091,"Seman, Master. Betros",male,10.0,3rd,Cherbourg,Lebanon,2622.0,4.0,0.0,0.0,no,3rd
1177,"Svensson, Mr. Johan Cervin",male,14.0,3rd,Southampton,Sweden,7538.0,9.0406,0.0,0.0,yes,3rd
1180,"Sweet, Mr. George Frederick",male,14.0,2nd,Southampton,England,220845.0,65.0,0.0,0.0,no,2nd
1243,"Veström, Miss. Hulda Amanda Adolfina",female,14.0,3rd,Southampton,Sweden,350406.0,7.1701,0.0,0.0,no,3rd
1257,"Watt, Miss. Robertha Josephine",female,12.0,2nd,Southampton,Scotland,33595.0,15.15,0.0,0.0,yes,2nd


I found the passengers that traveled with these children in the `titanic_stlearn.csv` file using their ticket number, available in their autobiography on the [Encyclopedia Titanica site](https://www.encyclopedia-titanica.org/): 

Asplund, Mr. Johan Charles traveled with a friend:
    - Karlsson, Mr. Einar Gervasius

Ayoub Daher, Miss. Banoura traveled with 5 cousins (all of them with '0' in both `sibsp` and `parch` columns): 
    - Youssiff (Sam'aan), Mr. Gerios
    - Thomas/Tannous, Mr. John and his teenage son
    - Thomas/Tannous, Mr. Tannous
    - George/Joseph, Mrs. Shawneene
    - Daher, Mr. Tannous

Emanuel, Miss. Virginia Ethel traveled with her nanny:
    - Dowdell, Miss. Elizabeth.

Ibrāhīm, Mr. Husayn Mahmūd Husayn traveled with a family friend:
    - listed in the .csv file as AbÄ«-Al-MunÃ , Mr. NÄsÄ«f QÄsim 

Najib Kiamie, Miss. Adele traveled with a family friend:
    - Baclini, Mrs. Latifa and her three daughters
    - Baclini, Miss. Eugenie
    - Baclini, Miss. Helene Barbara
    - Baclini, Miss. Marie Catherine

Seman, Master. Betros traveled with a family friend:
    - listed as Butrus-Ka'wÄ«, Mr. TannÅ«s

Svensson, Mr. Johan Cervin apparently traveled alone to be reunited with his family

Sweet, Mr. George Frederick traveled with family friends:
    - Herman, Mr. Samuel
    - Herman, Mrs. Jane and their two daughters
    - Herman, Miss. Alice
    - Herman, Miss. Kate

Veström, Miss. Hulda Amanda Adolfina traveled with her maternal aunt:
    - listed as KlasÃ©n, Mrs. Hulda Kristina Eugenia

Watt, Miss. Robertha Josephine traveled with her mother:
    - Watt, Mrs. Elizabeth

-----------------------
Values of children with unusual cases as well as those of passengers who traveled with them will be replaced by new corresponding values using the function `classify_family`. 

------------------

In [11]:
def classify_family(row):
    children_1 = 'Asplund, Mr. Johan Charles', 'Emanuel, Miss. Virginia Ethel','Ibrāhīm, Mr. Husayn Mahmūd Husayn', 'Seman, Master. Betros', 'Veström, Miss. Hulda Amanda Adolfina', 'Watt, Miss. Robertha Josephine'
    children_4 = 'Najib Kiamie, Miss. Adele', 'Sweet, Mr. George Frederick'
    if row.loc['name'] == 'Ayoub Daher, Miss. Banoura':
            return 5.0
    for c in children_1:
        if row.loc['name'].startswith(c):
            return 1.0
    for i in children_4:
        if row.loc['name'].startswith(i):
            return 4.0
        
    family_1 = 'Karlsson, Mr. Einar Gervasius', 'Dowdell, Miss. Elizabeth', 'AbÄ«-Al-MunÃ , Mr. NÄsÄ«f QÄsim', 'Butrus', 'KlasÃ©n, Mrs. Hulda Kristina Eugenia', 'Watt, Mrs. Elizabeth', 'Baclini, Mrs. Latifa', 'Baclini, Miss. Eugenie', 'Baclini, Miss. Helene Barbara', 'Baclini, Miss. Marie Catherine','Herman, Mr. Samuel', 'Herman, Mrs. Jane', 'Herman, Miss. Alice', 'Herman, Miss. Kate'  
    family_5 = 'Youssiff', 'Thomas/Tannous, Mr. John','Thomas/Tannous, Mr. Tannous', 'George/Joseph, Mrs. Shawneene', 'Daher, Mr. Tannous'
    
    for f in family_1:
        if row.loc['name'].startswith(f):
            return 1.0
    for n in family_5:
        if row.loc['name'].startswith(n):
            return 5.0
    else:
        return 0.0

In [12]:
#Apply the function to the passengers data frame 
companions = passengers.apply(classify_family, axis = 'columns') 

Since the first idea we want to test is:

1. One explanation for passengers being lost or saved was reluctance to leave a spouse, children or other family behind 

I have decided to combine the values of `sibps`, `parch` and `companions`, as a new column `family` because, for our current purposes, we do not need to differentiate the types of relationship that exist between the passengers. `family` includes parents, children, siblings, aunts, uncles, nieces, nephews, family friends, etc.     

In [13]:
#Create a new column 'family' with the total number of family members that each passenger traveled with
family = passengers['sibsp'] + passengers['parch'] + companions
passengers['family'] = family 
passengers.head()

Unnamed: 0,name,gender,age,class,embarked,country,ticketno,fare,sibsp,parch,survived,roles,family
0,"Abbing, Mr. Anthony",male,42.0,3rd,Southampton,United States,5547.0,7.11,0.0,0.0,no,3rd,0.0
1,"Abbott, Mr. Eugene Joseph",male,13.0,3rd,Southampton,United States,2673.0,20.05,0.0,2.0,no,3rd,2.0
2,"Abbott, Mr. Rossmore Edward",male,16.0,3rd,Southampton,United States,2673.0,20.05,1.0,1.0,no,3rd,2.0
3,"Abbott, Mrs. Rhoda Mary 'Rosa'",female,39.0,3rd,Southampton,England,2673.0,20.05,1.0,1.0,yes,3rd,2.0
4,"Abelseth, Miss. Karen Marie",female,16.0,3rd,Southampton,Norway,348125.0,7.13,0.0,0.0,yes,3rd,0.0


In [14]:
#Check that the 'family' column has the correct values for the children with unusual cases
passengers.loc[weird_cases_children.index]

Unnamed: 0,name,gender,age,class,embarked,country,ticketno,fare,sibsp,parch,survived,roles,family
62,"Asplund, Mr. Johan Charles",male,13.0,3rd,Southampton,Sweden,350054.0,7.1511,0.0,0.0,yes,3rd,1.0
72,"Ayoub Daher, Miss. Banoura",female,15.0,3rd,Cherbourg,Lebanon,2687.0,7.0407,0.0,0.0,yes,3rd,5.0
376,"Emanuel, Miss. Virginia Ethel",female,5.0,3rd,Southampton,United States,364516.0,12.0906,0.0,0.0,yes,3rd,1.0
579,"Ibrāhīm, Mr. Husayn Mahmūd Husayn",male,11.0,3rd,Cherbourg,,2699.0,18.1509,0.0,0.0,no,3rd,1.0
854,"Najib Kiamie, Miss. Adele",female,15.0,3rd,Cherbourg,Lebanon,2667.0,7.0406,0.0,0.0,yes,3rd,4.0
1091,"Seman, Master. Betros",male,10.0,3rd,Cherbourg,Lebanon,2622.0,4.0,0.0,0.0,no,3rd,1.0
1177,"Svensson, Mr. Johan Cervin",male,14.0,3rd,Southampton,Sweden,7538.0,9.0406,0.0,0.0,yes,3rd,0.0
1180,"Sweet, Mr. George Frederick",male,14.0,2nd,Southampton,England,220845.0,65.0,0.0,0.0,no,2nd,4.0
1243,"Veström, Miss. Hulda Amanda Adolfina",female,14.0,3rd,Southampton,Sweden,350406.0,7.1701,0.0,0.0,no,3rd,1.0
1257,"Watt, Miss. Robertha Josephine",female,12.0,2nd,Southampton,Scotland,33595.0,15.15,0.0,0.0,yes,2nd,1.0


--------------
A new column `accompanied` will be created in the `passengers` data frame. It has 1 for passengers who traveled with at least one family member and 0 for passengers who traveled alone.

--------------------

In [15]:
accompanied = family > 0
passengers['accomp'] = accompanied
accompanied = passengers['accomp'].astype(int)
passengers['accomp'] = accompanied
passengers.head()

Unnamed: 0,name,gender,age,class,embarked,country,ticketno,fare,sibsp,parch,survived,roles,family,accomp
0,"Abbing, Mr. Anthony",male,42.0,3rd,Southampton,United States,5547.0,7.11,0.0,0.0,no,3rd,0.0,0
1,"Abbott, Mr. Eugene Joseph",male,13.0,3rd,Southampton,United States,2673.0,20.05,0.0,2.0,no,3rd,2.0,1
2,"Abbott, Mr. Rossmore Edward",male,16.0,3rd,Southampton,United States,2673.0,20.05,1.0,1.0,no,3rd,2.0,1
3,"Abbott, Mrs. Rhoda Mary 'Rosa'",female,39.0,3rd,Southampton,England,2673.0,20.05,1.0,1.0,yes,3rd,2.0,1
4,"Abelseth, Miss. Karen Marie",female,16.0,3rd,Southampton,Norway,348125.0,7.13,0.0,0.0,yes,3rd,0.0,0


---------------------------
If the first idea was true:
1. One explanation for passengers being lost or saved was reluctance to leave a spouse, children or other family behind

we could expect a higher proportion out of the total number of passengers who survived to travel alone. We could also expect a higher proportion out of the total number of passengers who traveled alone to survive. First, let's test these hypothesis.  

-------------------------------

In [16]:
pd.crosstab(passengers['accomp'], passengers['survived'], normalize = 'columns')

survived,no,yes
accomp,Unnamed: 1_level_1,Unnamed: 2_level_1
0,0.66625,0.46
1,0.33375,0.54


In [17]:
pd.crosstab(passengers['accomp'], passengers['survived'], normalize = 'index')

survived,no,yes
accomp,Unnamed: 1_level_1,Unnamed: 2_level_1
0,0.698558,0.301442
1,0.497207,0.502793


-------------------
As we can see from the table, both hypothesis were denied. Apparently, traveling with a family member does not imply a higher probability of death and traveling alone does not imply a higher probability of survival. At least not when we take into account all the passengers together as a big group.

We can take this analysis a little further and divide the passengers by class and `accompanied` (yes or no). 

In [18]:
def sort_class_accomp (row):
    if (row.loc['accomp'] == 1) & (row.loc['roles'] == '3rd'):
        return '3rd accompanied'
    if (row.loc['accomp'] == 1) & (row.loc['roles'] == '2nd'):
        return '2nd accompanied'
    if (row.loc['accomp'] == 1) & (row.loc['roles'] == '1st'):
        return '1st accompanied'
    if (row.loc['accomp'] == 0) & (row.loc['roles'] == '3rd'):
        return '3rd alone'
    if (row.loc['accomp'] == 0) & (row.loc['roles'] == '2nd'):
        return '2nd alone'
    if (row.loc['accomp'] == 0) & (row.loc['roles'] == '1st'):
        return '1st alone'

In [19]:
by_class_accomp = passengers.apply(sort_class_accomp, axis = 'columns')
passengers['by_class_accomp'] = by_class_accomp
passengers.head()

Unnamed: 0,name,gender,age,class,embarked,country,ticketno,fare,sibsp,parch,survived,roles,family,accomp,by_class_accomp
0,"Abbing, Mr. Anthony",male,42.0,3rd,Southampton,United States,5547.0,7.11,0.0,0.0,no,3rd,0.0,0,3rd alone
1,"Abbott, Mr. Eugene Joseph",male,13.0,3rd,Southampton,United States,2673.0,20.05,0.0,2.0,no,3rd,2.0,1,3rd accompanied
2,"Abbott, Mr. Rossmore Edward",male,16.0,3rd,Southampton,United States,2673.0,20.05,1.0,1.0,no,3rd,2.0,1,3rd accompanied
3,"Abbott, Mrs. Rhoda Mary 'Rosa'",female,39.0,3rd,Southampton,England,2673.0,20.05,1.0,1.0,yes,3rd,2.0,1,3rd accompanied
4,"Abelseth, Miss. Karen Marie",female,16.0,3rd,Southampton,Norway,348125.0,7.13,0.0,0.0,yes,3rd,0.0,0,3rd alone


------------
If the first idea was true, it should be true for every passenger, regardless of class...

-----------

In [20]:
#Cross tabulation between our new by_class_accomp column and the survived column
pd.crosstab(passengers['by_class_accomp'], passengers['survived'], normalize = 'columns')

survived,no,yes
by_class_accomp,Unnamed: 1_level_1,Unnamed: 2_level_1
1st accompanied,0.05625,0.236
1st alone,0.09375,0.166
2nd accompanied,0.06125,0.146
2nd alone,0.12875,0.09
3rd accompanied,0.21625,0.158
3rd alone,0.44375,0.204


Most of the passengers who died were 3rd class passengers travelling alone (44%). Even more evidence to reject the first idea as well as to doubt the truth about the third class passengers deaths proposed in the [British Wreck Commissioner's
Inquiry report](https://www.titanicinquiry.org/BOTInq/BOTReport/botRep01.php).  

We can also look at the proportions out of the total number of 1st, 2nd and 3rd class that were accompanied or alone. We could expect for the passengers that were alone to have higher proportions of survival than proportions of deaths, regardless of their class (i.e. 1st alone-yes should be higher than 1st alone-no; 2nd alone-yes should be higher than 2nd alone-no; and 3rd alone-yes should be higher than 3rd alone-no). And we could expect for the passengers that were accompanied to have lower proportions of survival than proportions of deaths, regardless of their class (i.e. 1st accompanied-yes should be lower than 1st accompanied-no; 2nd accompanied-yes should be lower than 2nd accompanied-no; and 3rd accompanied-yes should be lower than 3rd alone-no).

In [21]:
pd.crosstab(passengers['by_class_accomp'], passengers['survived'], normalize = 'index')

survived,no,yes
by_class_accomp,Unnamed: 1_level_1,Unnamed: 2_level_1
1st accompanied,0.276074,0.723926
1st alone,0.474684,0.525316
2nd accompanied,0.401639,0.598361
2nd alone,0.695946,0.304054
3rd accompanied,0.686508,0.313492
3rd alone,0.776805,0.223195


According to the [British Wreck Commissioner's
Inquiry report](https://www.titanicinquiry.org/BOTInq/BOTReport/botRep01.php), the third class passengers were reluctant to leave their families behind. We could then expect most of the 3rd class passengers who were alone to survive...but they did not. 

In [22]:
#As additional information we can do a cross tabulation between number of family members traveling with each passenger
#and survival 
pd.crosstab(passengers['family'], passengers['survived'], normalize = 'index')

survived,no,yes
family,Unnamed: 1_level_1,Unnamed: 2_level_1
0.0,0.698558,0.301442
1.0,0.461224,0.538776
2.0,0.433962,0.566038
3.0,0.342857,0.657143
4.0,0.5625,0.4375
5.0,0.774194,0.225806
6.0,0.75,0.25
7.0,1.0,0.0
10.0,1.0,0.0


-------------
To test the second idea:
2. 'Passengers being lost or saved was reluctance to leave a spouse, children or other family behind' goes some way to explaining the relatively low proportion of Third Class female passengers that were saved.

First, a new data frame `females` will be created, with only the female passengers (adult and children). 

In [23]:
is_female = passengers['gender'] == 'female'
females = pd.DataFrame(passengers[is_female])
females

Unnamed: 0,name,gender,age,class,embarked,country,ticketno,fare,sibsp,parch,survived,roles,family,accomp,by_class_accomp
3,"Abbott, Mrs. Rhoda Mary 'Rosa'",female,39.0,3rd,Southampton,England,2673.0,20.0500,1.0,1.0,yes,3rd,2.0,1,3rd accompanied
4,"Abelseth, Miss. Karen Marie",female,16.0,3rd,Southampton,Norway,348125.0,7.1300,0.0,0.0,yes,3rd,0.0,0,3rd alone
7,"Abelson, Mrs. Hannah",female,28.0,2nd,Cherbourg,France,3381.0,24.0000,1.0,0.0,yes,2nd,1.0,1,2nd accompanied
12,"Ahlin, Mrs. Johanna Persdotter",female,40.0,3rd,Southampton,Sweden,7546.0,9.0906,1.0,0.0,no,3rd,1.0,1,3rd accompanied
14,"Aks, Mrs. Leah",female,18.0,3rd,Southampton,England,392091.0,9.0700,0.0,1.0,yes,3rd,1.0,1,3rd accompanied
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1306,"Wright, Miss. Marion",female,26.0,2nd,Southampton,England,220844.0,13.1000,0.0,0.0,yes,2nd,0.0,0,2nd alone
1309,"Yazbeck, Mrs. Selini",female,15.0,3rd,Cherbourg,Lebanon,2659.0,14.0901,1.0,0.0,yes,3rd,1.0,1,3rd accompanied
1310,"Young, Miss. Marie Grice",female,36.0,1st,Cherbourg,United States,17760.0,135.1208,0.0,0.0,yes,1st,0.0,0,1st alone
1313,"Yūsuf, Mrs. Kātrīn",female,23.0,3rd,Cherbourg,Lebanon,2668.0,22.0702,0.0,2.0,yes,3rd,2.0,1,3rd accompanied


In [24]:
#Proportion of females, who were accompanied or alone, that survived. 
pd.crosstab(females['accomp'], females['survived'], normalize = 'columns')

survived,no,yes
accomp,Unnamed: 1_level_1,Unnamed: 2_level_1
0,0.401575,0.39823
1,0.598425,0.60177


In [25]:
females['accomp'].value_counts()

1    280
0    186
Name: accomp, dtype: int64

In [26]:
females['roles'].value_counts()

3rd    216
1st    144
2nd    106
Name: roles, dtype: int64

In [27]:
females['by_class_accomp'].value_counts()

3rd accompanied    122
3rd alone           94
1st accompanied     92
2nd accompanied     66
1st alone           52
2nd alone           40
Name: by_class_accomp, dtype: int64

In [28]:
pd.crosstab(females['by_class_accomp'], females['survived'], normalize = 'columns') 
#Most of he woman that died were third class traveling acompanied. 

survived,no,yes
by_class_accomp,Unnamed: 1_level_1,Unnamed: 2_level_1
1st accompanied,0.023622,0.262537
1st alone,0.015748,0.147493
2nd accompanied,0.047244,0.176991
2nd alone,0.047244,0.100295
3rd accompanied,0.527559,0.162242
3rd alone,0.338583,0.150442


In [29]:
pd.crosstab(females['by_class_accomp'], females['survived'], normalize = 'index')

survived,no,yes
by_class_accomp,Unnamed: 1_level_1,Unnamed: 2_level_1
1st accompanied,0.032609,0.967391
1st alone,0.038462,0.961538
2nd accompanied,0.090909,0.909091
2nd alone,0.15,0.85
3rd accompanied,0.54918,0.45082
3rd alone,0.457447,0.542553


-------------------
Only for females in the third class being accompanied implies higher probability of dying ('3rd accompanied-no'). If the third class females were reluctant to leave their family behind and that is why they died more, a high proportion of the third class women traveling alone should have survived (only 54% did).      