<a href="https://www.kaggle.com/code/mahtaba/titanic-using-name-only-in-python-0-79904?scriptVersionId=93743640" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

I have had a difficult time reading Chris Deotte's (@cdeotte) [notebook](http://https://www.kaggle.com/code/cdeotte/titanic-using-name-only-0-) in the R language. So, I implemented his notebook ('Titanic using Name only') in Python 3. 
Chris Deotte uses only one column i.e. 'Name' to predict survival. He also used the 'Ticket' column to improve his predictions by a margin. I have, however, solely relied on the 'Name' column to make predictions with an accuracy of ~0.8.

What I like about Chris' approach is his model's simplicity. Many kernels have used complex Machine Learning models to give comparatively poor predictions. Chris does not use Machine Learning at all. He uses some common sense to make relatively accurate guesses. I hope you find this notebook helpful.

In [1]:
import pandas as pd

In [2]:
train = pd.read_csv('/kaggle/input/titanic/train.csv')[['PassengerId', 'Survived', 'Name']]
test = pd.read_csv('/kaggle/input/titanic/test.csv')[['PassengerId', 'Name']]

In [3]:

train['Surname'], test['Surname'] = (df['Name'].apply(lambda x: x.split(',')[0]) for df in [train, test])
train['Title'], test['Title'] = (df['Name'].apply(lambda x: x.split(',')[1]).apply(lambda x: x.split()[0]) for df in [train, test])

In [4]:
train.groupby('Title')['PassengerId'].count()

Title
Capt.          1
Col.           2
Don.           1
Dr.            7
Jonkheer.      1
Lady.          1
Major.         2
Master.       40
Miss.        182
Mlle.          2
Mme.           1
Mr.          517
Mrs.         125
Ms.            1
Rev.           6
Sir.           1
the            1
Name: PassengerId, dtype: int64

In [5]:
#We incorrectly extracted 'the' as a Title
train[train.Title=='the']

Unnamed: 0,PassengerId,Survived,Name,Surname,Title
759,760,1,"Rothes, the Countess. of (Lucy Noel Martha Dye...",Rothes,the


In [6]:
#Correcting the Title for passenger 760
train.loc[(train.PassengerId==760), 'Title'] = 'Countess.'

In [7]:
#Creating a dictionary to map Title to relevant Category
TitleDict = {'Capt.': 'man', 'Don.': 'man', 'Major.': 'man', 'Col.': 'man', 'Rev.': 'man', 'Dr.': 'man', 'Sir.': 'man',
                'Mr.': 'man', 'Jonkheer.': 'man', 'Dona.': 'woman', 'Countess.': 'woman', 'Mme.': 'woman', 'Mlle.': 'woman',
                'Ms.': 'woman', 'Miss.': 'woman', 'Lady.': 'woman', 'Mrs.': 'woman', 'Master.': 'boy'}

In [8]:
train['Category'], test['Category'] = (df.Title.map(TitleDict) for df in [train, test])

In [9]:
#The gender model, which predicts all women to survive and men to expire, scores pretty well, .76. We know that women and 
#children were given preference while saving passengers on the Titanic. So, we'll identify women-children-groups. For that,
#we'll create a temporary dataset which excludes men and then identify women-children-groups by their surnames.
temp = train[(train.Category!='man')]
SurnameDict = temp.groupby('Surname')['PassengerId'].count().to_dict()
print(SurnameDict)

{'Abbott': 1, 'Abelson': 1, 'Ahlin': 1, 'Aks': 1, 'Allen': 1, 'Allison': 3, 'Andersen-Jensen': 1, 'Andersson': 7, 'Andrews': 1, 'Angle': 1, 'Appleton': 1, 'Arnold-Franchi': 1, 'Asplund': 4, 'Astor': 1, 'Attalah': 1, 'Aubart': 1, 'Ayoub': 1, 'Backstrom': 1, 'Baclini': 4, 'Ball': 1, 'Barbara': 2, 'Barber': 1, 'Baxter': 1, 'Bazzani': 1, 'Beane': 1, 'Becker': 2, 'Beckwith': 1, 'Bidois': 1, 'Bishop': 1, 'Bissette': 1, 'Bonnell': 1, 'Boulos': 2, 'Bourke': 2, 'Bowerman': 1, 'Brown': 3, 'Burns': 1, 'Buss': 1, 'Bystrom': 1, 'Cacic': 1, 'Caldwell': 2, 'Cameron': 1, 'Canavan': 1, 'Caram': 1, 'Carr': 1, 'Carter': 4, 'Chambers': 1, 'Cherry': 1, 'Chibnall': 1, 'Christy': 1, 'Clarke': 1, 'Cleaver': 1, 'Collyer': 2, 'Compton': 1, 'Connolly': 1, 'Coutts': 2, 'Crosby': 1, 'Cumings': 1, 'Dahlberg': 1, 'Danbom': 1, 'Davies': 1, 'Davis': 1, 'Davison': 1, 'Dean': 1, 'Devaney': 1, 'Dick': 1, 'Dodge': 1, 'Doling': 2, 'Dowdell': 1, 'Drew': 1, 'Duff Gordon': 1, 'Duran y More': 1, 'Emanuel': 1, 'Endres': 1, 'Eus

In [10]:
train.loc[(train.Category!='man'), 'SurnameFreq']=train.Surname.map(SurnameDict) #mapping Surname to the dictionary

In [11]:
train.head()

Unnamed: 0,PassengerId,Survived,Name,Surname,Title,Category,SurnameFreq
0,1,0,"Braund, Mr. Owen Harris",Braund,Mr.,man,
1,2,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",Cumings,Mrs.,woman,1.0
2,3,1,"Heikkinen, Miss. Laina",Heikkinen,Miss.,woman,1.0
3,4,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",Futrelle,Mrs.,woman,1.0
4,5,0,"Allen, Mr. William Henry",Allen,Mr.,man,


In [12]:
#142 passengers belonged to a woman-child-group
train[train.SurnameFreq>1]

Unnamed: 0,PassengerId,Survived,Name,Surname,Title,Category,SurnameFreq
7,8,0,"Palsson, Master. Gosta Leonard",Palsson,Master.,boy,4.0
8,9,1,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",Johnson,Mrs.,woman,3.0
10,11,1,"Sandstrom, Miss. Marguerite Rut",Sandstrom,Miss.,woman,2.0
16,17,0,"Rice, Master. Eugene",Rice,Master.,boy,5.0
18,19,0,"Vander Planke, Mrs. Julius (Emelia Maria Vande...",Vander Planke,Mrs.,woman,2.0
...,...,...,...,...,...,...,...
858,859,1,"Baclini, Mrs. Solomon (Latifa Qurban)",Baclini,Mrs.,woman,4.0
863,864,0,"Sage, Miss. Dorothy Edith ""Dolly""",Sage,Miss.,woman,4.0
869,870,1,"Johnson, Master. Harold Theodor",Johnson,Master.,boy,3.0
885,886,0,"Rice, Mrs. William (Margaret Norton)",Rice,Mrs.,woman,5.0


In [13]:
train.loc[(train.SurnameFreq>1), 'Group'] = 1 #identifying these 142 passengers as Group=1

In [14]:
train[(train.Group==1)]

Unnamed: 0,PassengerId,Survived,Name,Surname,Title,Category,SurnameFreq,Group
7,8,0,"Palsson, Master. Gosta Leonard",Palsson,Master.,boy,4.0,1.0
8,9,1,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",Johnson,Mrs.,woman,3.0,1.0
10,11,1,"Sandstrom, Miss. Marguerite Rut",Sandstrom,Miss.,woman,2.0,1.0
16,17,0,"Rice, Master. Eugene",Rice,Master.,boy,5.0,1.0
18,19,0,"Vander Planke, Mrs. Julius (Emelia Maria Vande...",Vander Planke,Mrs.,woman,2.0,1.0
...,...,...,...,...,...,...,...,...
858,859,1,"Baclini, Mrs. Solomon (Latifa Qurban)",Baclini,Mrs.,woman,4.0,1.0
863,864,0,"Sage, Miss. Dorothy Edith ""Dolly""",Sage,Miss.,woman,4.0,1.0
869,870,1,"Johnson, Master. Harold Theodor",Johnson,Master.,boy,3.0,1.0
885,886,0,"Rice, Mrs. William (Margaret Norton)",Rice,Mrs.,woman,5.0,1.0


In [15]:
#Creating a dictionary which keeps a count of average survival of every woman-child-groups
dic = train[(train.Group==1)].groupby('Surname')['Survived'].mean().to_dict()
print(type(dic))
print(dic)

<class 'dict'>
{'Allison': 0.3333333333333333, 'Andersson': 0.14285714285714285, 'Asplund': 0.75, 'Baclini': 1.0, 'Barbara': 0.0, 'Becker': 1.0, 'Boulos': 0.0, 'Bourke': 0.0, 'Brown': 1.0, 'Caldwell': 1.0, 'Carter': 0.75, 'Collyer': 1.0, 'Coutts': 1.0, 'Doling': 1.0, 'Ford': 0.0, 'Fortune': 1.0, 'Goldsmith': 1.0, 'Goodwin': 0.0, 'Graham': 1.0, 'Hamalainen': 1.0, 'Harper': 1.0, 'Hart': 1.0, 'Hays': 1.0, 'Herman': 1.0, 'Hippach': 1.0, 'Johnson': 1.0, 'Jussila': 0.0, 'Kelly': 1.0, 'Laroche': 1.0, 'Lefebre': 0.0, 'Mellinger': 1.0, 'Moor': 1.0, 'Moubarek': 1.0, 'Murphy': 1.0, 'Navratil': 1.0, 'Newell': 1.0, 'Nicola-Yarred': 1.0, 'Palsson': 0.0, 'Panula': 0.0, 'Peter': 1.0, 'Quick': 1.0, 'Rice': 0.0, 'Richards': 1.0, 'Ryerson': 1.0, 'Sage': 0.0, 'Sandstrom': 1.0, 'Skoog': 0.0, 'Strom': 0.0, 'Taussig': 1.0, 'Van Impe': 0.0, 'Vander Planke': 0.0, 'West': 1.0, 'Wick': 1.0, 'Zabour': 0.0}


In [16]:
train.loc[(train.Group==1), 'SurnameSurvival']=train.Surname.map(dic)

In [17]:
train[(train.SurnameSurvival.isnull()==False)]

Unnamed: 0,PassengerId,Survived,Name,Surname,Title,Category,SurnameFreq,Group,SurnameSurvival
7,8,0,"Palsson, Master. Gosta Leonard",Palsson,Master.,boy,4.0,1.0,0.0
8,9,1,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",Johnson,Mrs.,woman,3.0,1.0,1.0
10,11,1,"Sandstrom, Miss. Marguerite Rut",Sandstrom,Miss.,woman,2.0,1.0,1.0
16,17,0,"Rice, Master. Eugene",Rice,Master.,boy,5.0,1.0,0.0
18,19,0,"Vander Planke, Mrs. Julius (Emelia Maria Vande...",Vander Planke,Mrs.,woman,2.0,1.0,0.0
...,...,...,...,...,...,...,...,...,...
858,859,1,"Baclini, Mrs. Solomon (Latifa Qurban)",Baclini,Mrs.,woman,4.0,1.0,1.0
863,864,0,"Sage, Miss. Dorothy Edith ""Dolly""",Sage,Miss.,woman,4.0,1.0,0.0
869,870,1,"Johnson, Master. Harold Theodor",Johnson,Master.,boy,3.0,1.0,1.0
885,886,0,"Rice, Mrs. William (Margaret Norton)",Rice,Mrs.,woman,5.0,1.0,0.0


In [18]:
#These 74 woman-child-group passengers all survived
train[train.SurnameSurvival==1]

Unnamed: 0,PassengerId,Survived,Name,Surname,Title,Category,SurnameFreq,Group,SurnameSurvival
8,9,1,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",Johnson,Mrs.,woman,3.0,1.0,1.0
10,11,1,"Sandstrom, Miss. Marguerite Rut",Sandstrom,Miss.,woman,2.0,1.0,1.0
39,40,1,"Nicola-Yarred, Miss. Jamila",Nicola-Yarred,Miss.,woman,2.0,1.0,1.0
43,44,1,"Laroche, Miss. Simonne Marie Anne Andree",Laroche,Miss.,woman,2.0,1.0,1.0
52,53,1,"Harper, Mrs. Henry Sleeper (Myna Haxtun)",Harper,Mrs.,woman,2.0,1.0,1.0
...,...,...,...,...,...,...,...,...,...
831,832,1,"Richards, Master. George Sibley",Richards,Master.,boy,3.0,1.0,1.0
856,857,1,"Wick, Mrs. George Dennick (Mary Hitchcock)",Wick,Mrs.,woman,2.0,1.0,1.0
858,859,1,"Baclini, Mrs. Solomon (Latifa Qurban)",Baclini,Mrs.,woman,4.0,1.0,1.0
869,870,1,"Johnson, Master. Harold Theodor",Johnson,Master.,boy,3.0,1.0,1.0


In [19]:
#These 50 woman-child-group passengers all perished
print(len(train[train.SurnameSurvival==0]))
train[train.SurnameSurvival==0]

50


Unnamed: 0,PassengerId,Survived,Name,Surname,Title,Category,SurnameFreq,Group,SurnameSurvival
7,8,0,"Palsson, Master. Gosta Leonard",Palsson,Master.,boy,4.0,1.0,0.0
16,17,0,"Rice, Master. Eugene",Rice,Master.,boy,5.0,1.0,0.0
18,19,0,"Vander Planke, Mrs. Julius (Emelia Maria Vande...",Vander Planke,Mrs.,woman,2.0,1.0,0.0
24,25,0,"Palsson, Miss. Torborg Danira",Palsson,Miss.,woman,4.0,1.0,0.0
38,39,0,"Vander Planke, Miss. Augusta Maria",Vander Planke,Miss.,woman,2.0,1.0,0.0
50,51,0,"Panula, Master. Juha Niilo",Panula,Master.,boy,4.0,1.0,0.0
59,60,0,"Goodwin, Master. William Frederick",Goodwin,Master.,boy,5.0,1.0,0.0
63,64,0,"Skoog, Master. Harald",Skoog,Master.,boy,5.0,1.0,0.0
71,72,0,"Goodwin, Miss. Lillian Amy",Goodwin,Miss.,woman,5.0,1.0,0.0
111,112,0,"Zabour, Miss. Hileni",Zabour,Miss.,woman,2.0,1.0,0.0


In [20]:
#These 18 passengers had mixed survival
train[(train.SurnameSurvival>0) & (train.SurnameSurvival<1)]

Unnamed: 0,PassengerId,Survived,Name,Surname,Title,Category,SurnameFreq,Group,SurnameSurvival
25,26,1,"Asplund, Mrs. Carl Oscar (Selma Augusta Emilia...",Asplund,Mrs.,woman,4.0,1.0,0.75
68,69,1,"Andersson, Miss. Erna Alexandra",Andersson,Miss.,woman,7.0,1.0,0.142857
119,120,0,"Andersson, Miss. Ellis Anna Maria",Andersson,Miss.,woman,7.0,1.0,0.142857
182,183,0,"Asplund, Master. Clarence Gustaf Hugo",Asplund,Master.,boy,4.0,1.0,0.75
233,234,1,"Asplund, Miss. Lillian Gertrud",Asplund,Miss.,woman,4.0,1.0,0.75
261,262,1,"Asplund, Master. Edvin Rojj Felix",Asplund,Master.,boy,4.0,1.0,0.75
297,298,0,"Allison, Miss. Helen Loraine",Allison,Miss.,woman,3.0,1.0,0.333333
305,306,1,"Allison, Master. Hudson Trevor",Allison,Master.,boy,3.0,1.0,0.333333
435,436,1,"Carter, Miss. Lucile Polk",Carter,Miss.,woman,4.0,1.0,0.75
498,499,0,"Allison, Mrs. Hudson J C (Bessie Waldo Daniels)",Allison,Mrs.,woman,3.0,1.0,0.333333


In [21]:
#Apply gender model plus new predictions
train.Predict = 0
train.loc[(train.Category=='woman'), 'Predict'] = 1
train.loc[(train.Category!='woman'), 'Predict'] = 0
train.loc[(train.Category=='boy') & (train.SurnameSurvival>0.5), 'Predict'] = 1
train.loc[(train.Category=='woman') & (train.SurnameSurvival<0.5), 'Predict'] = 0

In [22]:
#How many predictions changed from the gender model? How many of them are correct? 42 out of 43
print(len(train[(train.Category=='woman') & (train['Predict']==0)]))
train[(train.Category=='woman') & (train['Predict']==0)]

43


Unnamed: 0,PassengerId,Survived,Name,Surname,Title,Category,SurnameFreq,Group,SurnameSurvival,Predict
18,19,0,"Vander Planke, Mrs. Julius (Emelia Maria Vande...",Vander Planke,Mrs.,woman,2.0,1.0,0.0,0.0
24,25,0,"Palsson, Miss. Torborg Danira",Palsson,Miss.,woman,4.0,1.0,0.0,0.0
38,39,0,"Vander Planke, Miss. Augusta Maria",Vander Planke,Miss.,woman,2.0,1.0,0.0,0.0
68,69,1,"Andersson, Miss. Erna Alexandra",Andersson,Miss.,woman,7.0,1.0,0.142857,0.0
71,72,0,"Goodwin, Miss. Lillian Amy",Goodwin,Miss.,woman,5.0,1.0,0.0,0.0
111,112,0,"Zabour, Miss. Hileni",Zabour,Miss.,woman,2.0,1.0,0.0,0.0
113,114,0,"Jussila, Miss. Katriina",Jussila,Miss.,woman,2.0,1.0,0.0,0.0
119,120,0,"Andersson, Miss. Ellis Anna Maria",Andersson,Miss.,woman,7.0,1.0,0.142857,0.0
140,141,0,"Boulos, Mrs. Joseph (Sultana)",Boulos,Mrs.,woman,2.0,1.0,0.0,0.0
147,148,0,"Ford, Miss. Robina Maggie ""Ruby""",Ford,Miss.,woman,3.0,1.0,0.0,0.0


In [23]:
#How many predictions changed from the gender model? How many of them are correct? 17 out of 18
print(len(train[(train.Category=='boy') & (train['Predict']==1)]))
train[(train.Category=='boy') & (train['Predict']==1)]

18


Unnamed: 0,PassengerId,Survived,Name,Surname,Title,Category,SurnameFreq,Group,SurnameSurvival,Predict
65,66,1,"Moubarek, Master. Gerios",Moubarek,Master.,boy,2.0,1.0,1.0,1.0
78,79,1,"Caldwell, Master. Alden Gates",Caldwell,Master.,boy,2.0,1.0,1.0,1.0
125,126,1,"Nicola-Yarred, Master. Elias",Nicola-Yarred,Master.,boy,2.0,1.0,1.0,1.0
165,166,1,"Goldsmith, Master. Frank John William ""Frankie""",Goldsmith,Master.,boy,2.0,1.0,1.0,1.0
182,183,0,"Asplund, Master. Clarence Gustaf Hugo",Asplund,Master.,boy,4.0,1.0,0.75,1.0
183,184,1,"Becker, Master. Richard F",Becker,Master.,boy,2.0,1.0,1.0,1.0
193,194,1,"Navratil, Master. Michel M",Navratil,Master.,boy,2.0,1.0,1.0,1.0
261,262,1,"Asplund, Master. Edvin Rojj Felix",Asplund,Master.,boy,4.0,1.0,0.75,1.0
340,341,1,"Navratil, Master. Edmond Roger",Navratil,Master.,boy,2.0,1.0,1.0,1.0
348,349,1,"Coutts, Master. William Loch ""William""",Coutts,Master.,boy,2.0,1.0,1.0,1.0


So Chris Deotte's model performs dope on training data. Let's see if it performs well enough on test dataset.

In [24]:
train.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 10 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   PassengerId      891 non-null    int64  
 1   Survived         891 non-null    int64  
 2   Name             891 non-null    object 
 3   Surname          891 non-null    object 
 4   Title            891 non-null    object 
 5   Category         891 non-null    object 
 6   SurnameFreq      353 non-null    float64
 7   Group            142 non-null    float64
 8   SurnameSurvival  142 non-null    float64
 9   Predict          891 non-null    float64
dtypes: float64(4), int64(2), object(4)
memory usage: 69.7+ KB


In [25]:
test.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 418 entries, 0 to 417
Data columns (total 5 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   PassengerId  418 non-null    int64 
 1   Name         418 non-null    object
 2   Surname      418 non-null    object
 3   Title        418 non-null    object
 4   Category     418 non-null    object
dtypes: int64(1), object(4)
memory usage: 16.5+ KB


In [26]:
combined = pd.concat([train, test], axis=0).reset_index().drop('index', axis=1)

In [27]:
combined.head()

Unnamed: 0,PassengerId,Survived,Name,Surname,Title,Category,SurnameFreq,Group,SurnameSurvival,Predict
0,1,0.0,"Braund, Mr. Owen Harris",Braund,Mr.,man,,,,0.0
1,2,1.0,"Cumings, Mrs. John Bradley (Florence Briggs Th...",Cumings,Mrs.,woman,1.0,,,1.0
2,3,1.0,"Heikkinen, Miss. Laina",Heikkinen,Miss.,woman,1.0,,,1.0
3,4,1.0,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",Futrelle,Mrs.,woman,1.0,,,1.0
4,5,0.0,"Allen, Mr. William Henry",Allen,Mr.,man,,,,0.0


In [28]:
combined.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1309 entries, 0 to 1308
Data columns (total 10 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   PassengerId      1309 non-null   int64  
 1   Survived         891 non-null    float64
 2   Name             1309 non-null   object 
 3   Surname          1309 non-null   object 
 4   Title            1309 non-null   object 
 5   Category         1309 non-null   object 
 6   SurnameFreq      353 non-null    float64
 7   Group            142 non-null    float64
 8   SurnameSurvival  142 non-null    float64
 9   Predict          891 non-null    float64
dtypes: float64(5), int64(1), object(4)
memory usage: 102.4+ KB


In [29]:
#We need to recalculate SurnameFreq, Group and SurnameSurvival
combined = combined.drop(['SurnameFreq', 'Group', 'SurnameSurvival', 'Predict'], axis=1)

In [30]:
combined.groupby('Title')['PassengerId'].count()

Title
Capt.          1
Col.           4
Countess.      1
Don.           1
Dona.          1
Dr.            8
Jonkheer.      1
Lady.          1
Major.         2
Master.       61
Miss.        260
Mlle.          2
Mme.           1
Mr.          757
Mrs.         197
Ms.            2
Rev.           8
Sir.           1
Name: PassengerId, dtype: int64

In [31]:
temp = combined[(combined.Category!='man')]
SurnameDict = temp.groupby('Surname')['PassengerId'].count().to_dict()
print(SurnameDict)

{'Abbott': 2, 'Abelseth': 1, 'Abelson': 1, 'Abrahim': 1, 'Ahlin': 1, 'Aks': 2, 'Allen': 1, 'Allison': 3, 'Andersen-Jensen': 1, 'Andersson': 8, 'Andrews': 1, 'Angle': 1, 'Appleton': 1, 'Arnold-Franchi': 1, 'Asplund': 6, 'Assaf Khalil': 1, 'Astor': 1, 'Attalah': 1, 'Aubart': 1, 'Ayoub': 1, 'Backstrom': 1, 'Baclini': 4, 'Badman': 1, 'Ball': 1, 'Barbara': 2, 'Barber': 1, 'Barry': 1, 'Baxter': 1, 'Bazzani': 1, 'Beane': 1, 'Becker': 4, 'Beckwith': 1, 'Bentham': 1, 'Betros': 1, 'Bidois': 1, 'Bird': 1, 'Bishop': 1, 'Bissette': 1, 'Bonnell': 2, 'Boulos': 3, 'Bourke': 2, 'Bowen': 1, 'Bowerman': 1, 'Bradley': 1, 'Braf': 1, 'Brown': 5, 'Bryhl': 1, 'Buckley': 1, 'Bucknell': 1, 'Burns': 2, 'Buss': 1, 'Bystrom': 1, 'Cacic': 2, 'Caldwell': 2, 'Cameron': 1, 'Canavan': 1, 'Candee': 1, 'Caram': 1, 'Cardeza': 1, 'Carr': 2, 'Carter': 4, 'Cassebeer': 1, 'Cavendish': 1, 'Chaffee': 1, 'Chambers': 1, 'Chapman': 1, 'Chaudanson': 1, 'Cherry': 1, 'Chibnall': 1, 'Christy': 2, 'Clark': 1, 'Clarke': 1, 'Cleaver': 1,

In [32]:
combined.loc[(combined.Category!='man'), 'SurnameFreq']=combined.Surname.map(SurnameDict) #mapping Surname to the dictionary

In [33]:
combined.head()

Unnamed: 0,PassengerId,Survived,Name,Surname,Title,Category,SurnameFreq
0,1,0.0,"Braund, Mr. Owen Harris",Braund,Mr.,man,
1,2,1.0,"Cumings, Mrs. John Bradley (Florence Briggs Th...",Cumings,Mrs.,woman,1.0
2,3,1.0,"Heikkinen, Miss. Laina",Heikkinen,Miss.,woman,1.0
3,4,1.0,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",Futrelle,Mrs.,woman,1.0
4,5,0.0,"Allen, Mr. William Henry",Allen,Mr.,man,


In [34]:
combined[(combined.SurnameFreq>1)]

Unnamed: 0,PassengerId,Survived,Name,Surname,Title,Category,SurnameFreq
7,8,0.0,"Palsson, Master. Gosta Leonard",Palsson,Master.,boy,5.0
8,9,1.0,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",Johnson,Mrs.,woman,3.0
10,11,1.0,"Sandstrom, Miss. Marguerite Rut",Sandstrom,Miss.,woman,3.0
11,12,1.0,"Bonnell, Miss. Elizabeth",Bonnell,Miss.,woman,2.0
16,17,0.0,"Rice, Master. Eugene",Rice,Master.,boy,6.0
...,...,...,...,...,...,...,...
1291,1292,,"Bonnell, Miss. Caroline",Bonnell,Miss.,woman,2.0
1293,1294,,"Gibson, Miss. Dorothy Winifred",Gibson,Miss.,woman,2.0
1300,1301,,"Peacock, Miss. Treasteall",Peacock,Miss.,woman,3.0
1302,1303,,"Minahan, Mrs. William Edward (Lillian E Thorpe)",Minahan,Mrs.,woman,2.0


In [35]:
combined.loc[(combined.SurnameFreq>1), 'Group'] = 1 #identifying these 263 passengers as part of a woman-child-group

In [36]:
tmp = combined[0:891] #creating a temporary dataframe

In [37]:
tmp[(tmp.Group==1)] #180 woman-child-groups in the training dataset

Unnamed: 0,PassengerId,Survived,Name,Surname,Title,Category,SurnameFreq,Group
7,8,0.0,"Palsson, Master. Gosta Leonard",Palsson,Master.,boy,5.0,1.0
8,9,1.0,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",Johnson,Mrs.,woman,3.0,1.0
10,11,1.0,"Sandstrom, Miss. Marguerite Rut",Sandstrom,Miss.,woman,3.0,1.0
11,12,1.0,"Bonnell, Miss. Elizabeth",Bonnell,Miss.,woman,2.0,1.0
16,17,0.0,"Rice, Master. Eugene",Rice,Master.,boy,6.0,1.0
...,...,...,...,...,...,...,...,...
866,867,1.0,"Duran y More, Miss. Asuncion",Duran y More,Miss.,woman,2.0,1.0
869,870,1.0,"Johnson, Master. Harold Theodor",Johnson,Master.,boy,3.0,1.0
885,886,0.0,"Rice, Mrs. William (Margaret Norton)",Rice,Mrs.,woman,6.0,1.0
887,888,1.0,"Graham, Miss. Margaret Edith",Graham,Miss.,woman,2.0,1.0


In [38]:
#We'll use this dictionary to map SurnameSurvival
dct = tmp[(tmp.Group==1)].groupby('Surname')['Survived'].mean().to_dict()
print(type(dct))
print(dct)

<class 'dict'>
{'Abbott': 1.0, 'Aks': 1.0, 'Allison': 0.3333333333333333, 'Andersson': 0.14285714285714285, 'Asplund': 0.75, 'Baclini': 1.0, 'Barbara': 0.0, 'Becker': 1.0, 'Bonnell': 1.0, 'Boulos': 0.0, 'Bourke': 0.0, 'Brown': 1.0, 'Burns': 1.0, 'Cacic': 0.0, 'Caldwell': 1.0, 'Carr': 1.0, 'Carter': 0.75, 'Christy': 1.0, 'Collyer': 1.0, 'Compton': 1.0, 'Connolly': 1.0, 'Coutts': 1.0, 'Crosby': 1.0, 'Danbom': 0.0, 'Davies': 1.0, 'Dean': 1.0, 'Dodge': 1.0, 'Doling': 1.0, 'Drew': 1.0, 'Duran y More': 1.0, 'Fleming': 1.0, 'Ford': 0.0, 'Fortune': 1.0, 'Goldsmith': 1.0, 'Goodwin': 0.0, 'Graham': 1.0, 'Hamalainen': 1.0, 'Harper': 1.0, 'Hart': 1.0, 'Hays': 1.0, 'Herman': 1.0, 'Hippach': 1.0, 'Hirvonen': 1.0, 'Hocking': 1.0, 'Ilmakangas': 0.0, 'Johnson': 1.0, 'Johnston': 0.0, 'Jussila': 0.0, 'Kelly': 1.0, 'Kink-Heilmann': 1.0, 'Laroche': 1.0, 'Lefebre': 0.0, 'Lines': 1.0, 'Mallet': 1.0, 'McCoy': 1.0, 'McGowan': 1.0, 'Mellinger': 1.0, 'Minahan': 1.0, 'Moor': 1.0, 'Moubarek': 1.0, 'Murphy': 1.0, '

In [39]:
combined.loc[(combined.Group==1), 'SurnameSurvival']=combined.Surname.map(dct)

In [40]:
#We could not calculate SurnameSurvival for 13 passengers
combined[(combined.SurnameSurvival.isnull()==False)]

Unnamed: 0,PassengerId,Survived,Name,Surname,Title,Category,SurnameFreq,Group,SurnameSurvival
7,8,0.0,"Palsson, Master. Gosta Leonard",Palsson,Master.,boy,5.0,1.0,0.0
8,9,1.0,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",Johnson,Mrs.,woman,3.0,1.0,1.0
10,11,1.0,"Sandstrom, Miss. Marguerite Rut",Sandstrom,Miss.,woman,3.0,1.0,1.0
11,12,1.0,"Bonnell, Miss. Elizabeth",Bonnell,Miss.,woman,2.0,1.0,1.0
16,17,0.0,"Rice, Master. Eugene",Rice,Master.,boy,6.0,1.0,0.0
...,...,...,...,...,...,...,...,...,...
1283,1284,,"Abbott, Master. Eugene Joseph",Abbott,Master.,boy,2.0,1.0,1.0
1286,1287,,"Smith, Mrs. Lucien Philip (Mary Eloise Hughes)",Smith,Mrs.,woman,2.0,1.0,1.0
1291,1292,,"Bonnell, Miss. Caroline",Bonnell,Miss.,woman,2.0,1.0,1.0
1302,1303,,"Minahan, Mrs. William Edward (Lillian E Thorpe)",Minahan,Mrs.,woman,2.0,1.0,1.0


In [41]:
export = combined[891:1309]

In [42]:
export.info() #We don't have surname survival for 13 passengers

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 418 entries, 891 to 1308
Data columns (total 9 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   PassengerId      418 non-null    int64  
 1   Survived         0 non-null      float64
 2   Name             418 non-null    object 
 3   Surname          418 non-null    object 
 4   Title            418 non-null    object 
 5   Category         418 non-null    object 
 6   SurnameFreq      173 non-null    float64
 7   Group            83 non-null     float64
 8   SurnameSurvival  70 non-null     float64
dtypes: float64(4), int64(1), object(4)
memory usage: 29.5+ KB


In [43]:
export = export.drop('Survived', axis=1) #drop Survived col. for now

In [44]:
export[(export.SurnameSurvival==1)] #These woman-child-group passengers all survived

Unnamed: 0,PassengerId,Name,Surname,Title,Category,SurnameFreq,Group,SurnameSurvival
895,896,"Hirvonen, Mrs. Alexander (Helga E Lindqvist)",Hirvonen,Mrs.,woman,2.0,1.0,1.0
897,898,"Connolly, Miss. Kate",Connolly,Miss.,woman,2.0,1.0,1.0
915,916,"Ryerson, Mrs. Arthur Larned (Emily Maria Borie)",Ryerson,Mrs.,woman,4.0,1.0,1.0
923,924,"Dean, Mrs. Bertram (Eva Georgetta Light)",Dean,Mrs.,woman,3.0,1.0,1.0
940,941,"Coutts, Mrs. William (Winnie Minnie"" Treanor)""",Coutts,Mrs.,woman,3.0,1.0,1.0
943,944,"Hocking, Miss. Ellen Nellie""""",Hocking,Miss.,woman,2.0,1.0,1.0
944,945,"Fortune, Miss. Ethel Flora",Fortune,Miss.,woman,4.0,1.0,1.0
955,956,"Ryerson, Master. John Borie",Ryerson,Master.,boy,4.0,1.0,1.0
957,958,"Burns, Miss. Mary Delia",Burns,Miss.,woman,2.0,1.0,1.0
960,961,"Fortune, Mrs. Mark (Mary McDougald)",Fortune,Mrs.,woman,4.0,1.0,1.0


In [45]:
export[(export.SurnameSurvival==0)] #These woman-child-group passengers all expired 

Unnamed: 0,PassengerId,Name,Surname,Title,Category,SurnameFreq,Group,SurnameSurvival
909,910,"Ilmakangas, Miss. Ida Livija",Ilmakangas,Miss.,woman,2.0,1.0,0.0
924,925,"Johnston, Mrs. Andrew G (Elizabeth Lily"" Watson)""",Johnston,Mrs.,woman,3.0,1.0,0.0
928,929,"Cacic, Miss. Manda",Cacic,Miss.,woman,2.0,1.0,0.0
946,947,"Rice, Master. Albert",Rice,Master.,boy,6.0,1.0,0.0
971,972,"Boulos, Master. Akar",Boulos,Master.,boy,3.0,1.0,0.0
1023,1024,"Lefebre, Mrs. Frank (Frances)",Lefebre,Mrs.,woman,5.0,1.0,0.0
1031,1032,"Goodwin, Miss. Jessie Allis",Goodwin,Miss.,woman,6.0,1.0,0.0
1079,1080,"Sage, Miss. Ada",Sage,Miss.,woman,7.0,1.0,0.0
1092,1093,"Danbom, Master. Gilbert Sigvard Emanuel",Danbom,Master.,boy,2.0,1.0,0.0
1135,1136,"Johnston, Master. William Arthur Willie""""",Johnston,Master.,boy,3.0,1.0,0.0


In [46]:
export[(export.SurnameSurvival>0) & (export.SurnameSurvival<1)] #These groups had mixed survival

Unnamed: 0,PassengerId,Name,Surname,Title,Category,SurnameFreq,Group,SurnameSurvival
1045,1046,"Asplund, Master. Filip Oscar",Asplund,Master.,boy,6.0,1.0,0.75
1105,1106,"Andersson, Miss. Ida Augusta Margareta",Andersson,Miss.,woman,8.0,1.0,0.142857
1270,1271,"Asplund, Master. Carl Edgar",Asplund,Master.,boy,6.0,1.0,0.75


In [47]:
#Apply gender model plus new predictions to test dataset
export.Survived = 0
export.loc[(export.Category=='woman'), 'Survived'] = 1
export.loc[(export.Category!='woman'), 'Survived'] = 0
export.loc[(export.Category=='boy') & (export.SurnameSurvival>0.5), 'Survived'] = 1
export.loc[(export.Category=='woman') & (export.SurnameSurvival<0.5), 'Survived'] = 0

In [48]:
export['Survived'] = export['Survived'].astype(int)

In [49]:
export.head()

Unnamed: 0,PassengerId,Name,Surname,Title,Category,SurnameFreq,Group,SurnameSurvival,Survived
891,892,"Kelly, Mr. James",Kelly,Mr.,man,,,,0
892,893,"Wilkes, Mrs. James (Ellen Needs)",Wilkes,Mrs.,woman,1.0,,,1
893,894,"Myles, Mr. Thomas Francis",Myles,Mr.,man,,,,0
894,895,"Wirz, Mr. Albert",Wirz,Mr.,man,,,,0
895,896,"Hirvonen, Mrs. Alexander (Helga E Lindqvist)",Hirvonen,Mrs.,woman,2.0,1.0,1.0,1


In [50]:
#For some reason, my submission scores slightly below Chris', at .799
submission = export[['PassengerId', 'Survived']]
submission.to_csv('titanic_name_only.csv', index=False)