<h1>
<center>
Module 5: First look at Bias-Variance Tradeoff
</center>
</h1>
<div class=w19>
<p>
There is a classic problem with machine learning models using labeled data (i.e., supervised learning). Typically you are given training data and you build your model from that. Once you have your model, you can release into the wild where it will start working with new data coming from the world. The problem is that your model can suffer from one of two problems: (1) your model is too weak and underfits the training data or (2) your model is too strong and overfits the training data. The former is called *bias*. High bias can cause a model to miss the relevant relations between columns/features and the target output. Think of our prediction tree that was a stump using only `sex_female` as a column. It ignored all other columns. Think of this predictor as highly biased to liking one column. If it was less biased (more egalitarian) it would have included most if not all the columns.
<p>
  But we saw the strange case of adding more columns causing our scores to go down. What's up with that? It is likely a case of overfitting.
The problem is that the model pays too much attention to the nuances of the training data. It can end up modeling the random noise in the training data. This is called high *variance*. Take the Titanic data. What if a model used the `Name` column as a column to use for splitting? What if we one-hot encoded it? Since every name is unique, we would get 891 new columns. If I built a tree using those 891 columns as splitters and a max-depth of 891, I would get 100% accuracy, right? Convince yourself of that.
<p>
Caveat 1: I am claiming that the *raw* `Name` column has extreme variance. That said, I think it could be useful to wrangle the `Name` column a bit to pull out useful info from the raw values. For instance, I see salutations like Master, Reverend, Miss, Honorable, etc. These indeed might carry information. Maybe they identify passengers of "high class" that were let on the lifeboats. We could wrangle a new binary column `upper_class` that is formed by looking for salutations in the Name column.
<p>
Caveat 2: the drawback of using the Titanic data is that it is hard to see beyond the passengers given to us. Our Titanic models will not be released for future use. There will not be another Titanic built. So maybe better to think of the Loan Table. The models you build for it could definitely be released for future use. If good enough, I suppose they could replace human loan-agents in a bank. You can certainly see where bias might creep in here, e.g., only using the `Married` column to make decisions.
<p>
We will study a variety of methods for handling the Bias-Variance tradeoff in the coming weeks. We are looking for a sweet spot where there is not too much underfitting and not too much overfitting. This week we will try a technique called k-fold cross-validation.
<p>
  Go ahead and load things in.
</div>

In [1]:
import pandas as pd

from google.colab import drive
drive.mount('/content/gdrive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=email%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdocs.test%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.photos.readonly%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fpeopleapi.readonly&response_type=code

Enter your authorization code:
··········
Mounted at /content/gdrive


In [2]:
with open('/content/gdrive/My Drive/class_tables/titanic_wrangled_week2.csv', 'r') as f:
  titanic_table = pd.read_csv(f)

#Don't need results table. We will build a new one.

titanic_table.head(2)  #make sure it looks ok - we see the results of our week 2 wrangling

Unnamed: 0,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,...,age_bin,age_Child,age_Adult,age_Senior,sex_female,sex_male,ok_child,pclass_1,pclass_2,pclass_3
0,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,...,Child,1,0,0,0,1,0,0,0,1
1,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,...,Adult,0,1,0,1,0,0,1,0,0


In [0]:
pd.set_option('display.max_columns', None)

In [4]:
!rm library_w19_week4b.py

rm: cannot remove 'library_w19_week4b.py': No such file or directory


In [5]:
from google.colab import files
files.upload()

Saving library_w19_week4b.py to library_w19_week4b.py




In [6]:
from library_w19_week4b import *

%who function

accuracy	 build_pred	 build_tree_iter	 compute_prediction	 f1	 find_best_splitter	 generate_table	 gig	 gini	 
informedness	 path_id	 predictor_case	 probabilities	 reorder_paths	 tree_predictor	 


<h1>Extreme example of overfitting</h1>

Let's look at my idea of using the `Name` column from the titanic table as a splitter. First I will one hot encode the `Name` column.

In [0]:
one_hot_name = pd.get_dummies(titanic_table['Name'],prefix='z',dummy_na=False)  # false because should not have empties
big_table = titanic_table.join(one_hot_name)

In [8]:
big_table.head()  #Yikes - look at all those new columns!

Unnamed: 0,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,no_age,filled_age,emb_C,emb_Q,emb_S,emb_nan,age_bin,age_Child,age_Adult,age_Senior,sex_female,sex_male,ok_child,pclass_1,pclass_2,pclass_3,"z_Abbing, Mr. Anthony","z_Abbott, Mr. Rossmore Edward","z_Abbott, Mrs. Stanton (Rosa Hunt)","z_Abelson, Mr. Samuel","z_Abelson, Mrs. Samuel (Hannah Wizosky)","z_Adahl, Mr. Mauritz Nils Martin","z_Adams, Mr. John","z_Ahlin, Mrs. Johan (Johanna Persdotter Larsson)","z_Aks, Mrs. Sam (Leah Rosen)","z_Albimona, Mr. Nassef Cassem","z_Alexander, Mr. William","z_Alhomaki, Mr. Ilmari Rudolf","z_Ali, Mr. Ahmed","z_Ali, Mr. William","z_Allen, Miss. Elisabeth Walton","z_Allen, Mr. William Henry","z_Allison, Master. Hudson Trevor","z_Allison, Miss. Helen Loraine","z_Allison, Mrs. Hudson J C (Bessie Waldo Daniels)","z_Allum, Mr. Owen George","z_Andersen-Jensen, Miss. Carla Christine Nielsine","z_Anderson, Mr. Harry","z_Andersson, Master. Sigvard Harald Elias","z_Andersson, Miss. Ebba Iris Alfrida","z_Andersson, Miss. Ellis Anna Maria","z_Andersson, Miss. Erna Alexandra","z_Andersson, Miss. Ingeborg Constanzia","z_Andersson, Miss. Sigrid Elisabeth","z_Andersson, Mr. Anders Johan","z_Andersson, Mr. August Edvard (""Wennerstrom"")","z_Andersson, Mrs. Anders Johan (Alfrida Konstantia Brogren)","z_Andreasson, Mr. Paul Edvin","z_Andrew, Mr. Edgardo Samuel","z_Andrews, Miss. Kornelia Theodosia","z_Andrews, Mr. Thomas Jr","z_Angle, Mrs. William A (Florence ""Mary"" Agnes Hughes)","z_Appleton, Mrs. Edward Dale (Charlotte Lamson)","z_Arnold-Franchi, Mr. Josef","z_Arnold-Franchi, Mrs. Josef (Josefine Franchi)","z_Artagaveytia, Mr. Ramon","z_Asim, Mr. Adola","z_Asplund, Master. Clarence Gustaf Hugo","z_Asplund, Master. Edvin Rojj Felix","z_Asplund, Miss. Lillian Gertrud","z_Asplund, Mrs. Carl Oscar (Selma Augusta Emilia Johansson)","z_Astor, Mrs. John Jacob (Madeleine Talmadge Force)","z_Attalah, Miss. Malake","z_Attalah, Mr. Sleiman","z_Aubart, Mme. Leontine Pauline","z_Augustsson, Mr. Albert","z_Ayoub, Miss. Banoura","z_Backstrom, Mr. Karl Alfred","z_Backstrom, Mrs. Karl Alfred (Maria Mathilda Gustafsson)","z_Baclini, Miss. Eugenie","z_Baclini, Miss. Helene Barbara","z_Baclini, Miss. Marie Catherine","z_Baclini, Mrs. Solomon (Latifa Qurban)","z_Badt, Mr. Mohamed","z_Bailey, Mr. Percy Andrew","z_Balkic, Mr. Cerin","z_Ball, Mrs. (Ada E Hall)","z_Banfield, Mr. Frederick James","z_Barah, Mr. Hanna Assi","z_Barbara, Miss. Saiide","z_Barbara, Mrs. (Catherine David)","z_Barber, Miss. Ellen ""Nellie""","z_Barkworth, Mr. Algernon Henry Wilson","z_Barton, Mr. David John","z_Bateman, Rev. Robert James","z_Baumann, Mr. John D","z_Baxter, Mr. Quigg Edmond","z_Baxter, Mrs. James (Helene DeLaudeniere Chaput)","z_Bazzani, Miss. Albina","z_Beane, Mr. Edward","z_Beane, Mrs. Edward (Ethel Clarke)","z_Beavan, Mr. William Thomas","z_Becker, Master. Richard F","z_Becker, Miss. Marion Louise","z_Beckwith, Mr. Richard Leonard","z_Beckwith, Mrs. Richard Leonard (Sallie Monypeny)","z_Beesley, Mr. Lawrence","z_Behr, Mr. Karl Howell","z_Bengtsson, Mr. John Viktor","z_Berglund, Mr. Karl Ivar Sven","z_Berriman, Mr. William John","z_Betros, Mr. Tannous","z_Bidois, Miss. Rosalie","z_Bing, Mr. Lee","z_Birkeland, Mr. Hans Martin Monsen","z_Bishop, Mr. Dickinson H","z_Bishop, Mrs. Dickinson H (Helen Walton)","z_Bissette, Miss. Amelia","z_Bjornstrom-Steffansson, Mr. Mauritz Hakan","z_Blackwell, Mr. Stephen Weart","z_Blank, Mr. Henry","z_Bonnell, Miss. Elizabeth","z_Bostandyeff, Mr. Guentcho","z_Boulos, Miss. Nourelain","z_Boulos, Mr. Hanna","z_Boulos, Mrs. Joseph (Sultana)","z_Bourke, Miss. Mary","z_Bourke, Mr. John","z_Bourke, Mrs. John (Catherine)","z_Bowen, Mr. David John ""Dai""","z_Bowerman, Miss. Elsie Edith","z_Bracken, Mr. James H","z_Bradley, Mr. George (""George Arthur Brayton"")","z_Braund, Mr. Lewis Richard","z_Braund, Mr. Owen Harris","z_Brewe, Dr. Arthur Jackson","z_Brocklebank, Mr. William Alfred","z_Brown, Miss. Amelia ""Mildred""","z_Brown, Mr. Thomas William Solomon","z_Brown, Mrs. James Joseph (Margaret Tobin)","z_Brown, Mrs. Thomas William Solomon (Elizabeth Catherine Ford)","z_Bryhl, Mr. Kurt Arnold Gottfrid","z_Burke, Mr. Jeremiah","z_Burns, Miss. Elizabeth Margaret","z_Buss, Miss. Kate","z_Butler, Mr. Reginald Fenton","z_Butt, Major. Archibald Willingham","z_Byles, Rev. Thomas Roussel Davids","z_Bystrom, Mrs. (Karolina)","z_Cacic, Miss. Marija","z_Cacic, Mr. Luka","z_Cairns, Mr. Alexander","z_Calderhead, Mr. Edward Pennington","z_Caldwell, Master. Alden Gates","z_Caldwell, Mrs. Albert Francis (Sylvia Mae Harbaugh)","z_Calic, Mr. Jovo","z_Calic, Mr. Petar","z_Cameron, Miss. Clear Annie","z_Campbell, Mr. William","z_Canavan, Miss. Mary","z_Cann, Mr. Ernest Charles","z_Caram, Mrs. Joseph (Maria Elias)","z_Carbines, Mr. William","z_Cardeza, Mr. Thomas Drake Martinez","z_Carlsson, Mr. August Sigfrid","z_Carlsson, Mr. Frans Olof","z_Carr, Miss. Helen ""Ellen""","z_Carrau, Mr. Francisco M","z_Carter, Master. William Thornton II","z_Carter, Miss. Lucile Polk","z_Carter, Mr. William Ernest","z_Carter, Mrs. Ernest Courtenay (Lilian Hughes)","z_Carter, Mrs. William Ernest (Lucile Polk)","z_Carter, Rev. Ernest Courtenay","z_Cavendish, Mr. Tyrell William","z_Celotti, Mr. Francesco","z_Chaffee, Mr. Herbert Fuller","z_Chambers, Mr. Norman Campbell","z_Chambers, Mrs. Norman Campbell (Bertha Griggs)","z_Chapman, Mr. Charles Henry","z_Chapman, Mr. John Henry","z_Charters, Mr. David","z_Cherry, Miss. Gladys","z_Chibnall, Mrs. (Edith Martha Bowerman)","z_Chip, Mr. Chang","z_Christmann, Mr. Emil","z_Christy, Miss. Julie Rachel","z_Chronopoulos, Mr. Apostolos","z_Clarke, Mrs. Charles V (Ada Maria Winfield)","z_Cleaver, Miss. Alice","z_Clifford, Mr. George Quincy","z_Coelho, Mr. Domingos Fernandeo","z_Cohen, Mr. Gurshon ""Gus""","z_Coleff, Mr. Peju","z_Coleff, Mr. Satio","z_Coleridge, Mr. Reginald Charles","z_Collander, Mr. Erik Gustaf","z_Colley, Mr. Edward Pomeroy","z_Collyer, Miss. Marjorie ""Lottie""","z_Collyer, Mr. Harvey","z_Collyer, Mrs. Harvey (Charlotte Annie Tate)","z_Compton, Miss. Sara Rebecca","z_Connaghton, Mr. Michael","z_Connolly, Miss. Kate","z_Connors, Mr. Patrick","z_Cook, Mr. Jacob","z_Cor, Mr. Liudevit","z_Corn, Mr. Harry","z_Coutts, Master. Eden Leslie ""Neville""","z_Coutts, Master. William Loch ""William""","z_Coxon, Mr. Daniel","z_Crease, Mr. Ernest James","z_Cribb, Mr. John Hatfield","z_Crosby, Capt. Edward Gifford","z_Crosby, Miss. Harriet R","z_Culumovic, Mr. Jeso","z_Cumings, Mrs. John Bradley (Florence Briggs Thayer)","z_Cunningham, Mr. Alfred Fleming","z_Dahl, Mr. Karl Edwart","z_Dahlberg, Miss. Gerda Ulrika","z_Dakic, Mr. Branko","z_Daly, Mr. Eugene Patrick","z_Daly, Mr. Peter Denis","z_Danbom, Mr. Ernst Gilbert","z_Danbom, Mrs. Ernst Gilbert (Anna Sigrid Maria Brogren)","z_Daniel, Mr. Robert Williams","z_Danoff, Mr. Yoto","z_Dantcheff, Mr. Ristiu","z_Davidson, Mr. Thornton","z_Davies, Master. John Morgan Jr","z_Davies, Mr. Alfred J","z_Davies, Mr. Charles Henry","z_Davis, Miss. Mary","z_Davison, Mrs. Thomas Henry (Mary E Finck)","z_Dean, Master. Bertram Vere","z_Dean, Mr. Bertram Frank","z_Denkoff, Mr. Mitto","z_Dennis, Mr. Samuel","z_Devaney, Miss. Margaret Delia","z_Dick, Mr. Albert Adrian","z_Dick, Mrs. Albert Adrian (Vera Gillespie)","z_Dimic, Mr. Jovan","z_Dodge, Master. Washington","z_Doharr, Mr. Tannous","z_Doling, Miss. Elsie","z_Doling, Mrs. John T (Ada Julia Bone)","z_Dooley, Mr. Patrick","z_Dorking, Mr. Edward Arthur","z_Douglas, Mr. Walter Donald","z_Dowdell, Miss. Elizabeth","z_Downton, Mr. William James","z_Drazenoic, Mr. Jozef","z_Drew, Mrs. James Vivian (Lulu Thorne Christian)","z_Duane, Mr. Frank","z_Duff Gordon, Lady. (Lucille Christiana Sutherland) (""Mrs Morgan"")","z_Duff Gordon, Sir. Cosmo Edmund (""Mr Morgan"")","z_Duran y More, Miss. Asuncion","z_Edvardsson, Mr. Gustaf Hjalmar","z_Eitemiller, Mr. George Floyd","z_Eklund, Mr. Hans Linus","z_Ekstrom, Mr. Johan","z_Elias, Mr. Dibo","z_Elias, Mr. Joseph Jr","z_Elias, Mr. Tannous","z_Elsbury, Mr. William James","z_Emanuel, Miss. Virginia Ethel","z_Emir, Mr. Farred Chehab","z_Endres, Miss. Caroline Louise","z_Eustis, Miss. Elizabeth Mussey","z_Fahlstrom, Mr. Arne Jonas","z_Farrell, Mr. James","z_Farthing, Mr. John","z_Faunthorpe, Mrs. Lizzie (Elizabeth Anne Wilkinson)","z_Fischer, Mr. Eberhard Thelander","z_Fleming, Miss. Margaret","z_Flynn, Mr. James","z_Flynn, Mr. John","z_Flynn, Mr. John Irwin (""Irving"")","z_Foo, Mr. Choong","z_Ford, Miss. Doolina Margaret ""Daisy""","z_Ford, Miss. Robina Maggie ""Ruby""","z_Ford, Mr. William Neal","z_Ford, Mrs. Edward (Margaret Ann Watson)","z_Foreman, Mr. Benjamin Laventall","z_Fortune, Miss. Alice Elizabeth","z_Fortune, Miss. Mabel Helen","z_Fortune, Mr. Charles Alexander","z_Fortune, Mr. Mark","z_Fox, Mr. Stanley Hubert","z_Francatelli, Miss. Laura Mabel","z_Frauenthal, Dr. Henry William","z_Frauenthal, Mrs. Henry William (Clara Heinsheimer)","z_Frolicher, Miss. Hedwig Margaritha","z_Frolicher-Stehli, Mr. Maxmillian","z_Frost, Mr. Anthony Wood ""Archie""","z_Fry, Mr. Richard","z_Funk, Miss. Annie Clemmer","z_Futrelle, Mr. Jacques Heath","z_Futrelle, Mrs. Jacques Heath (Lily May Peel)","z_Fynney, Mr. Joseph J","z_Gale, Mr. Shadrach","z_Gallagher, Mr. Martin","z_Garfirth, Mr. John","z_Garside, Miss. Ethel","z_Gaskell, Mr. Alfred","z_Gavey, Mr. Lawrence","z_Gee, Mr. Arthur H","z_Gheorgheff, Mr. Stanio","z_Giglio, Mr. Victor","z_Giles, Mr. Frederick Edward","z_Gilinski, Mr. Eliezer","z_Gill, Mr. John William","z_Gillespie, Mr. William Henry","z_Gilnagh, Miss. Katherine ""Katie""","z_Givard, Mr. Hans Kristensen","z_Glynn, Miss. Mary Agatha","z_Goldenberg, Mr. Samuel L","z_Goldenberg, Mrs. Samuel L (Edwiga Grabowska)","z_Goldschmidt, Mr. George B","z_Goldsmith, Master. Frank John William ""Frankie""","z_Goldsmith, Mr. Frank John","z_Goldsmith, Mrs. Frank John (Emily Alice Brown)","z_Goncalves, Mr. Manuel Estanslas","z_Goodwin, Master. Harold Victor","z_Goodwin, Master. Sidney Leonard","z_Goodwin, Master. William Frederick","z_Goodwin, Miss. Lillian Amy","z_Goodwin, Mr. Charles Edward","z_Goodwin, Mrs. Frederick (Augusta Tyler)","z_Graham, Miss. Margaret Edith","z_Graham, Mr. George Edward","z_Graham, Mrs. William Thompson (Edith Junkins)","z_Green, Mr. George Henry","z_Greenberg, Mr. Samuel","z_Greenfield, Mr. William Bertram","z_Gronnestad, Mr. Daniel Danielsen","z_Guggenheim, Mr. Benjamin","z_Gustafsson, Mr. Alfred Ossian","z_Gustafsson, Mr. Anders Vilhelm","z_Gustafsson, Mr. Johan Birger","z_Gustafsson, Mr. Karl Gideon","z_Haas, Miss. Aloisia","z_Hagland, Mr. Ingvald Olai Olsen","z_Hagland, Mr. Konrad Mathias Reiersen","z_Hakkarainen, Mr. Pekka Pietari","z_Hakkarainen, Mrs. Pekka Pietari (Elin Matilda Dolck)","z_Hale, Mr. Reginald","z_Hamalainen, Master. Viljo","z_Hamalainen, Mrs. William (Anna)","z_Hampe, Mr. Leon","z_Hanna, Mr. Mansour","z_Hansen, Mr. Claus Peter","z_Hansen, Mr. Henrik Juul","z_Hansen, Mr. Henry Damsgaard","z_Harder, Mr. George Achilles","z_Harknett, Miss. Alice Phoebe","z_Harmer, Mr. Abraham (David Lishin)","z_Harper, Miss. Annie Jessie ""Nina""","z_Harper, Mr. Henry Sleeper","z_Harper, Mrs. Henry Sleeper (Myna Haxtun)","z_Harper, Rev. John","z_Harrington, Mr. Charles H","z_Harris, Mr. George","z_Harris, Mr. Henry Birkhardt","z_Harris, Mr. Walter","z_Harris, Mrs. Henry Birkhardt (Irene Wallach)","z_Harrison, Mr. William","z_Hart, Miss. Eva Miriam","z_Hart, Mr. Benjamin","z_Hart, Mr. Henry","z_Hart, Mrs. Benjamin (Esther Ada Bloomfield)","z_Hassab, Mr. Hammad","z_Hassan, Mr. Houssein G N","z_Hawksford, Mr. Walter James","z_Hays, Miss. Margaret Bechstein","z_Hays, Mrs. Charles Melville (Clara Jennings Gregg)","z_Healy, Miss. Hanora ""Nora""","z_Hedman, Mr. Oskar Arvid","z_Hegarty, Miss. Hanora ""Nora""","z_Heikkinen, Miss. Laina","z_Heininen, Miss. Wendla Maria","z_Hendekovic, Mr. Ignjac","z_Henry, Miss. Delia","z_Herman, Miss. Alice","z_Herman, Mrs. Samuel (Jane Laver)","z_Hewlett, Mrs. (Mary D Kingcome)","z_Hickman, Mr. Leonard Mark","z_Hickman, Mr. Lewis","z_Hickman, Mr. Stanley George","z_Hippach, Miss. Jean Gertrude","z_Hippach, Mrs. Louis Albert (Ida Sophia Fischer)","z_Hirvonen, Miss. Hildur E","z_Hocking, Mr. Richard George","z_Hocking, Mrs. Elizabeth (Eliza Needs)","z_Hodges, Mr. Henry Price","z_Hogeboom, Mrs. John C (Anna Andrews)","z_Hold, Mr. Stephen","z_Holm, Mr. John Fredrik Alexander","z_Holverson, Mr. Alexander Oskar","z_Holverson, Mrs. Alexander Oskar (Mary Aline Towner)","z_Homer, Mr. Harry (""Mr E Haven"")","z_Honkanen, Miss. Eliina","z_Hood, Mr. Ambrose Jr","z_Horgan, Mr. John","z_Hosono, Mr. Masabumi","z_Hoyt, Mr. Frederick Maxfield","z_Hoyt, Mr. William Fisher","z_Hoyt, Mrs. Frederick Maxfield (Jane Anne Forby)","z_Humblen, Mr. Adolf Mathias Nicolai Olsen","z_Hunt, Mr. George Henry","z_Ibrahim Shawah, Mr. Yousseff","z_Icard, Miss. Amelie","z_Ilett, Miss. Bertha","z_Ilmakangas, Miss. Pieta Sofia","z_Isham, Miss. Ann Elizabeth","z_Ivanoff, Mr. Kanio","z_Jacobsohn, Mr. Sidney Samuel","z_Jacobsohn, Mrs. Sidney Samuel (Amy Frances Christy)","z_Jalsevac, Mr. Ivan","z_Jansson, Mr. Carl Olof","z_Jardin, Mr. Jose Neto","z_Jarvis, Mr. John Denzil","z_Jenkin, Mr. Stephen Curnow","z_Jensen, Mr. Hans Peder","z_Jensen, Mr. Niels Peder","z_Jensen, Mr. Svend Lauritz","z_Jermyn, Miss. Annie","z_Jerwan, Mrs. Amin S (Marie Marthe Thuillard)","z_Johannesen-Bratthammer, Mr. Bernt","z_Johanson, Mr. Jakob Alfred","z_Johansson, Mr. Erik","z_Johansson, Mr. Gustaf Joel","z_Johansson, Mr. Karl Johan","z_Johnson, Master. Harold Theodor","z_Johnson, Miss. Eleanor Ileen","z_Johnson, Mr. Alfred","z_Johnson, Mr. Malkolm Joackim","z_Johnson, Mr. William Cahoone Jr","z_Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)","z_Johnston, Miss. Catherine Helen ""Carrie""","z_Johnston, Mr. Andrew G","z_Jonkoff, Mr. Lalio","z_Jonsson, Mr. Carl","z_Jussila, Miss. Katriina","z_Jussila, Miss. Mari Aina","z_Jussila, Mr. Eiriik","z_Kallio, Mr. Nikolai Erland","z_Kalvik, Mr. Johannes Halvorsen","z_Kantor, Mr. Sinai","z_Kantor, Mrs. Sinai (Miriam Sternin)","z_Karaic, Mr. Milan","z_Karlsson, Mr. Nils August","z_Karun, Miss. Manca","z_Kassem, Mr. Fared","z_Keane, Miss. Nora A","z_Keane, Mr. Andrew ""Andy""","z_Keefe, Mr. Arthur","z_Kelly, Miss. Anna Katherine ""Annie Kate""","z_Kelly, Miss. Mary","z_Kelly, Mr. James","z_Kelly, Mrs. Florence ""Fannie""","z_Kent, Mr. Edward Austin","z_Kenyon, Mrs. Frederick R (Marion)","z_Kiernan, Mr. Philip","z_Kilgannon, Mr. Thomas J","z_Kimball, Mr. Edwin Nelson Jr","z_Kink, Mr. Vincenz","z_Kink-Heilmann, Miss. Luise Gretchen","z_Kirkland, Rev. Charles Leonard","z_Klaber, Mr. Herman","z_Klasen, Mr. Klas Albin","z_Knight, Mr. Robert J","z_Kraeff, Mr. Theodor","z_Kvillner, Mr. Johan Henrik Johannesson","z_Lahoud, Mr. Sarkis","z_Lahtinen, Mrs. William (Anna Sylfven)","z_Laitinen, Miss. Kristina Sofia","z_Laleff, Mr. Kristo","z_Lam, Mr. Ali","z_Lam, Mr. Len","z_Landergren, Miss. Aurora Adelia","z_Lang, Mr. Fang","z_Laroche, Miss. Simonne Marie Anne Andree","z_Laroche, Mr. Joseph Philippe Lemercier","z_Laroche, Mrs. Joseph (Juliette Marie Louise Lafargue)","z_Larsson, Mr. August Viktor","z_Larsson, Mr. Bengt Edvin","z_LeRoy, Miss. Bertha","z_Leader, Dr. Alice (Farnham)","z_Leeni, Mr. Fahim (""Philip Zenni"")","z_Lefebre, Master. Henry Forbes","z_Lefebre, Miss. Ida","z_Lefebre, Miss. Jeannie","z_Lefebre, Miss. Mathilde","z_Lehmann, Miss. Bertha","z_Leinonen, Mr. Antti Gustaf","z_Leitch, Miss. Jessie Wills","z_Lemberopolous, Mr. Peter L","z_Lemore, Mrs. (Amelia Milley)","z_Lennon, Mr. Denis","z_Leonard, Mr. Lionel","z_Lester, Mr. James","z_Lesurer, Mr. Gustave J","z_Levy, Mr. Rene Jacques","z_Lewy, Mr. Ervin G","z_Leyson, Mr. Robert William Norman","z_Lievens, Mr. Rene Aime","z_Lindahl, Miss. Agda Thorilda Viktoria","z_Lindblom, Miss. Augusta Charlotta","z_Lindell, Mr. Edvard Bengtsson","z_Lindqvist, Mr. Eino William","z_Lines, Miss. Mary Conover","z_Ling, Mr. Lee","z_Lobb, Mr. William Arthur","z_Lobb, Mrs. William Arthur (Cordelia K Stanlick)","z_Long, Mr. Milton Clyde","z_Longley, Miss. Gretchen Fiske","z_Louch, Mrs. Charles Alexander (Alice Adelaide Slow)","z_Lovell, Mr. John Hall (""Henry"")","z_Lulic, Mr. Nikola","z_Lundahl, Mr. Johan Svensson","z_Lurette, Miss. Elise","z_Mack, Mrs. (Mary)","z_Madigan, Miss. Margaret ""Maggie""","z_Madill, Miss. Georgette Alexandra","z_Madsen, Mr. Fridtjof Arne","z_Maenpaa, Mr. Matti Alexanteri","z_Maioni, Miss. Roberta","z_Maisner, Mr. Simon","z_Mallet, Master. Andre","z_Mallet, Mr. Albert","z_Mamee, Mr. Hanna","z_Mangan, Miss. Mary","z_Mannion, Miss. Margareth","z_Marechal, Mr. Pierre","z_Markoff, Mr. Marin","z_Markun, Mr. Johann","z_Marvin, Mr. Daniel Warner","z_Masselmani, Mrs. Fatima","z_Matthews, Mr. William John","z_Mayne, Mlle. Berthe Antonine (""Mrs de Villiers"")","z_McCarthy, Mr. Timothy J","z_McCormack, Mr. Thomas Joseph","z_McCoy, Miss. Agnes","z_McCoy, Mr. Bernard","z_McDermott, Miss. Brigdet Delia","z_McEvoy, Mr. Michael","z_McGough, Mr. James Robert","z_McGovern, Miss. Mary","z_McGowan, Miss. Anna ""Annie""","z_McKane, Mr. Peter David","z_McMahon, Mr. Martin","z_McNamee, Mr. Neal","z_Meanwell, Miss. (Marion Ogden)","z_Meek, Mrs. Thomas (Annie Louise Rowley)","z_Mellinger, Miss. Madeleine Violet","z_Mellinger, Mrs. (Elizabeth Anne Maidment)","z_Mellors, Mr. William John","z_Meo, Mr. Alfonzo","z_Mernagh, Mr. Robert","z_Meyer, Mr. August","z_Meyer, Mr. Edgar Joseph","z_Meyer, Mrs. Edgar Joseph (Leila Saks)","z_Millet, Mr. Francis Davis","z_Milling, Mr. Jacob Christian","z_Minahan, Dr. William Edward","z_Minahan, Miss. Daisy E","z_Mineff, Mr. Ivan","z_Mionoff, Mr. Stoytcho","z_Mitchell, Mr. Henry Michael","z_Mitkoff, Mr. Mito","z_Mockler, Miss. Helen Mary ""Ellie""","z_Moen, Mr. Sigurd Hansen","z_Molson, Mr. Harry Markland","z_Montvila, Rev. Juozas","z_Moor, Master. Meier","z_Moor, Mrs. (Beila)","z_Moore, Mr. Leonard Charles","z_Moran, Miss. Bertha","z_Moran, Mr. Daniel J","z_Moran, Mr. James","z_Moraweck, Dr. Ernest","z_Morley, Mr. Henry Samuel (""Mr Henry Marshall"")","z_Morley, Mr. William","z_Morrow, Mr. Thomas Rowan","z_Moss, Mr. Albert Johan","z_Moubarek, Master. Gerios","z_Moubarek, Master. Halim Gonios (""William George"")","z_Moussa, Mrs. (Mantoura Boulos)","z_Moutal, Mr. Rahamin Haim","z_Mudd, Mr. Thomas Charles","z_Mullens, Miss. Katherine ""Katie""","z_Murdlin, Mr. Joseph","z_Murphy, Miss. Katherine ""Kate""","z_Murphy, Miss. Margaret Jane","z_Myhrman, Mr. Pehr Fabian Oliver Malkolm","z_Naidenoff, Mr. Penko","z_Najib, Miss. Adele Kiamie ""Jane""","z_Nakid, Miss. Maria (""Mary"")","z_Nakid, Mr. Sahid","z_Nankoff, Mr. Minko","z_Nasser, Mr. Nicholas","z_Nasser, Mrs. Nicholas (Adele Achem)","z_Natsch, Mr. Charles H","z_Navratil, Master. Edmond Roger","z_Navratil, Master. Michel M","z_Navratil, Mr. Michel (""Louis M Hoffman"")","z_Nenkoff, Mr. Christo","z_Newell, Miss. Madeleine","z_Newell, Miss. Marjorie","z_Newell, Mr. Arthur Webster","z_Newsom, Miss. Helen Monypeny","z_Nicholls, Mr. Joseph Charles","z_Nicholson, Mr. Arthur Ernest","z_Nicola-Yarred, Master. Elias","z_Nicola-Yarred, Miss. Jamila","z_Nilsson, Miss. Helmina Josefina","z_Nirva, Mr. Iisakki Antino Aijo","z_Niskanen, Mr. Juha","z_Norman, Mr. Robert Douglas","z_Nosworthy, Mr. Richard Cater","z_Novel, Mr. Mansouer","z_Nye, Mrs. (Elizabeth Ramell)","z_Nysten, Miss. Anna Sofia","z_Nysveen, Mr. Johan Hansen","z_O'Brien, Mr. Thomas","z_O'Brien, Mr. Timothy","z_O'Brien, Mrs. Thomas (Johanna ""Hannah"" Godfrey)","z_O'Connell, Mr. Patrick D","z_O'Connor, Mr. Maurice","z_O'Driscoll, Miss. Bridget","z_O'Dwyer, Miss. Ellen ""Nellie""","z_O'Leary, Miss. Hanora ""Norah""","z_O'Sullivan, Miss. Bridget Mary","z_Odahl, Mr. Nils Martin","z_Ohman, Miss. Velin","z_Olsen, Mr. Henry Margido","z_Olsen, Mr. Karl Siegwart Andreas","z_Olsen, Mr. Ole Martin","z_Olsson, Miss. Elina","z_Olsson, Mr. Nils Johan Goransson","z_Olsvigen, Mr. Thor Anderson","z_Oreskovic, Miss. Marija","z_Oreskovic, Mr. Luka","z_Osen, Mr. Olaf Elon","z_Osman, Mrs. Mara","z_Ostby, Mr. Engelhart Cornelius","z_Otter, Mr. Richard","z_Padro y Manent, Mr. Julian","z_Pain, Dr. Alfred","z_Palsson, Master. Gosta Leonard","z_Palsson, Miss. Stina Viola","z_Palsson, Miss. Torborg Danira","z_Palsson, Mrs. Nils (Alma Cornelia Berglund)","z_Panula, Master. Eino Viljami","z_Panula, Master. Juha Niilo","z_Panula, Master. Urho Abraham","z_Panula, Mr. Ernesti Arvid","z_Panula, Mr. Jaako Arnold","z_Panula, Mrs. Juha (Maria Emilia Ojala)","z_Parkes, Mr. Francis ""Frank""","z_Parr, Mr. William Henry Marsh","z_Parrish, Mrs. (Lutie Davis)","z_Partner, Mr. Austen","z_Pasic, Mr. Jakob","z_Patchett, Mr. George","z_Paulner, Mr. Uscher","z_Pavlovic, Mr. Stefo","z_Pears, Mr. Thomas Clinton","z_Pears, Mrs. Thomas (Edith Wearne)","z_Peduzzi, Mr. Joseph","z_Pekoniemi, Mr. Edvard","z_Penasco y Castellana, Mr. Victor de Satode","z_Penasco y Castellana, Mrs. Victor de Satode (Maria Josefa Perez de Soto y Vallejo)","z_Pengelly, Mr. Frederick William","z_Perkin, Mr. John Henry","z_Pernot, Mr. Rene","z_Perreault, Miss. Anne","z_Persson, Mr. Ernst Ulrik","z_Peter, Miss. Anna","z_Peter, Mrs. Catherine (Catherine Rizk)","z_Peters, Miss. Katie","z_Petranec, Miss. Matilda","z_Petroff, Mr. Nedelio","z_Petroff, Mr. Pastcho (""Pentcho"")","z_Petterson, Mr. Johan Emil","z_Pettersson, Miss. Ellen Natalia","z_Peuchen, Major. Arthur Godfrey","z_Phillips, Miss. Kate Florence (""Mrs Kate Louise Phillips Marshall"")","z_Pickard, Mr. Berk (Berk Trembisky)","z_Pinsky, Mrs. (Rosa)","z_Plotcharsky, Mr. Vasil","z_Ponesell, Mr. Martin","z_Porter, Mr. Walter Chamberlain","z_Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)","z_Quick, Miss. Phyllis May","z_Quick, Mrs. Frederick Charles (Jane Richards)","z_Radeff, Mr. Alexander","z_Razi, Mr. Raihed","z_Reed, Mr. James George","z_Reeves, Mr. David","z_Rekic, Mr. Tido","z_Renouf, Mr. Peter Henry","z_Renouf, Mrs. Peter Henry (Lillian Jefferys)","z_Reuchlin, Jonkheer. John George","z_Reynaldo, Ms. Encarnacion","z_Rice, Master. Arthur","z_Rice, Master. Eric","z_Rice, Master. Eugene","z_Rice, Master. George Hugh","z_Rice, Mrs. William (Margaret Norton)","z_Richard, Mr. Emile","z_Richards, Master. George Sibley","z_Richards, Master. William Rowe","z_Richards, Mrs. Sidney (Emily Hocking)","z_Ridsdale, Miss. Lucy","z_Ringhini, Mr. Sante","z_Rintamaki, Mr. Matti","z_Risien, Mr. Samuel Beard","z_Robbins, Mr. Victor","z_Robert, Mrs. Edward Scott (Elisabeth Walton McMillan)","z_Robins, Mrs. Alexander A (Grace Charity Laury)","z_Roebling, Mr. Washington Augustus II","z_Rogers, Mr. William John","z_Romaine, Mr. Charles Hallace (""Mr C Rolmane"")","z_Rommetvedt, Mr. Knud Paust","z_Rood, Mr. Hugh Roscoe","z_Rosblom, Mr. Viktor Richard","z_Rosblom, Mrs. Viktor (Helena Wilhelmina)","z_Ross, Mr. John Hugo","z_Rothes, the Countess. of (Lucy Noel Martha Dyer-Edwards)","z_Rothschild, Mrs. Martin (Elizabeth L. Barrett)","z_Rouse, Mr. Richard Henry","z_Rugg, Miss. Emily","z_Rush, Mr. Alfred George John","z_Ryan, Mr. Patrick","z_Ryerson, Miss. Emily Borie","z_Ryerson, Miss. Susan Parker ""Suzette""","z_Saad, Mr. Amin","z_Saad, Mr. Khalil","z_Saalfeld, Mr. Adolphe","z_Sadlier, Mr. Matthew","z_Sage, Master. Thomas Henry","z_Sage, Miss. Constance Gladys","z_Sage, Miss. Dorothy Edith ""Dolly""","z_Sage, Miss. Stella Anna","z_Sage, Mr. Douglas Bullen","z_Sage, Mr. Frederick","z_Sage, Mr. George John Jr","z_Sagesser, Mlle. Emma","z_Salkjelsvik, Miss. Anna Kristine","z_Salonen, Mr. Johan Werner","z_Samaan, Mr. Youssef","z_Sandstrom, Miss. Marguerite Rut","z_Sandstrom, Mrs. Hjalmar (Agnes Charlotta Bengtsson)","z_Saundercock, Mr. William Henry","z_Sawyer, Mr. Frederick Charles","z_Scanlan, Mr. James","z_Sdycoff, Mr. Todor","z_Sedgwick, Mr. Charles Frederick Waddington","z_Serepeca, Miss. Augusta","z_Seward, Mr. Frederic Kimber","z_Sharp, Mr. Percival James R","z_Sheerlinck, Mr. Jan Baptist","z_Shellard, Mr. Frederick William","z_Shelley, Mrs. William (Imanita Parrish Hall)","z_Shorney, Mr. Charles Joseph","z_Shutes, Miss. Elizabeth W","z_Silven, Miss. Lyyli Karoliina","z_Silverthorne, Mr. Spencer Victor","z_Silvey, Mr. William Baird","z_Silvey, Mrs. William Baird (Alice Munger)","z_Simmons, Mr. John","z_Simonius-Blumer, Col. Oberst Alfons","z_Sinkkonen, Miss. Anna","z_Sirayanian, Mr. Orsen","z_Sirota, Mr. Maurice","z_Sivic, Mr. Husein","z_Sivola, Mr. Antti Wilhelm","z_Sjoblom, Miss. Anna Sofia","z_Sjostedt, Mr. Ernst Adolf","z_Skoog, Master. Harald","z_Skoog, Master. Karl Thorsten","z_Skoog, Miss. Mabel","z_Skoog, Miss. Margit Elizabeth","z_Skoog, Mr. Wilhelm","z_Skoog, Mrs. William (Anna Bernhardina Karlsson)","z_Slabenoff, Mr. Petco","z_Slayter, Miss. Hilda Mary","z_Slemen, Mr. Richard James","z_Slocovski, Mr. Selman Francis","z_Sloper, Mr. William Thompson","z_Smart, Mr. John Montgomery","z_Smiljanic, Mr. Mile","z_Smith, Miss. Marion Elsie","z_Smith, Mr. James Clinch","z_Smith, Mr. Richard William","z_Smith, Mr. Thomas","z_Sobey, Mr. Samuel James Hayden","z_Soholt, Mr. Peter Andreas Lauritz Andersen","z_Somerton, Mr. Francis William","z_Spedden, Mrs. Frederic Oakley (Margaretta Corning Stone)","z_Spencer, Mrs. William Augustus (Marie Eugenie)","z_Stahelin-Maeglin, Dr. Max","z_Staneff, Mr. Ivan","z_Stankovic, Mr. Ivan","z_Stanley, Miss. Amy Zillah Elsie","z_Stanley, Mr. Edward Roland","z_Stead, Mr. William Thomas","z_Stephenson, Mrs. Walter Bertram (Martha Eustis)","z_Stewart, Mr. Albert A","z_Stone, Mrs. George Nelson (Martha Evelyn)","z_Stoytcheff, Mr. Ilia","z_Strandberg, Miss. Ida Sofia","z_Stranden, Mr. Juho","z_Strom, Miss. Telma Matilda","z_Strom, Mrs. Wilhelm (Elna Matilda Persson)","z_Sunderland, Mr. Victor Francis","z_Sundman, Mr. Johan Julian","z_Sutehall, Mr. Henry Jr","z_Sutton, Mr. Frederick","z_Svensson, Mr. Johan","z_Svensson, Mr. Olof","z_Swift, Mrs. Frederick Joel (Margaret Welles Barron)","z_Taussig, Miss. Ruth","z_Taussig, Mr. Emil","z_Taussig, Mrs. Emil (Tillie Mandelbaum)","z_Taylor, Mr. Elmer Zebley","z_Taylor, Mrs. Elmer Zebley (Juliet Cummins Wright)","z_Thayer, Mr. John Borland","z_Thayer, Mr. John Borland Jr","z_Thayer, Mrs. John Borland (Marian Longstreth Morris)","z_Theobald, Mr. Thomas Leonard","z_Thomas, Master. Assad Alexander","z_Thorne, Mrs. Gertrude Maybelle","z_Thorneycroft, Mr. Percival","z_Thorneycroft, Mrs. Percival (Florence Kate White)","z_Tikkanen, Mr. Juho","z_Tobin, Mr. Roger","z_Todoroff, Mr. Lalio","z_Tomlin, Mr. Ernest Portage","z_Toomey, Miss. Ellen","z_Torber, Mr. Ernst William","z_Tornquist, Mr. William Henry","z_Toufik, Mr. Nakli","z_Touma, Mrs. Darwis (Hanne Youssef Razi)","z_Troupiansky, Mr. Moses Aaron","z_Trout, Mrs. William H (Jessie L)","z_Troutt, Miss. Edwina Celia ""Winnie""","z_Turcin, Mr. Stjepan","z_Turja, Miss. Anna Sofia","z_Turkula, Mrs. (Hedwig)","z_Turpin, Mr. William John Robert","z_Turpin, Mrs. William John Robert (Dorothy Ann Wonnacott)","z_Uruchurtu, Don. Manuel E","z_Van Impe, Miss. Catharina","z_Van Impe, Mr. Jean Baptiste","z_Van Impe, Mrs. Jean Baptiste (Rosalie Paula Govaert)","z_Van der hoef, Mr. Wyckoff","z_Vande Velde, Mr. Johannes Joseph","z_Vande Walle, Mr. Nestor Cyriel","z_Vanden Steen, Mr. Leo Peter","z_Vander Cruyssen, Mr. Victor","z_Vander Planke, Miss. Augusta Maria","z_Vander Planke, Mr. Leo Edmondus","z_Vander Planke, Mrs. Julius (Emelia Maria Vandemoortele)","z_Vestrom, Miss. Hulda Amanda Adolfina","z_Vovk, Mr. Janko","z_Waelens, Mr. Achille","z_Walker, Mr. William Anderson","z_Ward, Miss. Anna","z_Warren, Mrs. Frank Manley (Anna Sophia Atkinson)","z_Watson, Mr. Ennis Hastings","z_Watt, Mrs. James (Elizabeth ""Bessie"" Inglis Milne)","z_Webber, Miss. Susan","z_Webber, Mr. James","z_Weir, Col. John","z_Weisz, Mrs. Leopold (Mathilde Francoise Pede)","z_Wells, Miss. Joan","z_West, Miss. Constance Mirium","z_West, Mr. Edwy Arthur","z_West, Mrs. Edwy Arthur (Ada Mary Worth)","z_Wheadon, Mr. Edward H","z_White, Mr. Percival Wayland","z_White, Mr. Richard Frasar","z_Wick, Miss. Mary Natalie","z_Wick, Mrs. George Dennick (Mary Hitchcock)","z_Widegren, Mr. Carl/Charles Peter","z_Widener, Mr. Harry Elkins","z_Wiklund, Mr. Jakob Alfred","z_Wilhelms, Mr. Charles","z_Willey, Mr. Edward","z_Williams, Mr. Charles Duane","z_Williams, Mr. Charles Eugene","z_Williams, Mr. Howard Hugh ""Harry""","z_Williams, Mr. Leslie","z_Williams-Lambert, Mr. Fletcher Fellows","z_Windelov, Mr. Einar","z_Wiseman, Mr. Phillippe","z_Woolner, Mr. Hugh","z_Wright, Mr. George","z_Yasbeck, Mr. Antoni","z_Yasbeck, Mrs. Antoni (Selini Alexander)","z_Young, Miss. Marie Grice","z_Youseff, Mr. Gerious","z_Yousif, Mr. Wazli","z_Yousseff, Mr. Gerious","z_Yrois, Miss. Henriette (""Mrs Harbeck"")","z_Zabour, Miss. Hileni","z_Zabour, Miss. Thamine","z_Zimmerman, Mr. Leo","z_de Messemaeker, Mrs. Guillaume Joseph (Emma)","z_de Mulder, Mr. Theodore","z_de Pelsmaeker, Mr. Alfons","z_del Carlo, Mr. Sebastiano","z_van Billiard, Mr. Austin Blyler","z_van Melkebeke, Mr. Philemon"
0,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S,0,22.0,0,0,1,0,Child,1,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C,0,38.0,1,0,0,0,Adult,0,1,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S,0,26.0,0,0,1,0,Child,1,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S,0,35.0,0,0,1,0,Adult,0,1,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S,0,35.0,0,0,1,0,Adult,0,1,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


<h2>We just added 891 columns!</h2>

In essence, we have a column for every row. If we build our tree correctly, we should get 100% accuracy. Let's build the tree. I only have to use the 891 columns as splitters. I added a `z` prefix to make them easy to find.

In [9]:
splitter_columns = []
for col in big_table.columns.values:
  if 'z_' in col:
    splitter_columns.append(col)
    
len(splitter_columns)

891

In [10]:
splitter_columns[:5]  #looks like pandas alphabetized them for me

['z_Abbing, Mr. Anthony',
 'z_Abbott, Mr. Rossmore Edward',
 'z_Abbott, Mrs. Stanton (Rosa Hunt)',
 'z_Abelson, Mr. Samuel',
 'z_Abelson, Mrs. Samuel (Hannah Wizosky)']

I am going to bail out a bit here. Normally I would build a tree with the 891 splitter columns and a depth of 891. But that takes awhile to run. Instead I'll build a tree with the first 5 columns and then take a look at the paths thar are produced.

In [0]:
tree_name_5 = build_tree_iter(big_table, splitter_columns[:5], 'Survived', {'max-depth': 5})


In [12]:
tree_name_5['paths'][0]  #Look at path 0.

{'conjunction': [('z_Abbott, Mrs. Stanton (Rosa Hunt)_1',
   <function library_w19_week4b.build_pred.<locals>.<lambda>>)],
 'gig_score': 0.0008531576117549178,
 'prediction': 1}

You can see that this path has one predicate. It asks if the `Name` value is equal to 'Abbott, Mrs. Stanton (Rosa Hunt)'. If it is true then prediction is 1. Is this prediction always correct? Yes! When I was building the tree, I just looked at the `Survived` column for Mrs. Abbott and used that. And there is only 1 row that matches the predicate.
<p>
  Let's look at the next path.

In [13]:
tree_name_5['paths'][1]

{'conjunction': [('z_Abbott, Mrs. Stanton (Rosa Hunt)_0',
   <function library_w19_week4b.build_pred.<locals>.<lambda>>),
  ('z_Abelson, Mrs. Samuel (Hannah Wizosky)_1',
   <function library_w19_week4b.build_pred.<locals>.<lambda>>)],
 'gig_score': 0.000856037737108517,
 'prediction': 1}

This path has 2 predicates: (1) Name is not 'Abbott, Mrs. Stanton (Rosa Hunt)', and (2) Name is 'Abelson, Mrs. Samuel (Hannah Wizosky)'. If both are true pedict 1. Again, this is always a perfect prediction.
<p>
  I'll do one more.

In [14]:
tree_name_5['paths'][2]

{'conjunction': [('z_Abbott, Mrs. Stanton (Rosa Hunt)_0',
   <function library_w19_week4b.build_pred.<locals>.<lambda>>),
  ('z_Abelson, Mrs. Samuel (Hannah Wizosky)_0',
   <function library_w19_week4b.build_pred.<locals>.<lambda>>),
  ('z_Abbing, Mr. Anthony_1',
   <function library_w19_week4b.build_pred.<locals>.<lambda>>)],
 'gig_score': 0.0003294362168793641,
 'prediction': 0}

Has 3 predicates. Rule out the first 2 names and check the 3rd name. I hope you can see that we are building the tree below. Given a row we are trying to predict, we will eventually come to the `z_`  column that matches and always choose the right prediction.

<img src='https://www.dropbox.com/s/eycivebhtbu06vx/Screenshot%202019-01-28%2015.31.54.png?raw=1'>

So what's wrong with that? We have 100% accuracy. The problem is that when you see new passengers, their names will not match any of the nodes in the tree. What will happen then? We will end up on the False branch of the last node in the tree. And we know from the way we do tree building, we will have a problem. I'll let you figure it out for extra credit.

<h2>How can I test overfitting?</h2>

The key issue in overfitting your tree is using the same data to both build the tree and test the tree. One way to check overfitting is to modify how we build and test. How about dividing up the full table into a bulding sub-table (typically called a training table) and a testing sub-table? We will try that next.

<h1>First let's build a new results table</h1>

Before trying the split idea, let's do it our old-school way: build and test from full table. It will give us a benchmark to compare against.
<p>

In [0]:
hypers_1 = {'max-depth': 3, 'gig-cutoff': 0.0}

Here are splitter columns I will use.

In [0]:
splitter_columns = [
 'emb_C',
 'emb_Q',
 'emb_S',
 'emb_nan',
 'age_Child',
 'age_Adult',
 'age_Senior',
 'no_age',
 'ok_child',
 'sex_female',
 'sex_male', 
 'pclass_1',
 'pclass_2',
 'pclass_3'
]

In [17]:
tree3 = build_tree_iter(titanic_table, splitter_columns, 'Survived', hypers_1)
print(len(tree3['paths']))

8


What I want to keep track of are the scores for the 3 measures: accuracy, f1, informedness. I'll build a new table for this. I will also include a name column to give some description of what we are doing.

In [18]:
cols = ['name', 'accuracy', 'f1', 'informedness']
kfold_table_1 = pd.DataFrame(columns=cols)  # empty for now
kfold_table_1

Unnamed: 0,name,accuracy,f1,informedness


I am going to do one more thing. I am going to add a type of comment to the kfold_table_1 dataframe. This will help me remember what settings I was using to get the table results.
<p>
  How did I come up with this code? Stackoverflow as usual: https://stackoverflow.com/a/54137536/4996152. Note that this is still a bit of a kludge. When I write the table out to a csv file, I'll lose this meta information. And we will see it is even worse than that in a minute.

In [19]:
from types import SimpleNamespace

kfold_table_1.meta = SimpleNamespace()
kfold_table_1.meta.hypers = hypers_1
kfold_table_1.meta

namespace(hypers={'max-depth': 3, 'gig-cutoff': 0.0})

Here is where we would start computing predictions, types and scores. I only care about scores so I think we can automate a big portion of what we need.

In [0]:
def produce_scores(table, tree, target):
    scratch_table = pd.DataFrame(columns=['prediction', 'actual'])
    scratch_table['prediction'] = table.apply(lambda row: tree_predictor(row, tree), axis=1)
    scratch_table['actual'] = table[target]  # just copy the target column
    cases = scratch_table.apply(lambda row: predictor_case(row, pred='prediction', target='actual'), axis=1)
    vc = cases.value_counts()
    return [accuracy(vc), f1(vc), informedness(vc)]


Let's try it out on the full table and tree3.


In [21]:
scores_3 = produce_scores(titanic_table, tree3, 'Survived')
scores_3


[0.8237934904601572, 0.7288428324697754, 0.5696002300834051]

Append to our results table.

In [22]:
kfold_table_1 = kfold_table_1.append({'name':'full_table',
                                      'accuracy': scores_3[0],
                                      'f1': scores_3[1],
                                      'informedness': scores_3[2]}, ignore_index=True)
kfold_table_1

Unnamed: 0,name,accuracy,f1,informedness
0,full_table,0.823793,0.728843,0.5696


In [0]:
#You will get an error if you execute this code. Why? Because pandas does not do what we want.
#When pandas builds a new table it does not copy over the meta data. It is an issue that has
#been rasied with pandas' developers.

kfold_table_1.meta  #should produce error of missing meta attribute

<h2>
Using both a training set and a testing set
</h2>
<div class=h1_cell>
<p>
Now let's try a new approach. We will split the full table into a training/building set and a testing set.

The standard approach is to take 2/3 of the data as training and hold out 1/3 as testing.
</div>

In [24]:
total_len = len(titanic_table)
split_boundary = int(total_len*(2/3))
split_boundary

594


<div class=h1_cell>
<p>
You can use a slice operator on a table just like you can on a list. Cool.
</div>

In [25]:
training_table = titanic_table[0:split_boundary]  # 0-593
test_table = titanic_table[split_boundary:]       # 594 to 890
print(len(training_table))
print(len(test_table))

594
297


<div class=h1_cell>
Now let's again build a tree like we did above with same splitter columns and same hypers. But now we will use the training table to build the tree.
</div>

In [26]:
#Notice using training_table not titanic_table

tree_train = build_tree_iter(training_table, splitter_columns, 'Survived', hypers_1)
print(len(tree_train['paths']))
#tree_train['paths']

8


Now to testing. Let's check out scores.

In [27]:
#Notice test_table and not titanic_table

produce_scores(test_table, tree_train, 'Survived')

[0.8047138047138047, 0.6627906976744187, 0.4907407407407407]

In [28]:
kfold_table_1  #Check against results for full table which we have stored away


Unnamed: 0,name,accuracy,f1,informedness
0,full_table,0.823793,0.728843,0.5696


We dropped on all scores. This is very typical. When testing with data different than training data, scores drop. Why? We do less overfitting of training data and hence get a more realistic score.

<h2>
Ok, let's generalize
</h2>
<p>
<div class=h1_cell>
<p>
What we are doing is called cross-validation: breaking our data up into training and testing sets. I'd like to push a bit harder on the cross-validation idea. I suggest we try more than just one split for training/testing. Try a bunch then average their results. There are many ways we can consider generating a separate set of splits. I am going to focus on a standard approach called K-Folding. The general idea is that we divide the table into K partitions or folds. We then build K trees from various combinations of the folds and do K tests, one for each tree. Where does K come from? You get to choose. I will use K=5 below but K=10 is more common.
<p>
Even with this standard algorithm there are variations on how you select the K folds. I am going to use a sequential approach, splitting into folds along the row indices. So my first fold will be from 0 to i, my next fold from i+1 to j, etc.
<p>
I'll also refer to the folds as slices.
</div>

In [29]:
k = 5  # more often 10

total_len = len(titanic_table.index)
slice_size = int(1.0/k*total_len)
slice_size

178

In [0]:
slice_1 = titanic_table[0:slice_size]
slice_2 = titanic_table[1*slice_size:2*slice_size]
slice_3 = titanic_table[2*slice_size:3*slice_size]
slice_4 = titanic_table[3*slice_size:4*slice_size]
slice_5 = titanic_table[4*slice_size:]

<div class=h1_cell>
Now that I have my 5 folds/slices, I'll take the first step: train on 4 of the slices and test on the remaining slice. 
</div>

In [31]:
fold1_test_table = slice_1
fold1_train_table = pd.concat([slice_2, slice_3, slice_4, slice_5])
len(fold1_train_table)

713

<div class=h1_cell>
Keep plodding along until I have 5 training sets and 5 test sets. 
</div>

In [32]:
fold2_test_table = slice_2
fold2_train_table = pd.concat([slice_1, slice_3, slice_4, slice_5])
len(fold2_train_table)

713

In [33]:
fold3_test_table = slice_3
fold3_train_table = pd.concat([slice_1, slice_2, slice_4, slice_5])
len(fold3_train_table)

713

In [34]:
fold4_test_table = slice_4
fold4_train_table = pd.concat([slice_1, slice_2, slice_3, slice_5])
len(fold4_train_table)

713

In [35]:
fold5_test_table = slice_5
fold5_train_table = pd.concat([slice_1, slice_2, slice_3, slice_4])
len(fold5_train_table)

712

<h2>
Try fold-set number 1
</h2>
<p>
<div class=h1_cell>
Whew. We now have 5 pairs of training and test. I'll try the first fold and see how it goes. First I'll build a tree from the training slice then test it with the test slice.

</div>

In [0]:
fold1_tree = build_tree_iter(fold1_train_table, splitter_columns, 'Survived', hypers_1)  # train

In [37]:
fold1_scores = produce_scores(fold1_test_table, fold1_tree, 'Survived')  # test
fold1_scores

[0.8089887640449438, 0.6666666666666666, 0.5006409343398377]

<h2>
Do the remaining 4
</h2>
<p>
<div class=h1_cell>


</div>

In [38]:
fold2_tree = build_tree_iter(fold2_train_table, splitter_columns, 'Survived', hypers_1)  # train
fold2_scores = produce_scores(fold2_test_table, fold2_tree, 'Survived')  # test
fold2_scores

[0.8033707865168539, 0.7328244274809159, 0.5653846153846154]

In [39]:
fold3_tree = build_tree_iter(fold3_train_table, splitter_columns, 'Survived', hypers_1)  # train
fold3_scores = produce_scores(fold3_test_table, fold3_tree, 'Survived')  # test
fold3_scores

[0.8370786516853933, 0.7603305785123966, 0.6108465608465607]

In [40]:
fold4_tree = build_tree_iter(fold4_train_table, splitter_columns, 'Survived', hypers_1)  # train
fold4_scores = produce_scores(fold4_test_table, fold4_tree, 'Survived')  # test
fold4_scores

[0.7808988764044944, 0.6608695652173913, 0.47913650125049356]

In [41]:
fold5_tree = build_tree_iter(fold5_train_table, splitter_columns, 'Survived', hypers_1)  # train
fold5_scores = produce_scores(fold5_test_table, fold5_tree, 'Survived')  # test
fold5_scores

[0.8268156424581006, 0.7102803738317757, 0.5502717391304348]

<h2>Take the average</h2>

Finally, let's take the average of the 5 folds. You can see I am using `np.add` and `reduce` in my averaging code. It is probably the case that a list-comprehension using `zip` would be clearer. I like both `numpy` and `reduce` because of potential for parallelism if I think I might need a performance boost in the future.

In [45]:
import numpy as np
from functools import reduce

average_1 = tuple(reduce(np.add, (fold2_scores, fold3_scores, fold4_scores, fold5_scores), fold1_scores)/5)  #take average of 5 folds
average_1

(0.8114305442219572, 0.7061943223418292, 0.5412560701903883)

<h2>
Kind of tedious
</h2>
<p>
<div class=h1_cell>
It was kind of ok building the 5 separate folds by hand, but when I move to K=10, it's a bit much to repeat all this 10 times for the 10 folds. So let's build a new function to do it for us. First, I'll define a helper function that given a list of slices and an index, will create a new list of slices with the index slice left out. Why? Because we need something to compute the training table and will use this function to get the necessary slices.
</div>

In [0]:
def compute_training(slices, left_out):
    training_slices = []
    for i,slice in enumerate(slices):
        if i == left_out:
            continue
        training_slices.append(slices[i])
    return pd.concat(training_slices)  # note we are returning a table (DataFrame)


In [49]:
#test it

a_traing_table = compute_training((slice_1,slice_2,slice_3,slice_4,slice_5), 4)  #leave out 4th (i.e., last) slice
len(a_traing_table)  #expect 891 minus the last slice = 712


712

<h2>
Ready to automate
</h2>
<p>
<div class=h1_cell>
I'll define a function that takes as arguments (a) the big table, (b) value for K, (c) the target column, (e) the hyper parameters to be used in building a model, and (f) the candidate columns to build the splitters from.
<p>
The function's output will be a results table. Notice I also added a comment onto the table to help me remember what hyper parameteres I was using.
</div>

In [0]:
def k_fold(table, k, target, hypers, candidate_columns):
  
    #set up the table where we will record fold results
    result_columns = ['name',  'accuracy', 'f1', 'informedness']
    k_fold_results_table = pd.DataFrame(columns=result_columns)
    
    #generate the slices
    total_len = len(table.index)
    slice_size = int(total_len/(1.0*k))
    slices = []
    for i in range(k-1):
        a_slice =  table[i*slice_size:(i+1)*slice_size]
        slices.append( a_slice )
    slices.append( table[(k-1)*slice_size:] )  # whatever is left
    
    #generate test results
    all_scores = []  #keep track of all k results
    for i in range(k):
        test_table = slices[i]
        train_table = compute_training(slices, i)
        fold_tree = build_tree_iter(train_table, candidate_columns, target, hypers)  # train
        scores = produce_scores(test_table, fold_tree, target)  # test
        results_row = {'name': 'fold_'+str(i), 'accuracy': scores[0], 'f1': scores[1], 'informedness': scores[2]}
        k_fold_results_table = k_fold_results_table.append(results_row,ignore_index=True)
        all_scores.append(scores)
    
    #compute average of all folds
    avg_scores = tuple(reduce(lambda total, triple: np.add(triple, total), all_scores)/5)
    results_row = {'name': 'average', 'accuracy': avg_scores[0], 'f1': avg_scores[1], 'informedness': avg_scores[2]}
    k_fold_results_table = k_fold_results_table.append(results_row,ignore_index=True)
    
    #note that I add the meta comment as last step to avoid it being wiped out
    k_fold_results_table.meta = SimpleNamespace()
    k_fold_results_table.meta.hypers  = hypers # adds comment to remind me of hyper params used
    
    return k_fold_results_table


<h2>
Let's try it out with default hyper params
</h2>
<p>
<div class=h1_cell>
I'll use K=5 and default values for hyper-parameters, i.e., max-depth of 4, gig-cutoff of 0.
</div>

In [0]:
default5_table = k_fold(titanic_table, 5, 'Survived', {}, splitter_columns)  # max-depth=4


In [52]:

default5_table


Unnamed: 0,name,accuracy,f1,informedness
0,fold_0,0.808989,0.666667,0.500641
1,fold_1,0.820225,0.761194,0.603846
2,fold_2,0.837079,0.760331,0.610847
3,fold_3,0.780899,0.66087,0.479137
4,fold_4,0.843575,0.745455,0.597147
5,average,0.818153,0.718903,0.558323


In [53]:
default5_table.meta  #using defaults

namespace(hypers={})

<h2>
Tuning hyper-parameters
</h2>
<p>
<div class=h1_cell>
We have 2 hyper-parameters. I'll concentrate on max-depth. I have the results for the default max-depth (i.e., 4) above. The means from the 5 folds are (0.818153,	0.718903,	0.558323).
<p>
I'll now try changing the max-depth to 3 and see how we do. As reminder, we did this in the last module and got a result that was the same as with max-depth 4. But that was using same data for training and testing. Now let's see if K-folding gives us a different answer.
</div>

In [55]:
max3_table = k_fold(titanic_table, 5, 'Survived', {'max-depth':3}, splitter_columns)
max3_table


Unnamed: 0,name,accuracy,f1,informedness
0,fold_0,0.808989,0.666667,0.500641
1,fold_1,0.803371,0.732824,0.565385
2,fold_2,0.837079,0.760331,0.610847
3,fold_3,0.780899,0.66087,0.479137
4,fold_4,0.826816,0.71028,0.550272
5,average,0.811431,0.706194,0.541256


<div class=h1_cell>
<p>
Here are our means from depth 4: (0.818153	0.718903	0.558323).
<p>
Here are our means from depth 3: (0.811431	0.706194	0.541256).
<p>
Using 5-folds, we lost a little ground when going from 4 to 3.
<p>
Let's try depth 2. Note that by decreasing the depth, we are moving the needle away from high variance but towards high bias.
</div>

In [58]:
max2_table = k_fold(titanic_table, 5, 'Survived', {'max-depth':2}, splitter_columns)
max2_table

Unnamed: 0,name,accuracy,f1,informedness
0,fold_0,0.752809,0.511111,0.322604
1,fold_1,0.764045,0.65,0.47
2,fold_2,0.792135,0.733813,0.561905
3,fold_3,0.741573,0.646154,0.432671
4,fold_4,0.798883,0.704918,0.54144
5,average,0.769889,0.649199,0.465724


<div class=h1_cell>
Lost quite a bit of ground with level 2.
</div>

<h2>
Where now?
</h2>
<p>
<div class=h1_cell>
<p>
We could try a few more values of depth. But we also could start playing with the other knob, the gig cutoff. By the end of our exploration, we should have a good idea on what values to set our hyper-parameters to. At that point, we will generate the final tree using all of the data. Something like this:
<p>
<pre>
<code>
optimal_depth = ...  # what we discovered in our K-folding
optimal_gig_cutoff = ...  # ditto
hypers = {'max-depth': optimal_depth, 'gig-cutoff': optimal_gig_cutoff )
final_tree = build_tree_iter(titanic_table, candidate_columns, 'Survived', hypers )
</code>
</pre>
</div>

<hr>
<h1>Write it out</h1>
<div class=h1_cell>

We did not change the tree so no need to write it out.
  <p>
    We did add functions so add them to your library and store under name `library_w19_week5.py`. Reminder: if one of your functions in library_w19_week5.py imports a library, that import has to happen in library_w19_week5.py. The functions in your library cannot see the imports you do in the notebook. Separate namespace and all that.
</div>


<h2>
Next up
</h2>
<p>
<div class=h1_cell>
Think about this. With K-folding, we are building k separate trees but then throwing them away. The final tree is produced from all the data. What if we decided not to throw those k trees away? What if we chose to keep all the trees as the "final tree". We would have an ensemble of trees (AKA a forest). How would they agree among themselves on the correct prediction? How about simply letting them vote. That's what is coming up next.
</div>