You are given data from the Titanic, in data.csv
.
For this challenge we need to guess whether the individuals from the dataset had survived or not. Use the provided features and either modify, delete or add new features based on existing ones. This is a very core part of being a data scientist.
After you have massaged the data into the form that makes you happy, then use a DecisionTreeClassifier from sklearn and try and get the highest accuracy you can get. Try adjusting the depth of the tree to vary accuracy.
Finally try to perform cross validation.
Note the data-dictionary.txt
provides information on the fields in the CSV file.
Send me a pull request after analysis is complete.