Titanic Dataset EDA and Modeling

Exploratory data analysis and modeling for the Kaggle Titanic challenge.

This repo demonstrates exploratory data analysis and modeling in R for the Titanic dataset from Kaggle (https://www.kaggle.com/c/titanic). You'll find an R notebook showing you step-by-step how to preprocess the data set, visualize the data with ggplot2, train and evaluate several common modeling algorithms, and perform test set prediction and submission. The final submission produced by this R notebook will land you a score of 0.78947 on Kaggle's public leaderboard, the top 22% of participants.

The code takes several minutes to run, so instead you can view the knitted HTML document containing the complete code and output, all nicely formatted. Just download the HTML file and view it locally on your browser.

This is meant to be educational and to serve as a starting point for others. Anyone is welcome to open issues and to suggest improvements.

Requirements

You'll need to install the following packages in your R environment.

install.packages("tidyverse", "caTools", "pROC", "class", "randomForest", "gbm", "e1071", "MASS")

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
README.md		README.md
submission1.csv		submission1.csv
submission2.csv		submission2.csv
titanic_dataset_EDA_and_modeling.Rmd		titanic_dataset_EDA_and_modeling.Rmd
titanic_dataset_EDA_and_modeling.html		titanic_dataset_EDA_and_modeling.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Titanic Dataset EDA and Modeling

Requirements

About

Releases

Packages

Languages

rmislam/titanic-dataset-EDA-and-modeling

Folders and files

Latest commit

History

Repository files navigation

Titanic Dataset EDA and Modeling

Requirements

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages