In [None]:
# Import data into variable
df <- read.csv("../input/titanic/train.csv")

In [None]:
# Import libraries
library(dslabs)
library(dplyr)
library(ggplot2)
library(tidyverse)
library(ggthemes)
library(ggrepel)

In [None]:
# Check Data
str(df)
head(df)

In [None]:
colSums(is.na(df) | df == "" | df == 0)

# Survival Rates
**1st** class Females were *most likely* to survive.

**3rd** class male were *least likely* to survive.

In [None]:
df %>% mutate(Survived = factor(Survived)) %>%
  ggplot(aes(x = Pclass, fill = Survived)) +
  geom_bar() +
  facet_grid(Sex~.) +
  ggtitle("Survival ratio by classes and sex")

# Having Siblings aboard
Having *0-2* siblings in *any class* didn't mean the passenger would *survive*.
Having *3 and more* siblings *lead to death*, but probably this is due to the fact that *all passengers in 3rd class tend to die*, and high-class passengers didn't have more than 2 siblings with them.

In [None]:
df %>% mutate(SibSp = factor(SibSp)) %>%
  ggplot(aes(x = Pclass, fill = SibSp)) +
  geom_bar() +
  facet_grid(Survived~.) +
  ggtitle("Passengers died/alive number vs. number of Siblings")

# Having Parents/children aboard
**Same** as with siblings conclusion comes with Parents/children number aboard.

In [None]:
df %>% mutate(Parch = factor(Parch)) %>%
  ggplot(aes(x = Pclass, fill = Parch)) +
  geom_bar() +
  facet_grid(Survived~.) +
  ggtitle("Passengers died/alive number vs. number of Parents/children")

# Ship fare as a ticket to heaven
Paying *higher* ship fare led to *higher* chance to **survive** in the **1st and 2nd** classes. Even though the mean fare of survived 3rd is higher than mean fare of died ones, according to the fare distribution it can be concluded that *a distinct group of passengers* with *approximately 6-11* British pounds fee were **in the place of the greatest danger**

In [None]:
df %>% filter(Fare > 0) %>%
  mutate(Pclass = factor(Pclass), Survived = factor(Survived)) %>%
  ggplot(aes(y = Fare, x = Survived, color = Survived)) +
  geom_boxplot() +
  facet_grid(.~Pclass) +
  scale_y_log10() +
  geom_jitter(width=0.1) +
  ggtitle("Fare distribution by classes")

# Age to survival chances
*Females younger than 26* were subjects of **likely death**. Same results for *males between 13 and 30* years old. Interestingly, *male younger than 13* tend to **stay alive**. 
Another point,* females between 41 and 48.5* years old were more **closer to death**, while *males* were in **50/50** situation. However, *after point of 49*, *females* tend to **survive**. In contrary, *after 51* years old, *males* **died** in significant rate. The *oldest survived* passenger were **80 years old male**.

In [None]:
df %>% filter(!is.na(Age)) %>% mutate(Survived = factor(Survived)) %>%
  ggplot(aes(x = Age, fill = Survived)) +
  geom_density(alpha=0.35) +
  facet_grid(.~Sex) +
  ggtitle("Age density to survival chances")

# Age vs. survival chances by classes
Basically, across *all classes* passengers *after 42* years old were more likely **subjects of death**. However, *2nd class* passengers *after 16* years old were also **doomed** even though in slighter probability.The same applies to *3rd class* passengers with **notable recovery** *after 25* years old. However with **the greatest fall** of surviving chances *after 32*.

In [None]:
df %>% filter(!is.na(Age)) %>%
  mutate(Pclass = factor(Pclass), Survived = factor(Survived)) %>%
  ggplot(aes(x = Age, fill = Survived)) +
  geom_density(alpha=0.35) +
  facet_wrap(vars(Pclass)) +
  ggtitle("Age vs. survival chances by classes")

# Port of Embarkation vs. Survival chances
Overall, a port of embarkation did not affect chances of survival

In [None]:
df %>% filter(Embarked != "") %>%
  mutate(Embarked = factor(Embarked), Survived = factor(Survived)) %>%
  ggplot(aes(x = Embarked, fill = Survived)) +
  geom_bar() +
  facet_wrap(vars(Pclass)) +
  ggtitle("Port of Embarkation vs. Survival chances")

# Conclusion
Properties with **SIGNIFICANT** effect on survival chances are:
* Pclass - Passenger's class
* Sex - Passenger's Sex
* Age - Passenger's Age
* Fare - Passenger's ship fare

Properties with **ALMOST NONE** effect on survival chances are:
* Sibsp - Number of siblings aboard
* Parch - Number of parents/children aboard
* Embarked - Passenger's port of embarkation

Properties with **NO** effect on survival chances are:
* Name - Passenger's name
* Ticket - Passenger ticket's id

Properties that should had effect, but there is lack of data:
* Cabin - Passenger's cabin number