# Python Data Science Libraries with Titanic Dataset

Welcome to this tutorial! We will explore the 5 essential Python libraries for data science: NumPy, Pandas, Matplotlib, Seaborn, and Scikit-learn. Using the famous Titanic dataset, we will perform tasks such as numerical operations, data cleaning, creating visualizations, and building a machine learning model to predict passenger survival.

## Table of Contents
1. [NumPy: Numerical Computations](#numpy)
2. [Pandas: Data Manipulation](#pandas)
3. [Matplotlib: Basic Plotting](#matplotlib)
4. [Seaborn: Advanced Statistical Visualizations](#seaborn)
5. [Scikit-learn: Machine Learning](#scikit-learn)

## NumPy: Numerical Computations <a name="numpy"></a>

First up is NumPy, the foundation of scientific computing in Python. NumPy gives us multi-dimensional arrays and a vast collection of mathematical functions. We'll use it for some basic numerical operations.


In [None]:
 # Import the NumPy library and alias it as np


In [None]:
# Create a 2D array


In [None]:
 # Print the created 2D array

**Perform operations**

In [None]:
# Calculate and print the sum of all elements in the array


In [None]:
# Calculate and print the mean of the array


In [None]:
 # Transpose the array (swap rows and columns) and print it


## Pandas: Data Manipulation <a name="pandas"></a>
Next, let's look at Pandas, your best friend for data manipulation and analysis. For our tutorial, we'll need a dataset to work with. We'll use the Titanic dataset, which contains information about the passengers on the Titanic.

## Introduction to the Titanic Dataset <a name="introduction"></a>

The Titanic dataset is a well-known dataset in the data science community. It contains information about the passengers who were on board the Titanic when it sank in 1912. The dataset includes details such as passenger age, class, fare, and survival status, making it a rich source for various data analysis and machine learning tasks.

### Link to the Dataset
You can download the dataset from the following link:
[Titanic Dataset](https://raw.githubusercontent.com/eduhubai/python-data-science-libraries/main/titanic.csv)

### Dataset Description
- **PassengerId**: Unique ID for each passenger.
- **Survived**: Survival status (0 = No, 1 = Yes).
- **Pclass**: Passenger class (1 = 1st, 2 = 2nd, 3 = 3rd).
- **Name**: Name of the passenger.
- **Sex**: Gender of the passenger.
- **Age**: Age of the passenger.
- **SibSp**: Number of siblings/spouses aboard the Titanic.
- **Parch**: Number of parents/children aboard the Titanic.
- **Ticket**: Ticket number.
- **Fare**: Passenger fare.
- **Cabin**: Cabin number.
- **Embarked**: Port of embarkation (C = Cherbourg, Q = Queenstown, S = Southampton).


### Loading and Exploring the Dataset

In [None]:
 # Import the Pandas library and alias it as pd


In [None]:
# Load the Titanic dataset


In [None]:
# Print the first 5 rows of the DataFrame


In [None]:
# Print basic statistics (count, mean, std, min, 25%, 50%, 75%, max) for each column in the DataFrame


In [None]:
# Group the data by 'Pclass' and calculate the mean of each group, then print the result


## Matplotlib: Basic Plotting <a name="matplotlib"></a>
Now, let's visualize our data with Matplotlib, the most widely used plotting library in Python. We'll create a bar plot to show the number of passengers by class and a histogram to show the distribution of passenger ages.

In [None]:
# Import the Matplotlib library's pyplot module and alias it as plt



**Creating Bar plot**

In [None]:
# 1. Create a new figure with a specific size

# 2. Create a bar plot of the counts of each passenger class

# 3. Set the title of the plot

# 4. Set the label for the x-axis

# 5. Set the label for the y-axis

# 6. Display the plot


Explain the graph:

**Crating Histogram**

In [None]:
# 1. Create a new figure with a specific size

# 2. Create a histogram of the ages of the passengers, with 30 bins

# 3. Set the title of the plot

# 4. Set the label for the x-axis

# 5. Set the label for the y-axis

# 6. Display the plot


Explain the graph:

## Seaborn: Advanced Statistical Visualizations <a name="seaborn"></a>
Now, let's explore Seaborn, a statistical data visualization library built on top of Matplotlib. We'll create a pair plot to explore relationships between different features in the Titanic dataset.

In [None]:
# Import the Seaborn library and alias it as sns


In [None]:
# 1. Set the style of the plots to 'whitegrid'
# 2. Create a pair plot of the DataFrame, color-coded by passenger class and showing 'Age' and 'Fare'
# 3. Display the plot



Explain the graph:

## Scikit-learn: Machine Learning <a name="scikit-learn"></a>
Lastly, let's look at Scikit-learn, your go-to library for predictive data analysis. We'll use it to build a machine learning model to predict whether a passenger survived the Titanic disaster based on their age, fare, and class.

In [None]:
from sklearn.model_selection import train_test_split  # Import the train_test_split function from Scikit-learn
from sklearn.ensemble import RandomForestClassifier  # Import the RandomForestClassifier class from Scikit-learn
from sklearn.metrics import accuracy_score  # Import the accuracy_score function from Scikit-learn

## Prepare the data
# 1. Drop rows with missing values in 'Age', 'Fare', 'Pclass', and 'Survived'
# 2. Use 'Age', 'Fare', and 'Pclass' as features (X)
# 3. Use 'Survived' as the target (y)

## Split the data
# Split the data into training and test sets

## Train a model
# 1. Create a RandomForestClassifier with 100 trees
# 2. Train the model on the training data

## Make predictions
# Predict the survival status for the test data

## Evaluate the model
# 1. Calculate the accuracy of the model
# 2. Print the accuracy


**Explanation:**

## Conclusion: Mastering Python Libraries for Data Science with the Titanic Dataset
And there you have it! We've covered the 5 essential Python libraries for data science: NumPy for numerical computing, Pandas for data manipulation, Matplotlib for basic plotting, Seaborn for statistical visualizations, and Scikit-learn for machine learning. Using the Titanic dataset, we've performed numerical operations, data cleaning, created visualizations, and built a machine learning model to predict passenger survival. Each of these libraries has much more to offer, so I encourage you to explore them further. Thanks for watching, and happy coding!