#### Name: Pratik Pawar 
#### Roll No.: 43
#### Batch: A2
#### PRN: 22211617

1. Using the [Titanic dataset]( https://www.kaggle.com/c/titanic/data), create a Python class
to perform basic exploratory data analysis (EDA). Implement methods to visualize the
distribution of survival rates based on different features such as ‘Pclass’, ‘Sex’, and
‘Age’. Use Matplotlib for visualization. 
Requirements:
o Create a class `TitanicEDA` with methods to load data, generate summary
statistics, and create visualizations.
o Use Pandas for data manipulation.
o Visualize the distribution of survival rates and other features.
o Save visualizations as image files.

In [11]:
pip install pandas matplotlib seaborn


Note: you may need to restart the kernel to use updated packages.


In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns


In [3]:
class TitanicEDA:
    def __init__(self, file_path):
        """
        Initialize the TitanicEDA class with the path to the dataset.
        """
        self.file_path = file_path
        self.df = None

    def load_data(self):
        """
        Load the Titanic dataset from the specified file path.
        """
        self.df = pd.read_csv(self.file_path)
        print("Data loaded successfully.")

    def generate_summary_statistics(self):
        """
        Generate and print summary statistics of the dataset.
        """
        if self.df is not None:
            print("Summary Statistics:")
            print(self.df.describe(include='all'))
        else:
            print("Data not loaded. Please load data first.")

    def plot_survival_rates(self, feature):
        """
        Plot survival rates based on the specified feature and save the plot as an image file.
        """
        if self.df is not None:
            plt.figure(figsize=(10, 6))
            sns.barplot(x=feature, y='Survived', data=self.df, estimator=lambda x: sum(x) / len(x))
            plt.title(f'Survival Rate by {feature}')
            plt.xlabel(feature)
            plt.ylabel('Survival Rate')
            plt.savefig(f'survival_rate_by_{feature}.png')
            plt.close()
            print(f'Plot saved as survival_rate_by_{feature}.png')
        else:
            print("Data not loaded. Please load data first.")

    def plot_age_distribution(self):
        """
        Plot the distribution of ages and save the plot as an image file.
        """
        if self.df is not None:
            plt.figure(figsize=(10, 6))
            sns.histplot(self.df['Age'].dropna(), kde=True, bins=30)
            plt.title('Age Distribution')
            plt.xlabel('Age')
            plt.ylabel('Frequency')
            plt.savefig('age_distribution.png')
            plt.close()
            print('Plot saved as age_distribution.png')
        else:
            print("Data not loaded. Please load data first.")

    def plot_class_distribution(self):
        """
        Plot the distribution of passengers based on their Pclass and save the plot as an image file.
        """
        if self.df is not None:
            plt.figure(figsize=(10, 6))
            sns.countplot(x='Pclass', data=self.df)
            plt.title('Passenger Class Distribution')
            plt.xlabel('Pclass')
            plt.ylabel('Count')
            plt.savefig('class_distribution.png')
            plt.close()
            print('Plot saved as class_distribution.png')
        else:
            print("Data not loaded. Please load data first.")

    def plot_sex_distribution(self):
        """
        Plot the distribution of survival rates based on Sex and save the plot as an image file.
        """
        if self.df is not None:
            plt.figure(figsize=(10, 6))
            sns.barplot(x='Sex', y='Survived', data=self.df, estimator=lambda x: sum(x) / len(x))
            plt.title('Survival Rate by Sex')
            plt.xlabel('Sex')
            plt.ylabel('Survival Rate')
            plt.savefig('survival_rate_by_sex.png')
            plt.close()
            print('Plot saved as survival_rate_by_sex.png')
        else:
            print("Data not loaded. Please load data first.")


In [7]:
# Create an instance of TitanicEDA
eda = TitanicEDA("Downloads/Titanic-Dataset.csv")

# Load the Titanic dataset
eda.load_data()


Data loaded successfully.


In [9]:
# Generate summary statistics
eda.generate_summary_statistics()


Summary Statistics:
        PassengerId    Survived      Pclass                     Name   Sex  \
count    891.000000  891.000000  891.000000                      891   891   
unique          NaN         NaN         NaN                      891     2   
top             NaN         NaN         NaN  Braund, Mr. Owen Harris  male   
freq            NaN         NaN         NaN                        1   577   
mean     446.000000    0.383838    2.308642                      NaN   NaN   
std      257.353842    0.486592    0.836071                      NaN   NaN   
min        1.000000    0.000000    1.000000                      NaN   NaN   
25%      223.500000    0.000000    2.000000                      NaN   NaN   
50%      446.000000    0.000000    3.000000                      NaN   NaN   
75%      668.500000    1.000000    3.000000                      NaN   NaN   
max      891.000000    1.000000    3.000000                      NaN   NaN   

               Age       SibSp       Parch 

In [11]:
# Plot survival rates by Pclass
eda.plot_survival_rates('Pclass')


Plot saved as survival_rate_by_Pclass.png


In [13]:
# Plot survival rates by Sex
eda.plot_sex_distribution()


Plot saved as survival_rate_by_sex.png


In [15]:
# Plot age distribution
eda.plot_age_distribution()


Plot saved as age_distribution.png


In [17]:
# Plot passenger class distribution
eda.plot_class_distribution()


Plot saved as class_distribution.png
