# Exploratory Data Analysis (EDA) on Animal Shelter Outcomes

This notebook is used for performing exploratory data analysis on the training dataset of animal shelter outcomes. The goal is to understand the distribution of the outcome types and other features.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Set the aesthetic style of the plots
sns.set_style('whitegrid')

In [None]:
# Load the training dataset
train_data = pd.read_csv('../data/train.csv')

# Display the first few rows of the dataset
train_data.head()

In [None]:
# Summary statistics of the dataset
train_data.describe(include='all')

In [None]:
# Check for missing values
missing_values = train_data.isnull().sum()
missing_values[missing_values > 0]

In [None]:
# Visualize the distribution of OutcomeType
plt.figure(figsize=(10, 6))
sns.countplot(data=train_data, x='OutcomeType', palette='Set2')
plt.title('Distribution of Outcome Types')
plt.xlabel('Outcome Type')
plt.ylabel('Count')
plt.xticks(rotation=45)
plt.show()

In [None]:
# Visualize the relationship between AgeuponOutcome and OutcomeType
plt.figure(figsize=(12, 6))
sns.boxplot(data=train_data, x='OutcomeType', y='AgeuponOutcome', palette='Set2')
plt.title('Age upon Outcome by Outcome Type')
plt.xlabel('Outcome Type')
plt.ylabel('Age upon Outcome')
plt.xticks(rotation=45)
plt.show()

In [None]:
# Visualize the distribution of breeds
plt.figure(figsize=(15, 8))
top_breeds = train_data['Breed'].value_counts().nlargest(10)
sns.barplot(x=top_breeds.index, y=top_breeds.values, palette='Set2')
plt.title('Top 10 Breeds in the Dataset')
plt.xlabel('Breed')
plt.ylabel('Count')
plt.xticks(rotation=45)
plt.show()

## Conclusion

This exploratory analysis provides insights into the distribution of outcome types, age upon outcome, and the most common breeds in the dataset. Further analysis and feature engineering will be necessary to prepare the data for modeling.