# Exploratory Data Analysis on Password Dataset

This notebook contains exploratory data analysis (EDA) on the password dataset. The goal is to understand the characteristics of the passwords and derive insights that can help in evaluating password strength.

In [None]:
# Import necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Set visualization style
sns.set(style='whitegrid')

In [None]:
# Load the dataset
data = pd.read_csv('../data/processed/password_data.csv')

# Display the first few rows of the dataset
data.head()

In [None]:
# Summary statistics of the dataset
data.describe()

In [None]:
# Visualize the distribution of password lengths
plt.figure(figsize=(10, 6))
sns.histplot(data['password_length'], bins=30, kde=True)
plt.title('Distribution of Password Lengths')
plt.xlabel('Password Length')
plt.ylabel('Frequency')
plt.show()

In [None]:
# Visualize the correlation between features
plt.figure(figsize=(12, 8))
correlation_matrix = data.corr()
sns.heatmap(correlation_matrix, annot=True, fmt='.2f', cmap='coolwarm')
plt.title('Correlation Matrix')
plt.show()

## Insights

1. The distribution of password lengths shows that most passwords are between X and Y characters long.
2. There is a strong correlation between password length and strength score, indicating that longer passwords tend to be stronger.

Further analysis can be conducted to explore other features and their impact on password strength.