# Exploratory Data Analysis for Sign Language Recognition

In this notebook, we will perform exploratory data analysis (EDA) on the sign language dataset. The goal is to visualize the dataset and understand the features that will be used for training the model.

In [None]:
# Import necessary libraries
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from PIL import Image

# Set visualization style
sns.set(style='whitegrid')

In [None]:
# Load the dataset
data_dir = '../data/raw/'  # Adjust the path as necessary
data = pd.read_csv(os.path.join(data_dir, 'sign_language_data.csv'))  # Replace with actual dataset file

# Display the first few rows of the dataset
data.head()

In [None]:
# Summary of the dataset
data.info()
data.describe()

In [None]:
# Visualize the distribution of labels
plt.figure(figsize=(12, 6))
sns.countplot(x='label', data=data)
plt.title('Distribution of Sign Language Labels')
plt.xticks(rotation=45)
plt.show()

In [None]:
# Display sample images from the dataset
sample_images = data.sample(5)
plt.figure(figsize=(15, 10))
for i, row in enumerate(sample_images.iterrows()):
    plt.subplot(1, 5, i + 1)
    img_path = os.path.join(data_dir, row[1]['image_path'])  # Adjust based on actual column name
    img = Image.open(img_path)
    plt.imshow(img)
    plt.title(row[1]['label'])
    plt.axis('off')
plt.show()

## Conclusion

In this notebook, we performed exploratory data analysis on the sign language dataset. We visualized the distribution of labels and displayed sample images. This analysis will help us understand the dataset better and guide our model training process.