# Exploratory Data Analysis and Preprocessing

In this notebook, we will perform exploratory data analysis (EDA) on the fashion product dataset, clean the data, and encode categorical variables.

In [None]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import os

# Set display options
pd.set_option('display.max_columns', None)
sns.set(style='whitegrid')

In [None]:
# Load the dataset
data_path = 'data/styles.csv'
df = pd.read_csv(data_path)

# Display the first few rows of the dataframe
df.head()

In [None]:
# Summary of the dataset
df.info()
df.describe(include='all')

In [None]:
# Check for missing values
missing_values = df.isnull().sum()
missing_values[missing_values > 0]

In [None]:
# Data cleaning
# Example: Dropping rows with missing target variable
df = df.dropna(subset=['target_column'])  # Replace 'target_column' with actual target column name

# Reset index after dropping rows
df.reset_index(drop=True, inplace=True)

In [None]:
# Encoding categorical variables
df_encoded = pd.get_dummies(df, columns=['categorical_column1', 'categorical_column2'])  # Replace with actual column names

# Display the first few rows of the encoded dataframe
df_encoded.head()

In [None]:
# Save the cleaned and encoded data for model training
cleaned_data_path = 'data/cleaned_styles.csv'
df_encoded.to_csv(cleaned_data_path, index=False)

## Conclusion

In this notebook, we performed EDA, cleaned the dataset, and encoded categorical variables. The cleaned data is now ready for model training.