# Data Audit and Exploratory Data Analysis

This notebook provides comprehensive data profiling, quality assessment, and exploratory analysis for the HPE Resource Assignment System.

## Objectives
1. **Dataset Cards**: Overview statistics for each data source
2. **Join Viability**: Analysis of relationships between datasets
3. **Text Quality**: Language detection, length analysis, n-gram analysis
4. **Label Distribution**: Class imbalance and long-tail analysis
5. **Outlier Detection**: IQR and z-score based outlier identification


In [None]:
# Setup
import sys
sys.path.append('../')

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import warnings
warnings.filterwarnings('ignore')

# Import our modules
from src.io_loader import ExcelLoader, load_processed_data
from src.eda import DataProfiler, FeatureDiscovery
from src.utils import config, logger

# Configure plotting
plt.style.use('default')
sns.set_palette("husl")
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 100)

print("✅ Setup complete")
