# Data Preparation & Exploratory Data Analysis

## Table of Contents
1. [Setup & Imports](#setup)
2. [Data Loading](#loading)
3. [Data Quality Assessment](#quality)
4. [Data Cleaning](#cleaning)
5. [Feature Engineering](#features)
6. [Univariate Analysis](#univariate)
7. [Temporal Analysis](#temporal)
8. [Trader Segmentation](#segmentation)
9. [Save Processed Data](#save)

## Environment Information
- Python version: 3.8+
- Key packages: pandas, numpy, scikit-learn, plotly
- Last updated: October 26, 2025

In [None]:
# Setup and imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
import warnings
warnings.filterwarnings('ignore')

# Set random seed for reproducibility
np.random.seed(42)

# Set visualization styles
plt.style.use('seaborn')
sns.set_palette("husl")

# Define custom color scheme
COLOR_SCHEME = {
    'primary': '#1f77b4',
    'secondary': '#ff7f0e',
    'positive': '#2ca02c',
    'negative': '#d62728',
    'neutral': '#7f7f7f'
}

## Data Loading
Loading the fear & greed index data and historical trading data. We'll implement proper error handling and data validation.

In [None]:
# Load datasets
def load_data(file_path, validate=True):
    """
    Load and validate dataset
    
    Parameters:
    -----------
    file_path : str
        Path to the CSV file
    validate : bool
        Whether to perform validation checks
        
    Returns:
    --------
    pd.DataFrame
        Loaded and validated dataframe
    """
    try:
        df = pd.read_csv(file_path)
        
        if validate:
            # Basic validation
            assert not df.empty, "DataFrame is empty"
            assert df.index.is_unique, "Index is not unique"
            
        return df
    
    except Exception as e:
        print(f"Error loading {file_path}: {str(e)}")
        return None

# Load datasets
fear_greed_df = load_data('fear_greed_index.csv')
historical_df = load_data('historical_data.csv')

# Display basic information
print("\nFear & Greed Index Data:")
print(fear_greed_df.info())
print("\nHistorical Trading Data:")
print(historical_df.info())