# 🧹 clean

## Clean Eye-Tracking Data

The `clean()` function helps you clean eye-tracking data by:

1. Removing rows where the 'trial' column contains NaN values
2. Removing columns that are entirely empty (all NaN) or filled with zeros

This is particularly useful for preparing eye-tracking data for analysis by removing data points that might not be useful or could potentially affect the analysis results.

## Parameters

- `samples` (str or DataFrame): Path to samples CSV file or pandas DataFrame with samples data
- `events` (str or DataFrame): Path to events CSV file or pandas DataFrame with events data
- `verbose` (bool): Print cleaning statistics (default: True)
- `copy` (bool): If True, saves to new files instead of overwriting originals when path is provided (default: False)

## Returns

- When file paths are provided and `copy=False`: (samples_path, events_path) - Paths to the cleaned files
- When file paths are provided and `copy=True`: (new_samples_path, new_events_path) - Paths to the new cleaned files
- When DataFrames are provided: (samples_df, events_df) - Cleaned DataFrames

In [1]:
import etformat as et
import pandas as pd

# Load sample data
samples = pd.read_csv(r"D:\Github_web_page_website\test_samples.csv")
events = pd.read_csv(r"D:\Github_web_page_website\test_events.csv")

# Clean the data using DataFrames
cleaned_samples, cleaned_events = et.clean(samples, events)

📖 etformat 1.1.1 - For Documentation, visit: https://ahsankhodami.github.io/etformat/intro.html
🧹 Starting data cleaning for eye-tracking data
   📊 Samples: DataFrame with shape (1520309, 56)
   📊 Events: DataFrame with shape (18261, 38)

🔍 Processing SAMPLES data...
   Original samples shape: (1520309, 56)
   Columns: 56
   ❌ Removed 2522 rows with NaN trials
   ❌ Removed 15 columns:
      • time_rel (all_nan)
      • pxL (all_nan)
      • pyL (all_nan)
      • hxL (all_nan)
      • hyL (all_nan)
      • paL (all_nan)
      • gxL (all_nan)
      • gyL (all_nan)
      • hdata1 (all_zero)
      • hdata6 (all_zero)
      • hdata7 (all_zero)
      • input (all_zero)
      • buttons (all_zero)
      • htype (all_nan)
      • errors (all_zero)
   ✅ Samples cleaned: (1520309, 56) → (1517787, 41)

🔍 Processing EVENTS data...
   Original events shape: (18261, 38)
   Columns: 38
   ❌ Removed 83 rows with NaN trials
   ❌ Removed 7 columns:
      • time (all_zero)
      • sttime_rel (all_nan)
   