# Feature Catalog and Selection Tutorial

This tutorial demonstrates how to configure and select features from **mne-features**, **librosa**, and custom extractors defined in `features/extractors.py`.
We rely on the `config.yaml` file to enable or disable entire groups of features.

## Loading the Configuration
The helper `config.load_config` reads `config.yaml` and validates the structure via pydantic models.

In [1]:
import pprint
import config
from features import load_feature_catalog

cfg = config.load_config()
pprint.pp(cfg.features.model_dump())

{'enable_all': True,
 'mne_features': {'enabled': True,
                  'selected_features': ['line_length',
                                        'zero_crossings',
                                        'kurtosis',
                                        'rms'],
                  'freq_bands': {'delta': [0.5, 4.5],
                                 'theta': [4.5, 8.5],
                                 'alpha': [8.5, 11.5],
                                 'sigma': [11.5, 15.5],
                                 'beta': [15.5, 30.0]}},
 'librosa': {'enabled': True,
             'selected_features': ['spectral_centroid',
                                   'spectral_bandwidth',
                                   'mfcc']},
 'custom': {'enabled': True,
            'selected_features': ['time_skewness',
                                  'time_kurtosis',
                                  'time_rms',
                                  'time_variance',
                                  'peak

## Building the Catalog
The function `load_feature_catalog` uses the configuration to return a dictionary mapping feature names to extractor callables.

In [2]:
catalog = load_feature_catalog()
list(catalog.keys())[:10]  # preview

['time_skewness',
 'time_kurtosis',
 'time_rms',
 'time_variance',
 'peak_to_peak',
 'zero_cross_rate',
 'spectral_entropy',
 'dominant_frequency',
 'wavelet_energy',
 'wavelet_entropy']

## Selecting Specific Libraries
You can disable an entire block (e.g. `librosa`) by setting `enabled: false` in the `config.yaml` section. When `enable_all` is `false`, only the libraries explicitly enabled will be added.

In [3]:
cfg.features.librosa.enabled = False
catalog = load_feature_catalog()
sorted(catalog)

['dominant_frequency',
 'kurtosis',
 'line_length',
 'mfcc',
 'multiscale_entropy',
 'peak_to_peak',
 'rms',
 'spectral_bandwidth',
 'spectral_centroid',
 'spectral_entropy',
 'time_kurtosis',
 'time_rms',
 'time_skewness',
 'time_variance',
 'wavelet_energy',
 'wavelet_entropy',
 'wavelet_symlets_energy',
 'zero_cross_rate',
 'zero_crossings']

## Computing Features
Each extractor takes a 1-D NumPy array (window) and the sampling rate `fs` when required.

In [4]:
import numpy as np
from unitest.fixtures.synthetic_pd import generate_synthetic_partial_discharge

sample = generate_synthetic_partial_discharge(num_good=1, num_fault=0, length=256)
window = sample.iloc[0, :-1].to_numpy(float)
line_length = catalog['line_length'](window, fs=256.0)
line_length

0.056226819264988084

## Batch Extraction from Cleaned Data
The helper function `features.extract_from_clean()` scans `root_dir` for cleaned windows and saves features under `2_feature_engineering/` grouped by theme and cleaning method.

In [5]:
from features import extract_from_clean

# This will write Parquet feature files into outputs/features/*
extract_from_clean(fs=256.0)

FileNotFoundError: [WinError 3] The system cannot find the path specified: 'partial_discharge_project'