## Import matrix

In [14]:
import h5py
from data.edf_data_import import read_mat_string

mat_file = "./data/EDF_RawData.mat"

with h5py.File(mat_file, 'r') as f:
    # List the top-level keys in the MAT file
    print("Keys:", list(f.keys()))
    
    # Access the 'allData' group
    allData = f['allData']
    
    # Access the 'fileName' field: it is stored as an array of references.
    file_names_ds = allData['fileName']
    file_names_list = []
    for i in range(file_names_ds.shape[0]):
        ref = file_names_ds[i, 0]
        name_str = read_mat_string(ref, f)
        file_names_list.append(name_str)
    
    print("EDF File Names:")
    for name in file_names_list:
        print("  ", name)
    
    # Access header and record fields similarly:
    hdr = allData['hdr']
    record = allData['record']
    # You can add further parsing as needed.

Keys: ['#refs#', 'allData']
EDF File Names:
   [[ 82]
 [ 49]
 [ 46]
 [101]
 [100]
 [102]]
   [[ 82]
 [ 49]
 [ 48]
 [ 46]
 [101]
 [100]
 [102]]
   [[ 82]
 [ 50]
 [ 46]
 [101]
 [100]
 [102]]
   [[ 82]
 [ 51]
 [ 46]
 [101]
 [100]
 [102]]
   [[ 82]
 [ 52]
 [ 46]
 [101]
 [100]
 [102]]
   [[ 82]
 [ 53]
 [ 46]
 [101]
 [100]
 [102]]
   [[ 82]
 [ 54]
 [ 46]
 [101]
 [100]
 [102]]
   [[ 82]
 [ 55]
 [ 46]
 [101]
 [100]
 [102]]
   [[ 82]
 [ 56]
 [ 46]
 [101]
 [100]
 [102]]
   [[ 82]
 [ 57]
 [ 46]
 [101]
 [100]
 [102]]


## Contents Overview

The MAT file (saved using MATLAB version 7.3 format) contains two top-level groups:

- **`#refs#`**:  
  Internal references used by MATLAB/HDF5.

- **`allData`**:  
  A MATLAB structure array that holds the raw data and header information for each EDF file. Since there are 10 EDF files processed, `allData` is organized as a structure array with 10 elements. Each element corresponds to one EDF file and has the following fields:
  
  1. **`fileName`**  
     - **Type:** MATLAB string  
     - **Shape:** (10, 1)  
     - **Description:** Contains the name of each EDF file.  
     - **Note:** When decoded in Python, the strings may include extra null characters (e.g., `R�1�.�e�d�f�`), which is typical of how MATLAB stores strings in this format.
  
  2. **`hdr`**  
     - **Type:** MATLAB structure  
     - **Description:** Contains header information as returned by `edfread`. This may include metadata such as:
       - Sampling rates
       - Number of channels
       - Channel labels
       - Other EDF-specific information
     - **Example Access in MATLAB:**  
       ```matlab
       firstHeader = allData(1).hdr;
       ```
  
  3. **`record`**  
     - **Type:** Numeric matrix  
     - **Description:** Contains the raw signal data from the EDF file. Each matrix is typically organized as `[channels x samples]`.
     - **Example Access in MATLAB:**  
       ```matlab
       firstRecord = allData(1).record;
       ```

### In MATLAB

1. **Loading the MAT File:**
   ```matlab
   load('EDF_RawData.mat', 'allData');

1. Basic features (Time-domain) per 30s epoch:
From EEG (marius), EMG, EOG, ECG, etc.
	•	Mean, variance, RMS, peak-to-peak, zero crossings
	•	Hjorth parameters: Activity, Mobility, Complexity
	•	EMG tone (muscle tension), ECG R-R intervals
	6 features x 6 signals
	

- eeg is done
- emg is half-done (but not that useful)
- eog is done
- position - let's not do it
- 


2. Frequency-domain features (FFT, Welch PSD):

For each signal:
	•	Relative power in delta, theta, alpha, beta, gamma bands
	•	Bandpower ratios (e.g., delta/alpha, theta/beta)
	•	Spectral entropy, median frequency

10 features × 6 signals = 60 features
- take from Marius notebook
- 

3. Time-frequency features (Wavelet or STFT):

Wavelet energy in bands, scalogram statistics
	•	Morlet or Daubechies wavelets
	•	Transient spike detection (esp. for N2 spindles)

5 per signal = 30

4. Cross-signal features:

	•	EEG–EOG coherence (eye movement and brain activity)
	•	EEG–EMG correlation (REM: low EMG + EEG theta)
	•	ECG–Respiration coupling (cardio-respiratory phase sync)
	•	EMG × SpO2 drop rate (apnea signatures) - @Martin

15 interaction features

5. Morphological / statistical shape descriptors:
	•	Skewness, kurtosis of the signal window
	•	Spike frequency, duration, slope (esp. in EEG)
	•	Cross-correlation peak lag between L-EOG and R-EOG

10–20 from this group

6. Event-driven features (from annotations or custom detectors):
	•	Number of desaturation events in epoch
	•	Apnea duration stats
	•	Microarousal presence

10–15 from XML annotations

7. Dynamics between epochs (contextual features):
	•	Change in EEG delta from previous epoch
	•	Time since last REM
	•	Variance trend over past 3 epochs

about 10 more

Separate the notebook to do basic needed stuff and go more advanced

Feature matrix

