## EEG to Sound Project

![](https://rickerevolte.de/favicon.png)

This work in progress-project aims to extract information from EEG-Files to use them as a modifiers for sound emulating e.g. sythesizers.

Let's run a test on an unknown EEG File that has been delivered as a raw binary without nearly any supplemantary informations.
First, make sure you are runnung python 3 and install all necessary packages via pip.
This jupyter book is optimized for python 3.9.2
I recommend creating a virtual environment for this.
Need help?
https://packaging.python.org/en/latest/tutorials/installing-packages/

Now you are ready to import the necessary packages

In [None]:
import sys, os
from pathlib import Path
import numpy as np

We will do some ASCII-Sniffing in order to obtain Information out of our demofile's header. Let's set the path to the file:

In [None]:
path = "../demo_EEG/demofile.EEG"

Let's check our demofile's size. We will need that size later

In [None]:
with open(path, "rb") as f:
    data = f.read()
    print(f"{path} has the size of {len(data)} bytes")

The following code defines the function seek_ascii() with the 2 arguments path (string) and number (int) of bytes we want to seek for readable ascii code.
We will then call that function later

In [None]:
def seek_ascii(path, n):
    with open(path, "rb") as f:
        data = f.read(n)
    printable = []
    cur = bytearray()
    for b in data:
        if 32 <= b <= 126:
            cur.append(b)
        else:
            if len(cur) >= 4:
                try:
                    printable.append(cur.decode("ascii", errors="ignore"))
                except:
                    pass
            cur = bytearray()
    if len(cur) >= 4:
        try:
            printable.append(cur.decode("ascii", errors="ignore"))
        except:
            pass
    return printable, data

Now we will only check the header. For this we are reading the first few bytes. You can change that amount by giving n another value.

In [None]:
n = 8192

In [None]:
with open(path, "rb") as f:
    data = f.read(n)
    ascii_blocks, raw_head = seek_ascii(path, n=8192)
    print("\nGefundene lesbare ASCII-Blöcke (erste 8192 Bytes):")
    if not ascii_blocks:
        print("  (keine lesbaren ASCII-Blöcke gefunden)")
    else:
        for s in ascii_blocks:
            print(" -", s)

At the moment, we can see some readable information in our demo file, such as names and data, but not the important information we need to read our demo file correctly.
The missing information is:
Sampling rate
Number and order of channels
Data type (continuous, event-based)
Byte order (endian)
and more.
In the following, we will take steps to obtain as much accurate information as possible in order to read, visualize, and continue working with our EEG file.

In [None]:
import os
import re
import struct
import numpy as np
import mne
import matplotlib
matplotlib.use("Qt5Agg")
import matplotlib.pyplot as plt
import csv

# ------------------------------------------------
# Parameter
# ------------------------------------------------
home = os.path.expanduser("~/Documents/CODE/playground/python-Neuro/myEEG/write_markers-events")
markers_csv = os.path.join(home, "markers.csv")
events_txt = os.path.join(home, "events.txt")
valid_markers_txt = os.path.join(home, "valid_markers.txt")
EEG_FILE = "../demo_EEG/demofile.EEG"
CHANNEL_NAMES = [
    "Fp1","Fp2","F3","F4","C3","C4","P3","P4","O1","O2",
    "F7","F8","T3","T4","T5","T6","Fz","Cz","Pz"
]
SFREQ = 256.0
N_BYTES_MARKERS = 8192
EPOCH_TMIN, EPOCH_TMAX = -0.1, 0.5
# ------------------------------------------------
# Funktionen
# ------------------------------------------------
def detect_binary_offset(filename, min_offset=200, max_scan=20000):
    """Erkennt, wo der ASCII-Header endet und Binärdaten beginnen."""
    with open(filename, "rb") as f:
        data = f.read(max_scan)

    printable = np.array([(32 <= b <= 126 or b in (9, 10, 13)) for b in data])
    byte_values = np.frombuffer(data, dtype=np.uint8)
    window = 1024
    threshold = 0.2
    offset = None

    for i in range(min_offset, len(data) - window, 32):  # 32-Byte Schritte
        win = printable[i:i + window]
        frac_printable = np.mean(win)
        # print(frac_printable)
        stddev = np.std(byte_values[i:i + window])
        # typischerweise: Text = hohe ASCII-Quote, geringe Varianz
        # Binärdaten = niedrige ASCII-Quote, hohe Varianz
        if frac_printable < threshold and stddev > 20:
            offset = i
            break

    if offset is None:
        print("⚠️ Kein klarer Übergang erkannt – nehme Default 1024 Bytes")
        offset = 1024

    print(f"➡️  Binärdaten vermutlich ab Byte {offset}")
    with open(EEG_FILE, "rb") as f:
        header = f.read(offset)  # erste Anzahl Bytes lesen
        print(f"header: ", header.decode(errors="ignore"))  # zeigt ASCII-Inhalt
    
    return offset

def extract_markers(path, n_bytes=N_BYTES_MARKERS, sfreq=SFREQ):
    """Extrahiere Marker aus den letzten Bytes der EEG-Datei."""
    with open(path, "rb") as f:
        # print("This is f: ",f)
        f.seek(-n_bytes, os.SEEK_END)
        tail = f.read()
        # print(f"tail of {N_BYTES_MARKERS} bytes: ", tail)

    # marker_re = re.compile(rb"(IMPEDANZ-WERTE|Augen auf|Augen zu|HV Anfang|HV Ende)")
    marker_re = re.compile(rb"(Augen auf|Augen zu|HV Anfang|HV Ende|IGNORED)")
    events = []
    for m in marker_re.finditer(tail):
        text = m.group(0).decode("latin1")
        start = m.start()
        prefix = tail[start-8:start]
        if len(prefix) >= 4:
            idx = struct.unpack("<I", prefix[0:4])[0]
            onset = idx / sfreq
            events.append((onset, text))
    return events

def markers_to_events(markers, sfreq):
    """Erzeuge MNE-kompatibles events-Array."""
    event_id = {}
    events = []
    for onset, desc in markers:
        if desc not in event_id:
            event_id[desc] = len(event_id) + 1
        sample_idx = int(onset * sfreq)
        events.append([sample_idx, 0, event_id[desc]])
    return np.array(events, dtype=int), event_id

def check_for_nans(evoked_dict):
    """Prüft NaNs in ERP-Daten."""
    valid_evokeds = {}
    for cond, evoked in evoked_dict.items():
        if np.isnan(evoked.data).any():
            print(f"⚠️  {cond}: enthält NaNs – wird übersprungen")
        else:
            valid_evokeds[cond] = evoked
    return valid_evokeds

# ------------------------------------------------
# Hauptteil
# ------------------------------------------------

def main():
    # --- Rohdaten laden ---
    print(f"loading file: ", EEG_FILE)
    OFFSET = detect_binary_offset(EEG_FILE)
    data = np.fromfile(EEG_FILE, dtype=np.int16, offset=OFFSET) #dtype EEG Daten ist int16 - Verwendung von int64 viertelt die Dauer und Marker fallen raus
    print("Länge data: ",len(data))
    print("Size data: ",data.size)
    n_channels = len(CHANNEL_NAMES)
    print("data.size/n_channels: ",data.size/n_channels)
    n_samples = data.size // n_channels
    print("abgerundet: ",(n_samples))
    data = data[: n_samples * n_channels] # liste
    print(len(data))
    print("index 0: ",data[0])
    data = data.reshape(n_samples, n_channels).T # macht 2D Array aus Daten

    # Optional: in µV umwandeln
    data = data.astype(np.float64) * 0.195  # Beispiel-Skalierung, je nach Gerät
    info = mne.create_info(CHANNEL_NAMES, SFREQ, ch_types="eeg")
    montage = mne.channels.make_standard_montage("standard_1020")
    info.set_montage(montage)
    raw = mne.io.RawArray(data, info)

    # --- Marker extrahieren ---
    markers = extract_markers(EEG_FILE, sfreq=SFREQ)
    print(markers)
    # markers = [(0.9, "Bla"), (1, "Blo"), (10, "hier bin ich")]
    print(f"\nGefundene Marker: {len(markers)}")
    for whatever, wasanderes in markers:
        print(f"{whatever:.1f} s → {wasanderes}")

    # --- Marker außerhalb des gültigen Bereichs verwerfen ---
    valid_markers = [(on, tx) for on, tx in markers if 0 <= on <= raw.times[-1]]
    print(f"\nMarker innerhalb Datenbereich: {len(valid_markers)}")
    for onset, text in valid_markers:
        print(f"{onset:.3f} s → {text}")
    print(valid_markers) 

    # --- Annotationen ---
    annotations = mne.Annotations(
        onset=[on for on, _ in valid_markers],
        duration=[1.0] * len(valid_markers),
        description=[tx for _, tx in valid_markers]
    )
    raw.set_annotations(annotations)

    raw.plot(n_channels=19, duration=10.0, scalings="auto", block=True)
if __name__ == "__main__":
    main()


And, just for demonstration, this section of code is the fully working, but still we are missing an important information, which is the correct order of the channels. Up to this point, we have applied standards that do not necessarily apply.
Before we start looking for typical neurological responses to trigger events in another tutorial in order to assign the neurological responses to the corresponding stimulated brain areas, we will now take a detour to analyze the change in frequency over time.

To be continued
(Last updated: November 2025)