## EEG to Sound Project

![](https://rickerevolte.de/favicon.png)

This work in progress-project aims to extract information from EEG-Files to use them as a modifiers for sound emulating e.g. sythesizers.

Let's run a test on an unknown EEG File that has been delivered as a raw binary without nearly any supplemantary informations.
First, make sure you are runnung python 3 and install all necessary packages via pip.
This jupyter book is optimized for python 3.9.2
I recommend creating a virtual environment for this.
Need help?
https://packaging.python.org/en/latest/tutorials/installing-packages/

Now you are ready to import the necessary packages

In [1]:
import sys, os
from pathlib import Path
import numpy as np

We will do some ASCII-Sniffing in order to obtain Information out of our demofile's header. Let's set the path to the file:

In [2]:
path = "../demo_EEG/demofile.EEG"

Let's check our demofile's size. We will need that size later

In [3]:
with open(path, "rb") as f:
    data = f.read()
    print(f"{path} has the size of {len(data)} bytes")

../demo_EEG/demofile.EEG has the size of 11250464 bytes


The following code defines the function seek_ascii() with the 2 arguments path (string) and number (int) of bytes we want to seek for readable ascii code.
We will then call that function later

In [4]:
def seek_ascii(path, n):
    with open(path, "rb") as f:
        data = f.read(n)
    printable = []
    cur = bytearray()
    for b in data:
        if 32 <= b <= 126:
            cur.append(b)
        else:
            if len(cur) >= 4:
                try:
                    printable.append(cur.decode("ascii", errors="ignore"))
                except:
                    pass
            cur = bytearray()
    if len(cur) >= 4:
        try:
            printable.append(cur.decode("ascii", errors="ignore"))
        except:
            pass
    return printable, data

Now we will only check the header. For this we are reading the first few bytes. You can change that amount by giving n another value.

In [5]:
n = 8192

In [6]:
with open(path, "rb") as f:
    data = f.read(n)
    ascii_blocks, raw_head = seek_ascii(path, n=8192)
    print(f"\nReadable ASCII blocks found (first {n} Bytes):")
    if not ascii_blocks:
        print("  (Readable ASCII blocks found)")
    else:
        for s in ascii_blocks:
            print(" -", s)


Readable ASCII blocks found (first 8192 Bytes):
 - COHERENCE
 - 27101553CB
 - Cooper
 - Dale
 - 22.02.1959M
 - Routine EEG
 - PRAXIS
 - !5 6!8!:!=!? 
 - !a b!p5f!/6N%I$J%32Q"V#U"<]'\ ]#\z,n-i,j-
 - _i+y*~+}*
 - ,.,/
 - $.$/
 - <.</
 - 4.4/3.0/
 - /o.l/k.h/g.d/c.`/
 - .|/{.x/w.t/s.p/O.L/K.H/G.D/C.@/_.\/[.X/W.T/S.P/
 - //.,/+.(/'.$/#. /?.</;.8/7.4/3.0/
 - /o.l/k.h/g.d/c.`/
 - .|/{.x/w.t/s.p/O.L/K.H/G.D/C.@/_.\/[.X/W.T/S.P/
 - }.(/
 - u. /
 - m.8/


At the moment, we can see some readable information in our demo file, such as names and dates, but not the important information we need to read our demo file correctly.
The missing information is:
Sampling rate
Number and order of channels
Data type (continuous, event-based)
Byte order (little-endian/big-endian)
and more.
In the following, we will take steps to obtain as much accurate information as possible in order to read, visualize, and continue working with our EEG file.

We try to draw conclusions about the number of channels based on the file size, the approximate examination time, and various standard sizes as well as different types of data, e.g. 16 bit, 32 bit per sample.

Our numerator is the size of our demofile.
Our denominator will then be: number of channels * number of bytes per sample * samplingrate

In [7]:
def guess_duration(filesize, n_channels=(), dtype_bytes_options=()):
    print("\nDauerabschätzung (bei verschiedenen bytes/pro sample):")
    for nch in n_channels:
        for bps in dtype_bytes_options:
            seconds = filesize / (nch * bps * 256.0)  # using 256 Hz as example
            mm = int(seconds // 60)
            ss = int(seconds % 60)
            print(f" - number of channels={nch} / bytes/sample={bps}: {seconds:.2f} s  → {mm} min {ss} s")

We assume different numbers of channels:

In [8]:
n_channels=(19,21,23)

...and different byte options

In [9]:
dtype_bytes_options=(2,4)

And will run the function guess_duration

In [10]:
with open(path, "rb") as f:
    data = f.read()
    filesize = len(data)
    print(filesize)

11250464


In [11]:
guess_duration(filesize, n_channels, dtype_bytes_options)


Dauerabschätzung (bei verschiedenen bytes/pro sample):
 - number of channels=19 / bytes/sample=2: 1156.50 s  → 19 min 16 s
 - number of channels=19 / bytes/sample=4: 578.25 s  → 9 min 38 s
 - number of channels=21 / bytes/sample=2: 1046.36 s  → 17 min 26 s
 - number of channels=21 / bytes/sample=4: 523.18 s  → 8 min 43 s
 - number of channels=23 / bytes/sample=2: 955.37 s  → 15 min 55 s
 - number of channels=23 / bytes/sample=4: 477.69 s  → 7 min 57 s


The function endian_test will run some tests on our demofile in order to find out the most possible byte-order little- or big-endian

Fortunately our patient remembers a rough time estimation of ca 20 minutes and made a selfie during the examination in which we could see the EEG device with 23 channels.

In [12]:
def endian_test(path, n_channels=23, dtype='int16', n_samples_to_read=5000):
    dt_size = np.dtype(dtype).itemsize
    bytes_needed = n_channels * n_samples_to_read * dt_size
    filesize = os.path.getsize(path)
    if filesize < bytes_needed:
        n_samples_to_read = max(1, filesize // (n_channels * dt_size))
        bytes_needed = n_channels * n_samples_to_read * dt_size
    print(f"\nendianness/statistical test: read first {n_samples_to_read} samples per channel (total {bytes_needed} Bytes).")
    with open(path, "rb") as f:
        raw = f.read(bytes_needed)
    arr_le = np.frombuffer(raw, dtype='<i2')  # little-endian int16
    arr_be = np.frombuffer(raw, dtype='>i2')  # big-endian int16
    # try to reshape assuming interleaved samples (time major)
    for name, arr in (("little-endian", arr_le), ("big-endian", arr_be)):
        if arr.size % n_channels != 0:
            print(f"  {name}: not divisible by {n_channels} (len={arr.size})")
            continue
        arr2 = arr.reshape((-1, n_channels)).T  # shape (n_channels, n_times)
        mins = arr2.min(axis=1)
        maxs = arr2.max(axis=1)
        means = arr2.mean(axis=1)
        stds = arr2.std(axis=1)
        print(f"  {name}: samples total {arr.size}, per channel {arr2.shape[1]}")
        print(f"    channel-min (first 5): {mins[:5].tolist()}")
        print(f"    channel-max (first 5): {maxs[:5].tolist()}")
        print(f"    channel-mean (first 5): {[round(x,2) for x in means[:5]]}")
        print(f"    channel-std  (first 5): {[round(x,2) for x in stds[:5]]}")

In [13]:
endian_test(path, n_channels=23, dtype='int16', n_samples_to_read=5000)


endianness/statistical test: read first 5000 samples per channel (total 230000 Bytes).
  little-endian: samples total 115000, per channel 5000
    channel-min (first 5): [-32768, -32768, -32768, -32765, -32768]
    channel-max (first 5): [32524, 32765, 32766, 32767, 32524]
    channel-mean (first 5): [86.78, -93.25, -577.88, -295.54, -190.04]
    channel-std  (first 5): [18893.33, 19013.95, 18868.42, 18810.12, 18634.51]
  big-endian: samples total 115000, per channel 5000
    channel-min (first 5): [-32512, -32255, -32051, -32512, -32255]
    channel-max (first 5): [32716, 32000, 32257, 30068, 32558]
    channel-mean (first 5): [1331.17, 1327.24, 1326.68, 1291.23, 1296.6]
    channel-std  (first 5): [2736.18, 2756.21, 2791.56, 2794.53, 2786.64]


We are pretty sure about the total nuber of channels beeing 19, because of ["FP1","F3","C3","P3","O2","Fz","Cz","Pz","O1","F4","C4","P4","FP2","T3","T4","T5","T6","NE","A1","A2","EX1","EX2","MISC"] NE, A1, A2, EX1 and EX2 and MISC are mastoides for referencing and are usually not recorded. So I'm pretty sure about having here a standard 10-20 EEG with 19 channels.
I will continue with this in the next book 02_readAndplotEEG.

And, just for demonstration, this section of code is the fully working, but still we are missing an important information, which is the correct order of the channels. Up to this point, we have applied standards that do not necessarily apply.
Before we start looking for typical neurological responses to trigger events in another tutorial in order to assign the neurological responses to the corresponding stimulated brain areas, we will now take a detour to analyze the change in frequency over time.

To be continued
(Last updated: November 2025)