## Imports
For visualizing the UCI Epileptic Seizure dataset, we will be using Bokeh, NumPy, and Pandas. To output graphs to this notebook, we use output_notebook() instead of output_file().

In [4]:
import pandas as pd
import numpy as np
from bokeh.plotting import figure, output_notebook, show
import bokeh.colors
output_notebook()

## Data Processing
After loading in our imports, we can clean the data a bit to make sure it's ready for visualizing.

In [2]:
dataset = pd.read_csv('seizure_data.csv') # load in seizure dataset
dataset = dataset.drop(columns=['Unnamed: 0'], axis=1) # clean unwanted columns
X = dataset.drop(columns=['y'])
Y = dataset['y']

## Visualizing
We can take a quick look at our data graphically by simply representing each row of the dataset as a line:

In [32]:
colormap = {1: 'magenta', 2: 'mediumpurple', 3: 'blueviolet', 4: 'darkmagenta', 5: 'darkslateblue'}
colors = [colormap[x] for x in Y]

plot = figure(title="EEG Values over Time (20 values)", x_axis_label='Timestamps (178 total)', y_axis_label='EEG Value', x_range=(0, 1), y_range=(-1800,1000))
for index, row in X.head(20).iterrows():
    plot.line(np.linspace(0, 1, 178), row, line_color=colors[index], line_width=2, legend=str(Y[index]))

show(plot)

As we can see from the color legend, only the 1st class is visually recognizable. Since the 1st class corresponds to having a seizure, it should be expected that the EEG values will fluctuate more. Past that, the other classes aren't immediately apparent.

We can further investigate the differences between the classes by drawing the average line from each class. This will help us see if there are any obvious differences we can immediately see.

In [44]:
sum_series = {1: pd.Series(), 2: pd.Series(), 3: pd.Series(), 4: pd.Series(), 5: pd.Series()}
for index, row in X.head(100).iterrows():
    sum_series[Y[index]] = sum_series[Y[index]].add(row, fill_value=0)