# Distribution of values for channels and sensors

An interesting observation is how the values are distributed for the different channels and sensor if we ignore the temporal relationship. If these distributions are very different this is a hind that there is already some general structural difference which can likely be used for a classifyer. The reverse is not nessecarely true. Two very similar distributions ignoring the temporal relationship can still be very different when we put the timing back in.

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import pickle
import numpy as np
import gestureanalysis.specific_utils as sutils
from gestureanalysis.constants import Constants
from typing import List, Callable
from sklearn.neighbors import LocalOutlierFactor
from sklearn.ensemble import IsolationForest

In [3]:
base_path = "/home/jsimon/Documents/thesis/gesture-analysis/data/"
time_groups_path_corrected_pickl = base_path+"transformed/time_added/all/time-and-groups-corrected-all.pkl"
stats_added_path_pickl = base_path+"transformed/stats_added/all/raw_stats-added-all.pkl"

In [4]:
# check working directory and adopt if needed
import os
os.getcwd()

'/home/jsimon/Documents/thesis/gesture-analysis/scripts'

In [5]:
# in case you need to reload, and know it exists:
with open( time_groups_path_corrected_pickl, "rb" ) as users_pickle_file:
    users = pickle.load(users_pickle_file)

In [6]:
usernames = users.keys()
gestures = users['AB73']['label'][0]['data']['gesture'].unique()

## Enumerate all the channels

Let's see what channels we have. We have 63 channels with sensor data. The last columns from the dataset are a erroneous magnetometer in x,y,z and the labels. The other channels are somewhat mixed up. Generally the first channels are flex sensors with the exception of 1_Thumb_pressure which is a pressure sensor. The other pressure sensors are from 12_Finger_1_pressure to 15_Finger_4_pressure followed by two more flex sensors on the wrist. From that we always have triplets of an accelerometer x,y,z followed by a gyroscope x,y,z triplet for each of the IMUs on the glove. We finish with the magnetometer x,y,z.

In [7]:
cols = users['AB73']['glove_merged'].columns
cols

Index(['0_Thumb_base', '1_Thumb_pressure', '2_Angle_between_thumb_and_hand',
       '3_Finger_1_base', '4_Finger_1_tip', '5_Finger_2_base',
       '6_Finger_2_tip', '7_Finger_3_base', '8_Finger_3_tip',
       '9_Finger_4_base', '10_Finger_4_tip', '11_Thumb_tip',
       '12_Finger_1_pressure', '13_Finger_2_pressure', '14_Finger_3_pressure',
       '15_Finger_4_pressure', '16_Wrist_extension', '17_Wrist_flexion',
       '18_Finger_1_Accel_X', '19_Finger_1_Accel_Y', '20_Finger_1_Accel_Z',
       '21_Finger_1_Gyro_X', '22_Finger_1_Gyro_Y', '23_Finger_1_Gyro_Z',
       '24_Finger_2_Accel_X', '25_Finger_2_Accel_Y', '26_Finger_2_Accel_Z',
       '27_Finger_2_Gyro_X', '28_Finger_2_Gyro_Y', '29_Finger_2_Gyro_Z',
       '30_Finger_3_Accel_X', '31_Finger_3_Accel_Y', '32_Finger_3_Accel_Z',
       '33_Finger_3_Gyro_X', '34_Finger_3_Gyro_Y', '35_Finger_3_Gyro_Z',
       '36_Finger_4_Accel_X', '37_Finger_4_Accel_Y', '38_Finger_4_Accel_Z',
       '39_Finger_4_Gyro_X', '40_Finger_4_Gyro_Y', '41_Finger_

Since the channels are not too well structured I created a lookup dictionary with the indexes of the channels groups around some concepts. You find all the indices of the individual sensors in it, as you find anatomic concept like all the data of finger 1 or the thumb.

In [8]:
idx_keys = Constants().raw_indices.keys()
print(idx_keys)

dict_keys(['flex', 'pressure', 'accel', 'gyro', 'magnetometer', 'lin_accel', 'thumb', 'finger_1', 'finger_2', 'finger_3', 'finger_4', 'wrist', 'palm'])


## Explore the distribution for outliers

A classical thing is to explore the disbribution for outliers. In our case we assume outliers come from bad sensors. If that is true, tue to the distinctive nature of ouliers being seldom and good to detect, outliers bear the danger to be picked up as features for detecting certain gestures who just by chance happen to have outliers in them.

In [9]:
higher_percentile = 98.5
lower_percentile = 1.5

In [10]:
def describe_value_range(columes, remove_outliers, show_overal):
    def describe_values(line, username):
        print('user: ', username)
        print(pd.DataFrame(data=line).describe())
        print("")
    all_vals = sutils.collect_values(usernames, users, columes, remove_outliers, 
                                     higher_percentile, lower_percentile, 
                                     True, describe_values, use_tqtm=True)
    if show_overal:
        print(pd.DataFrame(data=all_vals).describe())

In [11]:
def get_all_values(columes):
    all_vals = sutils.collect_values(usernames, users, columes, False, 
                                     None, None, False, None, use_tqtm=True)
    return all_vals

In [12]:
def find_outliers_(X):
    print('.... start finding outliers ....')
    #clf = LocalOutlierFactor(n_neighbors=20, contamination=0.1, n_jobs=3)
    clf = IsolationForest(max_samples=100, n_jobs=3)
    print('.... fit ....')
    clf.fit(X)
    print('.... predict ....')
    y_pred = clf.predict(X)
    print('.... done ....')
    #X_scores = clf.negative_outlier_factor_
    return y_pred #, X_scores

def visualise_outliers(X, y_pred, X_scores):
    plt.title("Local Outlier Factor (LOF)")
    plt.scatter(X, color='k', s=3., label='Data points')
    # plot circles with radius proportional to the outlier scores
    radius = (X_scores.max() - X_scores) / (X_scores.max() - X_scores.min())
    plt.scatter(X, s=1000 * radius, edgecolors='r',
            facecolors='none', label='Outlier scores')
    plt.axis('tight')
    plt.xlim((-5, 5))
    plt.ylim((-5, 5))
    plt.xlabel("prediction errors: %d" % (n_errors))
    legend = plt.legend(loc='upper left')
    legend.legendHandles[0]._sizes = [10]
    legend.legendHandles[1]._sizes = [20]
    plt.show()

In [13]:
all_flex = Constants().raw_indices['flex']['all']
print(cols[all_flex])

Index(['0_Thumb_base', '2_Angle_between_thumb_and_hand', '3_Finger_1_base',
       '4_Finger_1_tip', '5_Finger_2_base', '6_Finger_2_tip',
       '7_Finger_3_base', '8_Finger_3_tip', '9_Finger_4_base',
       '10_Finger_4_tip', '11_Thumb_tip', '16_Wrist_extension',
       '17_Wrist_flexion'],
      dtype='object')


In [14]:
X = get_all_values(cols[all_flex])

HBox(children=(IntProgress(value=0, max=23), HTML(value='')))

skipping userAE30



In [15]:
X = np.array(X).reshape(-1, 1)

In [16]:
y_pred = find_outliers_(X)

.... start finding outliers ....
.... fit ....




MemoryError: 

In [None]:
visualise_outliers(X, y_pred, X_scores)

In [None]:
test = None