<a href="https://colab.research.google.com/github/jmoranrun/HAR_Dist_ML/blob/main/HAR_Data_Class.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### NoteBook containing the HAR_Data Class
This Class is used to manipulate the UCI HAR Dataset.
(Human activity recognition using smartphones data set. https://archive.ics.uci.edu/ml/machine-learning-databases/00240)

Different users/subjects can be assigned to the training and test dataset.
The Class constructor takes in the: <br>
 (1) **har_dataset**  <br>

The har_dataset is a 3-D np array of form: [num_of_samples, num_of_timesamples, num_of_feature]

Where num_of_samples is the total number of recorded activites 
                    num_of_timesamples is the total number of timesammples in each recorded activity 
                   num_of_feature is the total number of features recorded using the smartphones.
  This dataset removes the orginal test and training user mapping to samples and we want to redefine this ourselves in various experiments. <br>

 (2) **y**: a vector containing the classification of each sample <br>

 (3) **sub_map**: a vector mapping each sample to a particular subject <br>

 (4) **test_users**: a list containing the subjects assigned to be test users <br>

  (5) **train_users**: a list containing the subjects assigned to be train users <br>



The method **select_user_val** will allow a percentage of the test users samples to moved into the training dataset,  the percentage amount is controlled with the parameter *percentage_mix*. It's guaranteed that all the test sujects will have some of their samples moved. 

In [None]:
import numpy as np
import random 

class Har_Data:
  def __init__(self, har_dataset, y, sub_map, test_users, train_user):
    self.har_dataset = har_dataset
    self.y = y
    self.sub_map = sub_map
    self.test_users = test_users
    self.train_user = train_user

#######################################################################
# Function to move selected test samples to traning dataset
# Lots of parameters - however, these are driven by a higher level
# function (select_user_val), which has a more user friendly parameter list
####################################################################### 
  def move_test_to_train(self, move_samples, move_sub_map, move_ys, move_idx, har_dataset_user_test, har_submap_user_test, har_dataset_user_train, har_submap_user_train, har_y_user_test, har_y_user_train):
    har_dataset_user_test = np.asarray(har_dataset_user_test)
    har_submap_user_test = np.asarray(har_submap_user_test)
    har_dataset_user_train = np.asarray(har_dataset_user_train)
    har_submap_user_train = np.asarray(har_submap_user_train)
    har_y_user_test = np.asarray(har_y_user_test)
    har_y_user_train = np.asarray(har_y_user_train)
    move_samples = np.asarray(move_samples)
    move_ys = np.asarray(move_ys)

    ## Delete the sample to be moved from the test dataset
    har_dataset_user_test = np.delete(har_dataset_user_test, move_idx, axis=0)
    har_submap_user_test = np.delete(har_submap_user_test, move_idx, axis=0)
    har_y_user_test = np.delete(har_y_user_test, move_idx, axis=0)

    ## Add the deleted samples to the trainind dataset
    har_dataset_user_train = np.concatenate((har_dataset_user_train, move_samples))
    har_submap_user_train = np.concatenate((har_submap_user_train, move_sub_map))
    har_y_user_train = np.concatenate((har_y_user_train, move_ys))
    return  har_dataset_user_test, har_submap_user_test, har_y_user_test, har_dataset_user_train, har_submap_user_train, har_y_user_train

#############################################################################################
# Function to create a user dataset
# har_dataset is a 3-D np array of form: [num_of_samples, num_of_timesamples, num_of_feature]
#             Where num_of_samples is the total number of recorded activites 
#                   num_of_timesamples is the total number of timesammples in each recorded activity 
#                   num_of_feature is the total number of features recorded using the smartphones.
# sub_map maps subjects (people) to each har_dataset sample
# test_user lists which subjects are assigned to the test dataset
# train_user lists which subjects are assigned to the training dataset
# percent_mix determines the percentage of the test dataset samples to move into the training dataset
#############################################################################################
  def select_user_val(self, percent_mix):
    har_dataset_user_test=[]
    har_submap_user_test=[]
    har_y_user_test=[]
    for user in self.test_users:
      har_dataset_user_test.extend(self.har_dataset[tuple(np.where(self.sub_map==user))].tolist()) 
      har_submap_user_test.extend(self.sub_map[np.where(self.sub_map==user)].tolist())
      har_y_user_test.extend(self.y[tuple(np.where(self.sub_map==user))].tolist()) 
    # Generate the default training dataset 
    har_dataset_user_train=[]
    har_submap_user_train=[]
    har_y_user_train=[]
    for user in self.train_user:
      har_dataset_user_train.extend(self.har_dataset[tuple(np.where(self.sub_map==user))].tolist())
      har_submap_user_train.extend(self.sub_map[np.where(self.sub_map==user)].tolist())
      har_y_user_train.extend(self.y[tuple(np.where(self.sub_map==user))].tolist()) 
 
  ## Now allow a percentage of test users samples to enter the training dataset
  ## Make sure that the percentage comes from each test user
    for user in self.test_users:
      har_user_sub_map=np.where(np.asarray(har_submap_user_test)==user)[0]
      har_user_sub_map_cnt=np.count_nonzero(np.asarray(har_submap_user_test)==user)
      num_take_sub_map = int(har_user_sub_map_cnt*percent_mix)
      if(num_take_sub_map > 0) :
        move_idx=random.sample(list(har_user_sub_map), num_take_sub_map)
        move_samples=[har_dataset_user_test[i] for i in move_idx]  
        move_sub_map=[har_submap_user_test[i] for i in move_idx]   
        move_ys=[har_y_user_test[i] for i in move_idx]   
        har_dataset_user_test, har_submap_user_test, har_y_user_test, har_dataset_user_train, har_submap_user_train, har_y_user_train \
          = self.move_test_to_train(move_samples, move_sub_map, move_ys, move_idx, har_dataset_user_test, har_submap_user_test, har_dataset_user_train, har_submap_user_train, har_y_user_test, har_y_user_train)
    return  np.asarray(har_dataset_user_test), np.asarray(har_submap_user_test), np.asarray(har_y_user_test), np.asarray(har_dataset_user_train), np.asarray(har_submap_user_train), np.asarray(har_y_user_train)
 