In [2]:
import os
#os.environ['http_proxy'] = ''
#os.environ['https_proxy'] = ''

In [None]:
import pandas as pd
from sensiml import SensiML

client = SensiML()
client.project='Model Building Demo'
client.pipeline='feature_vector_to_model'

### Prepare the feature vectors and upload the data to SensiML server

This data set consists of many reps which have 128 feature vectors each as well as a label column containing the ground truth and the rep number. We drop the rep number because we don't want that as a feature vector. Note: All feature vectors must be integers scaled between 0 and 254. 

The model generator requires that the ground truth column has the name "label", so we rename our ground truth column.

In [4]:
df= pd.read_csv('Support/scaled_reducedFeatures_128.csv')
# Rename the 0 to label and drop the columns we don't need
df = df.rename(columns={'0':'label'}).drop(['1','130'], axis=1)


client.upload_dataframe('feature_vectors_128.csv', df, )

Uploading file "feature_vectors_128.csv" to KB Cloud.
Upload of file "feature_vectors_128.csv"  to KB Cloud completed.


<sensiml.client.SensiML at 0x253670453c8>

### Reset the pipeline and set the input data to be the file we just uploaded

In [5]:
client.pipeline.reset()
client.pipeline.set_input_data('feature_vectors_128.csv', label_column='label')

Group columns must be a list of strings.


### Model TVO description

* train_validate_optimze (tvo) : This step defines the model validation methodology, the classification method to use and the training algorithm to train the classifier with. 

A tvo step is composed of a 
* Classifier
* Training algorithm 
* Validation method

For this pipeline we use the validation method "Stratified K-Fold Cross-Validation". Which splits the data into 5 folds and iteratively trains on 4 folds, test on 1 fold.  The training algorithm Hierarchical Clustering with Neuron Optimization uses a clustering algorithm to optimize neurons placement in feature space. After the model has been trained, neurons are loaded into the pvp classifier and model validation is performed using the selected validation method.

In [9]:
client.pipeline.set_validation_method('Stratified K-Fold Cross-Validation', params={'number_of_folds':5})

client.pipeline.set_classifier('PME', params={"classification_mode":'RBF','distance_mode':'L1'})

client.pipeline.set_training_algorithm('Hierarchical Clustering with Neuron Optimization',
                                    params = {'number_of_neurons':7,
                                             'cluster_method':"DLHC"})

client.pipeline.set_tvo({'validation_seed':0})

Execute the pipeline on kb cloud. 

In [10]:
results, stats = client.pipeline.execute()

Executing Pipeline with Steps:

------------------------------------------------------------------------
 0.     Name: feature_vectors_128.csv   		Type: featurefile              
------------------------------------------------------------------------
------------------------------------------------------------------------
 1.     Name: tvo                       		Type: tvo                      
------------------------------------------------------------------------
	Classifier: PVP
		Param: classification_mode: RBF
		Param: distance_mode: L1
		Param: reinforcement_learning: False
		Param: reserved_patterns: 0

	Training Algo: Hierarchical Clustering with Neuron Optimization
		Param: cluster_method: DLHC
		Param: number_of_neurons: 7

	Validation Method: Stratified K-Fold Cross-Validation
		Param: number_of_folds: 5

------------------------------------------------------------------------


Checking Pipeline Status:


Status: Running  Time:   5 sec.  Step:  1  Name: tvo              T

In [11]:
results.summarize()

TRAINING ALGORITHM: Hierarchical Clustering with Neuron Optimization
VALIDATION METHOD:  Stratified K-Fold Cross-Validation
CLASSIFIER:         PVP

AVERAGE METRICS:
F1-SCORE:    70.3   sigma 3.18
SENSITIVITY: 70.6   sigma 3.51
PRECISION:   87.1   sigma 2.08

--------------------------------------

STRATIFIED K-FOLD CROSS-VALIDATION MODEL RESULTS

MODEL INDEX: Fold 4
ACCURACY: 54.55
NEURONS: 7

MODEL INDEX: Fold 2
ACCURACY: 49.09
NEURONS: 7

MODEL INDEX: Fold 3
ACCURACY: 40.00
NEURONS: 7

MODEL INDEX: Fold 0
ACCURACY: 40.00
NEURONS: 7

MODEL INDEX: Fold 1
ACCURACY: 38.18
NEURONS: 7

