<a href="https://colab.research.google.com/github/paulovictorcorreia/anomaly-detection-sax/blob/main/Gridsearch_Optimization_in_Unknown_Data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 1 Introdução

Neste notebook, realizaremos a programação de uma classe que realiza a busca por parâmetros ótimos com dados de uma série temporal comprimidas, com o objetivo de ajustar os *k-folds* de acordo com os parâmetros de compressão, alterando o tamanho das *features* e do *target* para realizar a validação cruzada adequadamente para diferentes tamanhos de validação cruzada, permitindo que nós tenhamos uma capacidade melhor de identificar o desempenho real do modelo, através da análise das estatística de cada validação.

Inicialmente, neste notebook, importaremos os pacotes e classes necessárias para a implementação do modelo de detecção de anomalias proposto, para em seguida importar os dados. Logo depois, realizaremos a implementação da classe customizada de gridsearch, seguida de um rápido teste para verificar se está ou não funcionando corretamente. Finalmente, aplicaremos o gridsearch para diversos parâmetros da mesma forma que foi feita no CBA 2020, com o objetivo de identificar com maior confiança qual modelo desempenhou melhor.

O detector de anomalias utilizado é uma rede neural com LSTM utilizando o *pytorch*.

Ao final, pretendemos plotar gráficos de barras paralelas para plotar o desempenho de diferentes parâmetros de modelos.

## 1.1 Imporanto as classes e os pacotes necessários 

In [1]:
!pip install pyts skorch kaleido psutil
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns; sns.set(style="whitegrid")
import pandas as pd
import plotly.io as pio

from sklearn.pipeline import Pipeline
from sklearn.base import BaseEstimator, TransformerMixin, RegressorMixin, ClassifierMixin
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, recall_score, jaccard_score, roc_auc_score, precision_score, f1_score, make_scorer, roc_auc_score
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import cross_validate
from sklearn.model_selection import ParameterGrid
from sklearn.model_selection import StratifiedKFold
from pyts.approximation import PiecewiseAggregateApproximation, SymbolicAggregateApproximation
from scipy.ndimage.interpolation import shift

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.utils.data as Data
import torchvision.transforms as transforms
from skorch import NeuralNetClassifier

Collecting pyts
[?25l  Downloading https://files.pythonhosted.org/packages/b6/2b/1a62c0d32b40ee85daa8f6a6160828537b3d846c9fe93253b38846c6ec1f/pyts-0.11.0-py3-none-any.whl (2.5MB)
[K     |████████████████████████████████| 2.5MB 10.1MB/s 
[?25hCollecting skorch
[?25l  Downloading https://files.pythonhosted.org/packages/18/c7/2f6434f9360c91a4bf14ae85f634758e5dacd3539cca4266a60be9f881ae/skorch-0.9.0-py3-none-any.whl (125kB)
[K     |████████████████████████████████| 133kB 37.8MB/s 
[?25hCollecting kaleido
[?25l  Downloading https://files.pythonhosted.org/packages/83/39/1c960a971f8d27e6aa4f1462dc048275c98bbb46f16871269ba941bb4a04/kaleido-0.1.0-py2.py3-none-manylinux1_x86_64.whl (74.6MB)
[K     |████████████████████████████████| 74.6MB 41kB/s 
Installing collected packages: pyts, skorch, kaleido
Successfully installed kaleido-0.1.0 pyts-0.11.0 skorch-0.9.0


In [None]:
class FeatureSelector( BaseEstimator, TransformerMixin ):
    #Class Constructor 
    def __init__( self, feature_names ):
        self.feature_names = feature_names 
    
    #Return self nothing else to do here    
    def fit( self, X, y = None ):
        return self 
    
    #Method that describes what we need this transformer to do
    def transform( self, X, y = None ):
        return X[ self.feature_names ] 

class SaxTransformer(BaseEstimator, TransformerMixin):
    # Class constructor
    def __init__(self, alphabet_size=8, window_size=100):
        self.alphabet_size = alphabet_size
        self.window_size = window_size
        self.alphabet = "abcdefghijklmnopqrstuvwxyz"
        self.alphabet = self.alphabet[:alphabet_size]
    # Return Self, nothing else to do here
    def fit(self, X, y=None):
        return self 
    # Method that describes what this method needs to do
    def transform(self, X, y=None):
        words = []
        num_cols = X.shape[1]
        for col in range(num_cols):
            paa_transformer = PiecewiseAggregateApproximation(
                window_size=self.window_size)
            data_paa = paa_transformer.transform(X[:, col].reshape(1, -1))
            sax_transformer = SymbolicAggregateApproximation(
                n_bins=self.alphabet_size, 
                strategy="normal")
            word = sax_transformer.transform(data_paa)
            words.append(word)
        words_att = []
        for i in range(num_cols):
            words_att.append(words[i][0])
        output = pd.DataFrame(words_att).transpose()
        output = self.symbol2num(output)
        return output
    
    
    def symbol2num(self, X):
        '''
        Convert SAX symbols to ordered numbers from 0 up to 1.
        '''
        num_rows = X.shape[0]
        num_cols = X.shape[1]
        values_consult_table = {
            k: v for v, k in enumerate(sorted(set(self.alphabet)))
            } 
        numbers_from_char = np.empty((num_rows, num_cols))
        for i in range(num_cols):
            for j in range(num_rows):
                numbers_from_char[j, i] = values_consult_table[X.iloc[j, i]]
        numbers_from_char = np.array(numbers_from_char)
        numbers_from_char = np.array(
            list(map(lambda x: x/(self.alphabet_size - 1), numbers_from_char))
            )
        return numbers_from_char

class GeradorAtrasos(BaseEstimator, TransformerMixin):
    #Class Constructor 
    def __init__( self, delays=1):
        self.delays = delays
    
    #Return self nothing else to do here    
    def fit( self, X, y = None ):
        return self 
    
    #Method that describes what we need this transformer to do
    def transform( self, X, y = None ):
        num_cols = X.shape[1]
        num_rows = X.shape[0]
        num_delays = self.delays + 1
        # delays_array = np.empty((num_rows, num_cols))
        delays = []
        for i in range(num_cols):
            for j in range(num_delays):
                delays.append(shift(X[:, i], 1, cval=np.nan))
        delays = np.array(delays).transpose()
        delays = pd.DataFrame(delays)
        delays.fillna(method="bfill", inplace=True)
        return delays

In [None]:
# PyTorch Class
# torch.manual_seed(42)
class TorchLSTM(nn.Module):
    def __init__(self, num_inputs=10, num_units=15, num_layers=1):
        super(TorchLSTM, self).__init__()
        # LSTM parameters
        self.num_inputs = num_inputs
        self.num_units = num_units
        self.num_layers = num_layers

        # Layers of neurons
        self.feature = nn.LSTM(num_inputs, num_units, num_layers)
        self.linear1 = nn.Linear(num_units, num_units)     
        self.output = nn.Linear(num_units, 1)

        # Activation Functions and Dropout
        self.relu = nn.ReLU(inplace=True)
        self.sigmoid = nn.Sigmoid()
        self.dropout = nn.Dropout(p=0.1)
        self.softmax = nn.LogSoftmax(dim=1)
        
    
    def forward(self, x):
        x = x.transpose(1, 2)
        x, hs = self.feature(x)
        x = self.relu(self.linear1(x))
        x = self.dropout(x)
        x = self.output(x)
        x = self.sigmoid(x)
        x = x.squeeze()
        
        return x

In [None]:
# Class to use on Pipeline
class AnomalyLSTMDetector(BaseEstimator, ClassifierMixin):
    def __init__(self, start, end, num_units=15, 
                num_layers=1, num_inputs=3, batch_size=10, random_state=None):

        # Neural Network Parameters and models
        self.model = None
        self.num_inputs = num_inputs
        self.num_units = num_units
        self.num_layers = num_layers
        self.batch_size = batch_size
        self.random_state = random_state
        
        

        
        # Indice adjustment for target array
        self.start = start
        self.end = end

    def fit(self, X, y):
        torch.manual_seed(self.random_state)
        self.num_inputs = X.shape[1]
        # Modelo criado no Fit por causa do tamanho da entrada que deve
        # ser especificado pelo dataset utilizado.
        self.model = NeuralNetClassifier(TorchLSTM(self.num_inputs, 15, 1, ), 
                                        max_epochs=200, lr=0.01,
                                        train_split=None, criterion=nn.BCELoss,
                                        batch_size=32, iterator_train__shuffle=False,
                                        optimizer=torch.optim.Adam, verbose=0)
        
        word_size = X.shape[0]

        # Transformando target 'y' em tensor do tipo Float para
        # o modelo funcionar corretamente
        y_transformed = y
        # y_transformed = self.transforma_saida(self.start, self.end, 
        #                                       y, word_size)
        # y_transformed = torch.tensor(y_transformed)
        # y_transformed =  y_transformed.type(torch.FloatTensor)

        # Transformando as features em tensores no formato correto para
        # o modelo funcionar corretamente.
        num_rows = X.shape[0]
        num_cols = X.shape[1]
        X_transformed = X.reshape(num_rows, num_cols, 1)
        X_transformed = X_transformed.astype(np.float32)
        print(X_transformed.shape, y_transformed.shape)
        self.model.fit(X_transformed, y_transformed)
        return self


    def predict(self, X, y=None):
        X = X.values
        num_rows = X.shape[0]
        num_cols = X.shape[1]
        X = X.reshape(num_rows, num_cols, 1)
        X = X.astype(np.float32)
        y_pred = self.model.forward(X)        
        y_pred = (y_pred > 0.5)
        y_pred = self.filtrar_media_movel(y_pred)
        return y_pred

    def filtrar_media_movel(self, y):
        '''
        Aplica o filtro da média móvel no vetor de saída do modelo.
        '''
        y_filttered = pd.Series(y.flatten())
        y_filttered = y_filttered.rolling(window=5).mean()
        y_filttered = y_filttered.round().fillna(0).astype(int)
        return y_filttered

    def transforma_saida(self, start, end, ts, word_size):
        '''
        Converte a saída do modelo do domínio da amostragem original para o domínio
        da transformação SAX.
        '''
        size = len(ts)
        compression_rate = size / word_size
        output = np.zeros(word_size)
        for i in range(len(start)):
            low = int(start[i] / compression_rate)
            high = int(end[i] / compression_rate)
            output[low:high] = 1
        return output

## 1.2 Importando os dados a serem utilizados

In [None]:
!unzip DadosCBA.zip
falha1 = pd.read_csv("DadosCBA/falha1_100_110.csv")
falha1["estado"] = 0
falha1.loc[10000:11000, "estado"] = 1

falha2 = pd.read_csv("DadosCBA/falha2_100_110.csv")
falha2["estado"] = 0
falha2.loc[10000:11000, "estado"] = 1

falha8 = pd.read_csv("DadosCBA/falha8_120_130.csv")
falha8["estado"] = 0
falha8.loc[12000:13000, "estado"] = 1

falha13 = pd.read_csv("DadosCBA/falha13_120_130.csv")
falha13["estado"] = 0
falha13.loc[12000:13000, "estado"] = 1

falha20 = pd.read_csv("DadosCBA/falha20_120_130.csv")
falha20["estado"] = 0
falha20.loc[12000:13000, "estado"] = 1

Archive:  DadosCBA.zip
   creating: DadosCBA/.ipynb_checkpoints/
  inflating: DadosCBA/.ipynb_checkpoints/falha1_100_110-checkpoint.csv  
  inflating: DadosCBA/estrutura_do_nome.txt  
  inflating: DadosCBA/falha1_100_110.csv  
  inflating: DadosCBA/falha12_100_110_158_168.csv  
  inflating: DadosCBA/falha13_100h_11_20_61_70.csv  
  inflating: DadosCBA/falha13_120_130.csv  
  inflating: DadosCBA/falha1320_90_100_148_158.csv  
  inflating: DadosCBA/falha2_100_110.csv  
  inflating: DadosCBA/falha20_120_130.csv  
  inflating: DadosCBA/falha8_120_130.csv  
  inflating: DadosCBA/falha813_90_100_148_158.csv  
  inflating: DadosCBA/falha820_90_100_148_158.csv  


## 1.3 Criando funções de medição de desempenho do modelo


In [None]:
scoring = {
    'recall': make_scorer(recall_score), 'Precision': make_scorer(precision_score), 'f1_score': make_scorer(f1_score),
    'Accuracy': make_scorer(accuracy_score),
}

# 2 Implementação do modelo

Aqui será a implementação do modelo, que será uma classe python SEM polimorfismo, inicialmente.

Nessa abordagem inicial, buscariamos apenas os valores dos hiperparâmetros da etapa de compressão e geração de atrasos nos dados, e, ao final, pretendemos implementar uma busca também por modelos distintos de redes neurais.

## 2.1 Busca por parâmetros de compressão

O modelo aqui descrito

In [None]:
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score
class GridSearchComprimido(TransformerMixin, BaseEstimator):
    def __init__(self, *, modelo=None, param_grid=None,
                 random_state=42, scoring=None,
                 cv=4, start, end,
                 verbose=True, x_novo, y_novo, start_novo, end_novo):
        self._modelo = modelo 
        self._estimator = AnomalyLSTMDetector(
            start=[10000], end=[11000], 
            random_state=random_state)
        if param_grid:
            self._param_grid = ParameterGrid(param_grid)
        else:
            self._param_grid = param_grid
        self._random_state = random_state
        self._scoring = scoring
        self._cv = cv
        self.cv_results_ = pd.DataFrame()

        self.x_novo = x_novo
        self.y_novo = y_novo
        self.start_novo = start_novo
        self.end_novo = end_novo

        self._start = start 
        self._end = end
        self._verbose = verbose

    def fit(self, X, y=None):
        time_series_len = X.shape[0]
        modelo = self._modelo
        best_model = modelo
        best_score = 0
        for i, parameters in enumerate(self._param_grid):
            modelo.set_params(**parameters)
            # Define parâmetros para a transformação das features e do target
            window_size =self._modelo.get_params()["sax__window_size"]
            word_size = time_series_len/window_size + 1 
            word_size = int(word_size)

            # Converte features e targets para o domínio do SAX
            X_transformed, y_transformed = self.converter_dados_sax(X, word_size, modelo)
            X__transformed = X_transformed.values.reshape(X_transformed.shape[0], X_transformed.shape[1], 1)
            self._estimator.fit(X__transformed, y_transformed)
            

            time_series_len_novo = self.x_novo.shape[0]
            word_size = time_series_len_novo/window_size + 1 
            word_size = int(word_size)

            # Converte features e targets para o domínio do SAX
            X_transformed_novo, y_transformed_novo = self.converter_dados_sax_novo(self.x_novo, word_size, modelo)

            y_pred_novo = self._estimator.predict(X_transformed_novo)
            accuracy = accuracy_score(y_transformed_novo, y_pred_novo)
            precision = precision_score(y_transformed_novo, y_pred_novo)
            f1 = f1_score(y_transformed_novo, y_pred_novo)
            recall = recall_score(y_transformed_novo, y_pred_novo)

            scores = {
                "Accuracy": np.asarray(accuracy),
                "Precision": np.asarray(precision),
                "Recall": np.asarray(recall),
                "f1": np.asarray(f1),
            }
            print(scores)
            

            
            # Estapa de criação do dataset de saída com os dados do treinamento
            self.criar_cv_results(parameters, scores, i)
        return self

    def converter_dados_sax(self, X, word_size, modelo):
        '''
        Reamostra as features e o target da amostragem original para o domínio
        dos dados comprimidos pelo SAX.
        '''
        y_transformed = self.transforma_saida(
            self._start, self._end,
            X, word_size)
        y_transformed = torch.tensor(y_transformed)
        y_transformed =  y_transformed.type(torch.FloatTensor)
        X_transformed = modelo.fit_transform(X)
        return X_transformed, y_transformed
    
    def converter_dados_sax_novo(self, X, word_size, modelo):
        '''
        Reamostra as features e o target da amostragem original para o domínio
        dos dados comprimidos pelo SAX.
        '''
        y_transformed = self.transforma_saida(
            self.start_novo, self.end_novo,
            X, word_size)
        y_transformed = torch.tensor(y_transformed)
        y_transformed =  y_transformed.type(torch.FloatTensor)
        X_transformed = modelo.fit_transform(X)
        return X_transformed, y_transformed

    def criar_cv_results(self, parameters, desempenho_modelo, i):
        '''
        Cria um dataset com a média e o desvio padrão das métricas de desempenho
        escolhidas pelo usuário para cada conjunto de parâmetros de compressão
        passados pelo usuário.
        '''
        desempenho_modelo = pd.DataFrame(desempenho_modelo, index=[i])
        params = str(parameters)
        if self._verbose:
            print(params)
        params = [params] * desempenho_modelo.shape[0]
        desempenho_modelo["parameters"] = params
        desempenho_modelo = desempenho_modelo.groupby(by="parameters", ).agg(["mean", "std"])
        parameters = pd.DataFrame(parameters, index=[i])
        colunas_desempenhos = desempenho_modelo.columns.tolist()
        
        new_columns = list()
        for name, statistic in colunas_desempenhos:
            full_name = name + "_" + statistic
            new_columns.append(full_name)            
        desempenho_modelo.columns = new_columns
        desempenho_modelo.reset_index(inplace=True)
        desempenho_modelo.drop("parameters", axis=1, inplace=True)
        if i == 0:
            self.cv_results_ = pd.concat([parameters, desempenho_modelo], 
                                         axis=1)
        elif i > 0:
            current_search_parameters = pd.concat(
                [parameters.reset_index(drop=True), 
                 desempenho_modelo.reset_index(drop=True)], 
                axis=1,)
            
            # assert current_search_parameters.columns.to_list() == self.cv_results_.columns.to_list()
            self.cv_results_ = pd.concat(
                [self.cv_results_.reset_index(drop=True), 
                 current_search_parameters.reset_index(drop=True)], 
                axis=0,)
        return self

    def treina_modelo(self, modelo, X, y):
        print(y.shape)
        modelo_treinado = modelo.fit(X, y)
        return modelo_treinado    

    def transforma_saida(self, start, end, ts, word_size):
        '''
        Converte a saída do modelo do domínio da amostragem original para o 
        domínio da transformação SAX.
        '''
        size = len(ts)
        compression_rate = size / word_size
        output = np.zeros(word_size)
        for i in range(len(start)):
            low = int(start[i] / compression_rate)
            high = int(end[i] / compression_rate)
            output[low:high] = 1
        return output

In [None]:
feature_names = ["XMEAS01", "XMEAS10", "XMEAS21"]
alphabet_size = 8
window_size = 20
delays = 4
random_state = 42
teste_pipeline = Pipeline(steps=[
    ("ftr_sel", FeatureSelector(feature_names=feature_names)),
    ("std_scl", StandardScaler()),
    ("sax", SaxTransformer(alphabet_size=alphabet_size, 
                           window_size=window_size)),
    ("delays", GeradorAtrasos(delays=delays)),
    # ("clf", AnomalyLSTMDetector(start=[10000], end=[11000], 
    #                             random_state=random_state)),
])

search_space = {
    "sax__alphabet_size": np.arange(3, 9),
    "sax__window_size": np.arange(10, 41, 5),
    "delays__delays": np.arange(0, 5),}

In [None]:
grid_search = GridSearchComprimido(modelo=teste_pipeline, 
                                   param_grid=search_space, 
                                   start=[12000], end=[13001],
                                   scoring=scoring, verbose=True,
                                   start_novo=[10000], end_novo=[11001],
                                   x_novo=falha2.drop("estado", axis=1), y_novo=falha2["estado"])
grid_search.fit(X=falha8.drop("estado", axis=1), y=falha8["estado"])

(1781, 3, 1) torch.Size([1781])
{'Accuracy': array(0.93674889), 'Precision': array(0.5), 'Recall': array(0.76), 'f1': array(0.6031746)}
{'delays__delays': 0, 'sax__alphabet_size': 3, 'sax__window_size': 10}
(1187, 3, 1) torch.Size([1187])
{'Accuracy': array(0.97343454), 'Precision': array(0.85185185), 'Recall': array(0.6969697), 'f1': array(0.76666667)}
{'delays__delays': 0, 'sax__alphabet_size': 3, 'sax__window_size': 15}
(891, 3, 1) torch.Size([891])
{'Accuracy': array(0.95069532), 'Precision': array(0.59016393), 'Recall': array(0.72), 'f1': array(0.64864865)}
{'delays__delays': 0, 'sax__alphabet_size': 3, 'sax__window_size': 20}
(713, 3, 1) torch.Size([713])
{'Accuracy': array(0.93680885), 'Precision': array(0.5), 'Recall': array(0.7), 'f1': array(0.58333333)}
{'delays__delays': 0, 'sax__alphabet_size': 3, 'sax__window_size': 25}
(594, 3, 1) torch.Size([594])
{'Accuracy': array(0.9373814), 'Precision': array(0.5), 'Recall': array(0.6969697), 'f1': array(0.58227848)}
{'delays__delays

  _warn_prf(average, modifier, msg_start, len(result))


{'Accuracy': array(0.94469027), 'Precision': array(0.54545455), 'Recall': array(0.64285714), 'f1': array(0.59016393)}
{'delays__delays': 0, 'sax__alphabet_size': 5, 'sax__window_size': 35}
(446, 3, 1) torch.Size([446])
{'Accuracy': array(0.94444444), 'Precision': array(0.55172414), 'Recall': array(0.64), 'f1': array(0.59259259)}
{'delays__delays': 0, 'sax__alphabet_size': 5, 'sax__window_size': 40}
(1781, 3, 1) torch.Size([1781])
{'Accuracy': array(0.95382669), 'Precision': array(0.61157025), 'Recall': array(0.74), 'f1': array(0.66968326)}
{'delays__delays': 0, 'sax__alphabet_size': 6, 'sax__window_size': 10}
(1187, 3, 1) torch.Size([1187])


  _warn_prf(average, modifier, msg_start, len(result))


{'Accuracy': array(0.9373814), 'Precision': array(0.), 'Recall': array(0.), 'f1': array(0.)}
{'delays__delays': 0, 'sax__alphabet_size': 6, 'sax__window_size': 15}
(891, 3, 1) torch.Size([891])
{'Accuracy': array(0.9494311), 'Precision': array(0.58064516), 'Recall': array(0.72), 'f1': array(0.64285714)}
{'delays__delays': 0, 'sax__alphabet_size': 6, 'sax__window_size': 20}
(713, 3, 1) torch.Size([713])
{'Accuracy': array(0.92259084), 'Precision': array(0.35483871), 'Recall': array(0.275), 'f1': array(0.30985915)}
{'delays__delays': 0, 'sax__alphabet_size': 6, 'sax__window_size': 25}
(594, 3, 1) torch.Size([594])
{'Accuracy': array(0.943074), 'Precision': array(0.53488372), 'Recall': array(0.6969697), 'f1': array(0.60526316)}
{'delays__delays': 0, 'sax__alphabet_size': 6, 'sax__window_size': 30}
(509, 3, 1) torch.Size([509])
{'Accuracy': array(0.94690265), 'Precision': array(0.56666667), 'Recall': array(0.60714286), 'f1': array(0.5862069)}
{'delays__delays': 0, 'sax__alphabet_size': 6, 

  _warn_prf(average, modifier, msg_start, len(result))


{'Accuracy': array(0.94911504), 'Precision': array(0.57575758), 'Recall': array(0.67857143), 'f1': array(0.62295082)}
{'delays__delays': 1, 'sax__alphabet_size': 4, 'sax__window_size': 35}
(446, 6, 1) torch.Size([446])


  _warn_prf(average, modifier, msg_start, len(result))


{'Accuracy': array(0.93686869), 'Precision': array(0.), 'Recall': array(0.), 'f1': array(0.)}
{'delays__delays': 1, 'sax__alphabet_size': 4, 'sax__window_size': 40}
(1781, 6, 1) torch.Size([1781])
{'Accuracy': array(0.96521189), 'Precision': array(0.7184466), 'Recall': array(0.74), 'f1': array(0.72906404)}
{'delays__delays': 1, 'sax__alphabet_size': 5, 'sax__window_size': 10}
(1187, 6, 1) torch.Size([1187])
{'Accuracy': array(0.95825427), 'Precision': array(0.66176471), 'Recall': array(0.68181818), 'f1': array(0.67164179)}
{'delays__delays': 1, 'sax__alphabet_size': 5, 'sax__window_size': 15}
(891, 6, 1) torch.Size([891])
{'Accuracy': array(0.94816688), 'Precision': array(0.57142857), 'Recall': array(0.72), 'f1': array(0.63716814)}
{'delays__delays': 1, 'sax__alphabet_size': 5, 'sax__window_size': 20}
(713, 6, 1) torch.Size([713])
{'Accuracy': array(0.95102686), 'Precision': array(0.6), 'Recall': array(0.675), 'f1': array(0.63529412)}
{'delays__delays': 1, 'sax__alphabet_size': 5, 'sax

  _warn_prf(average, modifier, msg_start, len(result))


{'Accuracy': array(0.93678887), 'Precision': array(0.), 'Recall': array(0.), 'f1': array(0.)}
{'delays__delays': 1, 'sax__alphabet_size': 7, 'sax__window_size': 20}
(713, 6, 1) torch.Size([713])
{'Accuracy': array(0.94944708), 'Precision': array(0.58333333), 'Recall': array(0.7), 'f1': array(0.63636364)}
{'delays__delays': 1, 'sax__alphabet_size': 7, 'sax__window_size': 25}
(594, 6, 1) torch.Size([594])
{'Accuracy': array(0.94117647), 'Precision': array(0.52380952), 'Recall': array(0.66666667), 'f1': array(0.58666667)}
{'delays__delays': 1, 'sax__alphabet_size': 7, 'sax__window_size': 30}
(509, 6, 1) torch.Size([509])
{'Accuracy': array(0.94690265), 'Precision': array(0.55882353), 'Recall': array(0.67857143), 'f1': array(0.61290323)}
{'delays__delays': 1, 'sax__alphabet_size': 7, 'sax__window_size': 35}
(446, 6, 1) torch.Size([446])
{'Accuracy': array(0.94191919), 'Precision': array(0.53846154), 'Recall': array(0.56), 'f1': array(0.54901961)}
{'delays__delays': 1, 'sax__alphabet_size':

In [None]:
results_test = grid_search.cv_results_
results_test.to_csv("./resultsGRIDSEARCH_falha2.csv")

In [None]:
results_test.head()

## 2.2 Busca por parâmetros de otimização e por modelos de redes neurais

# 3  Visualização do Modelo

Aqui, aplicaremos as técnicas de visualização do modelo por meio de coordenadas paralelas do plotly.

## Visualização de Coordenadas Paralelas de Validação
Conjunto de validação.

In [None]:
results_test = pd.read_csv("results200epochs_new.csv")
labels = {
    "delays__delays": "Delays",
    "sax__alphabet_size": "Compression Alphabet Size",
    "sax__window_size": "Compression Window Size",
    "test_Accuracy_mean": "Mean Accuracy",
    "test_Precision_mean": "Mean Precision",
    "test_recall_mean": "Mean Recall",
    "test_f1_score_mean": "Mean F1-Score"
}
df_cols = ["test_Accuracy_mean", 
    "test_Precision_mean",
    "test_recall_mean",
    "test_f1_score_mean"
]



In [None]:
import plotly.express as px
df = px.data.iris()

df_cols = ["sax__window_size",
    "test_Accuracy_mean", 
    "test_Precision_mean",
    "test_recall_mean",
    "test_f1_score_mean"
]
fig = px.parallel_coordinates(
    results_test, color="sax__window_size",
    dimensions=df_cols, labels=labels,
    color_continuous_scale=px.colors.diverging.Tealrose,)
fig.show()

In [None]:
df_cols = ["sax__alphabet_size",
    "test_Accuracy_mean", 
    "test_Precision_mean",
    "test_recall_mean",
    "test_f1_score_mean"
]
fig = px.parallel_coordinates(
    results_test, color="sax__alphabet_size",
    dimensions=df_cols, labels=labels,
    color_continuous_scale=px.colors.diverging.Tealrose,)
fig.show()

In [None]:
df_cols = ["delays__delays",
    "test_Accuracy_mean", 
    "test_Precision_mean",
    "test_recall_mean",
    "test_f1_score_mean"
]
fig = px.parallel_coordinates(
    results_test, color="delays__delays",
    dimensions=df_cols, labels=labels,
    color_continuous_scale=px.colors.diverging.Tealrose,)
fig.show()

## Visualização de Coordenadas Paralelas de Treinamento
Conjunto de treinamento.

In [2]:
results_test = pd.read_csv("resultsGRIDSEARCH_falha1.csv")
labels = {
    "delays__delays": "Delays",
    "sax__alphabet_size": "Compression Alphabet Size",
    "sax__window_size": "Compression Window Size",
    "train_Accuracy_mean": "Mean Accuracy",
    "train_Precision_mean": "Mean Precision",
    "train_recall_mean": "Mean Recall",
    "train_f1_score_mean": "Mean F1-Score"
}
df_cols = ["train_Accuracy_mean", 
    "train_Precision_mean",
    "train_recall_mean",
    "train_f1_score_mean"
]



In [3]:
!wget https://github.com/plotly/orca/releases/download/v1.2.1/orca-1.2.1-x86_64.AppImage -O /usr/local/bin/orca
!chmod +x /usr/local/bin/orca
!apt-get install xvfb libgtk2.0-0 libgconf-2-4
!apt-get install poppler-utils 

--2021-02-12 19:43:17--  https://github.com/plotly/orca/releases/download/v1.2.1/orca-1.2.1-x86_64.AppImage
Resolving github.com (github.com)... 192.30.255.112
Connecting to github.com (github.com)|192.30.255.112|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://github-releases.githubusercontent.com/99037241/9dc3a580-286a-11e9-8a21-4312b7c8a512?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20210212%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20210212T194147Z&X-Amz-Expires=300&X-Amz-Signature=cfee23ee1537297e1143566aee78282903f0a376ace482bfb59bbe7d0f3c8918&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=99037241&response-content-disposition=attachment%3B%20filename%3Dorca-1.2.1-x86_64.AppImage&response-content-type=application%2Foctet-stream [following]
--2021-02-12 19:43:17--  https://github-releases.githubusercontent.com/99037241/9dc3a580-286a-11e9-8a21-4312b7c8a512?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AK

In [None]:
results_test.columns

Index(['Unnamed: 0', 'delays__delays', 'sax__alphabet_size',
       'sax__window_size', 'Accuracy_mean', 'Accuracy_std', 'Precision_mean',
       'Precision_std', 'Recall_mean', 'Recall_std', 'f1_mean', 'f1_std'],
      dtype='object')

In [5]:
import plotly.express as px
df = px.data.iris()

df_cols = [
    'delays__delays', 'sax__alphabet_size',
    'sax__window_size', 'Accuracy_mean', 'Precision_mean',
    'Recall_mean', 'f1_mean',
]
fig = px.parallel_coordinates(
    results_test, color="sax__window_size",
    dimensions=df_cols, labels=labels,
    color_continuous_scale=px.colors.diverging.Tealrose,)
fig.show()

In [6]:
import plotly.express as px
df = px.data.iris()

df_cols = [
    'delays__delays', 'sax__alphabet_size',
    'sax__window_size', 'Accuracy_mean', 'Precision_mean',
    'Recall_mean', 'f1_mean',
]
fig = px.parallel_coordinates(
    results_test, color="sax__window_size",
    dimensions=df_cols, labels=labels,
    color_continuous_scale=px.colors.diverging.Tealrose,)
fig.show()

In [7]:
fig = px.parallel_coordinates(
    results_test, color="sax__alphabet_size",
    dimensions=df_cols, labels=labels,
    color_continuous_scale=px.colors.diverging.Tealrose,)
fig.show()

In [8]:
fig = px.parallel_coordinates(
    results_test, color="delays__delays",
    dimensions=df_cols, labels=labels,
    color_continuous_scale=px.colors.diverging.Tealrose,)
fig.show()

## Visualização de Categorias Paralelas de Validação Falha 1
Conjunto de Teste da Falha 1.

In [10]:
labels = {
    "delays__delays": "\u03B3",
    "sax__alphabet_size": "\u03B7",
    "sax__window_size": "\u03C1",
    "train_Accuracy_mean": "Mean Accuracy",
    "train_Precision_mean": "Mean Precision",
    "train_recall_mean": "Mean Recall",
    "train_f1_score_mean": "Mean F1-Score",
    "fit_time_mean": "Training Time",
    "fit_time_mean_resampled": "Training Time",
    "test_Accuracy_mean": "Accuracy Score",
    "test_accuracy_mean_resampled": "Accuracy Score",
    "test_f1_mean_resampled": "F1 Score",
    "test_f1_score_mean": "F1 Score",
    "test_recall_mean_resampled": "Recall Score",
    "test_recall_mean": "Recall Score",
    "test_precision_mean_resampled": "Precision Score",
    "test_Precision_mean": "Precision Score",
}

df_cols = [
    'delays__delays', 'sax__alphabet_size',
    'sax__window_size', 'Accuracy_mean', 'Precision_mean',
    'Recall_mean', 'f1_mean',
]
results_test["test_recall_mean_resampled"] = np.around(results_test["Recall_mean"].values, 1)
fig = px.parallel_categories(
    results_test, color="test_recall_mean_resampled",
    dimensions=df_cols, labels=labels,
    # color_continuous_scale=px.colors.diverging.Tealrose,
    width=860, height=400
)
fig.update_layout(
    font_size=32,
    font_family="Arial",
    margin=dict(l=20, r=30, t=30, b=10),
)


fig.show()
fig.write_image("test1_recall_score.pdf")

In [11]:
results_test["test_precision_mean_resampled"] = np.around(results_test["Precision_mean"].values, 1)
fig = px.parallel_categories(
    results_test, color="test_precision_mean_resampled",
    dimensions=df_cols, labels=labels,
    # color_continuous_scale=px.colors.diverging.Tealrose,
    width=900, height=400
)
fig.update_layout(
    font_size=32,
    font_family="Arial",
    margin=dict(l=20, r=30, t=30, b=10),
)


fig.show()
fig.write_image("test1_precision_score.pdf")

In [12]:
results_test["test_f1_mean_resampled"] = np.around(results_test["f1_mean"].values, 1)
fig = px.parallel_categories(
    results_test, color="test_f1_mean_resampled",
    dimensions=df_cols, labels=labels,
    # color_continuous_scale=px.colors.diverging.Tealrose,
    width=800, height=400
)
fig.update_layout(
    font_size=32,
    font_family="Arial",
    margin=dict(l=20, r=30, t=30, b=10),
)


fig.show()
fig.write_image("test1_f1_score.pdf")

## Visualização de Categorias Paralellas de Validação Falha 2
Conjunto de Teste 2.

In [13]:
results_test2 = pd.read_csv("resultsGRIDSEARCH_falha2.csv")
df_cols = [
    'delays__delays', 'sax__alphabet_size',
    'sax__window_size', 'Accuracy_mean', 'Precision_mean',
    'Recall_mean', 'f1_mean',
]
results_test2["test_recall_mean_resampled"] = np.around(results_test2["Recall_mean"].values, 1)
fig = px.parallel_categories(
    results_test2, color="test_recall_mean_resampled",
    dimensions=df_cols, labels=labels,
    # color_continuous_scale=px.colors.diverging.Tealrose,
    width=860, height=400
)
fig.update_layout(
    font_size=32,
    font_family="Arial",
    margin=dict(l=20, r=30, t=30, b=10),
)


fig.show()
pio.write_image(fig, "test2_recall_score.pdf")

In [14]:
results_test2["test_precision_mean_resampled"] = np.around(results_test2["Precision_mean"].values, 1)
fig = px.parallel_categories(
    results_test2, color="test_precision_mean_resampled",
    dimensions=df_cols, labels=labels,
    # color_continuous_scale=px.colors.diverging.Tealrose,
    width=900, height=400
)
fig.update_layout(
    font_size=32,
    font_family="Arial",
    margin=dict(l=20, r=30, t=30, b=10),
)


fig.show()
pio.write_image(fig, "test2_precision_score.pdf")

In [15]:
results_test2["test_f1_mean_resampled"] = np.around(results_test2["f1_mean"].values, 1)
fig = px.parallel_categories(
    results_test2, color="test_f1_mean_resampled",
    dimensions=df_cols, labels=labels,
    # color_continuous_scale=px.colors.diverging.Tealrose,
    width=800, height=400
)
fig.update_layout(
    font_size=32,
    font_family="Arial",
    margin=dict(l=20, r=30, t=30, b=10),
)


fig.show()
pio.write_image(fig, "test2_f1_score.pdf")