## Clonar e instalar repositorio

Requisitos previos:

- [Python 3](https://www.python.org)
- [HTK](https://htk.eng.cam.ac.uk/)

Clonar el repositorio

```console
    $ git clone https://github.com/gfogwill/npf-hmm
    $ cd npf-hmm
```

Antes de instalar el paquete se recomienda crear un entorno virtual con una herramienta como virtualenv:

```console
    $ virtualenv -p python3 .venv
    $ source .venv/bin/activate
```

Instalar los requerimientos:

```console
    $ make requirements
```

NOTA: Cada ves que se inicie una nueva sesión debe activar el entorno virtual creado previamente:

```console 
    $ source venv/bin/activate
```

Probando el codigo:

```console 
    $ npf-hmm info
```

Si todo está bien, el logo del programa debe aparecer en la consola.

## Descargar y extraer los datos

Usar el directorio para los datos provenientes de fuentes external

In [1]:
from src.paths import external_data_path

Descargar y extraer los datos

In [2]:
! wget -q https://zenodo.org/record/5842290/files/mbi-cle.tar -P $external_data_path
! tar -xvf $external_data_path/mbi-cle.tar --directory $external_data_path/

./mbi-cle/
./mbi-cle/LICENSE.md
./mbi-cle/README.md
./mbi-cle/mbi-cle.csv


## Preparar los datos

Leer los datos a un DataFrame de Pandas

In [4]:
import pandas as pd

data = pd.read_csv(external_data_path / 'mbi-cle' / 'mbi-cle.csv', index_col='datetime')
        
data.head()

Unnamed: 0_level_0,size_bin_01,size_bin_02,...,size_bin_25,flag
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2013-02-03 00:00:00,81.140642,18.417794,...,11.762303,0
2013-02-03 00:10:00,41.465832,31.207359,...,12.029446,0
2013-02-03 00:20:00,77.069016,11.423469,...,12.58175,0
2013-02-03 00:30:00,35.298611,9.823707,...,13.005796,0
2013-02-03 00:40:00,70.062666,6.633204,...,17.372463,0


Preparar los datos con sus respectivas etiquetas para entrenar y evaluar los modelos

In [5]:
from src import data

X_train, X_test, y_train, y_test = data.dataset.make_dataset(data, test_size=0.2, seed=37)

INFO:root:Converting data to final format...
INFO:numexpr.utils:NumExpr defaulting to 4 threads.
INFO:root:Generating Master Label File (Train)...
INFO:root:Generating Master Label File (Test)...
INFO:root:Data OK!


## Ciclo de entrenamiento y evaluación

- Inicializar el modelo

In [6]:
from src.models.base import HiddenMarkovModel

import time

model = HiddenMarkovModel()
    
model.initialize(X_train)

model.train(X_train, y_train)

/home/gfogwil/Documentos/Facultad/Tesis/models/bdb/notebooks/Thesis_GPF


3

- Agregar las transiciones entre los estados 2 y 4 de los MOM. 
- Evaluar
- Guardar los resultados para analizar más adelante

In [7]:
model.edit([f'AT 2 4 0.2 {{e.transP}}', 
            f'AT 4 2 0.2 {{e.transP}}', 
            f'AT 4 2 0.2 {{ne.transP}}', 
            f'AT 4 2 0.2 {{ne.transP}}'])

start = time.time()
model.train(X_train, y_train)
end = time.time()

results = []

result = model.test(X_test, y_test)

result['n_gauss'] = 1
result['training_time'] = end - start

results.append(result)

- Duplicar el numero de gaussianas, entrenar, evaluar y guardar los resultados
- Repetir hasta alcanzar 1024 gaussianas

In [8]:
gaussian_duplication_times = 10

for i in range(1, gaussian_duplication_times+1):
    n_gauss = 2**i

    model.edit([f'MU {n_gauss} {{*.state[2-4].mix}}'])

    print(f'Training models with {n_gauss} gaussians...')
    start = time.time()
    model.train(X_train, y_train)
    end = time.time()

    result = model.test(X_test, y_test)

    result['n_gauss'] = n_gauss
    result['training_time'] = end - start

    results.append(result)

Training models with 2 gaussians...
Training models with 4 gaussians...
Training models with 8 gaussians...
Training models with 16 gaussians...
Training models with 32 gaussians...
Training models with 64 gaussians...
Training models with 128 gaussians...
Training models with 256 gaussians...
Training models with 512 gaussians...
Training models with 1024 gaussians...


Mostrar los resultados obtenidos

In [9]:
results = pd.DataFrame(results)
results = results.set_index('n_gauss')

In [12]:
results

Unnamed: 0_level_0,FNR,TP,TN,FP,FN,F1,MMC,TPR,N,training_time
n_gauss,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
1,0.188857,859,18346,5477,200,0.232319,0.26932,0.811143,24882,1.271201
2,0.152975,897,18612,5211,162,0.250314,0.294698,0.847025,24882,1.635681
4,0.196412,851,18978,4845,208,0.251962,0.288386,0.803588,24882,2.142504
8,0.229462,816,19561,4262,243,0.265928,0.296331,0.770538,24882,3.131197
16,0.272899,770,19816,4007,289,0.263879,0.286452,0.727101,24882,4.90731
32,0.296506,745,20675,3148,314,0.300889,0.317477,0.703494,24882,8.510332
64,0.418319,616,21014,2809,443,0.274755,0.271728,0.581681,24882,15.064733
128,0.616619,406,21429,2394,653,0.210417,0.180703,0.383381,24882,28.36079
256,0.668555,351,21773,2050,708,0.20289,0.167767,0.331445,24882,54.429302
512,0.785647,227,21903,1920,832,0.141609,0.096162,0.214353,24882,105.987945
