In [1]:
%load_ext autoreload
%autoreload 2

## Clonar e instalar repositorio

Requisitos previos:


Clonar el repositorio

```console
    $ git clone https://github.com/gfogwill/dmps
    $ cd dmps
```

Antes de instalar el paquete se recomienda crear un entorno virtual con una herramienta como virtualenv:

```console
    $ virtualenv -p python3 .venv
    $ source .venv/bin/activate
```

Instalar los requerimientos:

```console
    $ pip install -r requirements.txt
```

NOTA: Cada ves que se inicie una nueva sesión debe activar el entorno virtual creado previamente:

```console 
    $ source venv/bin/activate
```

Probando el codigo:

```console 
    $ dmps info
```

Si todo está bien, el logo del programa debe aparecer en la consola.

## Descargar y extraer los datos

Usar el directorio para los datos provenientes de fuentes external

In [2]:
from src.paths import external_data_path

/home/gfogwil/Documentos/Facultad/Tesis/models/bdb/data/external


Descargar y extraer los datos

In [140]:
! wget -q https://zenodo.org/record/5842290/files/mbi-cle.tar -P $external_data_path
! tar -xvf $external_data_path/mbi-cle.tar --directory $external_data_path/

./mbi-cle/
./mbi-cle/LICENSE.md
./mbi-cle/README.md
./mbi-cle/mbi-cle.csv


## Preparar los datos

Leer los datos a un DataFrame de Pandas

In [146]:
import pandas as pd
import os


data = pd.read_csv(external_data_path / 'mbi-cle' / 'mbi-cle.csv', index_col='datetime')
        
data.head()

Unnamed: 0_level_0,size_bin_01,size_bin_02,size_bin_03,size_bin_04,size_bin_05,size_bin_06,size_bin_07,size_bin_08,size_bin_09,size_bin_10,...,size_bin_17,size_bin_18,size_bin_19,size_bin_20,size_bin_21,size_bin_22,size_bin_23,size_bin_24,size_bin_25,flag
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2013-02-03 00:00:00,81.140642,18.417794,28.371392,59.40683,108.41863,163.004065,153.70262,196.51737,220.486645,252.95875,...,356.7758,224.54542,154.20561,73.177251,52.163222,32.273702,25.39898,17.345749,11.762303,0
2013-02-03 00:10:00,41.465832,31.207359,34.202565,84.020424,104.83561,120.2564,133.74343,143.6846,167.64205,185.59063,...,273.17078,203.54888,129.14382,68.765848,54.275322,21.656138,26.468294,8.913986,12.029446,0
2013-02-03 00:20:00,77.069016,11.423469,18.793716,84.133146,92.781701,104.68872,138.23468,143.23347,136.68605,121.754155,...,255.92975,191.947495,117.400375,51.663441,43.134553,30.368934,30.20301,18.540576,12.58175,0
2013-02-03 00:30:00,35.298611,9.823707,26.709677,69.927578,87.781746,139.009415,147.33816,113.18949,107.75998,119.161056,...,228.331145,164.570045,96.171857,51.047822,48.68943,29.346862,25.659436,12.03994,13.005796,0
2013-02-03 00:40:00,70.062666,6.633204,10.857456,55.186167,128.44194,98.475001,125.38476,113.40204,107.42661,99.626404,...,204.05272,153.73028,88.606794,53.501579,41.076571,26.961216,19.954262,14.197626,17.372463,0


Preparar los datos con sus respectivas etiquetas para entrenar y evaluar los modelos

In [147]:
from src import data

X_train, X_test, y_train, y_test = data.dataset.make_dataset(data, test_size=0.2, seed=37)

print(f"Nº total de archivos: {X_train['count'] + X_test['count']}\n\
Nº de archivos de entrnamiento: {X_train['count']}\n\
Nº de archivos de validación: {X_test['count']}")

INFO:root:Converting data to final format...
INFO:root:Generating Master Label File (Train)...
INFO:root:Generating Master Label File (Test)...
INFO:root:Data OK!


Nº total de archivos: 878
Nº de archivos de entrnamiento: 703
Nº de archivos de validación: 175


## Ciclo de entrenamiento y evaluación

- Inicializar el modelo

In [151]:
from src.models.base import HiddenMarkovModel

import time

model = HiddenMarkovModel()
    
model.initialize(X_train)

model.train(X_train, y_train)

/home/gfogwil/Documentos/Facultad/Tesis/models/bdb/notebooks/Thesis_GPF


3

- Agregar las transiciones entre los estados 2 y 4 de los MOM. 
- Evaluar
- Guardar los resultados para analizar más adelante

In [152]:
model.edit([f'AT 2 4 0.2 {{e.transP}}', 
            f'AT 4 2 0.2 {{e.transP}}', 
            f'AT 4 2 0.2 {{ne.transP}}', 
            f'AT 4 2 0.2 {{ne.transP}}'])

start = time.time()
model.train(X_train, y_train)
end = time.time()

results = []

result = model.test(X_test, y_test)

result['n_gauss'] = 1
result['training_time'] = end - start

results.append(result)

- Duplicar el numero de gaussianas, entrenar, evaluar y guardar los resultados
- Repetir hasta alcanzar 1024 gaussianas

In [153]:
gaussian_duplication_times = 10

for i in range(1, gaussian_duplication_times+1):
    n_gauss = 2**i

    model.edit([f'MU {n_gauss} {{*.state[2-4].mix}}'])

    print(f'Training models with {n_gauss} gaussians...')
    start = time.time()
    model.train(X_train, y_train)
    end = time.time()

    result = model.test(X_test, y_test)

    result['n_gauss'] = n_gauss
    result['training_time'] = end - start

    results.append(result)

Using 2 gaussians distributions
Training models with 2...
Using 4 gaussians distributions
Training models with 4...
Using 8 gaussians distributions
Training models with 8...
Using 16 gaussians distributions
Training models with 16...
Using 32 gaussians distributions
Training models with 32...
Using 64 gaussians distributions
Training models with 64...
Using 128 gaussians distributions
Training models with 128...
Using 256 gaussians distributions
Training models with 256...
Using 512 gaussians distributions
Training models with 512...
Using 1024 gaussians distributions
Training models with 1024...


In [None]:
results