# Classificador de sentenças utilizando GloVe embeddings bag - Implantação

Este componente é um classificador sentenças multiclasse baseado nos pacotes de Embeddings GloVe disponiblizado pela [Stanford](https://nlp.stanford.edu/projects/glove/) para a lngua inglesa e pelo [NILC-São Carlos](http://nilc.icmc.usp.br/nilc/index.php/repositorio-de-word-embeddings-do-nilc) para a língua portuguesa.

### **Em caso de dúvidas, consulte os [tutoriais da PlatIAgro](https://platiagro.github.io/tutorials/).**

## Declaração de Classe para Predições em Tempo Real

A tarefa de implantação cria um serviço REST para predições em tempo-real.<br>
Para isso você deve criar uma classe `Model` que implementa o método `predict`.

In [1]:
!wget https://raw.githubusercontent.com/platiagro/tasks/main/tasks/nlp-glove-embeddings-sentence-classification/dataset.py
!wget https://raw.githubusercontent.com/platiagro/tasks/main/tasks/nlp-glove-embeddings-sentence-classification/glove_embeddings.py
!wget https://raw.githubusercontent.com/platiagro/tasks/main/tasks/nlp-glove-embeddings-sentence-classification/model_lightning.py

--2021-01-27 12:20:14--  https://raw.githubusercontent.com/platiagro/tasks/main/tasks/nlp-glove-embeddings-sentence-classification/dataset.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.64.133, 151.101.192.133, 151.101.128.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.64.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 744 [text/plain]
Saving to: ‘dataset.py.17’


2021-01-27 12:20:14 (49.1 MB/s) - ‘dataset.py.17’ saved [744/744]

--2021-01-27 12:20:15--  https://raw.githubusercontent.com/platiagro/tasks/main/tasks/nlp-glove-embeddings-sentence-classification/glove_embeddings.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.64.133, 151.101.192.133, 151.101.128.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.64.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3103 (3.0K) [text/plain]
Saving to: ‘g

In [5]:
%%writefile Model.py
import logging
import os
import pickle
from typing import Dict, Iterable, List, Union

import numpy as np
import pandas as pd
import pytorch_lightning as pl
import torch
from dataset import MyDataset
from model_lightning import GloveFinetuner
from glove_embeddings import GloveEmbeddings
from platiagro import load_model
from pytorch_lightning.callbacks import ModelCheckpoint

logger = logging.getLogger(__name__)


class Model(object):
    def __init__(self, dataset: str = None, target: str = None):
        
        # Carrega artefatos: estimador, etc
        artifacts_file_name = "artifacts.p"
        artifacts = pickle.load(open(f"/tmp/data/{artifacts_file_name}", "rb"))
        
        self.hyperparams = artifacts["hyperparams"]
        self.model_parameters = artifacts["model_parameters"]
        self.dataset_infos = artifacts["dataset_infos"]
        self.extra_infos = artifacts["extra_infos"]
        self.deployment_infos = artifacts["deployment_infos"]
        self.lightning_configs = artifacts["lightning_configs"]  
        
        # Carregando classe GloveEmbeddings
        self.glove_embeddings = self.load_class_glove_embeddings()

        # Carregando pesos do modelo
        self.model = self.load_model()

        #Colocando modelo em modo de avaliação para fazer predições
        self.model.eval()

    def load_model(self):
        
        dataset_infos = {
            "all_data": self.dataset_infos["all_data"],
            "CustomDataset": MyDataset,
        }
        model = GloveFinetuner.load_from_checkpoint(
            checkpoint_path= self.deployment_infos["checkpoint_path"],
            hyperparams=self.hyperparams,
            model_parameters=self.model_parameters,
            dataset_infos=dataset_infos,
            extra_infos=self.extra_infos,
        )
        
        return model
    
    def load_class_glove_embeddings(self):
        
        glove_embeddings = GloveEmbeddings(
            glove_dim = self.deployment_infos["glove_dim"],
            glove_weights_file_name = self.deployment_infos["glove_weights_file_name"],
            device = self.deployment_infos["device"]
        )
        
        return glove_embeddings

    def predict(
        self, X: np.ndarray, feature_names: Iterable[str], meta: Dict = None
    ) -> Union[np.ndarray, List, str, bytes]:
        if feature_names:
            # Antes de utilizar o conjunto de dados X no modelo, reordena suas features de acordo com a ordem utilizada no treinamento
            df = pd.DataFrame(X, columns=feature_names)
            X = df[self.deployment_infos["columns"]].to_numpy()
        
        X_inference_glove_ids, X_inference_glove_words = self.glove_embeddings.build_glove_matrix(X)
        result = self.model.predict(X_inference_glove_ids, X_inference_glove_words)
        return result

Overwriting Model.py


In [6]:
%run Model.py

In [None]:
from Model import Model

inferenceModel = Model()
X_test = inferenceModel.deployment_infos["X_test"]

INFO:gensim.models.keyedvectors:loading projection weights from ./glove_s300_ingles.txt
INFO:gensim.models.keyedvectors:loading projection weights from ./glove_s300_ingles.txt


In [None]:
X_test = np.array(
    [["Un-bleeping-believable! Meg Ryan doesn't even look her usual ,pert lovable self in this, which normally makes me forgive her shallow ticky acting schtick. Hard to believe she was the producer on this dog. Plus Kevin Kline: what kind of suicide trip has his career been on? Whoosh... Banzai!!! Finally this was directed by the guy who did Big Chill? Must be a replay of Jonestown - hollywood style. Wooofff!"]]
)

In [None]:
resultado = inferenceModel.predict(X_test, None)
resultado