# Proyecto Sistemas Computacionales A-2018: Preprocesamiento del dataset
---

Los datos recolectados mediante el [formulario](https://goo.gl/forms/1p8IwDXPxwXGKYq92) de Google Forms necesitan de un procesamiento previo, lo cual pasa por las siguientes etapas:



1.   **Extracción de datos de interés**: en la encuesta también se recolectaron datos para un proyecto diferente, por lo cual se deben extraer únicamente los datos de interes, los cuales corresponden a las preguntas 4, 5 y 6.
2.   **Limpieza y tokenización**: se deben remover aquellas palabras sin un significado semántico específico para los sentimientos a tomar en cuenta.
3.   **Representación vectorial del vocabulario**: las oraciones deben ser representadas como un vector *n-* dimensional, por lo cual primero se debe encontrar la representación vectorial de cada palabra. Para esto se hará uso del modelo [Word2Vec](https://en.wikipedia.org/wiki/Word2vec).
4.   **Representación vectorial de las oraciones**: Una vez obtenida la representación vectorial de las palabras, se puede obtener la representación vectorial de las oraciones llevando un procedimiento que combine los vectores de cada palabra contenida en la oración, lo cual puede ser un promedio *coordenada-a-coordenada*. En este caso, se utilizará un promedio ponderado, con un peso conocido como [tf-idf](https://es.wikipedia.org/wiki/Tf-idf).


## Instalación de algunos módulos requeridos
---



In [1]:
!pip install gensim  # For the Word2Vec model
!pip install tqdm    # Just for using a progress bar
!pip install bokeh   # For graphs
!pip install unidecode

Collecting gensim
[?25l  Downloading https://files.pythonhosted.org/packages/33/33/df6cb7acdcec5677ed130f4800f67509d24dbec74a03c329fcbf6b0864f0/gensim-3.4.0-cp36-cp36m-manylinux1_x86_64.whl (22.6MB)
[K    100% |████████████████████████████████| 22.6MB 1.8MB/s 
[?25hCollecting smart-open>=1.2.1 (from gensim)
  Downloading https://files.pythonhosted.org/packages/4b/69/c92661a333f733510628f28b8282698b62cdead37291c8491f3271677c02/smart_open-1.5.7.tar.gz
Collecting boto>=2.32 (from smart-open>=1.2.1->gensim)
[?25l  Downloading https://files.pythonhosted.org/packages/bd/b7/a88a67002b1185ed9a8e8a6ef15266728c2361fcb4f1d02ea331e4c7741d/boto-2.48.0-py2.py3-none-any.whl (1.4MB)
[K    100% |████████████████████████████████| 1.4MB 18.7MB/s 
[?25hCollecting bz2file (from smart-open>=1.2.1->gensim)
  Downloading https://files.pythonhosted.org/packages/61/39/122222b5e85cd41c391b68a99ee296584b2a2d1d233e7ee32b4532384f2d/bz2file-0.98.tar.gz
Collecting boto3 (from smart-open>=1.2.1->gensim)
[?25l  

## Importación de todos los módulos requeridos
---


In [1]:
import pandas as pd
import numpy as np
from copy import deepcopy
from string import punctuation
from random import shuffle
import io
import csv

import gensim
from gensim.models.word2vec import Word2Vec
from gensim.utils import simple_preprocess

from tqdm import tqdm

import nltk

from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.manifold import TSNE
from sklearn.preprocessing import scale
from sklearn import svm

from google.colab import files

# importing bokeh library for interactive dataviz
import bokeh.plotting as bp
from bokeh.models import HoverTool, BoxSelectTool
from bokeh.plotting import figure, show, output_notebook

from unidecode import unidecode

import time

pd.options.mode.chained_assignment = None
tqdm.pandas(desc="progress-bar")
nltk.download('stopwords')


[nltk_data] Downloading package stopwords to
[nltk_data]     /home/jhonathanabreu/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

## Subida de dataset al servidor
---

Este notebook fue inicialmente creado para trabajar en Google Colaboratory, por lo cual es necesario subir el archivo del dataset a sus servidores. El dataset puede ser encontrado [aquí](https://drive.google.com/open?id=1ib9bswNfSrqHCiyQyFDhEDPNKwuSjV4y).

In [3]:
!ls
#!rm *.csv
!ls

datalab  nltk_data
datalab  nltk_data


In [3]:
uploaded = files.upload()

Saving labeled_training_data.csv to labeled_training_data.csv


## Carga del dataset
---

El siguiente paso es, entonces, cargar el data set, en este caso en un DataFrame de Pandas. Sin embargo, directamente mientras se realiza la ingesta, se realizará la limpieza de los datos

### Limpieza y tokenización del dataset

Para esto se crea la función `clean_sentence`, la cual realiza lo siguiente:
. Simultaneamente, se realiza la tokenización de las oraciones en palabras, utilizando el módulo **simple_preprocess**, de **gensim**, el cual, además, realiza otras limpiezas básicas, como la eliminación de signos de puntuación:

*   Utiliza el corpus en español de **stopwords** o palabras vacías provisto por **nltk** para eliminar las palabras sin significado semántico importante.
*  Simultaneamente, realiza la tokenización de las oraciones en palabras, utilizando el módulo `simple_preprocess`, de **gensim**, el cual, además, realiza otras limpiezas básicas, como la eliminación de signos de puntuación.



In [2]:
stopWords = nltk.corpus.stopwords.words('spanish')

def clean_sentence(sentence):
    tokens = [unidecode(word) for word in simple_preprocess(sentence)
              if word not in stopWords]
    
    return tokens 
        

### Carga o ingesta del dataset al DataFrame

Esta tarea la realiza la función `ingest`, la cual carga, limpia y tokeniza el dataset, en formato csv, en un DataFrame de Pandas:

In [3]:
def ingest(datasetFileName):
    data = pd.read_csv(datasetFileName, header = None)
    data.columns = ['sentence', 'sentiment']
    data['sentiment'] = data['sentiment'].map({
                                                'positivo': 1,
                                                'neutral': 0,
                                                'negativo': -1
                                              })
    data['tokens'] = data['sentence'].progress_map(clean_sentence)
    data.reset_index(inplace = True)
    data.drop('index', axis = 1, inplace = True)
    print('dataset loaded with shape', data.shape)
    return data

data = ingest('labeled_training_data.csv')
data.head(5)

FileNotFoundError: File b'labeled_training_data.csv' does not exist

## Creación del modelo Word2Vec
---

La implementación a utilizar será la de **gensim**. Sin embargo, primero se debe dividir el dataset en conjuntos de entrenamiento y prueba, con relación 7:3:

In [None]:
trainSentences, testSentences, trainLabels, testLabels = \
    train_test_split(np.array(data.tokens),
                     np.array(data.sentiment), test_size = 0.2)

El modelo Word2Vec, internamente, utiliza una red neuronal de 2 capas, mas sin embargo, no es necesario tener conocimiento de esto para su uso (más información [aquí](https://en.wikipedia.org/wiki/Word2vec) y [aquí](https://www.tensorflow.org/tutorials/word2vec)). Lo importante es indicar al modelo la dimensión de los vectores que representarán las palabras y otros parámetros importantes:

### Dimensión de los vectores

La calidad de la vectorización depende y aumenta a media que la dimensión de los vectores aumenta; sim embargo, el incremento marginal de la mejora disminuye en cierto punto. Los valores típicos de la dimensión están en el intervalo [100, 1000]. En este caso, se tomará una dimensión de 200, pero debe evaluarse este y otros parámetros para la optimización del modelo.

### Ventana contextual

Esto determina la cantidad de palabras a la izquierda y derecha que se toman como contexto de una palabra. Se debe realizar un estudio de la longitud promedio de las oraciones, pero, como en la encuesta se pidió un mínimo de 8 palabras, quitando las palabras vacías, se utilizará, por ahora, una ventana de 10, de modo que todas las palabras de la oración entren en el contexto.

In [52]:
vectorDimension = 900
wordsModel = Word2Vec(trainSentences, size = vectorDimension, min_count = 3,
                      window = 10)
wordsModel.train([train for train in tqdm(trainSentences)],
                 total_examples = len(trainSentences),
                 epochs = 15)

100%|██████████| 175/175 [00:00<00:00, 522794.30it/s]


(838, 11370)

### Obtención de la representación vectorial de palabras

In [10]:
wordsModel['bien']

  """Entry point for launching an IPython kernel.


array([-0.04942068, -0.17572436, -0.04004598, -0.01037453,  0.08782685,
       -0.31138816,  0.11382303, -0.23199888, -0.04570371, -0.00610333,
        0.06404767, -0.03375966, -0.03987723, -0.04294056, -0.08080643,
        0.05017672, -0.14911306, -0.18980563, -0.16492441, -0.25816724,
        0.14209741, -0.03908285, -0.33065113, -0.02307731, -0.10348456,
       -0.02836516,  0.0405107 ,  0.00372367,  0.01908221,  0.10175536,
       -0.11912344, -0.06483073,  0.01958265, -0.15317723,  0.21781647,
       -0.08104958,  0.1236623 ,  0.12536101, -0.07601602, -0.00719716,
       -0.21463662,  0.02859756,  0.11575279, -0.13655387,  0.02341248,
       -0.03828403, -0.08307426,  0.01724051, -0.04645931,  0.17824952,
        0.13580579, -0.00327414, -0.296602  ,  0.0008081 ,  0.17424375,
        0.22677292, -0.17102948, -0.3057054 , -0.14658427, -0.16912842,
        0.01301437, -0.15982015,  0.33561146, -0.00581877, -0.23739915,
        0.18927808, -0.21967238,  0.01925349, -0.03716213,  0.41

Como se observa, el vector tiene dimensión 200.

### Obtención de las palabras más similares

In [8]:
wordsModel.wv.most_similar('carino')

KeyError: ignored

Es muy notable que se necesita un corpus más grande de palabras para el uso de este tipo de modelos

## Validación gráfica del modelo Word2Vec
---

Para esto, se disminuye la dimensión de los vectores de las palabras a 2 para poder observar su gráfico 2D.

In [53]:
# defining the chart
output_notebook()
plot_tfidf = bp.figure(plot_width = 700, plot_height = 600,
                       title = 'Map of word vectors',
                       tools = 'pan, wheel_zoom, box_zoom, reset, hover, '
                               'previewsave',
                       x_axis_type = None, y_axis_type = None, min_border = 1)

# getting a list of word vectors. limit to 10000. each is of 200 dimensions
vocabulary = list(wordsModel.wv.vocab.keys())[:10000]
word_vectors = [wordsModel[w] for w in vocabulary]

# dimensionality reduction. converting the vectors to 2d vectors

tsne_model = TSNE(n_components = 2, verbose = 1, random_state = 0)
tsne_w2v = tsne_model.fit_transform(word_vectors)

# putting everything in a dataframe
tsne_df = pd.DataFrame(tsne_w2v, columns = ['x', 'y'])
tsne_df['words'] = vocabulary

# plotting. the corresponding word appears when you hover on the data point.
plot_tfidf.scatter(x = 'x', y = 'y', source = tsne_df)
hover = plot_tfidf.select(dict(type = HoverTool))
hover.tooltips = {"word": "@words"}
show(plot_tfidf)

  # Remove the CWD from sys.path while we load stuff.


[t-SNE] Computing 44 nearest neighbors...
[t-SNE] Indexed 45 samples in 0.000s...
[t-SNE] Computed neighbors for 45 samples in 0.001s...
[t-SNE] Computed conditional probabilities for sample 45 / 45
[t-SNE] Mean sigma: 0.002646
[t-SNE] KL divergence after 250 iterations with early exaggeration: 53.117191
[t-SNE] Error after 1000 iterations: 0.570402


Se observa, de esta manera, las aglomeración de palabras similares en la nube obtenida. Sin embargo, como ya se dijo, se necesita un vocabulario más grande y oraciones más largas.

## Representación vectorial de oraciones
---

Es hora entonces de obtener la representación vectorial de las oraciones, lo cual termina de preparar la data para ser alimentada al clasificador a utilizar. Como se dijo anteriormente, se utilizará un promedio ponderado de los vectores de cada palabra en la oración. El peso en cuestión será obtenido utilizando la métrica [Tf-idf](https://es.wikipedia.org/wiki/Tf-idf), lo cual, a grandes rasgos, representa la importancia de una palabra en la oración.

### Creación de la matriz de pesos Tf-idf

Esto será llevado a cabo con el módulo `TfidfVectorizer` de **sklearn**:

In [54]:
print('building tf-idf matrix ...')
vectorizer = TfidfVectorizer(analyzer = lambda x: x)
matrix = vectorizer.fit_transform([sentence
                                   for sentence in tqdm(trainSentences)])
tfidf = dict(zip(vectorizer.get_feature_names(), vectorizer.idf_))
print('vocab size :', len(tfidf))

100%|██████████| 175/175 [00:00<00:00, 644427.74it/s]

building tf-idf matrix ...
vocab size : 518





### Conversión oración a vector

La siguiente función, `buildWordVector`, calcula el promedio ponderado de los vectores de las palabras de la oración y construye el vector representativo de la oración.

> **Nota:** es importante notar que no se está manejando el caso de que una palabra de la oración no esté en el vocabulario del modelo Word2Vec, lo cual debe ser estudiado para su manejo.



In [None]:
def buildSentenceVector(tokens, size):
    vec = np.zeros(size).reshape((1, size))
    count = 0.
    for word in tokens:
        try:
            vec += wordsModel[word].reshape((1, size)) * tfidf[word]
            count += 1.
        except KeyError: # handling the case where the token is not
                         # in the corpus. useful for testing.
            continue
    if count != 0:
        vec /= count
    return vec

Se finaliza, entonces, con la conversión de oración a vectores de todas las oraciones de los corpus de entrenamiento y prueba:

In [56]:
trainSentencesVectors = np.concatenate([buildSentenceVector(w, vectorDimension)
                                        for w in tqdm(trainSentences)])
trainSentencesVectors = scale(trainSentencesVectors)

testSentencesVectors = np.concatenate([buildSentenceVector(w, vectorDimension)
                                       for w in tqdm(testSentences)])
testSentencesVectors = scale(testSentencesVectors)

  
100%|██████████| 175/175 [00:00<00:00, 11896.52it/s]
100%|██████████| 44/44 [00:00<00:00, 11996.19it/s]


Una pequeña y muy simple validación: el tamaño de la lista de oraciones de cada corpus debe ser igual al tamaño de su correspondiente lista de vectores:

In [57]:
print('Longitud de la lista de oraciones de entrenamiento:',
      len(trainSentences))
print('Longitud de la lista de vectores de entrenamiento:',
      len(trainSentencesVectors))
print('Longitud de la lista de oraciones de prueba:',
      len(testSentences))
print('Longitud de la lista de vectores de prueba:',
      len(testSentencesVectors))

Longitud de la lista de oraciones de entrenamiento: 175
Longitud de la lista de vectores de entrenamiento: 175
Longitud de la lista de oraciones de prueba: 44
Longitud de la lista de vectores de prueba: 44


## Trabajo por realizar
---

*   Es necesario estudiar las formas de **validación** de este tipo de modelos pues, al final del día, es un método de aprendizaje automático no supervisado y debe ser validado y potencialmente **optimizado**.
*   Se deben estudiar y resolver aspectos importantes, como el manejo de oraciones con palabras que no existan en el vocabulario de entrenamiento.





## Selección de los mejores parámetros para la SVM
---

In [None]:
def compareSVMClassifiers(classifiers, xTrain, yTrain, xTest, yTest,
                          verbose = True):
    nClassifiers = len(classifiers.keys())
    culumnNames = ['Clasificador', 'Exactitud Entrenamiento',
                   'Exactitud Prueba', 'Tiempo Entrenamiento']
    results = pd.DataFrame(data = np.zeros(shape = (nClassifiers, 4)),
                           columns = culumnNames)
    counter = 0
    for key, classifier in classifiers.items():
        tic = time.clock()
        classifier.fit(xTrain, yTrain)
        toc = time.clock()
        elapsedTime = toc - tic
        trainAccuracy = classifier.score(xTrain, yTrain)
        testAccuracy = classifier.score(xTest, yTest)
        results.loc[counter, 'Clasificador'] = key
        results.loc[counter, 'Exactitud Entrenamiento'] = trainAccuracy
        results.loc[counter, 'Exactitud Prueba'] = testAccuracy
        results.loc[counter, 'Tiempo Entrenamiento'] = elapsedTime
        if verbose:
            print("{c} entrenado en {f:.2f} s".format(c = key, f = elapsedTime))
        counter += 1
        
    return results

In [59]:
# Lineal
parameters = [{'kernel': ['linear'], 'C': [0.01, 0.1, 1, 10, 100]}]
optimalLinearSVM = GridSearchCV(svm.SVC(decision_function_shape = 'ovr'),
                                parameters, cv = 5, n_jobs = 8, verbose = 10)
optimalLinearSVM.fit(trainSentencesVectors, trainLabels);

Fitting 5 folds for each of 5 candidates, totalling 25 fits
[CV] C=0.1, kernel=linear ............................................
[CV] C=0.01, kernel=linear ...........................................
[CV] C=0.01, kernel=linear ...........................................
[CV] C=0.01, kernel=linear ...........................................
[CV] C=0.01, kernel=linear ...........................................
[CV] C=0.1, kernel=linear ............................................
[CV] C=0.1, kernel=linear ............................................
[CV] C=0.01, kernel=linear ...........................................
[CV] .. C=0.01, kernel=linear, score=0.4857142857142857, total=   0.1s
[CV] ... C=0.1, kernel=linear, score=0.5277777777777778, total=   0.2s
[CV] .. C=0.01, kernel=linear, score=0.5714285714285714, total=   0.2s
[CV] C=0.1, kernel=linear ............................................
[CV] C=1, kernel=linear ..............................................
[CV] C=0.1, kerne

[Parallel(n_jobs=8)]: Done   2 tasks      | elapsed:    0.3s
[Parallel(n_jobs=8)]: Done   9 tasks      | elapsed:    0.5s


[CV] C=10, kernel=linear .............................................
[CV] .................. C=0.1, kernel=linear, score=0.5, total=   0.2s
[CV] ... C=0.1, kernel=linear, score=0.5142857142857142, total=   0.2s
[CV] ..... C=1, kernel=linear, score=0.5428571428571428, total=   0.2s
[CV] C=10, kernel=linear .............................................
[CV] ..... C=1, kernel=linear, score=0.5277777777777778, total=   0.2s
[CV] C=10, kernel=linear .............................................
[CV] C=100, kernel=linear ............................................
[CV] C=10, kernel=linear .............................................
[CV] ..... C=1, kernel=linear, score=0.4857142857142857, total=   0.2s
[CV] .... C=10, kernel=linear, score=0.5277777777777778, total=   0.2s
[CV] ..... C=1, kernel=linear, score=0.5714285714285714, total=   0.2s
[CV] C=100, kernel=linear ............................................
[CV] C=100, kernel=linear ............................................
[CV] C

[Parallel(n_jobs=8)]: Done  13 out of  25 | elapsed:    0.6s remaining:    0.5s
[Parallel(n_jobs=8)]: Done  16 out of  25 | elapsed:    0.6s remaining:    0.4s


[CV] .... C=10, kernel=linear, score=0.5428571428571428, total=   0.2s
[CV] ... C=10, kernel=linear, score=0.47058823529411764, total=   0.2s
[CV] .... C=10, kernel=linear, score=0.5428571428571428, total=   0.3s
[CV] ... C=100, kernel=linear, score=0.4857142857142857, total=   0.3s


[Parallel(n_jobs=8)]: Done  19 out of  25 | elapsed:    0.9s remaining:    0.3s


[CV] .. C=100, kernel=linear, score=0.47058823529411764, total=   0.7s


[Parallel(n_jobs=8)]: Done  22 out of  25 | elapsed:    1.5s remaining:    0.2s


[CV] ... C=100, kernel=linear, score=0.5714285714285714, total=  12.2s
[CV] ... C=100, kernel=linear, score=0.5428571428571428, total=  14.5s
[CV] ... C=100, kernel=linear, score=0.5277777777777778, total=  20.6s


[Parallel(n_jobs=8)]: Done  25 out of  25 | elapsed:   21.2s remaining:    0.0s
[Parallel(n_jobs=8)]: Done  25 out of  25 | elapsed:   21.2s finished


In [60]:
# Radial
parameters = [{'kernel': ['rbf'], 'gamma': [1e-4, 1e-3, 1e-2, 1e-1],
                'C': [0.01, 0.1, 1, 10, 100]}]
optimalRadialSVM = GridSearchCV(svm.SVC(decision_function_shape = 'ovr'),
                                parameters, cv = 5, n_jobs = 8, verbose = 10)
optimalRadialSVM.fit(trainSentencesVectors, trainLabels);

Fitting 5 folds for each of 20 candidates, totalling 100 fits
[CV] C=0.01, gamma=0.001, kernel=rbf .................................
[CV] C=0.01, gamma=0.0001, kernel=rbf ................................
[CV] C=0.01, gamma=0.0001, kernel=rbf ................................
[CV] C=0.01, gamma=0.0001, kernel=rbf ................................
[CV] C=0.01, gamma=0.001, kernel=rbf .................................
[CV] C=0.01, gamma=0.001, kernel=rbf .................................
[CV] C=0.01, gamma=0.0001, kernel=rbf ................................
[CV] C=0.01, gamma=0.0001, kernel=rbf ................................
[CV]  C=0.01, gamma=0.001, kernel=rbf, score=0.5277777777777778, total=   0.2s
[CV]  C=0.01, gamma=0.001, kernel=rbf, score=0.34285714285714286, total=   0.2s
[CV]  C=0.01, gamma=0.001, kernel=rbf, score=0.34285714285714286, total=   0.2s
[CV] C=0.01, gamma=0.001, kernel=rbf .................................
[CV]  C=0.01, gamma=0.0001, kernel=rbf, score=0.342857142857

[Parallel(n_jobs=8)]: Done   2 tasks      | elapsed:    0.4s


[CV]  C=0.01, gamma=0.01, kernel=rbf, score=0.34285714285714286, total=   0.2s
[CV]  C=0.01, gamma=0.001, kernel=rbf, score=0.34285714285714286, total=   0.2s
[CV]  C=0.01, gamma=0.01, kernel=rbf, score=0.34285714285714286, total=   0.2s
[CV]  C=0.01, gamma=0.01, kernel=rbf, score=0.34285714285714286, total=   0.2s
[CV] C=0.01, gamma=0.1, kernel=rbf ...................................
[CV]  C=0.01, gamma=0.01, kernel=rbf, score=0.35294117647058826, total=   0.2s
[CV] C=0.01, gamma=0.1, kernel=rbf ...................................
[CV] C=0.01, gamma=0.1, kernel=rbf ...................................
[CV] C=0.01, gamma=0.1, kernel=rbf ...................................
[CV] C=0.1, gamma=0.0001, kernel=rbf .................................
[CV] ........ C=0.01, gamma=0.01, kernel=rbf, score=0.5, total=   0.2s
[CV] C=0.1, gamma=0.0001, kernel=rbf .................................
[CV]  C=0.01, gamma=0.001, kernel=rbf, score=0.35294117647058826, total=   0.2s
[CV] C=0.1, gamma=0.0001, k

[Parallel(n_jobs=8)]: Done   9 tasks      | elapsed:    0.7s
[Parallel(n_jobs=8)]: Done  16 tasks      | elapsed:    0.9s


[CV]  C=0.01, gamma=0.1, kernel=rbf, score=0.35294117647058826, total=   0.2s
[CV] C=0.1, gamma=0.0001, kernel=rbf .................................
[CV]  C=0.01, gamma=0.1, kernel=rbf, score=0.34285714285714286, total=   0.2s
[CV]  C=0.1, gamma=0.0001, kernel=rbf, score=0.3333333333333333, total=   0.2s
[CV] C=0.1, gamma=0.001, kernel=rbf ..................................
[CV] C=0.1, gamma=0.001, kernel=rbf ..................................
[CV]  C=0.01, gamma=0.1, kernel=rbf, score=0.34285714285714286, total=   0.2s
[CV]  C=0.1, gamma=0.0001, kernel=rbf, score=0.34285714285714286, total=   0.2s
[CV]  C=0.01, gamma=0.1, kernel=rbf, score=0.34285714285714286, total=   0.2s
[CV] C=0.1, gamma=0.001, kernel=rbf ..................................
[CV] C=0.1, gamma=0.001, kernel=rbf ..................................
[CV] C=0.1, gamma=0.001, kernel=rbf ..................................
[CV]  C=0.1, gamma=0.0001, kernel=rbf, score=0.34285714285714286, total=   0.2s
[CV] C=0.1, gamma=0.01,

[Parallel(n_jobs=8)]: Done  25 tasks      | elapsed:    1.4s


[CV]  C=0.1, gamma=0.01, kernel=rbf, score=0.34285714285714286, total=   0.2s
[CV]  C=0.1, gamma=0.01, kernel=rbf, score=0.3142857142857143, total=   0.2s
[CV] C=1, gamma=0.0001, kernel=rbf ...................................
[CV] C=1, gamma=0.0001, kernel=rbf ...................................
[CV]  C=0.1, gamma=0.1, kernel=rbf, score=0.34285714285714286, total=   0.2s
[CV]  C=0.1, gamma=0.1, kernel=rbf, score=0.34285714285714286, total=   0.1s
[CV] .......... C=0.1, gamma=0.1, kernel=rbf, score=0.5, total=   0.2s
[CV] C=1, gamma=0.0001, kernel=rbf ...................................
[CV]  C=0.1, gamma=0.01, kernel=rbf, score=0.23529411764705882, total=   0.2s
[CV] C=1, gamma=0.0001, kernel=rbf ...................................
[CV] C=1, gamma=0.0001, kernel=rbf ...................................
[CV] C=1, gamma=0.001, kernel=rbf ....................................
[CV]  C=0.1, gamma=0.1, kernel=rbf, score=0.23529411764705882, total=   0.1s
[CV] C=1, gamma=0.001, kernel=rbf .....

[Parallel(n_jobs=8)]: Done  34 tasks      | elapsed:    1.8s


[CV]  C=1, gamma=0.001, kernel=rbf, score=0.5277777777777778, total=   0.2s
[CV]  C=1, gamma=0.0001, kernel=rbf, score=0.3235294117647059, total=   0.2s
[CV]  C=1, gamma=0.0001, kernel=rbf, score=0.3333333333333333, total=   0.2s
[CV] C=1, gamma=0.001, kernel=rbf ....................................
[CV]  C=1, gamma=0.0001, kernel=rbf, score=0.42857142857142855, total=   0.2s
[CV] C=1, gamma=0.001, kernel=rbf ....................................
[CV] C=1, gamma=0.01, kernel=rbf .....................................
[CV] C=1, gamma=0.01, kernel=rbf .....................................
[CV]  C=1, gamma=0.0001, kernel=rbf, score=0.2857142857142857, total=   0.2s
[CV] C=1, gamma=0.01, kernel=rbf .....................................
[CV]  C=1, gamma=0.0001, kernel=rbf, score=0.34285714285714286, total=   0.2s
[CV] C=1, gamma=0.01, kernel=rbf .....................................
[CV]  C=1, gamma=0.001, kernel=rbf, score=0.7142857142857143, total=   0.1s
[CV] C=1, gamma=0.01, kernel=rbf ..

[Parallel(n_jobs=8)]: Done  45 tasks      | elapsed:    2.2s


[CV]  C=1, gamma=0.001, kernel=rbf, score=0.4117647058823529, total=   0.2s
[CV] ........... C=1, gamma=0.01, kernel=rbf, score=0.5, total=   0.2s
[CV]  C=1, gamma=0.01, kernel=rbf, score=0.45714285714285713, total=   0.2s
[CV] .......... C=1, gamma=0.001, kernel=rbf, score=0.6, total=   0.2s
[CV] C=1, gamma=0.1, kernel=rbf ......................................
[CV] C=1, gamma=0.1, kernel=rbf ......................................
[CV] C=1, gamma=0.1, kernel=rbf ......................................
[CV]  C=1, gamma=0.01, kernel=rbf, score=0.5142857142857142, total=   0.2s
[CV] C=1, gamma=0.1, kernel=rbf ......................................
[CV]  C=1, gamma=0.01, kernel=rbf, score=0.45714285714285713, total=   0.2s
[CV] C=10, gamma=0.0001, kernel=rbf ..................................
[CV] C=10, gamma=0.0001, kernel=rbf ..................................
[CV]  C=1, gamma=0.01, kernel=rbf, score=0.38235294117647056, total=   0.2s
[CV] C=10, gamma=0.0001, kernel=rbf .................

[Parallel(n_jobs=8)]: Done  56 tasks      | elapsed:    2.7s


[CV] C=10, gamma=0.0001, kernel=rbf ..................................
[CV] C=10, gamma=0.001, kernel=rbf ...................................
[CV]  C=10, gamma=0.0001, kernel=rbf, score=0.42857142857142855, total=   0.2s
[CV]  C=10, gamma=0.0001, kernel=rbf, score=0.5277777777777778, total=   0.2s
[CV] C=10, gamma=0.001, kernel=rbf ...................................
[CV] C=10, gamma=0.001, kernel=rbf ...................................
[CV] C=10, gamma=0.001, kernel=rbf ...................................
[CV]  C=1, gamma=0.1, kernel=rbf, score=0.45714285714285713, total=   0.2s
[CV] C=10, gamma=0.001, kernel=rbf ...................................
[CV]  C=10, gamma=0.0001, kernel=rbf, score=0.6285714285714286, total=   0.2s
[CV] C=10, gamma=0.01, kernel=rbf ....................................
[CV]  C=10, gamma=0.0001, kernel=rbf, score=0.6285714285714286, total=   0.2s
[CV] C=10, gamma=0.01, kernel=rbf ....................................
[CV]  C=10, gamma=0.0001, kernel=rbf, score=

[Parallel(n_jobs=8)]: Done  69 tasks      | elapsed:    3.3s


[CV]  C=10, gamma=0.01, kernel=rbf, score=0.45714285714285713, total=   0.2s
[CV] C=100, gamma=0.0001, kernel=rbf .................................
[CV]  C=10, gamma=0.01, kernel=rbf, score=0.38235294117647056, total=   0.2s
[CV]  C=10, gamma=0.1, kernel=rbf, score=0.4722222222222222, total=   0.2s
[CV]  C=10, gamma=0.1, kernel=rbf, score=0.42857142857142855, total=   0.2s
[CV] C=100, gamma=0.0001, kernel=rbf .................................
[CV] C=100, gamma=0.0001, kernel=rbf .................................
[CV] C=100, gamma=0.0001, kernel=rbf .................................
[CV]  C=10, gamma=0.1, kernel=rbf, score=0.45714285714285713, total=   0.1s
[CV] C=100, gamma=0.0001, kernel=rbf .................................
[CV]  C=10, gamma=0.01, kernel=rbf, score=0.5142857142857142, total=   0.2s
[CV]  C=10, gamma=0.1, kernel=rbf, score=0.5142857142857142, total=   0.1s
[CV] C=100, gamma=0.001, kernel=rbf ..................................
[CV] C=100, gamma=0.001, kernel=rbf ......

[Parallel(n_jobs=8)]: Done  82 tasks      | elapsed:    3.9s


[CV]  C=100, gamma=0.001, kernel=rbf, score=0.6285714285714286, total=   0.1s
[CV] C=100, gamma=0.1, kernel=rbf ....................................
[CV]  C=100, gamma=0.001, kernel=rbf, score=0.47058823529411764, total=   0.2s
[CV] C=100, gamma=0.1, kernel=rbf ....................................
[CV]  C=100, gamma=0.01, kernel=rbf, score=0.45714285714285713, total=   0.2s
[CV] ......... C=100, gamma=0.01, kernel=rbf, score=0.5, total=   0.2s
[CV] C=100, gamma=0.1, kernel=rbf ....................................
[CV] C=100, gamma=0.1, kernel=rbf ....................................
[CV]  C=100, gamma=0.01, kernel=rbf, score=0.45714285714285713, total=   0.2s
[CV]  C=100, gamma=0.1, kernel=rbf, score=0.4722222222222222, total=   0.1s
[CV]  C=100, gamma=0.01, kernel=rbf, score=0.5142857142857142, total=   0.2s
[CV]  C=100, gamma=0.01, kernel=rbf, score=0.38235294117647056, total=   0.1s
[CV]  C=100, gamma=0.1, kernel=rbf, score=0.45714285714285713, total=   0.1s
[CV]  C=100, gamma=0.1, 

[Parallel(n_jobs=8)]: Done  96 out of 100 | elapsed:    4.4s remaining:    0.2s
[Parallel(n_jobs=8)]: Done 100 out of 100 | elapsed:    4.4s finished


In [61]:
# Polinomico
parameters = [{'kernel': ['poly'], 'gamma': [1e-4, 1e-3, 1e-2, 1e-1],
                'C': [0.01, 0.1, 1, 10, 100], 'degree': [2, 3, 4]}]
optimalPolySVM = GridSearchCV(svm.SVC(decision_function_shape = 'ovr'),
                              parameters, cv = 5, n_jobs = 8, verbose = 10)
optimalPolySVM.fit(trainSentencesVectors, trainLabels);

Fitting 5 folds for each of 60 candidates, totalling 300 fits
[CV] C=0.01, degree=2, gamma=0.0001, kernel=poly .....................
[CV] C=0.01, degree=2, gamma=0.0001, kernel=poly .....................
[CV] C=0.01, degree=2, gamma=0.0001, kernel=poly .....................
[CV] C=0.01, degree=2, gamma=0.0001, kernel=poly .....................
[CV] C=0.01, degree=2, gamma=0.0001, kernel=poly .....................
[CV] C=0.01, degree=2, gamma=0.001, kernel=poly ......................
[CV] C=0.01, degree=2, gamma=0.001, kernel=poly ......................
[CV]  C=0.01, degree=2, gamma=0.0001, kernel=poly, score=0.2777777777777778, total=   0.1s
[CV] C=0.01, degree=2, gamma=0.001, kernel=poly ......................
[CV] C=0.01, degree=2, gamma=0.001, kernel=poly ......................
[CV]  C=0.01, degree=2, gamma=0.0001, kernel=poly, score=0.34285714285714286, total=   0.2s
[CV] C=0.01, degree=2, gamma=0.001, kernel=poly ......................


[Parallel(n_jobs=8)]: Batch computation too fast (0.1631s.) Setting batch_size=2.
[Parallel(n_jobs=8)]: Done   2 tasks      | elapsed:    0.4s


[CV]  C=0.01, degree=2, gamma=0.0001, kernel=poly, score=0.34285714285714286, total=   0.2s
[CV]  C=0.01, degree=2, gamma=0.0001, kernel=poly, score=0.34285714285714286, total=   0.2s
[CV] C=0.01, degree=2, gamma=0.01, kernel=poly .......................
[CV]  C=0.01, degree=2, gamma=0.0001, kernel=poly, score=0.35294117647058826, total=   0.2s
[CV] C=0.01, degree=2, gamma=0.01, kernel=poly .......................
[CV] C=0.01, degree=2, gamma=0.01, kernel=poly .......................
[CV]  C=0.01, degree=2, gamma=0.001, kernel=poly, score=0.34285714285714286, total=   0.1s
[CV]  C=0.01, degree=2, gamma=0.001, kernel=poly, score=0.34285714285714286, total=   0.1s
[CV] C=0.01, degree=2, gamma=0.01, kernel=poly .......................
[CV]  C=0.01, degree=2, gamma=0.001, kernel=poly, score=0.34285714285714286, total=   0.2s
[CV] C=0.01, degree=2, gamma=0.01, kernel=poly .......................
[CV]  C=0.01, degree=2, gamma=0.001, kernel=poly, score=0.2777777777777778, total=   0.2s
[CV] C

[Parallel(n_jobs=8)]: Done   9 tasks      | elapsed:    0.5s


[CV] C=0.01, degree=3, gamma=0.0001, kernel=poly .....................
[CV] C=0.01, degree=3, gamma=0.0001, kernel=poly .....................
[CV]  C=0.01, degree=2, gamma=0.01, kernel=poly, score=0.6, total=   0.2s
[CV] C=0.01, degree=3, gamma=0.0001, kernel=poly .....................
[CV]  C=0.01, degree=2, gamma=0.1, kernel=poly, score=0.45714285714285713, total=   0.1s
[CV]  C=0.01, degree=2, gamma=0.01, kernel=poly, score=0.35294117647058826, total=   0.1s
[CV] C=0.01, degree=2, gamma=0.1, kernel=poly ........................
[CV]  C=0.01, degree=2, gamma=0.1, kernel=poly, score=0.4722222222222222, total=   0.1s
[CV] C=0.01, degree=3, gamma=0.001, kernel=poly ......................
[CV] C=0.01, degree=3, gamma=0.001, kernel=poly ......................
[CV]  C=0.01, degree=2, gamma=0.01, kernel=poly, score=0.42857142857142855, total=   0.2s
[CV] C=0.01, degree=3, gamma=0.01, kernel=poly .......................
[CV]  C=0.01, degree=2, gamma=0.1, kernel=poly, score=0.4285714285714285

[Parallel(n_jobs=8)]: Done  16 tasks      | elapsed:    0.8s


[CV]  C=0.01, degree=3, gamma=0.0001, kernel=poly, score=0.34285714285714286, total=   0.2s
[CV] C=0.01, degree=3, gamma=0.0001, kernel=poly .....................
[CV]  C=0.01, degree=2, gamma=0.1, kernel=poly, score=0.5142857142857142, total=   0.2s
[CV]  C=0.01, degree=3, gamma=0.0001, kernel=poly, score=0.2777777777777778, total=   0.2s
[CV] C=0.01, degree=3, gamma=0.01, kernel=poly .......................
[CV]  C=0.01, degree=3, gamma=0.001, kernel=poly, score=0.34285714285714286, total=   0.1s
[CV]  C=0.01, degree=3, gamma=0.0001, kernel=poly, score=0.35294117647058826, total=   0.2s
[CV] C=0.01, degree=3, gamma=0.001, kernel=poly ......................
[CV] C=0.01, degree=3, gamma=0.001, kernel=poly ......................
[CV] C=0.01, degree=3, gamma=0.0001, kernel=poly .....................
[CV]  C=0.01, degree=3, gamma=0.01, kernel=poly, score=0.4444444444444444, total=   0.1s
[CV] C=0.01, degree=3, gamma=0.01, kernel=poly .......................
[CV]  C=0.01, degree=3, gamma=0

[Parallel(n_jobs=8)]: Done  34 tasks      | elapsed:    1.6s


[CV]  C=0.01, degree=3, gamma=0.1, kernel=poly, score=0.45714285714285713, total=   0.1s
[CV] C=0.01, degree=4, gamma=0.01, kernel=poly .......................
[CV]  C=0.01, degree=4, gamma=0.001, kernel=poly, score=0.34285714285714286, total=   0.2s
[CV]  C=0.01, degree=3, gamma=0.1, kernel=poly, score=0.4166666666666667, total=   0.1s
[CV]  C=0.01, degree=3, gamma=0.1, kernel=poly, score=0.29411764705882354, total=   0.1s
[CV] C=0.01, degree=4, gamma=0.001, kernel=poly ......................
[CV] C=0.01, degree=4, gamma=0.01, kernel=poly .......................
[CV] C=0.01, degree=4, gamma=0.01, kernel=poly .......................
[CV]  C=0.01, degree=4, gamma=0.001, kernel=poly, score=0.34285714285714286, total=   0.2s
[CV] C=0.01, degree=4, gamma=0.001, kernel=poly ......................
[CV]  C=0.01, degree=4, gamma=0.0001, kernel=poly, score=0.34285714285714286, total=   0.1s
[CV] C=0.01, degree=4, gamma=0.1, kernel=poly ........................
[CV]  C=0.01, degree=4, gamma=0.00

[Parallel(n_jobs=8)]: Done  52 tasks      | elapsed:    2.4s


[CV]  C=0.1, degree=2, gamma=0.0001, kernel=poly, score=0.35294117647058826, total=   0.2s
[CV] C=0.1, degree=2, gamma=0.001, kernel=poly .......................
[CV]  C=0.01, degree=4, gamma=0.1, kernel=poly, score=0.29411764705882354, total=   0.2s
[CV]  C=0.1, degree=2, gamma=0.001, kernel=poly, score=0.4, total=   0.2s
[CV]  C=0.1, degree=2, gamma=0.01, kernel=poly, score=0.4722222222222222, total=   0.2s
[CV] C=0.1, degree=2, gamma=0.001, kernel=poly .......................
[CV]  C=0.1, degree=2, gamma=0.001, kernel=poly, score=0.2857142857142857, total=   0.2s
[CV] C=0.1, degree=2, gamma=0.01, kernel=poly ........................
[CV] C=0.1, degree=2, gamma=0.01, kernel=poly ........................
[CV] C=0.1, degree=2, gamma=0.001, kernel=poly .......................
[CV]  C=0.1, degree=2, gamma=0.01, kernel=poly, score=0.5142857142857142, total=   0.1s
[CV] C=0.1, degree=2, gamma=0.01, kernel=poly ........................
[CV]  C=0.1, degree=2, gamma=0.0001, kernel=poly, score

[Parallel(n_jobs=8)]: Done  74 tasks      | elapsed:    3.1s


[CV]  C=0.1, degree=3, gamma=0.0001, kernel=poly, score=0.2777777777777778, total=   0.2s
[CV] C=0.1, degree=3, gamma=0.0001, kernel=poly ......................
[CV]  C=0.1, degree=3, gamma=0.0001, kernel=poly, score=0.34285714285714286, total=   0.2s
[CV] C=0.1, degree=3, gamma=0.0001, kernel=poly ......................
[CV]  C=0.1, degree=3, gamma=0.0001, kernel=poly, score=0.35294117647058826, total=   0.2s
[CV]  C=0.1, degree=3, gamma=0.001, kernel=poly, score=0.3142857142857143, total=   0.2s
[CV]  C=0.1, degree=2, gamma=0.1, kernel=poly, score=0.4722222222222222, total=   0.1s
[CV] C=0.1, degree=3, gamma=0.001, kernel=poly .......................
[CV] C=0.1, degree=3, gamma=0.01, kernel=poly ........................
[CV] C=0.1, degree=3, gamma=0.001, kernel=poly .......................
[CV]  C=0.1, degree=3, gamma=0.001, kernel=poly, score=0.34285714285714286, total=   0.2s
[CV]  C=0.1, degree=2, gamma=0.1, kernel=poly, score=0.5142857142857142, total=   0.1s
[CV] C=0.1, degree=3

[Parallel(n_jobs=8)]: Done  96 tasks      | elapsed:    4.0s


[CV]  C=0.1, degree=3, gamma=0.1, kernel=poly, score=0.29411764705882354, total=   0.1s
[CV] C=0.1, degree=4, gamma=0.01, kernel=poly ........................
[CV] C=0.1, degree=4, gamma=0.01, kernel=poly ........................
[CV]  C=0.1, degree=4, gamma=0.001, kernel=poly, score=0.45714285714285713, total=   0.2s
[CV]  C=0.1, degree=4, gamma=0.001, kernel=poly, score=0.3142857142857143, total=   0.2s
[CV] C=0.1, degree=4, gamma=0.001, kernel=poly .......................
[CV] C=0.1, degree=4, gamma=0.001, kernel=poly .......................
[CV]  C=0.1, degree=4, gamma=0.0001, kernel=poly, score=0.34285714285714286, total=   0.2s
[CV] C=0.1, degree=4, gamma=0.1, kernel=poly .........................
[CV]  C=0.1, degree=4, gamma=0.01, kernel=poly, score=0.4166666666666667, total=   0.1s
[CV] C=0.1, degree=4, gamma=0.01, kernel=poly ........................
[CV]  C=0.1, degree=4, gamma=0.001, kernel=poly, score=0.2777777777777778, total=   0.2s
[CV]  C=0.1, degree=4, gamma=0.0001, ke

[Parallel(n_jobs=8)]: Done 122 tasks      | elapsed:    4.9s


[CV]  C=1, degree=2, gamma=0.01, kernel=poly, score=0.4117647058823529, total=   0.2s
[CV]  C=1, degree=2, gamma=0.0001, kernel=poly, score=0.34285714285714286, total=   0.2s
[CV] C=1, degree=2, gamma=0.1, kernel=poly ...........................
[CV] C=1, degree=2, gamma=0.1, kernel=poly ...........................
[CV]  C=1, degree=2, gamma=0.001, kernel=poly, score=0.6, total=   0.1s
[CV] C=1, degree=3, gamma=0.0001, kernel=poly ........................
[CV]  C=1, degree=2, gamma=0.1, kernel=poly, score=0.45714285714285713, total=   0.1s
[CV]  C=1, degree=2, gamma=0.001, kernel=poly, score=0.3055555555555556, total=   0.1s
[CV] C=1, degree=2, gamma=0.1, kernel=poly ...........................
[CV] C=1, degree=3, gamma=0.0001, kernel=poly ........................
[CV]  C=1, degree=2, gamma=0.01, kernel=poly, score=0.45714285714285713, total=   0.2s
[CV] C=1, degree=3, gamma=0.0001, kernel=poly ........................
[CV]  C=1, degree=2, gamma=0.001, kernel=poly, score=0.352941176470

[Parallel(n_jobs=8)]: Done 148 tasks      | elapsed:    6.0s


[CV]  C=1, degree=3, gamma=0.1, kernel=poly, score=0.3142857142857143, total=   0.1s
[CV] C=1, degree=3, gamma=0.1, kernel=poly ...........................
[CV]  C=1, degree=4, gamma=0.0001, kernel=poly, score=0.2777777777777778, total=   0.2s
[CV]  C=1, degree=3, gamma=0.1, kernel=poly, score=0.4166666666666667, total=   0.1s
[CV]  C=1, degree=4, gamma=0.0001, kernel=poly, score=0.35294117647058826, total=   0.2s
[CV]  C=1, degree=4, gamma=0.0001, kernel=poly, score=0.34285714285714286, total=   0.2s
[CV] C=1, degree=4, gamma=0.0001, kernel=poly ........................
[CV] C=1, degree=4, gamma=0.001, kernel=poly .........................
[CV] C=1, degree=4, gamma=0.0001, kernel=poly ........................
[CV] C=1, degree=4, gamma=0.01, kernel=poly ..........................
[CV]  C=1, degree=4, gamma=0.001, kernel=poly, score=0.3142857142857143, total=   0.1s
[CV] C=1, degree=4, gamma=0.001, kernel=poly .........................
[CV]  C=1, degree=4, gamma=0.001, kernel=poly, scor

[Parallel(n_jobs=8)]: Done 178 tasks      | elapsed:    7.2s


[CV]  C=10, degree=2, gamma=0.01, kernel=poly, score=0.4117647058823529, total=   0.2s
[CV]  C=10, degree=2, gamma=0.01, kernel=poly, score=0.5142857142857142, total=   0.2s
[CV] C=10, degree=2, gamma=0.1, kernel=poly ..........................
[CV] C=10, degree=2, gamma=0.01, kernel=poly .........................
[CV]  C=10, degree=2, gamma=0.001, kernel=poly, score=0.5142857142857142, total=   0.2s
[CV] C=10, degree=3, gamma=0.0001, kernel=poly .......................
[CV]  C=10, degree=2, gamma=0.01, kernel=poly, score=0.45714285714285713, total=   0.2s
[CV]  C=10, degree=2, gamma=0.1, kernel=poly, score=0.42857142857142855, total=   0.1s
[CV] C=10, degree=2, gamma=0.1, kernel=poly ..........................
[CV] C=10, degree=3, gamma=0.0001, kernel=poly .......................
[CV]  C=10, degree=2, gamma=0.001, kernel=poly, score=0.38235294117647056, total=   0.1s
[CV]  C=10, degree=2, gamma=0.1, kernel=poly, score=0.45714285714285713, total=   0.1s
[CV] C=10, degree=3, gamma=0.001

[Parallel(n_jobs=8)]: Done 208 tasks      | elapsed:    8.3s


[CV]  C=10, degree=3, gamma=0.1, kernel=poly, score=0.45714285714285713, total=   0.2s
[CV] C=10, degree=4, gamma=0.01, kernel=poly .........................
[CV]  C=10, degree=4, gamma=0.0001, kernel=poly, score=0.2777777777777778, total=   0.2s
[CV] C=10, degree=4, gamma=0.0001, kernel=poly .......................
[CV]  C=10, degree=4, gamma=0.0001, kernel=poly, score=0.34285714285714286, total=   0.2s
[CV] C=10, degree=4, gamma=0.0001, kernel=poly .......................
[CV]  C=10, degree=4, gamma=0.001, kernel=poly, score=0.42857142857142855, total=   0.2s
[CV] C=10, degree=4, gamma=0.001, kernel=poly ........................
[CV]  C=10, degree=3, gamma=0.1, kernel=poly, score=0.29411764705882354, total=   0.1s
[CV]  C=10, degree=4, gamma=0.0001, kernel=poly, score=0.35294117647058826, total=   0.2s
[CV] C=10, degree=4, gamma=0.01, kernel=poly .........................
[CV]  C=10, degree=4, gamma=0.001, kernel=poly, score=0.2857142857142857, total=   0.2s
[CV] C=10, degree=4, gamm

[Parallel(n_jobs=8)]: Done 242 tasks      | elapsed:    9.6s


[CV] C=100, degree=3, gamma=0.001, kernel=poly .......................
[CV]  C=100, degree=2, gamma=0.01, kernel=poly, score=0.4117647058823529, total=   0.2s
[CV] C=100, degree=2, gamma=0.1, kernel=poly .........................
[CV]  C=100, degree=2, gamma=0.1, kernel=poly, score=0.42857142857142855, total=   0.2s
[CV]  C=100, degree=2, gamma=0.1, kernel=poly, score=0.45714285714285713, total=   0.2s
[CV] C=100, degree=2, gamma=0.1, kernel=poly .........................
[CV] C=100, degree=2, gamma=0.1, kernel=poly .........................
[CV]  C=100, degree=3, gamma=0.0001, kernel=poly, score=0.2777777777777778, total=   0.2s
[CV]  C=100, degree=3, gamma=0.0001, kernel=poly, score=0.42857142857142855, total=   0.2s
[CV] C=100, degree=3, gamma=0.0001, kernel=poly ......................
[CV]  C=100, degree=2, gamma=0.01, kernel=poly, score=0.42857142857142855, total=   0.1s
[CV] C=100, degree=3, gamma=0.0001, kernel=poly ......................
[CV] C=100, degree=3, gamma=0.001, kerne

[Parallel(n_jobs=8)]: Done 300 out of 300 | elapsed:   11.4s finished


In [62]:
print('Mejores parámetros:')
print('  SVM Lineal: ', optimalLinearSVM.best_params_)
print('  SVM Radial: ', optimalRadialSVM.best_params_)
print('  SVM Polinómico: ', optimalPolySVM.best_params_)

Mejores parámetros:
  SVM Lineal:  {'C': 0.01, 'kernel': 'linear'}
  SVM Radial:  {'C': 10, 'gamma': 0.001, 'kernel': 'rbf'}
  SVM Polinómico:  {'C': 0.01, 'degree': 2, 'gamma': 0.1, 'kernel': 'poly'}


In [63]:
# Comparacion
classifiers = {
    'SVM Lineal': svm.SVC(kernel = optimalLinearSVM.best_params_['kernel'],
                          C = optimalLinearSVM.best_params_['C']),
    'SVM Radial': svm.SVC(kernel = optimalRadialSVM.best_params_['kernel'],
                          C = optimalRadialSVM.best_params_['C'],
                          gamma = optimalRadialSVM.best_params_['gamma']),
    'SVM Polinómico': svm.SVC(kernel = optimalPolySVM.best_params_['kernel'],
                              C = optimalPolySVM.best_params_['C'],
                              gamma = optimalPolySVM.best_params_['gamma'],
                              degree = optimalPolySVM.best_params_['degree'])
}
comparisonResults = compareSVMClassifiers(classifiers,
                                          trainSentencesVectors, trainLabels,
                                          testSentencesVectors, testLabels)
display(comparisonResults.sort_values(by = 'Exactitud Prueba',
                                      ascending = False))

SVM Lineal entrenado en 0.04 s
SVM Radial entrenado en 0.04 s
SVM Polinómico entrenado en 0.04 s


Unnamed: 0,Clasificador,Exactitud Entrenamiento,Exactitud Prueba,Tiempo Entrenamiento
1,SVM Radial,0.737143,0.386364,0.03956
0,SVM Lineal,0.691429,0.363636,0.036035
2,SVM Polinómico,0.737143,0.340909,0.039418
