# > Tarea Máquina de Vectores de Soporte (SVM) - Modelos de IA

### ANÁLISIS DE SENTIMIENTOS

En la siguiente tarea orientada al procesamiento del lenguaje natural, usaremos un modelo basado en una SVM para realizar una tarea de clasificación masiva de documentos de manera automática, en función de la connotación positiva o negativa del lenguaje empleado en el documento.

Se usará el dataset de [Amazon fine food reviews](https://www.kaggle.com/datasets/snap/amazon-fine-food-reviews?resource=download)

En el temario Aplicaciones del Procesamiento Lenguaje Natural a la Clasificación de Textos hay un ejemplo de entrenamiento y uso de SVMs (Apartado 4.3).

Recursos web:

- https://es.wikipedia.org/wiki/M%C3%A1quinas_de_vectores_de_soporte

- https://neuraldojo.org/proyectos/analisis-de-sentimiento/guia-basica-de-analisis-de-sentimiento-en-python/

In [1]:
# Importación de librerías

import pandas as pd
from sklearn import model_selection, svm
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics import accuracy_score, classification_report

In [2]:
# Importación de los datos

df_food_reviews = pd.read_csv('Reviews.csv')
df_food_reviews

Unnamed: 0,Id,ProductId,UserId,ProfileName,HelpfulnessNumerator,HelpfulnessDenominator,Score,Time,Summary,Text
0,1,B001E4KFG0,A3SGXH7AUHU8GW,delmartian,1,1,5,1303862400,Good Quality Dog Food,I have bought several of the Vitality canned d...
1,2,B00813GRG4,A1D87F6ZCVE5NK,dll pa,0,0,1,1346976000,Not as Advertised,Product arrived labeled as Jumbo Salted Peanut...
2,3,B000LQOCH0,ABXLMWJIXXAIN,"Natalia Corres ""Natalia Corres""",1,1,4,1219017600,"""Delight"" says it all",This is a confection that has been around a fe...
3,4,B000UA0QIQ,A395BORC6FGVXV,Karl,3,3,2,1307923200,Cough Medicine,If you are looking for the secret ingredient i...
4,5,B006K2ZZ7K,A1UQRSCLF8GW1T,"Michael D. Bigham ""M. Wassir""",0,0,5,1350777600,Great taffy,Great taffy at a great price. There was a wid...
...,...,...,...,...,...,...,...,...,...,...
568449,568450,B001EO7N10,A28KG5XORO54AY,Lettie D. Carter,0,0,5,1299628800,Will not do without,Great for sesame chicken..this is a good if no...
568450,568451,B003S1WTCU,A3I8AFVPEE8KI5,R. Sawyer,0,0,2,1331251200,disappointed,I'm disappointed with the flavor. The chocolat...
568451,568452,B004I613EE,A121AA1GQV751Z,"pksd ""pk_007""",2,2,5,1329782400,Perfect for our maltipoo,"These stars are small, so you can give 10-15 o..."
568452,568453,B004I613EE,A3IBEVCTXKNOH,"Kathy A. Welch ""katwel""",1,1,5,1331596800,Favorite Training and reward treat,These are the BEST treats for training and rew...


In [3]:
# Eliminamos las filas que tienen 3 de Score
# valoraciones menor a 3 = 0 (mala puntuación)
# valoraciones mayores a 3 = 1 (buena puntuación)

df_food_reviews = df_food_reviews[df_food_reviews['Score'] != 3]
df_food_reviews.loc[df_food_reviews['Score'] < 3, 'Score'] = 0
df_food_reviews.loc[df_food_reviews['Score'] > 3, 'Score'] = 1
df_food_reviews

Unnamed: 0,Id,ProductId,UserId,ProfileName,HelpfulnessNumerator,HelpfulnessDenominator,Score,Time,Summary,Text
0,1,B001E4KFG0,A3SGXH7AUHU8GW,delmartian,1,1,1,1303862400,Good Quality Dog Food,I have bought several of the Vitality canned d...
1,2,B00813GRG4,A1D87F6ZCVE5NK,dll pa,0,0,0,1346976000,Not as Advertised,Product arrived labeled as Jumbo Salted Peanut...
2,3,B000LQOCH0,ABXLMWJIXXAIN,"Natalia Corres ""Natalia Corres""",1,1,1,1219017600,"""Delight"" says it all",This is a confection that has been around a fe...
3,4,B000UA0QIQ,A395BORC6FGVXV,Karl,3,3,0,1307923200,Cough Medicine,If you are looking for the secret ingredient i...
4,5,B006K2ZZ7K,A1UQRSCLF8GW1T,"Michael D. Bigham ""M. Wassir""",0,0,1,1350777600,Great taffy,Great taffy at a great price. There was a wid...
...,...,...,...,...,...,...,...,...,...,...
568449,568450,B001EO7N10,A28KG5XORO54AY,Lettie D. Carter,0,0,1,1299628800,Will not do without,Great for sesame chicken..this is a good if no...
568450,568451,B003S1WTCU,A3I8AFVPEE8KI5,R. Sawyer,0,0,0,1331251200,disappointed,I'm disappointed with the flavor. The chocolat...
568451,568452,B004I613EE,A121AA1GQV751Z,"pksd ""pk_007""",2,2,1,1329782400,Perfect for our maltipoo,"These stars are small, so you can give 10-15 o..."
568452,568453,B004I613EE,A3IBEVCTXKNOH,"Kathy A. Welch ""katwel""",1,1,1,1331596800,Favorite Training and reward treat,These are the BEST treats for training and rew...


### Usamos las primeras 10000 filas de entrenamiento

In [4]:
datos_entrenamiento = df_food_reviews.head(10000)

In [5]:
# Separamos datos de entrada y salida

documentos = datos_entrenamiento['Text'].tolist()
clases = datos_entrenamiento['Score'].tolist()

In [6]:
# Partición de la colección etiquetada: 20% test - 80% entrenamiento

X_train, X_test, y_train, y_test = model_selection.train_test_split(documentos, clases, test_size=0.2)

In [7]:
vectorizer = TfidfVectorizer()

# Cálculo de los TF-IDF para todos los términos
vectorizer.fit(documentos)

# Convertir a valores numéricos TF-IDF los datos de entramiento
Train_X_Tfidf = vectorizer.transform(X_train)

# Definir el modelo SVM
SVM = svm.SVC(kernel='linear')

# Fase de entrenamiento del modelo
SVM.fit(Train_X_Tfidf,y_train)

SVC(kernel='linear')

### Usamos los últimos 1000 para predecir

In [8]:
datos_final = df_food_reviews.tail(1000)
datos_pred = datos_final['Text'].tolist()
valor_pred = datos_final['Score'].tolist()

In [9]:
# Fase de uso del model para predecir si la frase pertenece a la clase 1 o 2

predictions_SVM = SVM.predict(vectorizer.transform(datos_pred))

In [10]:
# Evaluación y precisión del modelo

accuracy_score(predictions_SVM, valor_pred)

0.892

In [11]:
print(classification_report(predictions_SVM, valor_pred))

              precision    recall  f1-score   support

           0       0.55      0.93      0.69       132
           1       0.99      0.89      0.93       868

    accuracy                           0.89      1000
   macro avg       0.77      0.91      0.81      1000
weighted avg       0.93      0.89      0.90      1000

