<center>

<h1>📚 Maestría en Inteligencia Artificial Aplicada – 3er Semestre</h1>

<h3>Asignatura: Procesamiento de Lenguaje Natural</h3>

<hr style="width:60%;">

<h2>👨‍🎓 Estudiantes</h2>
<ul style="list-style:none; padding:0; font-size:18px;">
    <li>Claudia Martínez</li>
    <li>Sebastián Murillas</li>
    <li>Mario J. Castellanos</li>
    <li>Enrique Manzano</li>
    <li>Octavio Guerra</li>
</ul>

<hr style="width:60%;">

<h3>📅 Fecha: Agosto 12, 2025</h3>

</center>


# Analisis de Sentimientos en reseñas de Hoteles hechas en Trip Advisor

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Ohtar10/icesi-nlp/blob/main/Sesion1/7-sentiment-analysis.ipynb)

Ahora pongamos en práctica algunos de estos conceptos en un caso más real. Para esta práctica vamos a hacer un análisis de sentimientos sobre unas reseñas de hoteles hechas en Trip Advisor. Este caso sería una clasificación de tres posibles valores: negativo (neg), neutro (neu), positivo (pos). Podemos utilizar cualquier modelo para ese fin, lo adicional aquí es el pre-procesamiento de las entradas de texto.

### Referencias
* [Natural Language Processing in Action](https://www.manning.com/books/natural-language-processing-in-action)

In [43]:
import pkg_resources
import warnings

warnings.filterwarnings('ignore')

installed_packages = [package.key for package in pkg_resources.working_set]
IN_COLAB = 'google-colab' in installed_packages

In [46]:
!test '{IN_COLAB}' = 'True' && pip install -r https://raw.githubusercontent.com/semurillas/NLP_MIAA_252/refs/heads/main/requirements_7.txt



Empecemos por cargar el dataset:

In [23]:
import pandas as pd
import numpy as np

reviews = pd.read_csv('/content/tripadvisor_hotel_reviews.csv')
reviews.head()

Unnamed: 0,Review,Rating
0,nice hotel expensive parking got good deal sta...,4
1,ok nothing special charge diamond member hilto...,2
2,nice rooms not 4* experience hotel monaco seat...,3
3,"unique, great stay, wonderful time hotel monac...",5
4,"great stay great stay, went seahawk game aweso...",5


Luego, hagamos algo de limpieza, vamos a remover nulos y valores vacíos:

In [24]:
reviews.describe()

Unnamed: 0,Rating
count,20491.0
mean,3.952223
std,1.23303
min,1.0
25%,3.0
50%,4.0
75%,5.0
max,5.0


In [25]:
reviews.isna().sum()

Unnamed: 0,0
Review,0
Rating,0


In [26]:
reviews[reviews.Review == ''].index

Index([], dtype='int64')

In [27]:
reviews.Rating.value_counts()

Unnamed: 0_level_0,count
Rating,Unnamed: 1_level_1
5,9054
4,6039
3,2184
2,1793
1,1421


In [29]:
# Removing the Ratings 3 and 4 because are neutral
reviews = reviews[reviews.Rating != 3]
reviews = reviews[reviews.Rating != 4]
reviews.shape

(12268, 2)



Para hacer las cosas simples, vamos a utilizar un VADER para computar el puntaje de positivo o negativo. Este modelo ya viene implementado dentro de NLTK.

In [30]:
import nltk
nltk.download('vader_lexicon')

[nltk_data] Downloading package vader_lexicon to /root/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


True

In [31]:
from nltk.sentiment.vader import SentimentIntensityAnalyzer

sid = SentimentIntensityAnalyzer()
reviews['scores'] = reviews.Review.apply(lambda r: sid.polarity_scores(r))
reviews.head()

Unnamed: 0,Review,Rating,scores
1,ok nothing special charge diamond member hilto...,2,"{'neg': 0.11, 'neu': 0.701, 'pos': 0.189, 'com..."
3,"unique, great stay, wonderful time hotel monac...",5,"{'neg': 0.06, 'neu': 0.555, 'pos': 0.385, 'com..."
4,"great stay great stay, went seahawk game aweso...",5,"{'neg': 0.135, 'neu': 0.643, 'pos': 0.221, 'co..."
5,love monaco staff husband stayed hotel crazy w...,5,"{'neg': 0.084, 'neu': 0.651, 'pos': 0.265, 'co..."
6,"cozy stay rainy city, husband spent 7 nights m...",5,"{'neg': 0.026, 'neu': 0.609, 'pos': 0.364, 'co..."


Con estos puntajes ahora podemos convertir el resultado en una etiqueta de predicción:

In [32]:
reviews['compound'] = reviews.scores.apply(lambda s: s['compound'])
reviews.head()

Unnamed: 0,Review,Rating,scores,compound
1,ok nothing special charge diamond member hilto...,2,"{'neg': 0.11, 'neu': 0.701, 'pos': 0.189, 'com...",0.9787
3,"unique, great stay, wonderful time hotel monac...",5,"{'neg': 0.06, 'neu': 0.555, 'pos': 0.385, 'com...",0.9912
4,"great stay great stay, went seahawk game aweso...",5,"{'neg': 0.135, 'neu': 0.643, 'pos': 0.221, 'co...",0.9797
5,love monaco staff husband stayed hotel crazy w...,5,"{'neg': 0.084, 'neu': 0.651, 'pos': 0.265, 'co...",0.987
6,"cozy stay rainy city, husband spent 7 nights m...",5,"{'neg': 0.026, 'neu': 0.609, 'pos': 0.364, 'co...",0.9925


In [33]:
reviews.tail()

Unnamed: 0,Review,Rating,scores,compound
20485,not impressed unfriendly staff checked asked h...,2,"{'neg': 0.182, 'neu': 0.666, 'pos': 0.152, 'co...",-0.5013
20486,"best kept secret 3rd time staying charm, not 5...",5,"{'neg': 0.063, 'neu': 0.665, 'pos': 0.272, 'co...",0.9834
20488,"ok just looks nice modern outside, desk staff ...",2,"{'neg': 0.131, 'neu': 0.724, 'pos': 0.145, 'co...",0.2629
20489,hotel theft ruined vacation hotel opened sept ...,1,"{'neg': 0.15, 'neu': 0.671, 'pos': 0.179, 'com...",0.9867
20490,"people talking, ca n't believe excellent ratin...",2,"{'neg': 0.193, 'neu': 0.668, 'pos': 0.14, 'com...",-0.6071


In [38]:
reviews.groupby('Rating')['compound'].agg(['min', 'max'])

Unnamed: 0_level_0,min,max
Rating,Unnamed: 1_level_1,Unnamed: 2_level_1
1,-0.9974,0.9976
2,-0.9941,0.9995
5,-0.9677,0.9999


In [39]:
reviews['prediction'] = reviews['compound'].apply(lambda c: 'pos' if c >0 else 'neg')

In [40]:
reviews.head(10)

Unnamed: 0,Review,Rating,scores,compound,prediction
1,ok nothing special charge diamond member hilto...,2,"{'neg': 0.11, 'neu': 0.701, 'pos': 0.189, 'com...",0.9787,pos
3,"unique, great stay, wonderful time hotel monac...",5,"{'neg': 0.06, 'neu': 0.555, 'pos': 0.385, 'com...",0.9912,pos
4,"great stay great stay, went seahawk game aweso...",5,"{'neg': 0.135, 'neu': 0.643, 'pos': 0.221, 'co...",0.9797,pos
5,love monaco staff husband stayed hotel crazy w...,5,"{'neg': 0.084, 'neu': 0.651, 'pos': 0.265, 'co...",0.987,pos
6,"cozy stay rainy city, husband spent 7 nights m...",5,"{'neg': 0.026, 'neu': 0.609, 'pos': 0.364, 'co...",0.9925,pos
8,"hotel stayed hotel monaco cruise, rooms genero...",5,"{'neg': 0.038, 'neu': 0.663, 'pos': 0.298, 'co...",0.9618,pos
9,excellent stayed hotel monaco past w/e delight...,5,"{'neg': 0.064, 'neu': 0.451, 'pos': 0.484, 'co...",0.9756,pos
10,"poor value stayed monaco seattle july, nice ho...",2,"{'neg': 0.08, 'neu': 0.524, 'pos': 0.395, 'com...",0.9666,pos
15,horrible customer service hotel stay february ...,1,"{'neg': 0.132, 'neu': 0.701, 'pos': 0.167, 'co...",0.8496,pos
16,disappointed say anticipating stay hotel monac...,2,"{'neg': 0.096, 'neu': 0.668, 'pos': 0.236, 'co...",0.9905,pos


In [41]:
reviews.tail(10)

Unnamed: 0,Review,Rating,scores,compound,prediction
20479,lack customer service skills overpriced place ...,2,"{'neg': 0.035, 'neu': 0.8, 'pos': 0.165, 'comp...",0.9816,pos
20480,great play stay stay loyal inn package deal ha...,5,"{'neg': 0.132, 'neu': 0.403, 'pos': 0.465, 'co...",0.9595,pos
20481,ok price look hotel ok little run average clea...,2,"{'neg': 0.046, 'neu': 0.703, 'pos': 0.25, 'com...",0.8515,pos
20482,great choice wife chose best western quite bit...,5,"{'neg': 0.065, 'neu': 0.531, 'pos': 0.404, 'co...",0.9945,pos
20484,deceptive staff deceptive desk staff claiming ...,2,"{'neg': 0.207, 'neu': 0.747, 'pos': 0.046, 'co...",-0.947,neg
20485,not impressed unfriendly staff checked asked h...,2,"{'neg': 0.182, 'neu': 0.666, 'pos': 0.152, 'co...",-0.5013,neg
20486,"best kept secret 3rd time staying charm, not 5...",5,"{'neg': 0.063, 'neu': 0.665, 'pos': 0.272, 'co...",0.9834,pos
20488,"ok just looks nice modern outside, desk staff ...",2,"{'neg': 0.131, 'neu': 0.724, 'pos': 0.145, 'co...",0.2629,pos
20489,hotel theft ruined vacation hotel opened sept ...,1,"{'neg': 0.15, 'neu': 0.671, 'pos': 0.179, 'com...",0.9867,pos
20490,"people talking, ca n't believe excellent ratin...",2,"{'neg': 0.193, 'neu': 0.668, 'pos': 0.14, 'com...",-0.6071,neg


In [42]:
reviews.prediction.value_counts()


Unnamed: 0_level_0,count
prediction,Unnamed: 1_level_1
pos,10876
neg,1392


Y finalmente computar unas cuantas métricas de calidad del modelo:

In [43]:
# Map the 'Rating' values to 'neg' and 'pos'
reviews['sentiment_label'] = reviews['Rating'].apply(lambda rating: 'neg' if rating <= 2 else 'pos')

# Display the first few rows to show the new column
display(reviews.head())

Unnamed: 0,Review,Rating,scores,compound,prediction,sentiment_label
1,ok nothing special charge diamond member hilto...,2,"{'neg': 0.11, 'neu': 0.701, 'pos': 0.189, 'com...",0.9787,pos,neg
3,"unique, great stay, wonderful time hotel monac...",5,"{'neg': 0.06, 'neu': 0.555, 'pos': 0.385, 'com...",0.9912,pos,pos
4,"great stay great stay, went seahawk game aweso...",5,"{'neg': 0.135, 'neu': 0.643, 'pos': 0.221, 'co...",0.9797,pos,pos
5,love monaco staff husband stayed hotel crazy w...,5,"{'neg': 0.084, 'neu': 0.651, 'pos': 0.265, 'co...",0.987,pos,pos
6,"cozy stay rainy city, husband spent 7 nights m...",5,"{'neg': 0.026, 'neu': 0.609, 'pos': 0.364, 'co...",0.9925,pos,pos


In [46]:
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

y_true = reviews.sentiment_label.values
y_pred = reviews.prediction.values

acc = accuracy_score(y_true, y_pred)
cm = confusion_matrix(y_true, y_pred)
cr = classification_report(y_true, y_pred)


print(f"Accuracy:\n{acc}\n")
print(f"Classification Report:\n{cr}")
print(f"Confusion Matrix:\n{cm}")

Accuracy:
0.843495272253016

Classification Report:
              precision    recall  f1-score   support

         neg       0.96      0.42      0.58      3214
         pos       0.83      0.99      0.90      9054

    accuracy                           0.84     12268
   macro avg       0.90      0.71      0.74     12268
weighted avg       0.86      0.84      0.82     12268

Confusion Matrix:
[[1343 1871]
 [  49 9005]]


La correctitud no es la mejor, aún podemos hacerlo mucho mejor que la línea base (50%). Parece que tenemos problemas con las etiquetas negativas!