**In this notebook, we leverage a transformer-based model to perform sentiment analysis on hotel reviews from New York City. The process includes the following steps:**

Data Preparation: The notebook begins by loading a dataset that includes pre-processed hotel reviews with previously identified topics.

Sentiment Analysis Setup: We use the Hugging Face transformers library to set up a sentiment analysis pipeline. The DistilBERT model, fine-tuned on the SST-2 dataset for sentiment classification, is chosen for this task.

Sentiment Prediction: The model is applied to the text data, predicting sentiment labels (e.g., positive or negative) along with confidence scores. These results are added back to the original dataset for further analysis.

Results Integration: The dataset is updated with the sentiment labels and scores, providing a comprehensive view of customer sentiments associated with each review. This enriched data can be used for deeper analysis, including correlating sentiments with specific topics or hotel attributes.

This notebook concludes by saving the sentiment-enriched dataset, setting the stage for further exploration and insights into customer satisfaction and hotel performance.

In [1]:
import pandas as pd
import re
import string
import torch
from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassification
from datasets import Dataset, DatasetDict

In [3]:
reviews_df = pd.read_csv('./data/df_with_topics.csv')

In [4]:
# Convert the DataFrame to a Hugging Face Dataset
dataset = Dataset.from_pandas(reviews_df[['cleaned_text']])
# Specify the model and tokenizer
model_name = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(model_name)

# For GPU usage
device = 0 if torch.cuda.is_available() else -1
# Load the sentiment analysis pipeline
sentiment_analysis = pipeline("sentiment-analysis", model=model_name, tokenizer=tokenizer, device=device)


In [5]:
# Define a function to get the sentiment
def get_sentiment(batch):
    # Truncate text to the maximum length of the model
    truncated_texts = [text[:tokenizer.model_max_length] for text in batch['cleaned_text']]
    sentiments = sentiment_analysis(truncated_texts)
    return {
        'sentiment': [s['label'] for s in sentiments],
        'score': [s['score'] for s in sentiments]
    }

In [6]:
# Apply the sentiment analysis in batches
result = dataset.map(lambda batch: get_sentiment(batch), batched=True, batch_size=58)

# Add the sentiment results back to the original DataFrame
reviews_df['sentiment'] = result['sentiment']
reviews_df['score'] = result['score']

# Display the first few rows of the dataset with sentiment labels and scores
print(reviews_df[['cleaned_text', 'sentiment', 'score']].head())

Map:   0%|          | 0/206370 [00:00<?, ? examples/s]

You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset


                                        cleaned_text sentiment     score
0  stayed in a king suite for 11 nights and yes i...  NEGATIVE  0.680134
1  on every visit to nyc the hotel beacon is the ...  POSITIVE  0.999830
2  this is a great property in midtown we two dif...  POSITIVE  0.998442
3  the andaz is a nice hotel in a central locatio...  POSITIVE  0.999454
4  i have stayed at each of the us andaz properti...  POSITIVE  0.999725


In [7]:
reviews_df

Unnamed: 0,title,text,num_helpful_votes,date,hotel_class,type,name,service_rating,cleanliness_rating,overall_rating,...,business_service_(e_g_internet_access)_rating,check_in_front_desk_rating,cleaned_text,language,processed_text,topic,new_topics,label,sentiment,score
0,"“Truly is ""Jewel of the Upper Wets Side""”",Stayed in a king suite for 11 nights and yes i...,0,2012-12-17,3.0,hotel,Hotel Beacon,5.0,5.0,5.0,...,,,stayed in a king suite for 11 nights and yes i...,en,king suite 11 night yes cot bit happy standard...,-1,0,Overall Hotel Experience,NEGATIVE,0.680134
1,“My home away from home!”,"On every visit to NYC, the Hotel Beacon is the...",0,2012-12-17,3.0,hotel,Hotel Beacon,5.0,5.0,5.0,...,,,on every visit to nyc the hotel beacon is the ...,en,visit beacon place love conveniently located c...,0,0,Overall Hotel Experience,POSITIVE,0.999830
2,“Great Stay”,This is a great property in Midtown. We two di...,0,2012-12-18,4.0,hotel,Andaz 5th Avenue,4.0,5.0,4.0,...,,,this is a great property in midtown we two dif...,en,property midtown different different north tow...,0,0,Overall Hotel Experience,POSITIVE,0.998442
3,“Modern Convenience”,The Andaz is a nice hotel in a central locatio...,0,2012-12-17,4.0,hotel,Andaz 5th Avenue,5.0,5.0,4.0,...,,,the andaz is a nice hotel in a central locatio...,en,andaz central location manhattan hyatt come mo...,-1,11,"Evening Offerings: Wine, Cheese, and Breakfast",POSITIVE,0.999454
4,“Its the best of the Andaz Brand in the US....”,I have stayed at each of the US Andaz properti...,0,2012-12-17,4.0,hotel,Andaz 5th Avenue,4.0,5.0,4.0,...,,,i have stayed at each of the us andaz properti...,en,andaz property west hollywood property brand s...,2,2,Positive Experience: Location and Staff,POSITIVE,0.999725
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
206365,“Absolutely Perfect Stay”,My husband and I returned yesterday from NYC -...,1,2003-12-01,4.0,hotel,Hotel Giraffe,,,5.0,...,,,my husband and i returned yesterday from nyc w...,en,husband returned yesterday outstanding experie...,11,11,"Evening Offerings: Wine, Cheese, and Breakfast",POSITIVE,0.999796
206366,“Awesome boutique hotel”,"I stayed at Giraffe Hotel three times already,...",2,2003-09-28,4.0,hotel,Hotel Giraffe,,,5.0,...,,,i stayed at giraffe hotel three times already ...,en,giraffe favourite travel time giraffe neighbou...,11,11,"Evening Offerings: Wine, Cheese, and Breakfast",POSITIVE,0.999751
206367,“Fabulous!”,Stayed at the Giraffe for a weekend and loved ...,1,2003-09-10,4.0,hotel,Hotel Giraffe,,,5.0,...,,,stayed at the giraffe for a weekend and loved ...,en,giraffe weekend loved staff attentive remember...,11,11,"Evening Offerings: Wine, Cheese, and Breakfast",POSITIVE,0.999744
206368,“Hotel Giraffe Head & Shoulders Above the Rest”,"When they get it right, they REALLY get it rig...",10,2003-08-10,4.0,hotel,Hotel Giraffe,,,5.0,...,,,when they get it right they really get it righ...,en,right right giraffe week business getting righ...,2,2,Positive Experience: Location and Staff,POSITIVE,0.998954


In [8]:
reviews_df.to_csv('./data/reviews.csv', index= False)