In [None]:
import matplotlib.pyplot as plt
import pandas as pd
from nltk.sentiment.vader import SentimentIntensityAnalyzer

In this code block, I imported essential libraries for data visualization and sentiment analysis within my machine learning project.

- **Matplotlib**: Used it for creating visualizations.
- **Pandas**: Employed it for data manipulation and analysis.
- **Seaborn**: Enhanced the style and aesthetics of Matplotlib plots.
- **NLTK VADER SentimentIntensityAnalyzer**: Imported it for conducting sentiment analysis tasks on textual data.

In [None]:
data = pd.read_csv('dataset/hotel_reviews.csv')
display(data.head())

I loaded the dataset from a CSV file named 'hotel_reviews.csv' into a Pandas DataFrame, then I displayed the first few rows of the loaded dataset for initial data exploration.

In [None]:
display(data.info())

Used the .info() method to obtain an overview of the dataset's structure, including data types, missing values, and more.

In [None]:
display(data.isnull().sum())

I checked for missing values within the dataset using the .isnull() method and calculated the sum of missing values for each column.

In [None]:
ratings = data['Rating'].value_counts()
index = ratings.index
values = ratings.values

custom_colors = ['forestgreen', 'dodgerblue', 'darkorange', 'lightsalmon', 'red']
plt.figure(figsize=(7, 7))
plt.pie(values, labels=index, colors=custom_colors)
central_circle = plt.Circle((0, 0), 0.5, color='white')
fig = plt.gcf()
fig.gca().add_artist(central_circle)
plt.rc('font', size=12)
plt.title('Hotel Reviews Ratings', fontsize=20)

legend_labels = [f'{count}' for rating, count in zip(index, values)]
plt.legend(legend_labels, title="Rating Count", loc="center left", bbox_to_anchor=(1, 0.5))

plt.show()

I created this graph to illustrate the distribution of hotel review ratings. Each colored segment of the pie chart represented a different rating category, and the size of each segment corresponded to the number of reviews with that specific rating. I added a legend to display the actual count of reviews for each rating category, providing a clear visual representation of the quantity of ratings in the dataset.

In [None]:
sentiments = SentimentIntensityAnalyzer()
data['Positive'] = [sentiments.polarity_scores(i)['pos'] for i in data['Review']]
data['Negative'] = [sentiments.polarity_scores(i)['neg'] for i in data['Review']]
data['Neutral'] = [sentiments.polarity_scores(i)['neu'] for i in data['Review']]
display(data.head())

In this code block, I utilized the NLTK VADER SentimentIntensityAnalyzer to perform sentiment analysis on the textual data within the dataset. The following actions were taken:

- Created an instance of the SentimentIntensityAnalyzer class to analyze the sentiment of text data.
- Calculated the positive, negative, and neutral sentiment scores for each review in the dataset using list comprehensions and the polarity_scores method provided by the SentimentIntensityAnalyzer.
- Added three new columns ('Positive', 'Negative', and 'Neutral') to the dataset, each containing the corresponding sentiment score for the reviews.
- Displayed the first few rows of the updated dataset to provide an initial look at the sentiment analysis results.

This process allows for a more in-depth understanding of the sentiment distribution within the dataset and can be valuable for further analysis and insights.

In [None]:
x = sum(data['Positive'])
y = sum(data['Negative'])
z = sum(data['Neutral'])

def sentiment_score(a, b, c):
    if (a > b) and (a > c):
        print('Positive 😊')
    elif (b > a) and (b > c):
        print('Negative 😠')
    else:
        print('Neutral 🙂')
        
sentiment_score(x, y, z)

In this code block:

- Calculated the total positive, negative, and neutral sentiment scores for the entire dataset using the `sum()` function and the respective columns from the Pandas DataFrame.
- Defined a custom function `sentiment_score(a, b, c)` that takes the total positive, negative, and neutral scores as inputs and determines the overall sentiment of the dataset.
- Printed the overall sentiment label, which can be 'Positive 😊' if positive sentiment dominates, 'Negative 😠' if negative sentiment dominates, or 'Neutral 🙂' if the sentiment is relatively balanced.

This code provides a high-level summary of the sentiment distribution within the dataset, simplifying the analysis to a single sentiment label based on the calculated scores.