In [None]:
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from nltk.sentiment.vader import SentimentIntensityAnalyzer

In this code block, I imported essential libraries for data visualization and sentiment analysis within my machine learning project.

- **Matplotlib**: Used it for creating visualizations.
- **Pandas**: Employed it for data manipulation and analysis.
- **Seaborn**: Enhanced the style and aesthetics of Matplotlib plots.
- **NLTK VADER SentimentIntensityAnalyzer**: Imported it for conducting sentiment analysis tasks on textual data.

In [None]:
data = pd.read_csv('dataset/hotel_reviews.csv')
display(data.head())

I loaded the dataset from a CSV file named 'hotel_reviews.csv' into a Pandas DataFrame, then I displayed the first few rows of the loaded dataset for initial data exploration.

In [None]:
display(data.info())

Used the .info() method to obtain an overview of the dataset's structure, including data types, missing values, and more.

In [None]:
display(data.isnull().sum())

I checked for missing values within the dataset using the .isnull() method and calculated the sum of missing values for each column.

In [None]:
ratings = data['Rating'].value_counts()
index = ratings.index
values = ratings.values

custom_colors = ['forestgreen', 'dodgerblue', 'darkorange', 'lightsalmon', 'red']
plt.figure(figsize=(7, 7))
plt.pie(values, labels=index, colors=custom_colors)
central_circle = plt.Circle((0, 0), 0.5, color='white')
fig = plt.gcf()
fig.gca().add_artist(central_circle)
plt.rc('font', size=12)
plt.title('Hotel Reviews Ratings', fontsize=20)

legend_labels = [f'{count}' for rating, count in zip(index, values)]
plt.legend(legend_labels, title="Rating Count", loc="center left", bbox_to_anchor=(1, 0.5))

plt.show()

I created this graph to illustrate the distribution of hotel review ratings. Each colored segment of the pie chart represented a different rating category, and the size of each segment corresponded to the number of reviews with that specific rating. I added a legend to display the actual count of reviews for each rating category, providing a clear visual representation of the quantity of ratings in the dataset.