# Hotel Reviews in Europe

__Business problems and initial data exploration__

## Table of Contents

* [Business problems](#business_problems)
* [Data exploration](#data_exploration)
* [Data-driven solutions](#data_driven_solutions)

## Business problems <a class="anchor" id="business_problems"></a>

This project tries to solve several problems from the hotel industry, particularly those related to their guests, knowing their source countries and their preferences. Another important challenge for a hotel is to allocate the right amount of their resources according to the expected number of guests during each season, and targetting their advertising campaigns only to the right type of guest. Last but not least, every hotel manager should try to understand the hotel' strong and weak points by listening to what their guests have to say in their reviews.

Below we are exploring our dataset in the hope that we can come up with some data-driven solutions to our business problems.

## Data exploration <a class="anchor" id="data_exploration"></a>

__Import libraries and dataset__

In [None]:
import pandas as pd

df = pd.read_csv("data/Hotel_Reviews.csv")

__Size of dataset__

In [None]:
df.shape

__Column names and types__

In [None]:
df.dtypes

__Column description__

In [None]:
df.describe(include='all')

__Hotel countries__

In [None]:
set([address.split(" ")[-1] for address in df['Hotel_Address']])

__Number of guest nationalities__

In [None]:
len(df['Reviewer_Nationality'].unique())

__Number of hotels__

In [None]:
len(df['Hotel_Name'].unique())

In [None]:
df.head()

## Data-driven solutions <a class="anchor" id="data_driven_solutions"></a>

After exploring our data we can think of possible solutions to our business problems:

| Business problem | Solution | Notebook |
|------------------|----------|----------|
| Find the most frequent nationality of the guests staying in Spanish hotels | Group guest reviews by _Reviewer_Nationality_ and by _Hotel_Address_ containing the word 'Spain', and plot the result in a color-coded world map | [02_world_map](./02_hotel_reviews_world_map.ipynb) |
| Find the busiest month for the hotels of each country | Extract the month from _Review_Date_ and group by month and hotel countries, counting the number of guest reviews (assuming reviews are written straight after the stay) | [03_season_analysis](./03_hotel_reviews_season_analysis.ipynb) |
| Analyse the sentiment of guest reviews | Apply NLP techniques on _Positive_Review_ and _Negative_Review_ together with _Reviewer_Score_ and build a single-label classification algorithm | [04_sentiment_analysis](./04_hotel_reviews_sentiment_analysis.ipynb) |
| Target hotel ads to the right type of guests | Engineer new features from _Tags_ and build a clustering algorithm for hotels as well as a recommendation system for guests | [05_recommendation_system](./05_hotel_reviews_recommendation_system.ipynb)