# Key topics extraction and contextual sentiment of users’ reviews

In this notebook i tried to impliment a **Key topics extraction and contextual sentiment** as disscussed by Dhruv Pathak in his [blog.](https://tech.goibibo.com/key-topics-extraction-and-contextual-sentiment-of-users-reviews-20e63c0fd7ca)

## Setup

### Required python libraries

* Spacy
* Scikit-learn
* Difflib
* Jellyfish
* Regex
* Nltk

### Various NLP concepts that are used in this notebookm

* __Stemming and Lemmatization__, to work with root forms of multiple variations.
* __Fuzzy matching__, for approximate phrase matches and paraphrase detection.
* __Tokenization, retokenization, part of speech (POS) tagging__ to identify the concepts in the content
* __Dependency Parsing__ to find relations between the concepts, and use that wisdom.

## Import the relevant packages

In [6]:
import numpy as np
import pandas as pd
import nltk
import re
import spacy

## Data

That's where we load, inspect and preprocess our data.

In [19]:
data_df = pd.read_csv("D:/Code/Resources/Data/booking_com_hotel_reviews_europe.csv")

In [20]:
# Data Information
print("Shape\n","="*100)
print(data_df.shape)
print("\n\nData Described\n", "="*100)
print(data_df.describe())
print("\n\nData Information\n","="*100)
print(data_df.info())
data_df.head()

Shape
(515738, 17)


Data Described
       Additional_Number_of_Scoring  Average_Score  \
count                 515738.000000  515738.000000   
mean                     498.081836       8.397487   
std                      500.538467       0.548048   
min                        1.000000       5.200000   
25%                      169.000000       8.100000   
50%                      341.000000       8.400000   
75%                      660.000000       8.800000   
max                     2682.000000       9.800000   

       Review_Total_Negative_Word_Counts  Total_Number_of_Reviews  \
count                      515738.000000            515738.000000   
mean                           18.539450              2743.743944   
std                            29.690831              2317.464868   
min                             0.000000                43.000000   
25%                             2.000000              1161.000000   
50%                             9.000000              2134.0000

Unnamed: 0,Hotel_Address,Additional_Number_of_Scoring,Review_Date,Average_Score,Hotel_Name,Reviewer_Nationality,Negative_Review,Review_Total_Negative_Word_Counts,Total_Number_of_Reviews,Positive_Review,Review_Total_Positive_Word_Counts,Total_Number_of_Reviews_Reviewer_Has_Given,Reviewer_Score,Tags,days_since_review,lat,lng
0,s Gravesandestraat 55 Oost 1092 AA Amsterdam ...,194,8/3/2017,7.7,Hotel Arena,Russia,I am so angry that i made this post available...,397,1403,Only the park outside of the hotel was beauti...,11,7,2.9,"[' Leisure trip ', ' Couple ', ' Duplex Double...",0 days,52.360576,4.915968
1,s Gravesandestraat 55 Oost 1092 AA Amsterdam ...,194,8/3/2017,7.7,Hotel Arena,Ireland,No Negative,0,1403,No real complaints the hotel was great great ...,105,7,7.5,"[' Leisure trip ', ' Couple ', ' Duplex Double...",0 days,52.360576,4.915968
2,s Gravesandestraat 55 Oost 1092 AA Amsterdam ...,194,7/31/2017,7.7,Hotel Arena,Australia,Rooms are nice but for elderly a bit difficul...,42,1403,Location was good and staff were ok It is cut...,21,9,7.1,"[' Leisure trip ', ' Family with young childre...",3 days,52.360576,4.915968
3,s Gravesandestraat 55 Oost 1092 AA Amsterdam ...,194,7/31/2017,7.7,Hotel Arena,United Kingdom,My room was dirty and I was afraid to walk ba...,210,1403,Great location in nice surroundings the bar a...,26,1,3.8,"[' Leisure trip ', ' Solo traveler ', ' Duplex...",3 days,52.360576,4.915968
4,s Gravesandestraat 55 Oost 1092 AA Amsterdam ...,194,7/24/2017,7.7,Hotel Arena,New Zealand,You When I booked with your company on line y...,140,1403,Amazing location and building Romantic setting,8,3,6.7,"[' Leisure trip ', ' Couple ', ' Suite ', ' St...",10 days,52.360576,4.915968


In [21]:
data_df.loc[0].to_dict()

{'Hotel_Address': ' s Gravesandestraat 55 Oost 1092 AA Amsterdam Netherlands',
 'Additional_Number_of_Scoring': 194,
 'Review_Date': '8/3/2017',
 'Average_Score': 7.7,
 'Hotel_Name': 'Hotel Arena',
 'Reviewer_Nationality': ' Russia ',
 'Negative_Review': ' I am so angry that i made this post available via all possible sites i use when planing my trips so no one will make the mistake of booking this place I made my booking via booking com We stayed for 6 nights in this hotel from 11 to 17 July Upon arrival we were placed in a small room on the 2nd floor of the hotel It turned out that this was not the room we booked I had specially reserved the 2 level duplex room so that we would have a big windows and high ceilings The room itself was ok if you don t mind the broken window that can not be closed hello rain and a mini fridge that contained some sort of a bio weapon at least i guessed so by the smell of it I intimately asked to change the room and after explaining 2 times that i booked 