# Key topics extraction and contextual sentiment of users’ reviews

In this notebook i tried to impliment a **Key topics extraction and contextual sentiment** as disscussed by Dhruv Pathak in his [blog.](https://tech.goibibo.com/key-topics-extraction-and-contextual-sentiment-of-users-reviews-20e63c0fd7ca)

## Setup

### Required python libraries

* Spacy
* Scikit-learn
* Difflib
* Jellyfish
* Regex
* Nltk

### Various NLP concepts that are used in this notebookm

* __Stemming and Lemmatization__, to work with root forms of multiple variations.
* __Fuzzy matching__, for approximate phrase matches and paraphrase detection.
* __Tokenization, retokenization, part of speech (POS) tagging__ to identify the concepts in the content
* __Dependency Parsing__ to find relations between the concepts, and use that wisdom.

## Import the relevant packages

In [6]:
import numpy as np
import pandas as pd
import nltk
import re
import spacy

## Data

That's where we load, inspect and preprocess our data.

In [7]:
data_df = pd.read_csv("D:/Code/Resources/Data/goibibo_com-travel_sample.csv")

In [9]:
# Data Information
print("Shape\n","="*100)
print(data_df.shape)
print("\n\nData Described\n", "="*100)
print(data_df.describe())
print("\n\nData Information\n","="*100)
print(data_df.info())
data_df.head()

Shape
(4000, 36)


Data Described
       guest_recommendation  hotel_star_rating  image_count     latitude  \
count           2416.000000        4000.000000  4000.000000  4000.000000   
mean              75.537666           1.306000    12.995500    21.288213   
std               22.698935           1.479159    11.631113     7.576905   
min                0.000000           0.000000     0.000000     8.080476   
25%               67.000000           0.000000     6.000000    13.748553   
50%               80.000000           1.000000     9.000000    22.225083   
75%               90.000000           3.000000    17.000000    28.018203   
max              100.000000           5.000000   129.000000    79.608077   

         longitude   room_count  site_review_count  site_review_rating  
count  4000.000000  4000.000000        2416.000000         2416.000000  
mean     77.432995    22.200250          47.765728            3.750993  
std       4.506588    96.132138          93.233924            

Unnamed: 0,additional_info,address,area,city,country,crawl_date,guest_recommendation,hotel_brand,hotel_category,hotel_description,...,room_count,room_facilities,room_type,similar_hotel,site_review_count,site_review_rating,site_stay_review_rating,sitename,state,uniq_id
0,Room Service|Internet Access|Restaurant|Free I...,"15th Mile, N.H.21,Manali, District Kullu,Himac...",Others,Manali,India,2016-07-24,85.0,,gostays,The standard check-in time is 12:00 PM and the...,...,17,Room Service |Basic Bathroom Amenities|Cable /...,Deluxe Room,https://www.goibibo.com/hotels/woodchime-homes...,87.0,4.0,Service Quality::3.9|Amenities::3.7|Food and D...,goibibo,Himachal Pradesh,2c8db027d43a9452a43e88eb30d9f983
1,Room Service|Gym/Spa,"A-585, Sushant Lok-1 ,Near Iffco Chowk Metro S...",Sushant Lok,Gurgaon,India,2016-07-24,87.0,,regular,The standard check-in time is 12:00 PM and the...,...,18,Room Service |Air Conditioning |Basic Bathroom...,Deluxe Room With Free WIFI,https://www.goibibo.com/hotels/stepinn-iffco-c...,8.0,4.5,Service Quality::4.7|Amenities::4.7|Food and D...,goibibo,Haryana,e98f69f889c0235e6dc480e7df6de0de
2,Restaurant|Swimming Pool,"Cobra Vaddo,Calungate Baga Road, Bardez, Calan...",Calangute Area,Goa,India,2016-07-24,50.0,,regular,The standard check-in time is 12:00 PM and the...,...,15,Room Service |Air Conditioning |Cable / Satell...,Standard Room,https://www.goibibo.com/hotels/sunrise-beach-r...,2.0,2.5,Service Quality::2.5|Amenities::2.5|Food and D...,goibibo,Goa,9b59d00eaffc273d83000ed7dcda0e83
3,,Simsa,Village Simsa,Manali,India,2016-07-24,100.0,,regular,The standard check-in time is 12:00 PM and the...,...,24,Basic Bathroom Amenities|Cable / Satellite / P...,Deluxe Room,https://www.goibibo.com/hotels/green-cottages-...,1.0,5.0,Service Quality::5.0|Amenities::5.0|Food and D...,goibibo,Himachal Pradesh,df0971f9c5501af112485ee28b468ce5
4,Internet Access|Restaurant,"8180 Street No.-6,Arakashan Road,Paharganj",Paharganj,Delhi,India,2016-07-24,63.0,,regular,The standard check-in time is 12:00 PM and the...,...,20,Basic Bathroom Amenities|Cable / Satellite / P...,Standard Room Non AC,https://www.goibibo.com/hotels/delhi-continent...,121.0,2.8,Service Quality::2.7|Amenities::2.6|Food and D...,goibibo,Delhi,0c3514344c9cda8718f558e84bdb44ef


In [18]:
data_df.loc[0].to_dict()

{'additional_info': 'Room Service|Internet Access|Restaurant|Free Internet',
 'address': '15th Mile, N.H.21,Manali, District Kullu,Himachal Pradesh',
 'area': 'Others',
 'city': 'Manali',
 'country': 'India',
 'crawl_date': '2016-07-24',
 'guest_recommendation': 85.0,
 'hotel_brand': nan,
 'hotel_category': 'gostays',
 'hotel_description': 'The standard check-in time is 12:00 PM and the standard check-out time is 12:00 PM. Early check-in or late check-out is strictly subjected to availability and may be chargeable by the hotel. Any early check-in or late check-out request must be directed and reconfirmed with hotel directly',
 'hotel_facilities': "Doctor on Call|Dry Cleaning|Laundry Service Available|Lobby|Parking Facilities Available|Gardens|Dance Performances (on demand)|Catering|Multi Lingual Staff|Wake-up Call / Service|Suitable For Children|Kitchen available (home cook food on request)|Open Air Restaurant / Dining |Veg / Non Veg Kitchens Separate |Vegetarian Food / Jain Food Avail