Reputation analysis using Natural Language Processing tools (Text analyctics), semisupervised classification and timeseries analysis.
Analyse text content from tripavisors reviews over Pullman resort in Port Douglas, using a varity of methods.
Check the full jupyter notebook here (Report)
I explore the current trends of customer experience
through online comments on TripAdvisor for Pullman Sea temple (PPD) in Port Douglas, Queensland, Australia.
By analysing the scores, I discovered:
- The score distribution by comments. Most comments have a high score (5 bubbles/stars)!
However, when applying a Time series analysis
I realised that:
- Monthly average number of comments has increased through the years, although
- Score evolution shows a declining trend in recent years, and
- when counting the proportion of comments, I discovered that despite most comments are still positive, negatives are more predominant
- When checking the absolute values we see negative comments remaining the same but there are fewer new positive comments
To understand the customer experience and why the score is declining, I performed several Text analysis
of the actual comments, to discover:
- By using multiple strategies I extracted most common phases to see which factors are the most important for customers, like the swimming pool, the distance to town or the staff
- Applying
vector similarity
, I build a semi-supervised sentence classifier to group the text by its content in 5 categories: Housekeeping, Infrastructure, Restaurant, Front Desk and others. I later checked if their prevalence changed over time. Which wasn't the case: All 5 topics are relevant all the time.
- Also, I used full unsupervised Topic modelling technique to explore more relevant topics I could miss in the first analysis.
This analysis showed again that distance to town the swimming pool and the staff, specially from front desk, were the most important, but also:
- The restaurant and room service
- Most rooms are fully equipped apartments with clean and spacious rooms with kitchen and laundry
- The latest is important for families with kids, it is likely the main type of customer
- Also the hotel configuration and the different types of buildings
- Atmosphere: luxury and tropical
- Other surrounding attractions like the Daintree and the Coral Reef
- Then, I applied Sentiment Analysis, to score how positive or negative a comment was by its content, and realised that Housekeeping has the least positive sentiment. While the Front desk was mostly positive.
- Finally, I used Signal Decomposition over the sentiment score through time:
- Seasonality creates pressure over both Food and beverage and Housekeeping areas.
- Environmental and infrastructure factors may need renewal as its novelty use decay over time, as shown by its declining trend.
- Because rooms are functional apartments with independent access, some rooms are privately owned and rented through other media such AirBnB. And those may not include services from the hotel management and may have separate housekeeping and other services. Those can impact the comments score as more and more rooms are being sold to private owners.
Pullman Sea temple Resort constantly check online reviews to improve service. However, it is difficult to have a systematical view of the text content of such reviews, specially to compare evolution and trends.
- Python 3.8
- Jupyter notebook
- Spacy
- Gensim 3.8.3
- pyVisLDA
- Pandas
- Numpy
- NLTK
- Scklearn
- Scrapy
pip install requirements.txt
to run scrapy script go to /tripullman and run
scrapy crawl pull -o test.csv
Open the file using VSCODE and jupyter notebook plugin OR
alternative open the terminal in the main folder and run
jupyter notebook