Tripadvisor Review Analyzer App using Python and selenium to scrape and extract the latest reviews from an attraction on the Tripadvisor URL link the user enters on the landing page, scraped review data then cleaned, processed and analyzed with Natural Language Processing toolkit NLTK and Sentiment Analysis is performed on the contents of the reviews
Tripadvisor Review Analyzer App for tourist attractions using Python and selenium to scrape and extract the latest reviews from an attraction on the Tripadvisor URL link the user enters on the app landing page, then the scraped review data are cleaned,processed and analyzed with Natural Language Processing toolkit NLTK and Sentiment Analysis is performed on the contents of the reviews.
First of all, when the URL link of an attraction on Tripadvisor is entered by the user,selenium will scrape the data for the latest 100 reviews written for the attraction on Tripadvisor page *(less than 100 reviews will be analyzed if the attraction is fairly new or unknown and has less than 100 reviews written on its Tripadvisor page) then, using the Natural Language Toolkit python package NLTK and its built-in Vader Sentiment Analyzer, classify the reviews written for the attraction as positive, negative or neutral using a lexicon of positive and negative words.
Once the reviews are classified, data processing is performed on positive and negative reviews data respectively, Tokenization to break down the review sentences into meaningful elements as tokens, lowercase texts and remove puctuations then remove the words such as "the", "is", "what" and so on from the tokenized data that are irrelevant to text sentiment and dont provide any valuable information which are stopwords
The next step is, again with NLTK, get the most common words found in both positive and negative review groups and the following data is available and displayed as analyzed results on the results page:
number of reviews classified as positive
number of reviews classified as negative
few samples of reviews classified as positive
few samples of reviews classified as negative
Most frequently used words and its frequency found in POSITIVE reviews
Most frequently used words and its frequency found in NEGATIVE reviews
This is how the results page looks like: