prediction algorithm for detecting fake Yelp reviews
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
css
data
fonts
img
js
models
op_spam_v1.4/positive_polarity_watson
python
.gitignore
README.md
abstract.md
business_data.csv
honest_web.key
hotel_yelp_reviews_with_sample.csv
hotels_data.csv
hotels_prior.csv
hotels_prior_09-2014.csv
index.html
tone_analyzer_extract.ipynb

README.md

This project is for Q&A with Watson, a CS class at Columbia University with Professor Alfio Gliozzo.

Our project will build on Ott's (2011, 2013) work to predict fake Yelp reviews for 20 hotels in Chicago.

Value Proposition: Anywhere from 15-30% of online reviews are fake due to the financial incenctives for independent businesses to cheat. Yelp, Google, and Amazon among others are cracking down on the spam epidemic in a number of ways. Algorithms that can detect fake reviews with a low false positive rate are key to this endeavor, and companies are investing millions in developing these algorithms. Therefore, predictive analytics in this area is key to continued success of online marketplaces. Our project aims to build on this research using Watson tools to improve the predictive model.

Project Abstract: Online reviews have become the cornerstone of many shoppers' buying decisions. A UK study estimates that over 20 billion pounds are spend based on online reviews.* Therefore, the financial incentives is high for a company to boost its reputation with positive fake reviews or attack competitors with negative fake reviews. An HBS study estimates that a one-star increase on Yelp can increase a restaurant's revenue by 5-9%.**

Both academic and industry research has been directed toward detecting fake reviews across various industries and sites. In particular, Ott and the Cornell Research Group used Mechanical Turk to generate 400 positive and negative fake reviews for hotels in Chicago. Their prediction model showed a 40% increase in accuracy over human judges. Our project seeks to build on top of this research by predicting fake reviews based on the user and review interactions.

[1] M. Ott, Y. Choi, C. Cardie, and J.T. Hancock. 2011. Finding Deceptive Opinion Spam by Any Stretch of the Imagination. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies.

[2] M. Ott, C. Cardie, and J.T. Hancock. 2013. Negative Deceptive Opinion Spam. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.

[3] Competition and Markets Authority. "Online reviews and endorsements: Report on the CMA's call for information." June, 2015.

[4] Luca, Michael. "Reviews, Reputation, and Revenue: The Case of Yelp.com." Harvard Business School Working Paper, No. 12-016, September 2011. (Revise and resubmit at the American Economic Journal - Applied Economics.)