GitHub - mmonokuma/yelp_fake: prediction algorithm for detecting fake Yelp reviews

This project is for Q&A with Watson, a CS class at Columbia University with Professor Alfio Gliozzo.

Our project will build on Ott's (2011, 2013) work to predict fake Yelp reviews for 20 hotels in Chicago.

Value Proposition: Anywhere from 15-30% of online reviews are fake due to the financial incenctives for independent businesses to cheat. Yelp, Google, and Amazon among others are cracking down on the spam epidemic in a number of ways. Algorithms that can detect fake reviews with a low false positive rate are key to this endeavor, and companies are investing millions in developing these algorithms. Therefore, predictive analytics in this area is key to continued success of online marketplaces. Our project aims to build on this research using Watson tools to improve the predictive model.

Project Abstract: Online reviews have become the cornerstone of many shoppers' buying decisions. A UK study estimates that over 20 billion pounds are spend based on online reviews.* Therefore, the financial incentives is high for a company to boost its reputation with positive fake reviews or attack competitors with negative fake reviews. An HBS study estimates that a one-star increase on Yelp can increase a restaurant's revenue by 5-9%.**

Both academic and industry research has been directed toward detecting fake reviews across various industries and sites. In particular, Ott and the Cornell Research Group used Mechanical Turk to generate 400 positive and negative fake reviews for hotels in Chicago. Their prediction model showed a 40% increase in accuracy over human judges. Our project seeks to build on top of this research by predicting fake reviews based on the user and review interactions.

[1] M. Ott, Y. Choi, C. Cardie, and J.T. Hancock. 2011. Finding Deceptive Opinion Spam by Any Stretch of the Imagination. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies.

[2] M. Ott, C. Cardie, and J.T. Hancock. 2013. Negative Deceptive Opinion Spam. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.

[3] Competition and Markets Authority. "Online reviews and endorsements: Report on the CMA's call for information." June, 2015.

[4] Luca, Michael. "Reviews, Reputation, and Revenue: The Case of Yelp.com." Harvard Business School Working Paper, No. 12-016, September 2011. (Revise and resubmit at the American Economic Journal - Applied Economics.)

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
css		css
data		data
fonts		fonts
img		img
js		js
models		models
op_spam_v1.4/positive_polarity_watson		op_spam_v1.4/positive_polarity_watson
python		python
.gitignore		.gitignore
README.md		README.md
abstract.md		abstract.md
business_data.csv		business_data.csv
honest_web.key		honest_web.key
hotel_yelp_reviews_with_sample.csv		hotel_yelp_reviews_with_sample.csv
hotels_data.csv		hotels_data.csv
hotels_prior.csv		hotels_prior.csv
hotels_prior_09-2014.csv		hotels_prior_09-2014.csv
index.html		index.html
tone_analyzer_extract.ipynb		tone_analyzer_extract.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

mmonokuma/yelp_fake

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages