Summary: Recent debate over fake news makes it important to have a method of identifying the validity of headlines. Especially with the nature of the contemporary political atmosphere, satire and real news are increasingly difficult to differentiate. Using an extensive dataset of headlines from Kaggle, our group first appended the original dataset with numerical variables describing each headline. Then, we performed a hypothesis test to determine the statistical significance of headline character length on sarcastic vs. non-sarcastic articles (A/B testing). Finally, we built, trained, and evaluated three classifiers (Naive Bayes, k-Nearest Neighbors, and Multilayer Perceptron) for the purpose of detecting sarcasm in news headlines.
Link to our dataset: https://www.kaggle.com/rmisra/news-headlines-dataset-for-sarcasm-detection
Link to our pitch deck: https://docs.google.com/presentation/d/1FRci0cazV-hCFm9mKHR-GUqdMgsujb5T9tqFilcxw54/edit?usp=sharing
Seena Saiedian, Edward Liu, Alex Xu, Will Furtado, Jason Xiong, and Kathlee Wong's project for Data Science Society at Berkeley's Fall 2019 General Membership program.