CS 725 : Foundations of Machine Learning
Sarcasm Detection in Tweets
Navjot Singh 130110071
Kalpesh Patil 130040019
Ashwin Bhat 13D070006
Yash Bhalgat 13D070014
- (stores intermediate data files)
just run automate.sh present in /code $ : ./automate.sh
The Flow :
The project proceeds in the following manner working through different codes and files present :
Use Senti-Strength tool on sarcasm_tweets.txt and nonsarcasm_tweets.txt to generate polarity scores for each word.
-> Output: sarcsam_tweets_clean.txt and nonsarcasm_tweets_clean.txt are obtained.
Run sarcasm_gen_features.py and nonsarcasm_gen_features.py. These generate the numerical features as discussed in the report. -> Output: sarcasm_gen_features.csv and nonsarcasm_gen_features.csv are obtained
Run sarcasm_tweets_gap.py as gaps in sarcasm_tweets.txt are to be removed to run 'x'_gen_emojis.py. -> Output: sarcasm_tweets_gap.txt is obtained.
Run both sarcasm_gen_emoji.py and nonsarcasm_gen_emojis.py -> Output: sarcasm_emoji.csv and nonsarcasm_emoji.csv are obtained.
Merging in sarcasm_emoji.csv and sarcasm_gen_features.csv (and same for nonsarcasm) is required, this is done by the Rscript provided. -> We are now done with producing our numerical features.
Run create_vocab.py to form a extract unigrams from sarcsam_tweets.txt and nonsarcasm_tweets.txt -> Output: vocab.txt storing occurence of each unigram in our data.
Run neural_network.py -> Output: Return Precision,Accuracy and Recall values for NN
Run nonsarcsam_final_features.py and sarcasm_final_features.py to make LibSVM features -> Output : sarcasm_final_libsvm.py sarcasm_final_libsvm.py
Run svm_classification.py -> Output: Return Precision,Accuracy and Recall values for SVM
(Please refer to the report)
Github Repository : https://github.com/navisngh11/Sarcasm-Detection-Twitter