Fake news (text) classification system. Also performs bigram-level analysis on the corpus to calculate some statistics. Dataset was provided by the university. Multiple representations are tested (Bag of Words, tf-idf, Doc2Vec from scratch). A variety of classifiers are explored, such as Complement Naive Bayes (https://scikit-learn.org/stable/modules/naive_bayes.html#complement-naive-bayes), SVM and logistic regression.
pandas, matplotlib, nltk, numpy, wordcloud, gensim, scikit-learn, pandarallel