GitHub - fmohsen/Fake-news-res

Table of Contents

About The Project
Built With
Geting Started
Contributors

About The Project

This project extends an existing application designed to compare different machine learning libraries for fake news detection. It aims to ease the process of selecting and tuning algorithms by integrating an additional machine learning library, namely the Naive Bayes classifier and enabling user customization of key parameters. It also incorperates techniques like Transfer Learning and Incremental Learning. The project provides a versatile tool, beneficial for researchers, journalists, and other stakeholders in the battle against fake news. Besides, it contributes to ongoing research by exploring the performance of various machine learning libraries, shedding light on the development of more robust and efficient fake news detection systems.

Scope

This project focuses on:

Extension of Existing Application: The project builds upon an existing application for comparing machine learning libraries in terms of fake news detection.
Inclusion of Additional Library: The project involves incorporating an additional machine learning library, the Naive Bayes classifier, widening the range of comparison.
Parameter Customization: The project enables users to customize key parameters of the chosen algorithms, providing greater flexibility and adaptability to various datasets and requirements.
Transfer and Incremental Learning: The project explores the use of transfer learning and incremental learning methods.Both these methodologies are evaluated for their potential in fake news detection, and their performance is compared to traditional machine learning approaches.
Binary Classification: The current scope is restricted to fake news detection in terms of binary classification - labeling news as either 'fake' or 'not fake'.
Language Limitation: The application primarily supports English language datasets at this stage.
Dataset Dependency: The project's effectiveness depends on the quality and diversity of datasets used for training and testing the models.
Algorithm Comparison: The project evaluates and compares the performance of the included machine learning libraries.

❗ Disclaimer: This application explores the use of supervised machine learning methods for fake news detection, focusing on patterns such as writing style and keyword usage. While these techniques can effectively identify some elements common to fake news, they should not be misconstrued as a comprehensive fact-checking system.

Supported Machine Learning Models

Logistic Regression
Decision Trees
K-Nearest Neighbors
Gradient Boosting
Naïve Bayes (Multinomial and Bernoulli)

Preprocessing Techniques

Data Loading and Feature Extraction: The parse_data function is used for loading and decoding datasets of various formats. The feature_extraction function leverages the Empath tool to analyze text for lexical features.
Feature Selection: Feature selection is carried out by the computeAnova and getDatasetWithSignificantFeatures functions, which help to reduce the dimensionality of the dataset.
Data Cleaning and Preprocessing: Data cleaning and preprocessing tasks are performed by the drop_na function (which removes missing values), drop_non_english function (that excludes non-English text, though this function is not actively used in this project), and shuffle_csv function (which randomizes DataFrame rows).
Text Preprocessing: Text preprocessing includes the removal of punctuation and stopwords (remove_punctuation_stopwords), and word stemming (word_stemming).
Dataset Modifications: Functions such as add_label and keep_columns are used to modify the dataset, while mergeDatasetWithLabels merges the dataset and labels.
Text Vectorization: Finally, the vectorize_feature function applies TF-IDF vectorization to the input text, preparing it for machine learning algorithms.

(back to top)

Built With

This project has used the following libraries / modules:

(back to top)

Getting Started

❗ Note: Install the required dependencies by running `pip install -r requirements.txt`

Run the main.py file and visit localhost:8050.

Step-by-step Guide

Load one or more datasets by clicking Select Files.
Choose the label column from the dropdown menu.
Choose one or more ML algorithms from the dropdown menu
Customize their parameters.
Choose the datatype and vectorization algorithm.
Choose the feature column.
Divide the dataset into training set and testing set.
Generate predictions.

Contributors

Kevin Wang / S-Number: 3470016

Bedir Chaushi

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
FA-KES-Dataset.csv		FA-KES-Dataset.csv
List of Datasets.txt		List of Datasets.txt
README.md		README.md
cache.db		cache.db
cache.db-shm		cache.db-shm
cache.db-wal		cache.db-wal
changelog.md		changelog.md
counts.py		counts.py
custom-script.js		custom-script.js
data_for_feature_extraction.csv		data_for_feature_extraction.csv
header.css		header.css
helper.py		helper.py
main.py		main.py
pulse.gif		pulse.gif
requirements.txt		requirements.txt
typography.css		typography.css

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About The Project

Scope

Supported Machine Learning Models

Preprocessing Techniques

Built With

Getting Started

Step-by-step Guide

Contributors

About

Releases

Packages

Languages

fmohsen/Fake-news-res

Folders and files

Latest commit

History

Repository files navigation

About The Project

Scope

Supported Machine Learning Models

Preprocessing Techniques

Built With

Getting Started

Step-by-step Guide

Contributors

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages