GitHub - seroetr/Spam_MultiLabel_Text_Classification: Spam Text Classification using Machine Learning Models

Spam Text Classification

Spam Text Classification using Machine Learning Models

Multi-Label Text Classification

stackoverflow.csv dataset is used, which can be found via link: https://raw.githubusercontent.com/laxmimerit/All-CSV-ML-Data-Files-Download/master/stackoverflow.csv
Tf-idf and MultiLabelBinarizer are used together to prepare train and test dataset from csv file.
Since Logistic Regression and SVM don't support multi-class classification, OneVsRestClassifier is used.
The One-vs-Rest strategy splits a multi-class classification into one binary classification problem per class.

Hate Speech Classification

In Tensorflow, Convolutional Neural Network is used to perform hate speech classification.
Dataset hate_speech_data.csv can be reached via https://raw.githubusercontent.com/laxmimerit/hate_speech_dataset/master/data.csv.
First data preprocessing is realized. After that, since the dataset is not huge, only one layer CNN model is built.
Using from tensorflow.keras.preprocessing.text import Tokenizer, numerical values are assigned to words in dictionary format so that each word has its own word_index.
Each sentence will be converted into a sequence where each word is replaced by its number in the word index using tokenizer.texts_to_sequences(sentences).

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Hate Speech Classification.ipynb		Hate Speech Classification.ipynb
Multi-Label Text Classification.ipynb		Multi-Label Text Classification.ipynb
README.md		README.md
Spam Text Classification.ipynb		Spam Text Classification.ipynb
hate_speech_data.csv		hate_speech_data.csv
spam.tsv		spam.tsv
svm_multilabel.pkl		svm_multilabel.pkl
tfidf-multilabel.pkl		tfidf-multilabel.pkl
tokens.pkl		tokens.pkl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hate Speech Classification.ipynb

Hate Speech Classification.ipynb

Multi-Label Text Classification.ipynb

Multi-Label Text Classification.ipynb

README.md

README.md

Spam Text Classification.ipynb

Spam Text Classification.ipynb

hate_speech_data.csv

hate_speech_data.csv

spam.tsv

spam.tsv

svm_multilabel.pkl

svm_multilabel.pkl

tfidf-multilabel.pkl

tfidf-multilabel.pkl

tokens.pkl

tokens.pkl

Repository files navigation

Spam Text Classification

Multi-Label Text Classification

Hate Speech Classification

About

Releases

Packages

Languages

seroetr/Spam_MultiLabel_Text_Classification

Folders and files

Latest commit

History

Repository files navigation

Spam Text Classification

Multi-Label Text Classification

Hate Speech Classification

About

Topics

Resources

Stars

Watchers

Forks

Languages