# spam_tweet_detector

Utilizing 3 machine learning models, SVM, LSTM and Naive Bayes, to identify 'Spam' or 'Quality' tweet.

The models have been trained in advance. If you want to see the Project itself including how the model architectures were chosen and deep analysis on performance, have a look at this [repo](https://github.com/rodonguyen/showcase_AI_ML/tree/master/1.%20Twitter%20Spam%20Detection).

# Your input

Input values are in the code block below.

Instructions:
- CHOOSE A TWEET BY UNCOMMENTING ONE, more data in ./train.csv
- CHANGE THE following, followers, actions, is_retweet VALUES IF YOU KNOW WHAT THEY ARE :)

In [8]:
### CHOOSE A TWEET BY UNCOMMENTING ONE

tweet_content = "Big day.  #WeTheNorth #yyz #thesix #sunset #skyline @ The Six https://www.instagram.com/p/BFgrA9gBZay/"  # Quality example
# tweet_content = "Eastside Ivo (@eastside_ivo) came through Real Talk on Sunday. Sorry we couldn't be more… https://www.instagram.com/p/BFgq0pRCzfS/"  # Quality example
# tweet_content = "Is Hyperloop the future of travel? http://fb.me/3QdMfJDWC"  # Quality example
# tweet_content = "FDNY Fighting Blaze at Multiple Bronx Homes https://t.co/KAu9TfZuNX https://t.co/Cu6vRPWC30"  # Spam example
# tweet_content = "US jet shoots down Iranian-made drone https://t.co/TyDNtlFnJF https://t.co/z8gJ1fK3QX"  # Spam example
# tweet_content = "Jerry Sandusky denies child molestation charges in court https://t.co/YaAlPIhnve"  # Spam example
# tweet_content = "Oh damn!! https://t.co/KgGPYXogqf https://t.co/aLrDErjv2U"  # Spam example


### CHANGE THESE VALUES IF YOU KNOW WHAT THEY ARE

following = 4743    # The number of followings of the tweet owner, 4743 is mean value
followers = 366142  # The number of followers of the tweet owner, 366142 is mean value
actions = 7232      # The total number of favourites, replies, and retweets associated with the tweet. 7232 is mean value
is_retweet = 0      # 1 if the tweet is a retweet. Otherwise, 0.

# Prepare

In [9]:
# Importing libraries
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
import utils, utils_naive_bayes, utils_svm, pandas, utils_lstm
# sys.modules.pop('utils')

# Load Model
lstm_model = utils_lstm.load_LSTM_model()
nb_model = utils_naive_bayes.load_naive_bayes_model()
svm_model = utils_svm.load_svm_model()

# Prepare data
lstm_tweet_tensor, lstm_others_col_std = utils_lstm.preprocessing_input([[tweet_content, following, followers, actions, is_retweet]])
nb_input = utils_naive_bayes.preprocess_input([tweet_content])
svm_input = utils_svm.preprocess_input([[tweet_content, following, followers, actions, is_retweet]])

# Predict

In [10]:
prediction = pandas.DataFrame(
  [
    ['LSTM', (lstm_model.predict([lstm_tweet_tensor, lstm_others_col_std]) >= 0.5)],
    ['Naive Bayes', nb_model.predict(nb_input)[0]],
    ['SVM', svm_model.predict(svm_input)[0]]
  ], 
  columns=['model', 'prediction'])
  
prediction[prediction == True] = "Spam"
prediction[prediction == False] = "Quality"
prediction



Unnamed: 0,model,prediction
0,LSTM,Quality
1,Naive Bayes,Quality
2,SVM,Quality
