# **Lab 4: Adversarial Attacks Against Machine Learning Based SpamFilters**

Please Type the Names of the Team Members:

## **Introduction**
Machine learning-based spam detection models learn from a set of labeled training data and detect spam emails after the training phase. We study a class of vulnerabilities of such detection models, where the attack can manipulate a trained model, e.g., a SVM classifier, to misclassify maliciously crafted spam emails at the detection phase. However, very often feature extraction methods make it very difficult to translate the change in the feature space to that in the textual email space. This lab uses a new attack method of making guided changes to text data by taking advantage of generated adversarial examples that purposely modify the TF-IDF vetor representing an email. We identify a set of "magic words", or malicious words, to be added to a spam email, which can cause desirable misclassifications by classifiers.

For more information on this method, you can refer to the following publications:

(1) J. He, Q. Cheng, and X. Li, “Understanding the Impact of Bad Words 
on Email Management through Adversarial Machine Learning,” SIG-KM International Research Symposium 2021, Virtual Event, The University of North Texas, September 29, 2021. [Download](https://isi.jhu.edu/wp-content/uploads/2021/10/Bad-Words-He-Cheng-Li-Rev.pdf)

(2) C. Wang, D. Zhang, S. Huang, X. Li, and L. Ding, “Crafting Adversarial Email Content against Machine Learning Based Spam Email Detection,” In Proceedings of the 2021 International Symposium on Advanced Security on Software and Systems (ASSS ’21) with AsiaCCS 2021, Virtual Event, Hong Kong, June 7, 2021. [Download](https://isi.jhu.edu/wp-content/uploads/2021/04/ASSS_Workshop_Paper.pdf
) 

## **1. Loading Dataset**
The dataset we will be using is called Ling-Spam. The Ling-Spam dataset is a collection of 2,893 spam and non-spam messages curated from the Linguist List. These messages focus on linguistic interests around job postings, research opportunities and software discussion.
### Acknowledgements
All acknowledgements go to the original authors of A Memory-Based Approach to Anti-Spam Filtering for Mailing Lists. The dataset was made publicly available as a part of that paper. \\
**Run the code block below:**

(choose the message.csv to upload. Wait until it shows 100% before you continue.)

In [None]:
import pandas as pd
from google.colab import files
uploaded = files.upload()

Saving messages.csv to messages.csv


**Run the code block below:**

In [None]:
from sklearn.model_selection import train_test_split


def data_extraction():
  # change the 'message.csv' to the file names you uploaded
  df = pd.read_csv('messages.csv')
  x = df.message
  y = df.label
  x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=99)
  return x_train, x_test, y_train, y_test

With the code section below, we extracted the dataset. \\
**Run the code block below:**

In [None]:
x_train, x_test, y_train, y_test = data_extraction()
print(x_train)

2501    call for squibs > from lisa cheng and rint syb...
2294    hello , my name is kevin elphick and i am the ...
2204    greetings to the list , i wish to thank everyo...
2628    editor 's note : we recently posted informatio...
1579    call for participation a workshop on minimizin...
                              ...                        
1092    this is a one time mailing , if you are not in...
1768    / / / / / / / / / / / / / / / / / / / / / / / ...
1737    this does n't quite qualify , but ' overlook '...
1209    a colloquium on translation is being proposed ...
641     call for participation epia ' 97 8th portugues...
Name: message, Length: 2314, dtype: object


In the code block above, we have read the dataset into variables x 
and y. Variable x contains the email messages and variable y contains
 the class labels with 0 being ham and 1 being spam. 

## **2. Preprocessing the Emails**
For the emails we used, we need to removed all the HTML tags, numbers, punctuation marks, and English stop words in ordser to keep only useful information. We also need to converted all the words to their lowercase forms and each paragraph into a single line instead of multiple lines. In the last step of data preprocessing, we will conduct stemming on all the words. \\
**Run the code block below:**

In [None]:
import re
import string
from nltk.tokenize import word_tokenize
from sklearn.feature_extraction._stop_words import ENGLISH_STOP_WORDS
from nltk.stem import PorterStemmer
from nltk.stem import WordNetLemmatizer
import nltk
nltk.download('punkt')
nltk.download('wordnet')

def remove_hyperlink(word):
  return re.sub(r"http\S+", " ", word)


def to_lower(word):
    result = word.lower()
    return result


def remove_number(word):
    result = re.sub(r'\d+', ' ', word)
    return result


def remove_punctuation(word):
    result = word.translate(str.maketrans(dict.fromkeys(string.punctuation)))
    return result


def remove_whitespace(word):
    result = word.strip()
    return result


def replace_newline(word):
    return word.replace('\n', ' ')


def clean_up_pipeline(sentence):
    cleaning_utils = [remove_hyperlink,replace_newline,to_lower,
                      remove_number,
                      remove_punctuation,
                      remove_whitespace]
    for o in cleaning_utils:
        sentence = o(sentence)
    return sentence


def remove_stop_words(words):
    result = [i for i in words if i not in ENGLISH_STOP_WORDS]
    return result


def word_stemmer(words):
    stemmer = PorterStemmer()
    return [stemmer.stem(o) for o in words]


def word_lemmatizer(words):
    lemmatizer = WordNetLemmatizer()
    return [lemmatizer.lemmatize(o) for o in words]


def clean_token_pipeline(words):
    cleaning_utils = [remove_stop_words, word_lemmatizer]
    for o in cleaning_utils:
        words = o(words)
    return words


def preprocess(x_train, x_test):
    x_train = [clean_up_pipeline(o) for o in x_train]
    x_test = [clean_up_pipeline(o) for o in x_test]
    x_train = [word_tokenize(o) for o in x_train]
    x_test = [word_tokenize(o) for o in x_test]
    x_train = [clean_token_pipeline(o) for o in x_train]
    x_test = [clean_token_pipeline(o) for o in x_test]
    x_train = [" ".join(o) for o in x_train]
    x_test = [" ".join(o) for o in x_test]
    return x_train, x_test


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Unzipping corpora/wordnet.zip.


With the code section below, we preprocessed the dataset. \\
**Run the code block below:**

In [None]:
x_train, x_test = preprocess(x_train, x_test)
print(x_train[0])

squib lisa cheng rint sybesma editor glot international year glot international start featuring squib section invite everybody send squib subject field theoretical linguistics appear monthly production time relatively short able publish squib soon acceptance review procedure set geared losing little time possible squib squib squib inspire present idea fleshed time connection fact thought related spell beginning new analysis necessarily daring new fact old language old fact new guise come beautiful observation theoretically relevant tell wonderful problem possibly hint solution length page glot international word including reference interested submitting squib send hard copy soft copy address sending consult guideline author web site www hagpub com glot htm send email prefer receive guideline email regular mail address email glot rullet leidenuniv nl regular mail lisng rint sybesma glot international department general linguistics leiden university p o box ra leiden netherlands lisa che

## **3. Feature Extraction**
In this step, we convert the text content of an email into a numerical feature vector, representing information of that email used for classification. There are many vectorization methods to convert text data into numerical vectors. In this lab, we will use TF-IDF, modified Word2vec, and Modified Doc2vec to do feature extraction of all the words in an email. \\
**Run the code block below:**

In [None]:
from sklearn.feature_extraction.text import TfidfVectorizer
from gensim.models import Word2Vec
from gensim.models import Doc2Vec
import numpy as np
import pandas as pd
from scipy import sparse
from sklearn.preprocessing import MinMaxScaler
from gensim.models.doc2vec import TaggedDocument


vectorizer = TfidfVectorizer()


def convert_to_feature(raw_tokenize_data):
    raw_sentences = [' '.join(o) for o in raw_tokenize_data]
    return vectorizer.transform(raw_sentences)


def TfidfConvert(x_train, x_test):
    x_train = [o.split(" ") for o in x_train]
    x_test = [o.split(" ") for o in x_test]
    raw_sentences = [' '.join(o) for o in x_train]
    vectorizer.fit(raw_sentences)
    x_train_features = convert_to_feature(x_train)
    x_test_features = convert_to_feature(x_test)
    return x_train_features, x_test_features


def getUniqueWords(allWords):
    uniqueWords = []
    for i in allWords:
        if i not in uniqueWords:
            uniqueWords.append(i)
    return uniqueWords


def input_split(x):
    new_x = []
    for line in x:
        newline = line.split(' ')
        new_x.append(newline)
    return new_x


def getUniqueWords(allWords):
    uniqueWords = []
    for i in allWords:
        if i not in uniqueWords:
            uniqueWords.append(i)
    return uniqueWords


def x2vec(input_x, feature_names, model):
    x_features = []
    for index in input_x:
        model_vector = [0] * len(feature_names)

        for token in index:
            if token in feature_names:
                feature_index = feature_names.index(token)

                if model.wv.has_index_for(token):
                    token_vecs = model.wv.get_vector(token)
                    model_vector[feature_index] = token_vecs[0]
        x_features.append(model_vector)
    return x_features


def single_transform(x, method, feature_model, feature_names, scaler, selection_model):
    if method == 'TFIDF':

        result = feature_model.transform(x)
        if selection_model != 'NaN':
            result = selection_model.transform(result)
        return result
    else:
        temp_x = x.values
        temp_x = temp_x[0].split(' ')
        model_vector = [0] * len(feature_names)
        for token in temp_x:
            if token in feature_names:
                feature_index = feature_names.index(token)
                if feature_model.wv.has_index_for(token):
                    token_vecs = feature_model.wv.get_vector(token)
                    model_vector[feature_index] = token_vecs[0]
        x_features = [model_vector]
        x_features = scaler.transform(x_features)
        x_train_features = sparse.csr_matrix(x_features)
        if selection_model != 'NaN':
            x_train_features = selection_model.transform(x_train_features)
        return x_train_features


def feature_extraction(x_train, x_test, method):

    if method == 'TFIDF':
        x_train_features, x_test_features = TfidfConvert(x_train, x_test)
        feature_names = vectorizer.get_feature_names_out()

        return x_train_features, x_test_features, feature_names, vectorizer, 'NaN'

    if method == 'word2vec':
        temp_x_train = input_split(x_train)
        temp_x_test = input_split(x_test)
        model_train = Word2Vec(temp_x_train, vector_size=1)
        feature_space = []
        for index in temp_x_train:
            feature_space = feature_space + getUniqueWords(index)
        feature_names = getUniqueWords(feature_space)
        x_train_features = x2vec(temp_x_train, feature_names, model_train)
        x_test_features = x2vec(temp_x_test, feature_names, model_train)
        x_train_features = np.array(x_train_features)
        x_test_features = np.array(x_test_features)
        pd.DataFrame(x_train_features).to_csv("x_train_features.csv", header=None, index=False)
        pd.DataFrame(x_test_features).to_csv("x_test_features.csv", header=None, index=False)
        scaler = MinMaxScaler()
        scaler.fit(x_train_features)
        x_train_features = scaler.transform(x_train_features)
        x_test_features = scaler.transform(x_test_features)
        x_train_features = sparse.csr_matrix(x_train_features)
        x_test_features = sparse.csr_matrix(x_test_features)
        return x_train_features, x_test_features, feature_names, model_train, scaler

    if method == 'doc2vec':
        temp_x_train = input_split(x_train)
        temp_x_test = input_split(x_test)
        documents = [TaggedDocument(doc, [i]) for i, doc in enumerate(temp_x_test)]
        model_train = Doc2Vec(documents, vector_size=1)
        feature_space = []
        for index in temp_x_train:
            feature_space = feature_space + getUniqueWords(index)
        feature_names = getUniqueWords(feature_space)
        x_train_features = x2vec(temp_x_train, feature_names, model_train)
        x_test_features = x2vec(temp_x_test, feature_names, model_train)
        scaler = MinMaxScaler()
        scaler.fit(x_train_features)
        x_train_features_scaled = scaler.transform(x_train_features)
        x_test_features_scaled = scaler.transform(x_test_features)
        x_train_features = sparse.csr_matrix(x_train_features_scaled)
        x_test_features = sparse.csr_matrix(x_test_features_scaled)
        return x_train_features, x_test_features, feature_names, model_train, scaler

  (0, 46349)	0.02621363735628304
  (0, 46104)	0.044243412442949326
  (0, 45930)	0.023335189665615534
  (0, 45897)	0.05871887896488733
  (0, 45259)	0.026876570150263467
  (0, 43511)	0.03332411093642023
  (0, 41716)	0.06961103844466418
  (0, 41592)	0.03875217458178802
  (0, 41445)	0.059961608008199815
  (0, 41444)	0.031870003742657305
  (0, 41069)	0.03690291355772966
  (0, 40436)	0.18193988855449547
  (0, 39890)	0.04905362530462098
  (0, 39851)	0.027435748736838014
  (0, 39317)	0.03730012147404211
  (0, 39167)	0.540230402858367
  (0, 38954)	0.059961608008199815
  (0, 38657)	0.03603358217054158
  (0, 38594)	0.0457128349501382
  (0, 38519)	0.06138243821681908
  (0, 38143)	0.028945471752544888
  (0, 37728)	0.03540022191725924
  (0, 37401)	0.03247438138722991
  (0, 37251)	0.04228239275638995
  (0, 37244)	0.06912511215074625
  :	:
  (0, 14601)	0.07996732569784008
  (0, 14313)	0.029211044385985103
  (0, 14114)	0.05710524903351722
  (0, 14056)	0.02092300773253752
  (0, 13822)	0.1009938832123625

With the code section below, we extracted the TFIDF values of all the emails in the dataset. \\
**Run the code block below:**

In [None]:
method = "TFIDF"
x_train_features, x_test_features, feature_names, feature_model, scalar = feature_extraction(x_train, x_test, method)
print(x_train_features[0])

  (0, 46349)	0.02621363735628304
  (0, 46104)	0.044243412442949326
  (0, 45930)	0.023335189665615534
  (0, 45897)	0.05871887896488733
  (0, 45259)	0.026876570150263467
  (0, 43511)	0.03332411093642023
  (0, 41716)	0.06961103844466418
  (0, 41592)	0.03875217458178802
  (0, 41445)	0.059961608008199815
  (0, 41444)	0.031870003742657305
  (0, 41069)	0.03690291355772966
  (0, 40436)	0.18193988855449547
  (0, 39890)	0.04905362530462098
  (0, 39851)	0.027435748736838014
  (0, 39317)	0.03730012147404211
  (0, 39167)	0.540230402858367
  (0, 38954)	0.059961608008199815
  (0, 38657)	0.03603358217054158
  (0, 38594)	0.0457128349501382
  (0, 38519)	0.06138243821681908
  (0, 38143)	0.028945471752544888
  (0, 37728)	0.03540022191725924
  (0, 37401)	0.03247438138722991
  (0, 37251)	0.04228239275638995
  (0, 37244)	0.06912511215074625
  :	:
  (0, 14601)	0.07996732569784008
  (0, 14313)	0.029211044385985103
  (0, 14114)	0.05710524903351722
  (0, 14056)	0.02092300773253752
  (0, 13822)	0.1009938832123625

### **Question 1**
Look up the information of Word2vec and Doc2vec online and describe what it does in your own words using one paragraph. 

## **4. Training SVM**
In this section, we will train a Support Vector Machine (SVM) as an spam filter. \\
**Run the code block below:**

In [None]:
!pip install secml
from secml.data import CDataset
from secml.data.splitter import CDataSplitterKFold
from secml.ml.classifiers import CClassifierSVM
from secml.ml.peval.metrics import CMetricAccuracy
from secml.ml.peval.metrics import CMetricConfusionMatrix
from secml.adv.attacks.evasion import CAttackEvasionPGD
import numpy as np
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
# from Feature_extraction import single_transform
import csv
from statistics import mean, stdev
import threading
import time


def train_test_SVM(x_train_features, x_test_features, y_train, y_test):
    tr_set = CDataset(x_train_features, y_train)
    # Train the SVM
    print("Build SVM")
    xval_splitter = CDataSplitterKFold()
    clf_lin = CClassifierSVM()
    xval_lin_params = {'C': [1]}
    print("Find the best params")
    best_lin_params = clf_lin.estimate_parameters(
        dataset=tr_set,
        parameters=xval_lin_params,
        splitter=xval_splitter,
        metric='accuracy',
        perf_evaluator='xval'
    )
    print("Finish Train")
    print("The best training parameters are: ", [
          (k, best_lin_params[k]) for k in sorted(best_lin_params)])
    print("Train SVM")
    clf_lin.fit(tr_set.X, tr_set.Y)

    # Test the Classifier
    ts_set = CDataset(x_test_features, y_test)
    y_pred = clf_lin.predict(ts_set.X)
    metric = CMetricAccuracy()
    acc = metric.performance_score(y_true=ts_set.Y, y_pred=y_pred)
    confusion_matrix = CMetricConfusionMatrix()
    cm = confusion_matrix.performance_score(y_true=ts_set.Y, y_pred=y_pred)
    print("Confusion Matrix: ")
    print(cm)
    return tr_set, ts_set, clf_lin

Collecting secml
  Downloading secml-0.15-py3-none-any.whl (463 kB)
[K     |████████████████████████████████| 463 kB 4.0 MB/s 
Installing collected packages: secml
Successfully installed secml-0.15
2022-04-15 03:30:52,413 - secml.settings - INFO - New `SECML_HOME_DIR` created: /root/secml-data
2022-04-15 03:30:52,413 - secml.settings - INFO - New `SECML_HOME_DIR` created: /root/secml-data
2022-04-15 03:30:52,417 - secml.settings - INFO - Default configuration file copied to: /root/secml-data/secml.conf
2022-04-15 03:30:52,417 - secml.settings - INFO - Default configuration file copied to: /root/secml-data/secml.conf
2022-04-15 03:30:52,424 - secml.settings - INFO - New `SECML_DS_DIR` created: /root/secml-data/datasets
2022-04-15 03:30:52,424 - secml.settings - INFO - New `SECML_DS_DIR` created: /root/secml-data/datasets
2022-04-15 03:30:52,432 - secml.settings - INFO - New `SECML_MODELS_DIR` created: /root/secml-data/models
2022-04-15 03:30:52,432 - secml.settings - INFO - New `SECML_

With the code section below, we trained an SVM classifier using TFIDF values extracted. \\
**Run the code block below:**

In [None]:
tr_set, ts_set, clf_lin = train_test_SVM(x_train_features, x_test_features, y_train, y_test)

Build SVM
Find the best params
Finish Train
The best training parameters are:  [('C', 1)]
Train SVM
Confusion Matrix: 
CArray([[483   1]
 [  3  92]])


## **5. PGD Attack**
Our approach is based on successful adversarial perturbations made to model input features. We employ the Projected Gradient Descent (PGD) method as an attack method to modify the feature values for desirable adversarial examples in the feature domain. PGD method is considered as one of the most powerful first-order adversaries. PGD algorithm iteratively finds the needed changes with a constraint, *dmax*, which is the Euclidean distance to the original input indicating the allowed level of perturbations, to achieve the maximum loss in classification. In our approach, we run PGD over a set of spam emails in iterations and generate adversarial examples in the feature space. Then we test these modified featuer vectors to see whether they could successfully bypass the detection. \\
**Run the code block below:**

In [None]:
def pdg_attack(clf_lin, tr_set, ts_set, y_test, feature_names, nb_attack, dmax, lb, ub):

    class_to_attack = 1
    cnt = 0  # the number of success adversaril examples

    ori_examples2_x = []
    ori_examples2_y = []

    for i in range(nb_attack):
        # take a point at random being the starting point of the attack
        idx_candidates = np.where(y_test == class_to_attack)
        # select nb_init_pts points randomly in candidates and make them move
        rn = np.random.choice(idx_candidates[0].size, 1)
        x0, y0 = ts_set[idx_candidates[0][rn[0]],
                        :].X, ts_set[idx_candidates[0][rn[0]], :].Y

        x0 = x0.astype(float)
        y0 = y0.astype(int)
        x2 = x0.tondarray()[0]
        y2 = y0.tondarray()[0]

        ori_examples2_x.append(x2)
        ori_examples2_y.append(y2)

    # Perform adversarial attacks
    noise_type = 'l2'  # Type of perturbation 'l1' or 'l2'
    y_target = 0
    # dmax = 0.09  # Maximum perturbation

    # Bounds of the attack space. Can be set to `None` for unbounded
    solver_params = {
        'eta': 0.01,
        'max_iter': 1000,
        'eps': 1e-4}

    # set lower bound and upper bound respectively to 0 and 1 since all features are Boolean
    pgd_attack = CAttackEvasionPGD(
        classifier=clf_lin,
        double_init_ds=tr_set,
        distance=noise_type,
        dmax=dmax,
        lb=lb, ub=ub,
        solver_params=solver_params,
        y_target=y_target
    )

    ad_examples_x = []
    ad_examples_y = []
    ad_index = []
    cnt = 0
    for i in range(len(ori_examples2_x)):
        x0 = ori_examples2_x[i]
        y0 = ori_examples2_y[i]
        y_pred_pgd, _, adv_ds_pgd, _ = pgd_attack.run(x0, y0)
        if y_pred_pgd.item() == 0:
            cnt = cnt + 1
            ad_index.append(i)

        ad_examples_x.append(adv_ds_pgd.X.tondarray()[0])
        ad_examples_y.append(y_pred_pgd.item())

        attack_pt = adv_ds_pgd.X.tondarray()[0]
    print("PGD attack successful rate:", cnt / nb_attack)
    startTime2 = time.time()
    ori_examples2_x = np.array(ori_examples2_x)
    ori_examples2_y = np.array(ori_examples2_y)
    ad_examples_x = np.array(ad_examples_x)
    ad_examples_y = np.array(ad_examples_y)

    ori_dataframe = pd.DataFrame(ori_examples2_x, columns=feature_names)
    ad_dataframe = pd.DataFrame(ad_examples_x, columns=feature_names)

    # extract the success and fail examples
    ad_dataframe['ad_label'] = ad_examples_y
    ad_success = ad_dataframe.loc[ad_dataframe.ad_label == 0]
    ori_success = ori_dataframe.loc[ad_dataframe.ad_label == 0]
    ad_fail = ad_dataframe.loc[ad_dataframe.ad_label == 1]
    ori_fail = ori_dataframe.loc[ad_dataframe.ad_label == 1]

    ad_success_x = ad_success.drop(columns=['ad_label'])
    ad_fail_x = ad_fail.drop(columns=['ad_label'])

    result = (ad_success_x - ori_success)
    ori_dataframe.to_csv('ori_dataframe.csv')
    ad_dataframe.to_csv('ad_dataframe.csv')
    result.to_csv('result.csv')
    return result, cnt, ad_success_x, ori_dataframe, ori_examples2_y, cnt/nb_attack

With the code section below, we run PGD attacks on the trained classifier with 100 spam emails and 0.06 dmax \\
**Run the code block below:**

In [None]:
lb = np.ndarray.min(x_train_features.toarray())
ub = np.ndarray.max(x_train_features.toarray())
attack_amount = 100
dmax = 0.06
result, cnt, ad_success_x, ori_dataframe, ori_examples2_y, successful_rate = pdg_attack(clf_lin, tr_set, ts_set, y_test, feature_names, attack_amount, dmax, lb, ub)

PGD attack successful rate: 0.17


## **6. Magical Words**
Adversarial emails are crafted by adding “magic words” to the original spam emails. The “magic words” are identified by intersecting the unique ham words with the “top words”. Specifically,  the  unique  ham  words  are  the  word  that  only appeared  in  ham  emails  but  not in  spam  emails.  After the  PGD  attack on the set of spam emails,  we find which features are modified to the largest extent to bypass the detection. We then select the “top words” whose features have been changed the most by the PGD attack. (The changes are measured using the variance of TF-IDF differences before and after the PGD perturbation over these spam emails.) In  our  experiments,  we  use  the  top  100  words,  which  is relatively  efficient. This  set  is  relatively  small  and demonstrates a high success rate by using the magic words to fool the classifier. \\
**Run the code block below:**

In [None]:
def magical_word(x_train, x_test, y_train, y_test, result, cnt):
    # Method 2
    x2result1 = result
    x2result1 = np.array(x2result1)
    x2result = result
    x2result = x2result.multiply(x2result1)

    sum_number = x2result.sum() / cnt
    sum_number = pd.DataFrame(sum_number, columns=['sum_number'])
    sum_number = sum_number.sort_values(
        by='sum_number', ascending=False, inplace=False)

    sum_number_pd = pd.DataFrame(sum_number.index[:100])
    sum_number_pd.to_csv("x2result.csv")
    d = {'message': x_train, 'label': y_train}
    df = pd.DataFrame(data=d)
    d1 = {'message': x_test, 'label': y_test}
    df1 = pd.DataFrame(data=d1)
    frames = [df, df1]
    messages = pd.concat(frames)
    messages.to_csv("messages.csv")
    spam = messages[messages.label == 1]
    ham = messages[messages.label == 0]

    # Tf-idf for spam datasets
    vect_spam = TfidfVectorizer()
    vect_spam.fit_transform(spam['message'])
    header_spam = vect_spam.get_feature_names_out()

    # Tf-idf for ham datasets
    vect_ham = TfidfVectorizer()
    vect_ham.fit_transform(ham['message'])
    header_ham = vect_ham.get_feature_names_out()

    # find unique ham words
    ham_unique = list(set(header_ham).difference(set(header_spam)))
    header_ham1 = pd.DataFrame(ham_unique)
    header_ham1.to_csv("ham_unique.csv")

    with open("x2result.csv", "r") as csvfile:
        reader = csv.reader(csvfile)
        top100_features = []
        for row in reader:
            top100_features.append(row[1])
    top100_features = top100_features[1:]
    # in ham & top100

    ham_unique_in_top = list(
        set(ham_unique).intersection(set(top100_features)))
    words14str = ""
    for item in ham_unique_in_top:
        words14str = words14str + " " + item
    return words14str, spam, ham

With the code section below, we identifed a set of magic word given the successful perturbations. \\
**Run the code block below:**

In [None]:
words14str, spam, ham = magical_word(x_train, x_test, y_train, y_test, result, cnt)
print(words14str)

 benjamin discourse vt vowel ipa ohayosensei ammondt context phonetic workshop cascadilla gala grammar translation linguist bralich linguistic phonology chorus elra sentence risked


## **7. Crafting Adversarial Emails & Attacking SVM**
After we find the magical words, we then insert them back to the original spam emails. This proccess is what we called "crafting adversarial emails". Then, we feed the new feature vectors of these crafted emails to the SVM classifier to see if they would be misclassified as ham emails.  \\
**Run the code block below:**


In [None]:
m2_empty = pd.DataFrame()
spam_cnt = 0
threads = []
m2_empty_l1 = pd.DataFrame()
m2_empty_l2 = pd.DataFrame()
m2_empty_l3 = pd.DataFrame()
m2_empty_l4 = pd.DataFrame()
m2_list = [m2_empty_l1, m2_empty_l2, m2_empty_l3, m2_empty_l4]

def single_transform(x, method, feature_model, feature_names, scaler, selection_model):
  result = feature_model.transform(x)
  if selection_model != 'NaN':
    result = selection_model.transform(result)
  return result

class myThread(threading.Thread):

    def __init__(self, threadID, name, spam_message, words14str, method, feature_model, feature_names, scaler, clf_lin, list_index, selection_model):
        threading.Thread.__init__(self)
        self.threadID = threadID
        self.name = name
        self.spam_message = spam_message
        self.words14str = words14str
        self.method = method
        self.feature_model = feature_model
        self.feature_names = feature_names
        self.scaler = scaler
        self.clf_lin = clf_lin
        self.list_index = list_index
        self.lock = threading.Lock()
        self.selection_model = selection_model

    def run(self):
        global spam_cnt
        print("Starting " + self.name)
        spam_cnt_1 = m2_empty_out(self.name, self.spam_message, self.words14str, self.method,
                                  self.feature_model, self.feature_names, self.scaler, self.clf_lin,
                                  self.list_index, self.selection_model)
        spam_cnt = spam_cnt+spam_cnt_1
        time.sleep(0.1)
        print("Exiting " + self.name)


def m2_empty_out(name, spam_message, words14str, method, feature_model, feature_names, scaler, clf_lin, list_index, selection_model):
    m2_empty_1 = pd.DataFrame()
    spam_cnt_1 = 0
    global m2_list

    for j in spam_message.message:
        choose_email = [j + words14str]
        message_14_email = pd.DataFrame(choose_email, columns=["message"])
        message_14_tf_idf = single_transform(
            message_14_email["message"], method, feature_model, feature_names, scaler, selection_model)
        message_14_tf_idf = pd.DataFrame(
            message_14_tf_idf.toarray(), columns=feature_names)
        message_14_y = [1]
        message_14_y = pd.Series(message_14_y)
        message_CData = CDataset(message_14_tf_idf, message_14_y)
        message_14_pred = clf_lin.predict(message_CData.X)

        if message_14_pred == 0:
            spam_cnt_1 = spam_cnt_1 + 1
            m2_empty_1 = m2_empty_1.append(
                message_14_tf_idf, ignore_index=True)

    m2_list[list_index] = m2_list[list_index].append(
        m2_empty_1, ignore_index=True)

    return spam_cnt_1



def svm_attack(method, clf_lin, spam, words14str, feature_model, feature_names, scaler, selection_model):

    global m2_empty

    spam_messages = np.array_split(spam, 4)
    print("Start processing message")
    thread1 = myThread(1, "Thread-1", spam_messages[0], words14str,
                       method, feature_model, feature_names, scaler, clf_lin, 0, selection_model)
    thread2 = myThread(2, "Thread-2", spam_messages[1], words14str,
                       method, feature_model, feature_names, scaler, clf_lin, 1, selection_model)
    thread3 = myThread(3, "Thread-3", spam_messages[2], words14str,
                       method, feature_model, feature_names, scaler, clf_lin, 2, selection_model)
    thread4 = myThread(4, "Thread-4", spam_messages[3], words14str,
                       method, feature_model, feature_names, scaler, clf_lin, 3, selection_model)
    threads.append(thread1)
    threads.append(thread2)
    threads.append(thread3)
    threads.append(thread4)
    for t in threads:
        t.start()
    for t in threads:
        t.join()

    m2_empty = m2_empty.append(m2_list[0], ignore_index=True)
    m2_empty = m2_empty.append(m2_list[1], ignore_index=True)
    m2_empty = m2_empty.append(m2_list[2], ignore_index=True)
    m2_empty = m2_empty.append(m2_list[3], ignore_index=True)

    print("Exiting Main Thread")
    print('White box attack with length on SVM:')
    print('Number of samples provided:', len(spam))
    print('Number of crafted sample that got misclassified:', spam_cnt)
    print('Successful rate:', spam_cnt / len(spam))

    return m2_empty

With the code section below, we crafted a set of spam emails and feed them back to the trained classifier to see if they can bypass. \\
Run the code block below:

In [None]:
m2_empty = svm_attack('TFIDF', clf_lin, spam,words14str, feature_model, feature_names, scalar, 'NaN')

Start processing message
Starting Thread-1Starting Thread-2

Starting Thread-3Starting Thread-4

Exiting Thread-3
Exiting Thread-1
Exiting Thread-4
Exiting Thread-2
Exiting Main Thread
White box attack with length on SVM:
Number of samples provided: 481
Number of crafted sample that got misclassified: 186
Successful rate: 0.3866943866943867


## **Tasks**
### **Task 1** ### 
Integrate the step 1 - 7 above into one function in the below code block.
This function should only have two inputs, with the first being the method we use for feature extraction, and the dmax we would use for PGD attacks. This function should return the set of magic word identified and print out the success rate for step 7. \\
Hint: You can change the method of feature extraction by changing the value of the "method" variable.

In [1]:
def all_in_one(FE_method, PGD_dmax):
  # inject your code here:
  return magic_word

### **Task 2** ###
Using the function you wrote for task 1, run it for 5 times with dmax being 0.02, 0.04, 0.06, 0.08, and 0.1 respectively for each time the feature extraction method being TF-IDF, modified word2vec, and modified doc2vec. Record the magic word attack success rate and the magic word for each time and fill in the table below by changing the "dmax =" with the actual success rate:


TF-IDF  | Word2vec | Doc2vec
-------------------|------------------|------------------
dmax = 0.02| dmax = 0.02 | dmax = 0.02
dmax = 0.04| dmax = 0.04 | dmax = 0.04
dmax = 0.06| dmax = 0.06 | dmax = 0.06
dmax = 0.08| dmax = 0.08 | dmax = 0.08
dmax = 0.1| dmax = 0.1 | dmax = 0.1





### **Task 3** ###
Draw a graph with the x axis being dmax and the y axis being success rates, and answer the questions below: \\


1.   Which feature extraction method has the single highest magic word attack success rate?
2.   Which feature extraction method do you think is better for magic word attack and why?



In [None]:
# Code for drawing graph

Your answers to the two questions

### **Task 4** ###
Do the following:
1. Select a set of maigc word with the highest success rate in task 2
2. Train an KNN classifier with dataset provided and the same featuer extraction method you used to obtain the set of magic word you selected. You can use any parameter for the KNN classifier.
3. Pick 5 spam emails and add the magic words to the end of them. Convert them back to feature vectors and feed them to the KNN you trained. How many of the 5 spam emails got missclassified?

In [None]:
## Inject your code here