# Classifying Wikipedia Comments

***

Nathaniel Haddad - Northeastern University - 2019

CS5100 - Foundation of Artificial Intelligence

*Python 3*

# Building a classifier for personal attacks
In this section we will train a simple bag-of-words classifier for personal attacks using the [Wikipedia Talk Labels: Personal Attacks]() data set.

In [1]:
import pandas as pd
import urllib
from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score
from sklearn.metrics import confusion_matrix
from sklearn.metrics import precision_recall_fscore_support
from sklearn.metrics import precision_recall_curve
from sklearn.feature_extraction.text import TfidfVectorizer
from inspect import signature
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.feature_extraction.text import HashingVectorizer
from sklearn.neural_network import MLPClassifier
# from google.colab import drive
import nltk
from nltk.corpus import stopwords
# from spellchecker import SpellChecker
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegressionCV
from sklearn import metrics
from sklearn.pipeline import FeatureUnion
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier

In [2]:
!pip install pyspellchecker



# Dataset


In [None]:
# download annotated comments and annotations

ANNOTATED_COMMENTS_URL = 'https://ndownloader.figshare.com/files/7554634' 
ANNOTATIONS_URL = 'https://ndownloader.figshare.com/files/7554637' 

def download_file(url, fname):
    urllib.request.urlretrieve(url, fname)

download_file(ANNOTATED_COMMENTS_URL, 'attack_annotated_comments.tsv')
download_file(ANNOTATIONS_URL, 'attack_annotations.tsv')

Google Drive File Retrieval

**NOTE:** Use this only if your path to files matches the path below 

In [None]:
# drive.mount('/content/drive')

In [None]:
# comments = pd.read_csv('/content/drive/My Drive/cs5100 project/attack_annotated_comments.tsv', sep = '\t', index_col = 0)
# annotations = pd.read_csv('/content/drive/My Drive/cs5100 project/attack_annotations.tsv',  sep = '\t')

Normal File Retrieval

In [None]:
comments = pd.read_csv('attack_annotated_comments.tsv', sep = '\t', index_col = 0)
annotations = pd.read_csv('attack_annotations.tsv',  sep = '\t')

In [None]:
len(annotations['rev_id'].unique())

In [None]:
# labels a comment as an atack if the majority of annoatators did so
labels = annotations.groupby('rev_id')['attack'].mean() > 0.5

In [None]:
# join labels and comments
comments['attack'] = labels

In [None]:
# remove newline and tab tokens
comments['comment'] = comments['comment'].apply(lambda x: x.replace("NEWLINE_TOKEN", " "))
comments['comment'] = comments['comment'].apply(lambda x: x.replace("TAB_TOKEN", " "))

# Text Preprocessing

In [None]:
comments.sample(5)

**Remove stopwords**

In [None]:
nltk.download("stopwords")
STOPWORDS = set(stopwords.words("english"))

In [None]:
"""
function: is_stopword
param(s): word, a string
returns: a boolean
does: determines whether or not the given string is a stopword
"""
def is_stopword(word):
  if word in STOPWORDS:
    return True
  return False

**Correct spelling**

Does not improve ROC AUC

In [None]:
# spell = SpellChecker()

# """
# function: correct_spelling
# param(s): word, a string
# returns: a string
# does: corrects the spelling of a word using the spellchecker package
# """
# def correct_spelling(word):
#   misspelled_word = spell.unknown(word)
#   if word is misspelled_word:
#     return spell.correction(word)
#   return word

Parse comments["comment"] and clean text. 


**NOTE:** I tested all of the techniques used in this function individually, and none increased AUC values. This is interesting, as the vectorizers below perform many of the operations described here. Aside from spellchecking, each text cleanup technique is pretty standard, but may oversimplify the problem at hand

In [None]:
"""
function: parse_text
param(s): text, a string
returns: a string
"""
def parse_text(text):
  new_text = []
  # split text into list of items
  words_and_symbols = str(text).split()
  # iterate through each item and create a new string of alphabet characters
  for item in words_and_symbols:
    # make item lower case
    item = item.lower()
    # remove non-alpha characters
    if item.isalpha():
      # correct spelling
      word = correct_spelling(item)
      # remove stopwords
      if not is_stopword(word):
        new_text.append(word)
  return " ".join(new_text)

In [None]:
# After testing text cleaning results above, no need to use this
# comments["comment"] = comments["comment"].apply(parse_text)

In [None]:
comments.query('attack')['comment'].head(10)

# Metric Functions

Confusion Matrix

In [None]:
"""
function: build_confusion_matrix
params: model, a function
returns: nothing
does: builds and prints a confusion matrix
"""
def build_confusion_matrix(model, y_pred):
  cm = confusion_matrix(y_pred, test_comments['attack'])
  print(cm)

Precision-Recall

In [None]:
"""
function: plot_precision_recall
params: precision, a float; recall, a float
returns: nothing
does: plots precision as a function of recall
source: https://scikit-learn.org/stable/auto_examples/model_selection/plot_precision_recall.html
"""
def plot_precision_recall(precision, recall):
  print('post')
  step_kwargs = ({'step': 'post'}
                if 'step' in signature(plt.fill_between).parameters
                else {})

  plt.step(recall, precision, color='b', alpha=0.2, where='post')
  print('post2')
  plt.fill_between(recall, precision, alpha=0.2, color='b', **step_kwargs)
  print('post3')
  plt.xlabel('Recall')
  plt.ylabel('Precision')
  plt.ylim([0.0, 1.05])
  plt.xlim([0.0, 1.0])
  plt.title('2-class Precision-Recall curve:')

In [None]:
"""
function: precision_recall_fscore
params: clf, a function
returns: nothing
does: calculates precision, recall, and f-score of given function
"""
def precision_recall_fscore(clf, y_pred):
  metrics = precision_recall_fscore_support(y_true=test_comments['attack'], y_pred=y_pred, average='weighted')
  print('Test Precision: %.5f' %metrics[0])
  print('Test Recall: %.5f' %metrics[1])
  print('Test F-Score: : %.5f' %metrics[2])
  # precision_curve, recall_curve, _ = precision_recall_curve(test_comments['attack'], clf.predict(test_comments['comment']))
  # plot_precision_recall(precision_curve, recall_curve)

Metrics Function

In [None]:
"""
function: get_metrics
params: clf, a function
returns: nothing
does: prints out confusion matrix, precision, recall, f-score, and ROC AUC
"""
def get_metrics(clf):
  y_pred = clf.predict(test_comments['comment'])
  build_confusion_matrix(clf, y_pred)
  precision_recall_fscore(clf, y_pred)
  auc = roc_auc_score(test_comments['attack'], clf.predict_proba(test_comments['comment'])[:,1])
  print('Test ROC AUC: %.5f' %auc)

# Models

**Logistic Regression (strawman)**

In [None]:
# fit a simple text classifier

train_comments = comments.query("split=='train'")
test_comments = comments.query("split=='test'")

clf = Pipeline([
    ('vect', CountVectorizer(max_features = 10000, ngram_range = (1,2))),
    ('tfidf', TfidfTransformer(norm = 'l2')),
    ('clf', LogisticRegression()),
])

clf = clf.fit(train_comments['comment'], train_comments['attack'])
get_metrics(clf)

In [None]:
# correctly classify nice comment
clf.predict(['Thanks for you contribution, you did a great job!'])

In [None]:
# correctly classify nasty comment
clf.predict(['People as stupid as you should not edit Wikipedia!'])

[[20280  1236]
 [  142  1520]]

Test Precision: 0.93923

Test Recall: 0.94055

Test F-Score: : 0.93396

Test ROC AUC: 0.95697

**Logistic Regression (\#2)**

Create New Training Set

In [None]:
# create a new training set made up of validation set and previous training set
train_comments = comments.query("split=='train'")
val_comments = comments.query("split=='dev'")
test_comments = comments.query("split=='test'")
train_comments = pd.concat([val_comments, train_comments])

In [None]:
clf = Pipeline([
    # replace CountVectorizer with TfidfVectorizer
    ('vect', TfidfVectorizer(max_df=1.0, min_df=1, max_features=None, norm = 'l2')),
    # leave in the Transformer for now
    ('tfidf', TfidfTransformer(norm = 'l2')),
    ('clf', LogisticRegression(n_jobs=-1)),
])

parameters = {'vect__lowercase': (True, False),
              'vect__analyzer': ('word', 'char', 'char_wb'),
              'vect__ngram_range': [(1,1), (1,2)],
              'vect__stop_words': ('english', None),
              'clf__solver': ('newton-cg', 'lbfgs')}

# now using GridSearchCV for tuning
clf = GridSearchCV(clf, parameters, cv=10, n_jobs=-1)
clf = clf.fit(train_comments['comment'], train_comments['attack'])
get_metrics(clf)

In [None]:
# correctly classify nice comment
clf.predict(['Thanks for you contribution, you did a great job!'])

In [None]:
# correctly classify nasty comment
clf.predict(['People as stupid as you should not edit Wikipedia!'])

[[20300  1325]
 [  122  1431]]

Test Precision: 0.93667

Test Recall: 0.93757

Test F-Score: : 0.92975

Test ROC AUC: 0.96079



**Logisitic Regression (\#3)**

Feature Union

In [None]:
# from lecture
vectorizerW = TfidfVectorizer(lowercase=True, analyzer='word', stop_words=None, ngram_range = (1,1), max_df=1.0, min_df=1, max_features=None, norm = 'l2')
vectorizerC = TfidfVectorizer(lowercase=True, analyzer='char', stop_words=None, ngram_range = (1,1), max_df=1.0, min_df=1, max_features=None, norm = 'l2')
# this variable will be used from now on
combined_features = FeatureUnion([('word', vectorizerW), ('char', vectorizerC)])

In [None]:
clf = Pipeline([
    # new FeatureUnion implementation
    ('features', combined_features),
    ('clf', LogisticRegression(n_jobs=-1)),
])

parameters = {'clf__solver': ('newton-cg', 'lbfgs')}

clf = GridSearchCV(clf, parameters, cv=5, n_jobs=-1)
clf = clf.fit(train_comments['comment'], train_comments['attack'])
get_metrics(clf)

In [None]:
# correctly classify nice comment
clf.predict(['Thanks for you contribution, you did a great job!'])

In [None]:
# correctly classify nasty comment
clf.predict(['People as stupid as you should not edit Wikipedia!'])

[[20262  1152]
 [  160  1604]]
 
Test Precision: 0.94182

Test Recall: 0.94339

Test F-Score: : 0.93785

Test ROC AUC: 0.96259

**Logistic Regression (\#4)**

https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegressionCV.html


In [None]:
clf = Pipeline([
    ('features', combined_features),
    ('clf', LogisticRegressionCV(cv=10,max_iter=100, n_jobs=-1, solver='lbfgs', random_state=12345)),
])

parameters = {'clf__fit_intercept': (True, False),
              'clf__refit': (True, False)}

clf = GridSearchCV(clf, parameters, cv=10, n_jobs=-1)
clf = clf.fit(train_comments['comment'], train_comments['attack'])
get_metrics(clf)

In [None]:
# correctly classify nice comment
clf.predict(['Thanks for you contribution, you did a great job!'])

In [None]:
# correctly classify nasty comment
clf.predict(['People as stupid as you should not edit Wikipedia!'])

[[20189   993]
 [  233  1763]]
Test ROC AUC: 0.96438

**Multi-Layer Perceptron**

https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html

For the MLPClassifier, I used an iterative method to choose the best number of neurons for the network, using powers of two. The best result I found was with **32 neurons, although higher numbers showed similar results.**

In [None]:
neuron_pow = 0

# iterate until user cancels: take best number of neurons from results
while True:
    
    print('MLP with {} neurons\n'.format(2**neuron_pow))
    
    clf = Pipeline([
        ('features', combined_features),
        ('clf', MLPClassifier(hidden_layer_sizes=(2**neuron_pow), max_iter=200, 
                              activation='relu', random_state=12345, 
                              validation_fraction=0.1, verbose=True, early_stopping=True)),
    ])

    # parameters = {'clf__early_stopping': (True),
    #               'clf__warm_start': (True),
    #               'clf__solver': 'adam'}

    # clf = GridSearchCV(clf, parameters, cv=5, n_jobs=-1)
    clf = clf.fit(train_comments['comment'], train_comments['attack'])
    get_metrics(clf)

    neuron_pow+=1

In [None]:
# correctly classify nice comment
clf.predict(['Thanks for you contribution, you did a great job!'])

In [None]:
# correctly classify nasty comment
clf.predict(['People as stupid as you should not edit Wikipedia!'])

32 Neurons

[[20165  1040]
 [  257  1716]]
 
Test Precision: 0.94130

Test Recall: 0.94404

Test F-Score: : 0.93994

Test ROC AUC: 0.95322

**Multinomial Naive Bayes**

https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.MultinomialNB.html

In [None]:
from sklearn.naive_bayes import MultinomialNB

clf = Pipeline([
    ('features', combined_features),
    ('clf', MultinomialNB()),
])

parameters = {'clf__fit_prior': (True, False)}

clf = GridSearchCV(clf, parameters, cv=10, n_jobs=-1)
clf = clf.fit(train_comments['comment'], train_comments['attack'])
get_metrics(clf)

In [None]:
# correctly classify nice comment
clf.predict(['Thanks for you contribution, you did a great job!'])

In [None]:
# correctly classify nasty comment
clf.predict(['People as stupid as you should not edit Wikipedia!'])

[[20398  2146]
 [   24   610]]
 
Test Precision: 0.91163

Test Recall: 0.90638

Test F-Score: : 0.87939

Test ROC AUC: 0.86398

**Random Forest Classifier**

https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html

In [None]:
clf = Pipeline([
    ('features', combined_features),
    ('clf', RandomForestClassifier(n_estimators=150, n_jobs=-1)),
])

parameters = {'clf__bootstrap': (True, False)}

clf = GridSearchCV(clf, parameters, cv=5, n_jobs=-1)
clf = clf.fit(train_comments['comment'], train_comments['attack'])
get_metrics(clf)

In [None]:
# correctly classify nice comment
clf.predict(['Thanks for you contribution, you did a great job!'])

In [None]:
# correctly classify nasty comment
clf.predict(['People as stupid as you should not edit Wikipedia!'])

[[20392  1938]
 [   30   818]]
 
Test Precision: 0.91932

Test Recall: 0.91509

Test F-Score: : 0.89451

Test ROC AUC: 0.94202

# Results

**a. What are the text cleaning methods you tried? What are the ones you have included in the final code?**

For my project, I tried several text cleaning methods. When I first received the assignment, I did some research into text processing for natural language processing experiments. I am also working on another project for another class where word embeddings are used to derive similarities between documents. It was great that these two assignments converged at some points!

In my research, I learned about stopwords, spelling, alphabetic versus numeric characters, and how these features might impact the accuracy of a machine learning model. The following details some of the text processing methods I tried, most of which are included in the code for viewing, though some methods are commented out:

One of the first text cleaning methods I tried was removing the stopwords from the `comment` column in the dataset. Stopwords are usually the most common words in a language, which can affect the meaning of phrases in a document. I used a package called the Natural Language Toolkit (nltk) that contains lists of stopwords. Using the imported list of stopwords, I removed them from each instance in `comment` using the Pandas method `apply`. Overall, I decided not to include my `remove_stopwords` function in my final code because it did not have a positive impact on my ROC AUC score for my models. I also used term frequency-inverse document frequency vectorizers (tfidf) later on in my experimentation, which removes stopwords when text mining - I will detail this later on in my findings.

Another text cleaning method I tried was to correct the spelling of words in the `comment` attribute of the dataset. In my research, I learned that spelling can sometimes affect text mining. Sometimes, this effect is positive, in that misspelled words are spelled correctly. Other times, this effect can be negative, as misspelled and correctly spelled words are incorrectly spelled because the words do not exist in the dataset of correctly spelled words. I thought that maybe corrected spelling would 'normalize' so to say, some of the comments. After implementing a spellchecker function with `pyspellchecker`, I saw no improvement to the ROC AUC score, which surprised me, so I decided not to include the spellchecker in the final code.

While implementing the above to functions, I built a parser method that applies specific transformations to the text as it is applied to each instance. Including the above two functions described, I also made all text lowercase and removed numbers and other nonalphabetical characters. 

Again, even with all of these functions used independently, I saw no increase in ROC AUC score. Why? Because the code provided by the strawman model includes a `TfidfTransformer` that automatically lowercases text and removes stopwords. *It pays to read the fine manual!*

Here's what I did include: I read more into tfidf vectorizers and transformers and decided to replace the strawman `CountVectorizer` with a `TfidfVectorizer`. By doing so, I was able to better mine text for features using hyperparameter tuning. I also followed the professor's advice from the lecture and included a `FeatureUnion` of two `TfidfVectorizers`: one for words, another for characters. This significantly improved my ROC AUC score.

***

**b. What are the features you considered using? What features did you use in the final code?**

For my project, I considered using several different types of feature extraction methods. First, I looked into fine-tuning the given strawman feature extraction methods, `CountVectorizer` and `TfidfTransformer`. Doing so, I saw an improvement in accuracy, albeit small. The results can be observed in my code above!

Next, I used a `TfidfVectorizer` to combine the `CountVectorizer` and the `TfidfTransformer` above, per the professor's notes in the project assignment handout. I also used `GridSearchCV` to fine-tune this feature extraction method.

Finally, I added `FeatureUnion` of two `TfidfVectorizer` methods. I have already detailed this implementation in part a. Ultimately, I settled on using the resulting `combined_features` variable for `FeatureUnion` in my model testing.

I also considered using other features besides `comments`, such as `year`. `year` produced an ROC AUC of 0.95585, which was slightly worse than the strawman code. I also tried all of the other features in combination to `comment`; `logged_in`, `ns`, and `sample` negatively affected the AUC ROC score. I have since removed these features from the code.

I believe that with the use of more efficient neural network packages like TensorFlow and Keras or implementation of a dataset of current events by year, these features might be able to provide some use. But for now, the other features do not have a significant positive effect on the ROC AUC. There is not enough about each of these other features to draw meaning right out of the bag. 

***

**c. What optimizations did you add in your code, if any?**

In my answers to these questions, I have detailed many optimizations that I made to the strawman code that had a significant effect on the metrics of my models. 

One of the most important optimizations was combining training and validation datasets to create a new and bigger training set. Doing so made my training dataset bigger and therefore, gave it more unseen data to train on. I was also able to still create validation sets using built-in Sci-kit learn methods, so it was not a problem to combine the two.

Other optimizations I made include hyper-parameter tuning (discussed below), thoughtful analysis of the dataset to include specific feature extraction techniques (discussed above), and using functions to prevent code replication. 

My goal for this project was to learn the details of Sci-kit learn, and in doing so, learn how to clean data and optimize the machine learning models to accurately predict the labels for unseen data. I believe I was able to do so!
***

**d. What are the ML methods you tried out, and what were your best results with each method? Which was the best ML method you saw before tuning hyperparameters?**

For this project, we were asked to try out three different machine learning methods. In order to truly understand the strawman code, however, I created multiple machine learning methods, far more than the professor asked. I did this so I could better understand the effects of parameterization and feature extraction on the existing code.

For my first implementation, I chose to use the existing `LogisticRegression` machine learning model provided by the strawman code. Using the existing pipeline, I made alterations to the code: I combined validation and training sets to create a new training set with more instances that could then be used for validation using Sci-kit learn functions. One of the first functions I tried was `GridSearchCV`. Using this function *took a very long time to train*. With that in mind, I was careful to read the documentation for all of the implemented models to see what parameters *really* needed to be passed into `GridSearchCV`. 

Another model I tested was `LogisticRegression` with a `FeatureUnion` of two `TfidfVectorizers` as described in part a. This model also improved ROC AUC over the strawman code. Finally, I implemented `LogisticRegressionCV` which is a logistic regression model with cross-validation. This model was even better than the previous two!

*For my three new machine learning models, I tested `MLPClassifier`, `MultiNomialNaiveBayes`, and a `RandomForestClassifier`.*

I have been working with neural networks in another class so I was excited to see how Sci-kit learn's multi-layer perceptron would perform. To my surprise, I was not as fast as I expected it to be. I have been using TensorFlow and Keras neural networks and was blown away by how much slower the Sci-kit learn implementation was. Nonetheless, I implemented the `MLPClassifier` after I had figured out text mining methods and `GridSearchCV`. I was able to train the neural network with a decent ROC AUC score. 

As soon as I implemented the `MultiNomialNaiveBayes` classifier and got the ROC AUC score, I knew it was a bad performer for this dataset. There are not many parameters for this model, and even after using `GridSearchCV`, the score did not improve significantly.

Finally, my `RandomForestClassifier` also performed pretty well given the dataset. Again, I used `GridSearchCV` to fine-tune the model. Overall, I think that the `RandomForestClassifier` performed pretty well!

***

**e. What hyper-parameter tuning did you do, and by how many percentage points did your accuracy go up?**

I used several different techniques for hyper-parameter tuning. The most important technique I used was to read the fine manual for all of the models. Reading the documentation was critical in making decisions about which hyper-parameters to choose for each model. For example, gradient descent is great for small datasets, but not for large ones. Here, stochastic gradient descent would be more applicable. Therefore, hyper-parameters should only include stochastic gradient descent, as our dataset is quite large. After handpicking, I used `GridSearchCV` to tune the hyper-parameters, passing in the values that would have the best effect on the models, as discussed above. 

My accuracy increased ~1.5-2.0 percentage points after hyperparameter tuning.

***

**f. What did you learn from the different metrics? Did you try cross-validation?**

I used a `get_metrics` function to keep track of all of my metrics for each experiment. As the project required, I implemented a confusion matrix, precision, recall, f-score, and ROC AUC. 

From these metrics I was able to learn more about the accuracy of my predictions for each of the classes, `attack == 1` and `attack == 0`. I thought it was great to see that some models more accurately predict positive classes than negative classes and vice versa. As part of the assignment, I used ROC AUC as the primary score metric and focused on increasing this specific metric. 

In my project, I did use cross-validation in several models. The first instance is with `GridSearchCV` and the strawman code. Second, I used a `LogisticRegressionCV` model. I also used cross-validation in my `MLPClassifier`. 

I think it is important to note that I combined the `dev` set from the Wikipedia dataset with the `train` set, and then created validation sets from that total, per the professor's advice during the Tuesday discussions.

***

**g. What are your best final Result Metrics? By how much is it better than the strawman figure? Which model gave you this performance?**

My final result metrics are as follows:

`LogisticRegression` (strawman):
[[20280  1236]
 [  142  1520]],
Test Precision: 0.93923,
Test Recall: 0.94055,
Test F-Score: : 0.93396,
Test ROC AUC: 0.95697

`LogisticRegressionCV`
[[20204  1014]
 [  218  1742]],
Test Precision: 0.94467,
Test Recall: 0.94685,
Test F-Score: : 0.94287,
Test ROC AUC: 0.96361

`MLPClassifier`
[[20165  1040]
 [  257  1716]],
Test Precision: 0.94130,
Test Recall: 0.94404,
Test F-Score: : 0.93994,
Test ROC AUC: 0.95322

`MultiNomialNB`
[[20398  2146]
 [   24   610]],
Test Precision: 0.91163,
Test Recall: 0.90638,
Test F-Score: : 0.87939,
Test ROC AUC: 0.86398

`RandomForestClassifier`
[[20392  1938]
 [   30   818]],
Test Precision: 0.91932,
Test Recall: 0.91509,
Test F-Score: : 0.89451,
Test ROC AUC: 0.94202

***

**h. What is the most interesting thing you learned from doing the report?**

The most interesting thing I learned was text processing methods. 

In a close second, I think that it is worthy to mention sci-ki learn. I have used the package before, but not in the level of detail that we used for this project. I thought it was exciting to learn how everything worked and feel that I can and will be able to accomplish a lot with this package. I am learning TensorFlow and Keras in another class at the moment, so it is great to learn more about a similar package.

***

**i. What was the hardest thing to do?**

The hardest thing to do was waiting for the models to train. 

I consider myself a very patient person and had no trouble waiting for the models to complete training. However, I realized that unlike previous classes where I sat and watched the program compile in a short amount of time...with machine learning, doing that is a waste of time. 

The hardest thing to do with this project was to manage my time more efficiently. I realized that training could happen in the background of accomplishing other tasks, writing documentation, writing the answers to some of these questions, doing homework for other classes. In the beginning, I spent a lot of time waiting for models to be trained and treated that time as a break. This was an important lesson to learn and I am glad that I picked up on it early enough.