## Initial data inspection
Load in test and training data files. The data used for this analysis can be found [here](https://archive.ics.uci.edu/ml/datasets/Drug+Review+Dataset+%28Druglib.com%29).

Optional: If working in Google Colab, use drive.
mount() so that you can import files from Google Drive into your code

In [None]:
# Optional data import from Google Drive
from google.colab import drive
drive.mount('/content/drive')

import pandas as pd

train_file = '/content/drive/My Drive/drugLib_raw/drugLibTrain_raw.tsv'
test_file = '/content/drive/My Drive/drugLib_raw/drugLibTest_raw.tsv'

train_df = pd.read_csv(train_file,sep='\t')

This dataset contains information describing the drug information, the condition that it being treated and the patient reviews. 

The aim is to predict the `effectiveness` from this dataset. 

The patient review data is divided into three columns: `benefitsReview`, `sideEffectsReview` and `commentsReview`. The text will be mined from these three columns to try and predict drug `effectiveness`.

A more comprehensive description of the data can be found [here](https://archive.ics.uci.edu/ml/datasets/Drug+Review+Dataset+%28Druglib.com%29)



In [None]:
train_df.head()

Unnamed: 0.1,Unnamed: 0,urlDrugName,rating,effectiveness,sideEffects,condition,benefitsReview,sideEffectsReview,commentsReview
0,2202,enalapril,4,Highly Effective,Mild Side Effects,management of congestive heart failure,slowed the progression of left ventricular dys...,"cough, hypotension , proteinuria, impotence , ...","monitor blood pressure , weight and asses for ..."
1,3117,ortho-tri-cyclen,1,Highly Effective,Severe Side Effects,birth prevention,Although this type of birth control has more c...,"Heavy Cycle, Cramps, Hot Flashes, Fatigue, Lon...","I Hate This Birth Control, I Would Not Suggest..."
2,1146,ponstel,10,Highly Effective,No Side Effects,menstrual cramps,I was used to having cramps so badly that they...,Heavier bleeding and clotting than normal.,I took 2 pills at the onset of my menstrual cr...
3,3947,prilosec,3,Marginally Effective,Mild Side Effects,acid reflux,The acid reflux went away for a few months aft...,"Constipation, dry mouth and some mild dizzines...",I was given Prilosec prescription at a dose of...
4,1951,lyrica,2,Marginally Effective,Severe Side Effects,fibromyalgia,I think that the Lyrica was starting to help w...,I felt extremely drugged and dopey. Could not...,See above


## Data preprocessing

The data needs to be pre-processed before inputing it into a model.
A model will perform better if the input data consist of features that have a significant impact on what you are trying to predict and the amount of noise is minimized (data that is deemed insignificant). Several techniques are used here in this analysis exercise:

1.   Data cleansing
2.   Lemmatization
3.   Removal of stop words
4.   Using term frequency-inverse document frequency (TF-IDF)

### 1. Token Normalisation

The train and test data is currently in tab delimited format and will be converted into Pandas Dataframes. 

An additional column `combinedReview` has been added which contains all the review data from the 3 columns, concatenated.

Another additional column `label` has been included in these Dataframes that assigns classification labels `effectiveness` as integer values so that they can be read later by the model.

The `combinedReview` data will be cleaned to remove any special (invalid) characters, multiple spaces, numbers, escape characters and any words that are just 1 character long. The review text will also be case-folded (converted all to lower case) so that words that are spelled the same will be grouped together regardless of their upper/lower casing.

### 2. Case Folding
All words are converted to lower case so that words that are spelt the same can be matched. It is reasonable to assume that variations in casings will not impact the sentinment of the reviews.

### 3. Lemmatization

Lemmatization takes words and reduces them down to its base form i.e. its lemma. This helps to group together words that are similar and can be considered equivalent for the purposes of input features for a model. For example, the lemma of  `have` and `had` is `have`, so these words will be grouped together and treated the same.

### 4. Stemming
Porter's Algorithm is used for Stemming which is a more crude technique for grouping words that can be considered equivalent that were not already reduced by lemmatisation, for example the words "compressed" and "compression" have different lemmas but with stemming they can all be reduced to the same base, "compress". Plurals are also singularized if not already done so in lemmatization.

### 5. POS Tagging
POS tags are part of speech tags. Only nouns, adjectives, adverbs and adverbs are included for model inputs and all other words belonging to other POS types are considered less significant and are discarded (e.g. articles, conjunctions).


### 5. Removal of stop words
Stop words, or function words, are the most common words in a language. They typically do not provide a lot of information for text mining and are often excluded in text classification e.g. this, the, a, of. However, a lot of negation words are included in the stop words which can have great significant in sentiment analysis, for example, the sentiment of the phrases `very effective` and `not very effective` are contrasting, but the words `not` and `very` are considered stop words. So a customised list of selected stop words are excluded from the reviews to reduce the noise for the model inputs. 

In [None]:
import numpy as np

labels_dict = {}
for count,label in enumerate(train_df["effectiveness"].unique()):
  labels_dict[label] = count+1

def build_dataframe(filepath):
    # Loading the training data into a Pandas Dataframe
    df = pd.read_csv(filepath, sep='\t')
    # Creating new columns for cleaned data
    df["benefitsReview_cleaned"] = np.nan
    df["sideEffectsReview_cleaned"] = np.nan
    df["commentsReview_cleaned"] = np.nan

    columns_to_clean = {"benefitsReview": "benefitsReview_cleaned",
                        "sideEffectsReview": "sideEffectsReview_cleaned",
                        "commentsReview": "commentsReview_cleaned"}


    for row in df.itertuples():
        for raw_column, cleaned_column in columns_to_clean.items():
            drug_review = df.loc[row.Index, raw_column]
            # If the row is empty, no need to process, continue on processing the next line
            if pd.isnull(drug_review):
                continue

            drug_review = clean_review_text(drug_review)
            drug_review = ' '.join(drug_review)

            # Update the 'clean column' with the cleaned drug_review
            df.loc[row.Index, cleaned_column] = drug_review

    # Concatenating 3 columns of unstructured, descriptive customer review data into a single column in the Dataframe
    df["combinedReview"] = df["benefitsReview_cleaned"].str.cat(df["sideEffectsReview_cleaned"],
                                                                             sep=" ").str.cat(df["commentsReview_cleaned"], sep=" ")
    return df

# Function used to clean the test and training drug review data and returns a tokenized list of the words in their lemma form(root form)
def clean_review_text(drug_review):
    # Remove special characters
    drug_review = re.sub('\W', ' ', drug_review)
    # Remove underscores
    drug_review = re.sub('_', '', drug_review)
    # Remove single characters
    drug_review = re.sub(r'\s+[a-zA-Z]\s+', ' ', drug_review)
    # Remove all numbers
    drug_review = re.sub("\d+", "", drug_review)
    # Remove single characters from the start
    drug_review = re.sub(r'\^[a-zA-Z]\s+', ' ', drug_review)
    # Substituting multiple spaces with single space
    drug_review = re.sub(r'\s+', ' ', drug_review, flags=re.I)
    # Removing prefixed 'b'
    drug_review = re.sub(r'^b\s+', '', drug_review)
    # Remove any newline escape characters
    drug_review = re.sub("\n", "", drug_review)
    # Converting to Lowercase
    drug_review = drug_review.lower()
    # Convert all instances of side effect, plural and singular form, to one feature
    if 'side effect' in drug_review:
        # Feature engineer 'side effect' as one feature as this is very significant feature for sentiment analysis
        drug_review=drug_review.replace('side effect', 'side_effect')
    if 'side affect' in drug_review:
        drug_review=drug_review.replace('side affect','side_effect')  # correct frequent misspellings of side effect as this is an important input feature
    drug_review = drug_review.split()
    # Singularize all plurals to singular words
    drug_review = [singularize(word) for word in drug_review]
    # Lemmatization: Replace each word in each text file with its lemma form (root form) and convert to present tense
    drug_review = [WordNetLemmatizer().lemmatize(word,'v') for word in drug_review]
    # Stemming: Use Porter's algorithm to stem words
    ps = PorterStemmer()
    drug_review = [ps.stem(word) for word in drug_review]
    # Remove a subset of stop words that are useful for this sentiment analysis
    # If the entire term is a stopword, then do not include in the target vocab/lexicon
    # The stopwords were initially added for the purpose of finding n-grams with stopwords as the stopwords would
    # have value within an n-gram e.g. "very effective" and "much more" contain stopwords but are valuable
    # when part of n-grams
    all_stopwords = stopwords.words('english')
    custom_stopwords = []
    exclusions = ['no','not','down','more','few','most','other','some','no','not','only','same','so','very','aren', "aren't", 'couldn', "couldn't", 'didn', "didn't", 'doesn', "doesn't", 'hadn', "hadn't", 'hasn', "hasn't", 'haven', "haven't", 'isn', "isn't",'shouldn', "shouldn't", 'wasn', "wasn't", 'weren', "weren't", 'won', "won't", 'wouldn', "wouldn't"]
    for stopword in all_stopwords:
        if stopword not in exclusions:
            custom_stopwords.append(stopword)

    clean_drug_review = []

    postags = nltk.pos_tag(drug_review)
    for tag in postags:
        word = tag[0]
        pos = tag[1]
        if (len(word) > 2 and (pos[:2] == "NN" or pos[0] == "V" or pos[:2] == "RB" or pos[0] == "J")) or word in exclusions:
            clean_drug_review.append(word)

    clean_drug_review = ' '.join(clean_drug_review)
    clean_drug_review = clean_drug_review.replace('have have','have')
    clean_drug_review = clean_drug_review.split()
    return clean_drug_review

import re
import nltk
nltk.download('stopwords')
nltk.download('wordnet')
nltk.download('averaged_perceptron_tagger')
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
from nltk.stem import PorterStemmer

# Function used to clean the test and training drug review data and returns a tokenized list of the words in their lemma form(root form)
def clean_review_text(drug_review):
    # Remove special characters
    drug_review = re.sub('\W', ' ', drug_review)
    # Remove underscores
    drug_review = re.sub('_', '', drug_review)
    # Remove single characters
    drug_review = re.sub(r'\s+[a-zA-Z]\s+', ' ', drug_review)
    # Remove all numbers
    drug_review = re.sub("\d+", "", drug_review)
    # Remove single characters from the start
    drug_review = re.sub(r'\^[a-zA-Z]\s+', ' ', drug_review)
    # Substituting multiple spaces with single space
    drug_review = re.sub(r'\s+', ' ', drug_review, flags=re.I)
    # Removing prefixed 'b'
    drug_review = re.sub(r'^b\s+', '', drug_review)
    # Remove any newline escape characters
    drug_review = re.sub("\n", "", drug_review)
    # Converting to Lowercase
    drug_review = drug_review.lower()
    # Convert all instances of side effect, plural and singular form, to one feature
    if 'side effect' in drug_review:
        # Feature engineer 'side effect' as one feature as this is very significant feature for sentiment analysis
        drug_review=drug_review.replace('side effect', 'side_effect')
    if 'side affect' in drug_review:
        drug_review=drug_review.replace('side affect','side_effect')  # correct frequent misspellings of side effect as this is an important input feature
    drug_review = drug_review.split()
    # Lemmatization: Replace each word in each text file with its lemma form (root form) and convert to present tense
    drug_review = [WordNetLemmatizer().lemmatize(word,'v') for word in drug_review]
    # Stemming: Use Porter's algorithm to stem words
    ps = PorterStemmer()
    drug_review = [ps.stem(word) for word in drug_review]
    # Remove a subset of stop words that are useful for this sentiment analysis
    # If the entire term is a stopword, then do not include in the target vocab/lexicon
    # The stopwords were initially added for the purpose of finding n-grams with stopwords as the stopwords would
    # have value within an n-gram e.g. "very effective" and "much more" contain stopwords but are valuable
    # when part of n-grams
    all_stopwords = stopwords.words('english')
    custom_stopwords = []
    exclusions = ['no','not','down','more','few','most','other','some','no','not','only','same','so','very','aren', "aren't", 'couldn', "couldn't", 'didn', "didn't", 'doesn', "doesn't", 'hadn', "hadn't", 'hasn', "hasn't", 'haven', "haven't", 'isn', "isn't",'shouldn', "shouldn't", 'wasn', "wasn't", 'weren', "weren't", 'won', "won't", 'wouldn', "wouldn't"]
    for stopword in all_stopwords:
        if stopword not in exclusions:
            custom_stopwords.append(stopword)

    clean_drug_review = []

    postags = nltk.pos_tag(drug_review)
    for tag in postags:
        word = tag[0]
        pos = tag[1]
        if (len(word) > 2 and (pos[:2] == "NN" or pos[0] == "V" or pos[:2] == "RB" or pos[0] == "J")) or word in exclusions:
            clean_drug_review.append(word)

    clean_drug_review = ' '.join(clean_drug_review)
    clean_drug_review = clean_drug_review.replace('have have','have')
    clean_drug_review = clean_drug_review.split()
    return clean_drug_review


train_df = build_dataframe(train_file)
test_df = build_dataframe(test_file)

# Remove any duplicate lines in the train dataset and also any rows with no patient review data (no input for the model)
train_df.drop_duplicates()
train_df.dropna(subset=["combinedReview"],inplace=True)

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!


Viewing the updated, cleaned Dataframe for the training dataset which includes the new `combinedReview` column

In [None]:
train_df.head()

Unnamed: 0.1,Unnamed: 0,urlDrugName,rating,effectiveness,sideEffects,condition,benefitsReview,sideEffectsReview,commentsReview,benefitsReview_cleaned,sideEffectsReview_cleaned,commentsReview_cleaned,combinedReview
0,2202,enalapril,4,Highly Effective,Mild Side Effects,management of congestive heart failure,slowed the progression of left ventricular dys...,"cough, hypotension , proteinuria, impotence , ...","monitor blood pressure , weight and asses for ...",slow progress leav ventricular dysfunct overt ...,cough hypotens proteinuria impot renal failur ...,monitor blood pressur weight ass resolut fluid,slow progress leav ventricular dysfunct overt ...
1,3117,ortho-tri-cyclen,1,Highly Effective,Severe Side Effects,birth prevention,Although this type of birth control has more c...,"Heavy Cycle, Cramps, Hot Flashes, Fatigue, Lon...","I Hate This Birth Control, I Would Not Suggest...",thi type birth control have more con help cram...,heavi cycl cramp hot flash fatigu long last cy...,hate thi birth control not suggest thi anyon,thi type birth control have more con help cram...
2,1146,ponstel,10,Highly Effective,No Side Effects,menstrual cramps,I was used to having cramps so badly that they...,Heavier bleeding and clotting than normal.,I took 2 pills at the onset of my menstrual cr...,use have cramp so leav ball bed least day pons...,heavier bleed clot normal,take pill onset menstrual cramp then everi hou...,use have cramp so leav ball bed least day pons...
3,3947,prilosec,3,Marginally Effective,Mild Side Effects,acid reflux,The acid reflux went away for a few months aft...,"Constipation, dry mouth and some mild dizzines...",I was given Prilosec prescription at a dose of...,acid reflux away few month just few day drug h...,constip dri mouth some mild dizzi away medic s...,give prilosec prescript dose day medic take on...,acid reflux away few month just few day drug h...
4,1951,lyrica,2,Marginally Effective,Severe Side Effects,fibromyalgia,I think that the Lyrica was starting to help w...,I felt extremely drugged and dopey. Could not...,See above,think lyrica start help pain side_effect just ...,felt extrem drug dopey not drive while thi med...,see abov,think lyrica start help pain side_effect just ...


### 6. Using term frequency-inverse document frequency (TF-IDF)
The reviews contain a very large range of words and it would take a lot of computational power to include every single unique word when training a model. We can use term frequency-inverse document frequency (TF-IDF) which is a statistic that measures how important words are to a corpus (the set of all reviews). TF-IDF depends on how many times a word appears in a single document and also how many different documents that word appears in.

Higher TF-IDF values are given to words that appear  frequently within a single document *AND* if that word appears in a smaller number of documents. This means that if these words are seen in a document, it will be easier to predict what classification that document belongs to, based on that word. Using intuition, a word like `the` would have a low TF-IDF because it would probably appear in all documents. A word like `headache` may have a high TF-IDF because it seems like a word that would appear in a smaller number of reviews, and intuitively it would seem that this word highly suggests that the drug review is negative, for example. A word that is completely unique and only appears once in all the reviews would have a low TF-IDF because it appears infrequently in a single document (an example of this could be a misspelled word, like `headahce`.



Create the lexicon by calculating Term Frequency - Infrequent Document Frequency (TF-IDF). We will save the terms with the top 600 TF-IDF scores as the vocabulary chosen as input features for this model.

### 7. Using n-grams

Tri-grams and four-grams are phrases made up of three or four words. These are used instead of individual words as input features to the model as they provide more information about sentiment, e.g. an input feature `not longer feel symptoms` or `experienced slight headache` expresses more sentiment accurately than the single words `symptoms` and `headache`.

### 8. Named Entities
As one would expect from drug reviews, the text data consists of both descriptive information on the user's experience and also information relating to how the medicine was taken, which is not very informative for sentiment analysis. This includes information such as what time of day medicine was taken, if the medicine was taken for a few months, or if one pill was taken in the morning etc. A custom blacklist of words was created to exclude any terms with such words, such as: day, hour, once, twice, daily, take and pill.

### 9. Feature construction

It was found that the term `side effect` was common amongst high scoring TF-IDF terms. Thus these words were joined together as a single feature by clustering the two words `side effect` into one word, `side_effect`. Another predominant phrase that was clustered was `have had`. So all the terms `have`, `have had` and `had` all reduce down to the word `have` as this simplifies the input features whilst having no impact on sentiment.

In [None]:
from sklearn.feature_extraction.text import TfidfVectorizer

def tfidf_lexicon(train_df,size=400):
    # Engineer two features side effect as one grouped feature
    # Calculate Term Frequency-Infrequent document frequency (TF-IDF) for 1-3 ngrams
    # (groups of 1 word, groups of 2 words and groups of 3 words)
    tv = TfidfVectorizer(ngram_range = (3,4),
                         sublinear_tf = True,
                         max_features = int(size*2))
    train_tv = tv.fit_transform(train_df['combinedReview'])
    vocab = tv.get_feature_names()
    dist = np.sum(train_tv, axis=0)
    vocab_tfidf_df = pd.DataFrame(dist,columns = vocab)
    sorted_tfidf_df = vocab_tfidf_df.sort_values(vocab_tfidf_df.first_valid_index(),axis=1,ascending=False)
    i = 0
    lexicon = []
    custom_exclusions = [ 'day', 'hour','week', 'once', 'twice', 'daily', 'morning','year','month', 'take', 'medication','pill']
    for (word, score) in sorted_tfidf_df.iteritems():
        exclude = False
        for exclusion in custom_exclusions:
            if exclusion in word:
                exclude = True
        if not exclude:
            lexicon.append((word, score.values[0]))


    target_lexicon = []

    for term in lexicon:
        word = term[0]
        if len(word) == 2:
            target_lexicon.append(term)
        else:
            for term1 in lexicon:
                word1 = term1[0]
                if word!=word1 and word in word1:
                    continue
            target_lexicon.append(term)

    target_lexicon = target_lexicon[:size]

    return target_lexicon

lexicon_size=200
lexicon_tfidf = tfidf_lexicon(train_df,size=lexicon_size)
lexicon = [l[0] for l in lexicon_tfidf]
lexicon_df_demo = pd.DataFrame(lexicon_tfidf,columns=['Term','TF-IDF_Score'])
print("Top 20 terms in the review data, ranked by TF-IDF scores")
lexicon_df_demo.head(20)


Top 20 terms in the review data, ranked by TF-IDF scores


Unnamed: 0,Term,TF-IDF_Score
0,have no side_effect,32.484035
1,not experi ani,29.88738
2,there no side_effect,29.035621
3,not have ani,26.90545
4,lower blood pressur,24.50982
5,no longer have,21.030435
6,high blood pressur,20.466402
7,thi drug have,20.066177
8,have not have,19.148428
9,have ani side_effect,18.759807


Building the classifications which maps distinct classes from the dataset to integers

In [None]:
def build_classifications(df):
    classification_names = list(df.effectiveness.unique())
    classifications = {}
    id = 1
    for classification in classification_names:
        classifications[classification] = id
        id += 1
    return classifications

classifications = build_classifications(train_df)
print(classifications)

{'Highly Effective': 1, 'Marginally Effective': 2, 'Ineffective': 3, 'Considerably Effective': 4, 'Moderately Effective': 5}


## Text to Vector

The reviews now need to be converted into a format so that a machine learning model can read it.

The final feature set is converted into a vector representing the frequency of words from the targeted vocabulary that are present in each review. This vector representation is called a bag of words model because it only retains information of the frequency of terms, but does not retain its order. Since the lexicon is 200 words long, the vector representation for each review is also 200 numbers long. Taking the target vocabulary that is printed in the code block above, the first 3 vocab words are day, effect and side.
If the review text is `I experience this side effect on a day to day basis` then the vector representation would be [2,1,2] because `day` appears twice and `side` and `effect` appear once only.

*   `X_train` contains the vector representation of the reviews from the training dataset. The model will use this dataset to train.
*   `y_train` represents the classification for that review. Each review is represented by one number. For example, if the `effectiveness` of a drug review is `Highly Effective`, then this is represented in the `y_train` dataset as `1`. There are 3097 reviews in the final training dataset, so `y_train` would simply be a list of 3097 numbers, ranging from 1 to 5 for the different classes.
The model will use this dataset to train.

*   `X_test` contains the vector representation of the reviews from the test dataset. After the model is trained, it will use this as input features to try and predict the effectiveness.
*   `y_test` contains the actual effectiveness for the reviews in the test dataset. After the model has predicted the classifications, its guesses will be compared against this dataset to measure its accuracy, precision and recall later.





In [None]:
from nltk.tokenize import word_tokenize
import nltk
nltk.download('punkt')

def build_featureset(df,lexicon,classifications):
    featureset = []

    # N-gram terms from the lexicon kept as a separate list
    ngrams = [ngram for ngram in lexicon if (len(ngram.split()) > 1)]

    for row in df.itertuples():
        combined_drug_review = df.loc[row.Index, "combinedReview"]
        # If there is no review available for a row, no need to process it, continue on processing the next line
        if pd.isnull(combined_drug_review):
            continue
        word_tokens = word_tokenize(combined_drug_review.lower())
        features = np.zeros(len(lexicon))
        classification = classifications[row.effectiveness]
        for i in range(len(word_tokens)):
            # Check match to n-grams first
            for ngram in ngrams:
                if ngram in combined_drug_review:
                    index_value = lexicon.index(ngram.lower())
                    features[index_value] += 1

            # Matching one word terms
            if word_tokens[i].lower() in lexicon:
                index_value = lexicon.index(word_tokens[i].lower())
                features[index_value] += 1

            features = list(features)
            featureset.append([features, classification])
    return featureset


train_featureset = build_featureset(train_df,lexicon,classifications)
test_featureset = build_featureset(test_df,lexicon,classifications)

X_train = [features[0] for features in train_featureset]
y_train = [features[1] for features in train_featureset]
X_test = [features[0] for features in test_featureset]
y_test = [features[1] for features in test_featureset]

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


## Build and run the model

The Random Forest Classifier is built setting the number of trees in the forest to 100 and also calculating a class weight. The `effectiveness` classifications are not balanced which would create bias in the model so to counter this, weights are applied to each class, inversely proportional to the size of each class.

The performance of the model is measured below in terms of Precision, Recall and F-values.


In [None]:
from sklearn.ensemble import RandomForestClassifier
from sklearn import metrics

#n_estimators defines the number of trees for the random forest
regressor = RandomForestClassifier(n_estimators=100,random_state=1,verbose=3, class_weight='balanced')
regressor.fit(X_train, y_train)

# The random forest model returns a matrix of floats, this needs to be transformed into integers
# to define the clear cut classifications so that accuracy can be calculated.
y_pred_float = regressor.predict(X_test)
y_pred = []
for y in y_pred_float:
   y_pred.append(round(y))

print("\nRandom Forest Accuracy:", metrics.accuracy_score(y_test, y_pred))

print("Random forest confusion matrix: \n")
print(metrics.confusion_matrix(y_test,y_pred))

print("\nRandom forest precision score:")
print(metrics.precision_score(y_test,y_pred,average='weighted'))
print("\nRandom forest recall score:")
print(metrics.recall_score(y_test,y_pred,average='weighted'))
print("\nRandom forest f1 score:")
print(metrics.f1_score(y_test,y_pred,average='weighted'))

print("classification report:")
print(metrics.classification_report(y_test, y_pred))

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


building tree 1 of 100


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    3.7s remaining:    0.0s


building tree 2 of 100


[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    7.0s remaining:    0.0s


building tree 3 of 100
building tree 4 of 100
building tree 5 of 100
building tree 6 of 100
building tree 7 of 100
building tree 8 of 100
building tree 9 of 100
building tree 10 of 100
building tree 11 of 100
building tree 12 of 100
building tree 13 of 100
building tree 14 of 100
building tree 15 of 100
building tree 16 of 100
building tree 17 of 100
building tree 18 of 100
building tree 19 of 100
building tree 20 of 100
building tree 21 of 100
building tree 22 of 100
building tree 23 of 100
building tree 24 of 100
building tree 25 of 100
building tree 26 of 100
building tree 27 of 100
building tree 28 of 100
building tree 29 of 100
building tree 30 of 100
building tree 31 of 100
building tree 32 of 100
building tree 33 of 100
building tree 34 of 100
building tree 35 of 100
building tree 36 of 100
building tree 37 of 100
building tree 38 of 100
building tree 39 of 100
building tree 40 of 100
building tree 41 of 100
building tree 42 of 100
building tree 43 of 100
building tree 44 of 100

[Parallel(n_jobs=1)]: Done 100 out of 100 | elapsed:  5.6min finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.1s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    0.1s remaining:    0.0s
[Parallel(n_jobs=1)]: Done 100 out of 100 | elapsed:    5.4s finished



Random Forest Accuracy: 0.22197961122375193
Random forest confusion matrix: 

[[ 5348  1985  1259  3381 15697]
 [ 1079   450   633   472  3039]
 [  780   459   542   354  3907]
 [ 2597  1313  1097  2316 13086]
 [ 1878   604   560   693  7000]]

Random forest precision score:
0.31628646011551453

Random forest recall score:
0.22197961122375193

Random forest f1 score:
0.2110800339738471
classification report:
              precision    recall  f1-score   support

           1       0.46      0.19      0.27     27670
           2       0.09      0.08      0.09      5673
           3       0.13      0.09      0.11      6042
           4       0.32      0.11      0.17     20409
           5       0.16      0.65      0.26     10735

    accuracy                           0.22     70529
   macro avg       0.23      0.23      0.18     70529
weighted avg       0.32      0.22      0.21     70529

