# CNNs

This notebook contains the code needed to train and test the CNNs described in "Comparing deep learning and concept extraction based methods for patient phenotyping from clinical narratives".

## Preparation

First, the necessary libraries are imported. A connection to Google Drive was set up and the working directory was moved to "DLH project" (if this folder was shared to you, you will need to make a shortcut to "DLH project" to use this code). It was specified that the GPU should be used for training, if one is available, a hyperparameter specifying the number of kernels per n-gram size was set (to 100, as that was the value used in the original paper), and torch was told to not use scientific notation for decimals. The specifications of the GPU were investigated as well. 

In [None]:
import matplotlib.pyplot as plt 
import numpy as np
import os
import pandas as pd
# import pycuda.driver as cuda
import time
import sys
import torch
import torch.nn as nn
import tqdm

from prettytable import PrettyTable
from sklearn.metrics import *
from torch.utils.data import Dataset

In [None]:
# !pip install pycuda

In [None]:
# # https://medium.com/ai%C2%B3-theory-practice-business/use-gpu-in-your-pytorch-code-676a67faed09
# print('__Python VERSION:', sys.version)
# print('__pyTorch VERSION:', torch.__version__)
# print('__CUDA VERSION', )
# from subprocess import call
# # call(["nvcc", "--version"]) does not work
# ! nvcc --version
# print('__CUDNN VERSION:', torch.backends.cudnn.version())
# print('__Number CUDA Devices:', torch.cuda.device_count())
# print('__Devices')
# # call(["nvidia-smi", "--format=csv", "--query-gpu=index,name,driver_version,memory.total,memory.used,memory.free"])
# print('Active CUDA Device: GPU', torch.cuda.current_device())
# print ('Available devices ', torch.cuda.device_count())
# print ('Current cuda device ', torch.cuda.current_device())

In [None]:
# cuda.init()
# ## Get Id of default device
# torch.cuda.current_device() # Tesla T4 with standard GPU on Google drive
# # 0
# cuda.Device(0).name() # '0' is the id of your GPU

In [None]:
kernels_per_n_gram_size = 100

In [None]:
# For better printing
torch.set_printoptions(sci_mode=False)

In [None]:
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

Mounted at /content/drive


In [None]:
os.chdir("/content/drive/MyDrive/DLH project")

In [None]:
device = (
    "cuda"
    if torch.cuda.is_available()
    else "cpu"
)
print(f"Using {device} device")

Using cuda device


## Loading the data
Here, we load the data to be used. labelled_corpus_df contains the labels and the cleaned text for each of the discharge summaries we will use, and study_corpus_tensor contains the representationf of those documents, with study_corpus_tensor[i, :, :] representing the i-th document and study_corpus_tensor[i, j, :] containing the word embedding representing the j-th word in the i-th document.

The data is put into a custom dataset object for use in PyTorch.

In [None]:
labelled_corpus_df = pd.read_csv("labelled_corpus_df.csv", index_col = 0)

In [None]:
labelled_corpus_df.head(1)

Unnamed: 0,HADM_ID,SUBJECT_ID,Advanced.Cancer,Advanced.Heart.Disease,Advanced.Lung.Disease,Chronic.Neurological.Dystrophies,Chronic.Pain.Fibromyalgia,Alcohol.Abuse,Other.Substance.Abuse,Obesity,Schizophrenia.and.other.Psychiatric.Disorders,Depression,Cleaned Text
0,118003.0,3644,0,0,0,0,1,0,0,0,0,1,admission date 2200 4 7 discharge date 2200 4 ...


In [None]:
# Takes some time (< 1 minute)
study_corpus_tensor = torch.load("embedded_docs.pt")

In [None]:
study_corpus_tensor.shape

torch.Size([1341, 5434, 100])

In [None]:
study_corpus_tensor.device

device(type='cpu')

In [None]:
study_corpus_tensor = study_corpus_tensor.to(device)
print(study_corpus_tensor.device)

cuda:0


In [None]:
embedding_vector_size = study_corpus_tensor.shape[2]
print(embedding_vector_size)

100


In [None]:
class CustomDatasetEmbedded(Dataset):
    def __init__(self, corpus_tensor, labels):
        """
        Store the corpus (of shape num_docs by max_num_words_per_doc by size_of_word embedding) 
        labels (for a single target variable)
        """

        self.x = corpus_tensor
        self.y = labels

    def __len__(self):

        """
        Return the number of documents
        """
        return len(self.y)

    def __getitem__(self, index):
        """
        Return one document (represented as a tensor, with each row being the embedding for one word in that document),
        and its label (whether the patient described has or does not have some phenotype)
        """
        return (self.x[index, :, :], self.y[index])

In [None]:
depression_y = torch.tensor(labelled_corpus_df["Depression"]).to(device)
print(depression_y.device)

cuda:0


In [None]:
depression_dataset = CustomDatasetEmbedded(study_corpus_tensor, depression_y)

In [None]:
train_dataset, test_dataset = torch.utils.data.random_split(depression_dataset, [0.8, 0.2])

In [None]:
# len(train_dataset)

In [None]:
# len(test_dataset)

In [None]:
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size = 32, shuffle = True) # Batch size?
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size = 32)

# CNN
This class allows you to create CNNs for predicting patient characterstics based on discharge summaries as described in the paper being replicated. n_gram_sizes must be a list which can contain the integers 1, 2, 3, 4, and/or 5. The CNN created will consider n-grams of the sizes specified in n_gram_sizes when looking at discharge summaries. Each CNN is a binary classifier which predicts if a patient has or lacks a single condition.

In [None]:
class CNN_n_gram(nn.Module):
    def __init__(self, n_gram_sizes):
        super(CNN_n_gram, self).__init__()

        self.n_gram_sizes = n_gram_sizes
        num_sizes = 0

        if 1 in self.n_gram_sizes:
            self.conv1 = nn.Conv2d(in_channels = 1,
                                  out_channels = kernels_per_n_gram_size,
                                  kernel_size = (1, embedding_vector_size),
                                  stride = 1,
                                  padding = 0)
            torch.nn.init.uniform_(self.conv1.weight, -0.01, 0.01)
            torch.nn.init.zeros_(self.conv1.bias)

            # each kernel's feature map is condensed to a single value
            conv1_output_height = study_corpus_tensor.shape[1] + 1 - 1
            self.pool1 = nn.MaxPool2d(kernel_size = (conv1_output_height, 1))
            num_sizes += 1

        if 2 in self.n_gram_sizes:
            self.conv2 = nn.Conv2d(in_channels = 1,
                                  out_channels = kernels_per_n_gram_size,
                                  kernel_size = (2, embedding_vector_size),
                                  stride = 1,
                                  padding = 0)
            torch.nn.init.uniform_(self.conv2.weight, -0.01, 0.01)
            torch.nn.init.zeros_(self.conv2.bias)

            # each kernel's feature map is condensed to a single value
            conv2_output_height = study_corpus_tensor.shape[1] + 1 - 2
            self.pool2 = nn.MaxPool2d(kernel_size = (conv2_output_height, 1))
            num_sizes += 1

        if 3 in self.n_gram_sizes:
            self.conv3 = nn.Conv2d(in_channels = 1,
                                  out_channels = kernels_per_n_gram_size,
                                  kernel_size = (3, embedding_vector_size),
                                  stride = 1,
                                  padding = 0)
            torch.nn.init.uniform_(self.conv3.weight, -0.01, 0.01)
            torch.nn.init.zeros_(self.conv3.bias)

            # each kernel's feature map is condensed to a single value
            conv3_output_height = study_corpus_tensor.shape[1] + 1 - 3
            self.pool3 = nn.MaxPool2d(kernel_size = (conv3_output_height, 1))
            num_sizes += 1

        if 4 in self.n_gram_sizes:
            self.conv4 = nn.Conv2d(in_channels = 1,
                                  out_channels = kernels_per_n_gram_size,
                                  kernel_size = (4, embedding_vector_size),
                                  stride = 1,
                                  padding = 0)
            torch.nn.init.uniform_(self.conv4.weight, -0.01, 0.01)
            torch.nn.init.zeros_(self.conv4.bias)

            # each kernel's feature map is condensed to a single value
            conv4_output_height = study_corpus_tensor.shape[1] + 1 - 4
            self.pool4 = nn.MaxPool2d(kernel_size = (conv4_output_height, 1))
            num_sizes += 1

        if 5 in self.n_gram_sizes:
            self.conv5 = nn.Conv2d(in_channels = 1,
                                  out_channels = kernels_per_n_gram_size,
                                  kernel_size = (5, embedding_vector_size),
                                  stride = 1,
                                  padding = 0)
            torch.nn.init.uniform_(self.conv5.weight, -0.01, 0.01)
            torch.nn.init.zeros_(self.conv5.bias)

            # each kernel's feature map is condensed to a single value
            conv5_output_height = study_corpus_tensor.shape[1] + 1 - 5
            self.pool5 = nn.MaxPool2d(kernel_size = (conv5_output_height, 1))
            num_sizes += 1

        assert num_sizes > 0

        self.do = nn.Dropout(p = 0.5) 

        self.fc = nn.Linear(kernels_per_n_gram_size * num_sizes, 2) # Input size. 300, for 300 filters here.
        torch.nn.init.normal_(self.fc.weight, 0.01)
        torch.nn.init.zeros_(self.fc.bias)

        self.activation = nn.LogSoftmax(dim = 1) # Called on tensor of shape [batch_size, 2]
        
    def forward(self, x):
        # Add an extra dimension, because nn.Conv2d takes an input of (batch_size, number of channels, height of input, width of input)
        # and not (batch_size, height of input, width of input)
        x = torch.unsqueeze(x, dim = 1)

        list_of_convolutional_outputs = []

        if 1 in self.n_gram_sizes:
            x1 = self.conv1(x)
            x1 = torch.relu(x1) # Point of this, given the max pooling?
            x1 = self.pool1(x1)
            list_of_convolutional_outputs.append(x1)
        
        if 2 in self.n_gram_sizes:
            x2 = self.conv2(x)
            x2 = torch.relu(x2)
            x2 = self.pool2(x2)
            list_of_convolutional_outputs.append(x2)
        
        if 3 in self.n_gram_sizes:
            x3 = self.conv3(x)
            x3 = torch.relu(x3)
            x3 = self.pool3(x3)
            list_of_convolutional_outputs.append(x3)

        if 4 in self.n_gram_sizes:
            x4 = self.conv4(x)
            x4 = torch.relu(x4)
            x4 = self.pool4(x4)
            list_of_convolutional_outputs.append(x4)
        
        if 5 in self.n_gram_sizes:
            x5 = self.conv5(x)
            x5 = torch.relu(x5)
            x5 = self.pool5(x5)
            list_of_convolutional_outputs.append(x5)

        # Combine the results of applying differently sized kernels in parallel
        x = torch.cat(list_of_convolutional_outputs, dim = 1)

        x = self.do(x)

        x = torch.flatten(x, start_dim = 1)
        
        x = self.fc(x)

        x = self.activation(x)

        return x

In [None]:
# cnn_1_gram_model = CNN_n_gram([1]).to(device)
# print(cnn_1_gram_model)
# cnn_1_2_gram_model = CNN_n_gram([1, 2]).to(device)
# print(cnn_1_2_gram_model)
# cnn_1_2_3_gram_model = CNN_n_gram([1, 2, 3]).to(device)
# print(cnn_1_2_3_gram_model)

In [None]:
# for name, param in cnn_1_2_3_gram_model.named_parameters():
#     print(name)
#     print(param.shape)

In [None]:
# test_conv = nn.Conv2d(in_channels = 1,
#                       out_channels = 100,
#                       kernel_size = (2, 100),
#                       stride = 1,
#                       padding = 0)
# for name, param in test_conv.named_parameters():
#     if name.endswith("weight"):
#         print(torch.linalg.vector_norm(param, dim = (1, 2, 3)))
#         print(torch.linalg.vector_norm(param, dim = (1, 2, 3)).shape)

In [None]:
# # This loss function was specified in main.lua within the GitHub repository provided by the authors of the original study
# # local criterion = nn.ClassNLLCriterion()
# criterion = nn.modules.loss.NLLLoss() # CrossEntropyLoss() doesn't help

# # Adadelta was specified in a PDF file attached to the original paper
# # The hyperparameters for the optimizer are specified in trainer.lua from the GitHub repository provided by the authors of the original study
# optimizer = torch.optim.Adadelta(cnn_1_2_3_gram_model.parameters(), rho = 0.95, eps = 1e-6)

In [None]:
# # Normalize the weights going into each output in the FC layer to 3?
# test_tensor = torch.tensor([[1., 2., 3.],
#                             [4., 5., 6.]])
# # norms = torch.linalg.vector_norm(test_tensor, dim = 1, keepdim = True)
# norms = torch.linalg.vector_norm(test_tensor, keepdim = True)
# print(norms)
# test_tensor = 3 * test_tensor / norms
# print(test_tensor)
# # print(torch.linalg.vector_norm(test_tensor, dim = 1, keepdim = True))
# print(torch.linalg.vector_norm(test_tensor, keepdim = True))

This function is responsible for training CNNs. It takes in a CNN, a training dataloader, the number of epochs of training to complete, an optmizer, and a loss function, and returns the trained model.

You can uncomment the line with tqdm in it to see the progress within each loop (it is currently replaced with a line without tqdm, as printing that when training 60 different CNNs would be excessively verbose).

In [None]:
# From main.lua in the provided code:
# cmd:option('-epochs', 20, 'Number of training epochs')
# 20 is the default
n_epochs = 20

# From HW3 CNN
def train_model(model, train_dataloader, n_epoch, optimizer, criterion):
    """
    :param model: A CNN model
    :param train_dataloader: the DataLoader of the training data
    :param n_epoch: number of epochs to train
    :return:
        model: trained model
    """
    model.train() # prep model for training
    
    
    for epoch in range(n_epoch):
        curr_epoch_loss = []
        # For testing
        loader_index = 0
        # for x, y in tqdm.tqdm(train_dataloader):
        for x, y in train_dataloader:
            """
            TODO: Within the loop, do the normal training procedures:
                   pass the input through the model
                   pass the output through loss_func to compute the loss (name the variable as *loss*)
                   zero out currently accumulated gradient, use loss.basckward to backprop the gradients, then call optimizer.step
            """

            # print(x.shape) # torch.Size([32, 5434, 100])
            # print(y.shape) # torch.Size([32])
            # raise NotImplementedError

            # your code here
            # FROM HOMEWORK 2
            """ Step 1. clear gradients """
            optimizer.zero_grad()
            """ 
            TODO: Step 2. perform forward pass using `model`, save the output to y_hat;
                  Step 3. calculate the loss using `criterion`, save the output to loss.
            """

            y_hat = model(x)
            loss = criterion(y_hat, y)

            # print(y_hat)
            # print(y)
            # raise NotImplementedError

            """ Step 4. backward pass """
            loss.backward()
            """ Step 5. optimization """
            optimizer.step()

            # NORMALIZE HERE???
            # Linear layer's weights (for unigram CNN) are of size [2, 100]
            # Normalize each of the two rows to a L2 norm of 3?
            norms = torch.linalg.vector_norm(model.fc.weight, dim = 1, keepdim = True) #keepdim = True)
            # norms.to(device)
            model.fc.weight = torch.nn.Parameter((3 * model.fc.weight) / (norms + 1e-7)) # LIKE SO????

            """ Step 6. record loss """
            curr_epoch_loss.append(loss.cpu().data.numpy())

            # print(torch.sum(torch.max(y_hat, 1).indices) / len(y))

            # if epoch == n_epochs - 1: # and loader_index == 0: #and loader_index == 33:
                # print(y_hat)
                # print(y)
                # print(torch.max(y_hat, 1).indices) # NOT RETURNING ALL 0s during training??????
                # print(torch.sum(torch.max(y_hat, 1).indices) / len(y))
            loader_index += 1

        # print(f"Epoch {epoch}: curr_epoch_loss={np.mean(curr_epoch_loss)}")
    return model

In [None]:
# start_time = time.time()
# cnn_1_2_3_gram_model = train_model(model = cnn_1_2_3_gram_model,
#                                   train_dataloader = train_loader,
#                                   n_epoch = n_epochs,
#                                   optimizer = optimizer,
#                                   criterion = criterion)
# end_time = time.time()
# print()
# print(end_time - start_time)

This function evaluates a trained model on the data present within a test dataloader. It returns the true labels, predicted labels, and output of the CNN corresponding to the 1 class (corresponding to the patient having the condition).

In [None]:
# From HW3 CNN
def eval_model(model, dataloader):
    model.eval()
    Y_pred  = []
    Y_true  = []
    Y_score = []
    with torch.no_grad():
        for x, y in dataloader:
            # your code here
            Y_true.append(y)
            
            y_hat = model(x)
            # print(torch.max(y_hat, 1).indices)
            
            # Y_score.append(torch.exp(y_hat[:, 1]))
            Y_score.append(y_hat[:, 1])

            # print(y_hat[:, 1])
            # print(y)
            # raise NotImplementedError

            # https://campuswire.com/c/G902DEAF1/feed/823
            # Return class with higher probability
            Y_pred.append(torch.max(y_hat, 1).indices)
            
    #         raise NotImplementedError

    Y_score = [y_score.to("cpu") for y_score in Y_score]
    Y_pred  = [y_pred.to("cpu")  for y_pred  in Y_pred]
    Y_true  = [y_true.to("cpu")  for y_true  in Y_true]

    Y_score = np.concatenate(Y_score, axis = 0)    
    Y_pred  = np.concatenate(Y_pred,  axis=0)
    Y_true  = np.concatenate(Y_true,  axis=0)


    return Y_score, Y_pred, Y_true

In [None]:
# y_score, y_pred, y_true = eval_model(cnn_1_2_3_gram_model, test_loader)

In [None]:
# print("Predicted percent of patients that have the condition:", np.sum(y_pred) / len(y_pred)) # Seed = 4 -> all positive????
# print("Actual percent of patients that have the condition:", np.sum(y_true) / len(y_true))
# print("Accuracy:", accuracy_score(y_true, y_pred))
# print("Precision:", precision_score(y_true, y_pred))
# print("Recall:", recall_score(y_true, y_pred))
# print("F1 Score:", f1_score(y_true, y_pred))
# print("AUC:", roc_auc_score(y_true, y_score))

In [None]:
# plt.boxplot(y_score)
# plt.title("Seed = " + str(seed))
# plt.show()

In [None]:
# for x, y in train_loader:
#     print(torch.sum(y))

In [None]:
# for x, y in test_loader:
#     print(torch.sum(y))

In [None]:
# # From https://discuss.pytorch.org/t/how-do-i-check-the-number-of-parameters-of-a-model/4325/23?page=2
# def count_parameters(model):
#     table = PrettyTable(["Modules", "Parameters"])
#     total_params = 0
#     for name, parameter in model.named_parameters():
#         if not parameter.requires_grad: 
#             continue
#         param = parameter.numel()
#         table.add_row([name, param])
#         total_params+=param
#     print(table)
#     print(f"Total Trainable Params: {total_params}")
#     return total_params

# count_parameters(cnn_1_gram_model)

In [None]:
Here, we figure out the name of the file to store results to.

In [None]:
max_index_results_file = 0
for filename in os.listdir():
    filename_without_type = filename.split(".")[0]
    if filename_without_type.startswith("results_"):
        index = int(filename_without_type[8:])
        if index > max_index_results_file:
            max_index_results_file = index

new_file_name = "results_" + str(max_index_results_file + 1) + ".txt"
print(new_file_name)

results_3.txt


This function trains a CNN which considers the n-gram sizes described in the list n_gram_sizes for detecting the provided condition. You must provide a training dataloader, test dataloader, the name of the file to store results to, the device to use for training, and the number of epochs of training to perform.

It will both print and save to the indicated file information about training time and performance.

In [None]:
def train_test_model(n_gram_sizes, condition, train_loader, test_loader, output_file, device, n_epochs):
    which_model = str(n_gram_sizes) + "-gram CNN for " + condition
    print(which_model)
    output_file.write(which_model + "\n")

    model = CNN_n_gram(n_gram_sizes).to(device)

    criterion = nn.modules.loss.NLLLoss()

    optimizer = torch.optim.Adadelta(model.parameters(), rho = 0.95, eps = 1e-6)

    start_time = time.time()
    model = train_model(model = model,
                        train_dataloader = train_loader,
                        n_epoch = n_epochs,
                        optimizer = optimizer,
                        criterion = criterion)
    end_time = time.time()
    training_time_statement = "Training time: " + str(end_time - start_time) + " seconds"
    print(training_time_statement)
    output_file.write(training_time_statement + "\n")

    y_score, y_pred, y_true = eval_model(model, test_loader)

    predicted_positive_statement = "Predicted percent of patients that have the condition: " + str(np.sum(y_pred) / len(y_pred))
    print(predicted_positive_statement)
    output_file.write(predicted_positive_statement + "\n")

    actual_positive_statement = "Actual percent of patients that have the condition: " + str(np.sum(y_true) / len(y_true))
    print(actual_positive_statement)
    output_file.write(actual_positive_statement + "\n")

    accuracy_statement = "Accuracy: " + str(accuracy_score(y_true, y_pred))
    print(accuracy_statement)
    output_file.write(accuracy_statement + "\n")

    precision_statement = "Precision: " + str(precision_score(y_true, y_pred))
    print(precision_statement)
    output_file.write(precision_statement + "\n")

    recall_statement = "Recall: " + str(recall_score(y_true, y_pred))
    print(recall_statement)
    output_file.write(recall_statement + "\n")

    f1_statement = "F1 Score: " + str(f1_score(y_true, y_pred))
    print(f1_statement)
    output_file.write(f1_statement + "\n")

    auc_statement = "AUC: " + str(roc_auc_score(y_true, y_score))
    print(auc_statement)
    output_file.write(auc_statement + "\n")

    print()
    output_file.write("\n")

Here, we train and test 6 models considering different sets of n-grams for each of the 10 conditions.

In [None]:
file = open(new_file_name, 'a')

seed =  3
torch.manual_seed(seed)

# advanced or metastatic cancer, advanced heart disease, advanced lung disease, chronic neurologic dystrophies, chronic pain, alcohol abuse, substance abuse, obesity, psychiatric disorders, or depression.
all_conditions = ["Advanced.Cancer",
                  "Advanced.Heart.Disease",
                  "Advanced.Lung.Disease",
                  "Chronic.Neurological.Dystrophies",
                  "Chronic.Pain.Fibromyalgia",
                  "Alcohol.Abuse",
                  "Other.Substance.Abuse",
                  "Obesity",
                  "Schizophrenia.and.other.Psychiatric.Disorders",
                  "Depression"]

for condition in all_conditions:
    labels = torch.tensor(labelled_corpus_df[condition]).to(device)

    dataset = CustomDatasetEmbedded(study_corpus_tensor, labels)

    train_dataset, test_dataset = torch.utils.data.random_split(dataset, [0.8, 0.2])

    train_loader = torch.utils.data.DataLoader(train_dataset, batch_size = 32, shuffle = True) # Batch size?
    test_loader = torch.utils.data.DataLoader(test_dataset, batch_size = 32)

    train_test_model(n_gram_sizes = [1],
                     condition = condition,
                     train_loader = train_loader,
                     test_loader = test_loader,
                     output_file = file,
                     device = device,
                     n_epochs = n_epochs)
    train_test_model(n_gram_sizes = [1, 2],
                     condition = condition,
                     train_loader = train_loader,
                     test_loader = test_loader,
                     output_file = file,
                     device = device,
                     n_epochs = n_epochs)
    train_test_model(n_gram_sizes = [1, 2, 3],
                     condition = condition,
                     train_loader = train_loader,
                     test_loader = test_loader,
                     output_file = file,
                     device = device,
                     n_epochs = n_epochs)
    train_test_model(n_gram_sizes = [1, 2, 3, 4],
                     condition = condition,
                     train_loader = train_loader,
                     test_loader = test_loader,
                     output_file = file,
                     device = device,
                     n_epochs = n_epochs)
    train_test_model(n_gram_sizes = [1, 2, 3, 4, 5],
                     condition = condition,
                     train_loader = train_loader,
                     test_loader = test_loader,
                     output_file = file,
                     device = device,
                     n_epochs = n_epochs)
    train_test_model(n_gram_sizes = [2, 3, 4, 5],
                     condition = condition,
                     train_loader = train_loader,
                     test_loader = test_loader,
                     output_file = file,
                     device = device,
                     n_epochs = n_epochs)

file.close()

[1]-gram CNN for Advanced.Cancer
Training time: 16.104912757873535 seconds
Predicted percent of patients that have the condition: 0.09328358208955224
Actual percent of patients that have the condition: 0.10074626865671642
Accuracy: 0.9477611940298507
Precision: 0.76
Recall: 0.7037037037037037
F1 Score: 0.7307692307692308
AUC: 0.9329952358998003

[1, 2]-gram CNN for Advanced.Cancer
Training time: 22.6281635761261 seconds
Predicted percent of patients that have the condition: 0.007462686567164179
Actual percent of patients that have the condition: 0.10074626865671642
Accuracy: 0.9067164179104478
Precision: 1.0
Recall: 0.07407407407407407
F1 Score: 0.13793103448275862
AUC: 0.9661902566466882

[1, 2, 3]-gram CNN for Advanced.Cancer
Training time: 59.80095553398132 seconds
Predicted percent of patients that have the condition: 0.05970149253731343
Actual percent of patients that have the condition: 0.10074626865671642
Accuracy: 0.9514925373134329
Precision: 0.9375
Recall: 0.5555555555555556


  _warn_prf(average, modifier, msg_start, len(result))


Training time: 23.60663151741028 seconds
Predicted percent of patients that have the condition: 0.014925373134328358
Actual percent of patients that have the condition: 0.10820895522388059
Accuracy: 0.8992537313432836
Precision: 0.75
Recall: 0.10344827586206896
F1 Score: 0.18181818181818182
AUC: 0.8865243110662242

[1, 2, 3]-gram CNN for Advanced.Lung.Disease
Training time: 60.171048402786255 seconds
Predicted percent of patients that have the condition: 0.041044776119402986
Actual percent of patients that have the condition: 0.10820895522388059
Accuracy: 0.917910447761194
Precision: 0.8181818181818182
Recall: 0.3103448275862069
F1 Score: 0.45000000000000007
AUC: 0.9357957004761217

[1, 2, 3, 4]-gram CNN for Advanced.Lung.Disease
Training time: 101.2346203327179 seconds
Predicted percent of patients that have the condition: 0.055970149253731345
Actual percent of patients that have the condition: 0.10820895522388059
Accuracy: 0.9253731343283582
Precision: 0.8
Recall: 0.41379310344827586

  _warn_prf(average, modifier, msg_start, len(result))


Training time: 23.615981340408325 seconds
Predicted percent of patients that have the condition: 0.011194029850746268
Actual percent of patients that have the condition: 0.1044776119402985
Accuracy: 0.9067164179104478
Precision: 1.0
Recall: 0.10714285714285714
F1 Score: 0.19354838709677416
AUC: 0.9105654761904762

[1, 2, 3]-gram CNN for Other.Substance.Abuse
Training time: 60.13122582435608 seconds
Predicted percent of patients that have the condition: 0.029850746268656716
Actual percent of patients that have the condition: 0.1044776119402985
Accuracy: 0.9104477611940298
Precision: 0.75
Recall: 0.21428571428571427
F1 Score: 0.3333333333333333
AUC: 0.9208333333333333

[1, 2, 3, 4]-gram CNN for Other.Substance.Abuse
Training time: 101.16284942626953 seconds
Predicted percent of patients that have the condition: 0.033582089552238806
Actual percent of patients that have the condition: 0.1044776119402985
Accuracy: 0.9216417910447762
Precision: 0.8888888888888888
Recall: 0.2857142857142857
F

  _warn_prf(average, modifier, msg_start, len(result))


Training time: 23.621701955795288 seconds
Predicted percent of patients that have the condition: 0.0
Actual percent of patients that have the condition: 0.04477611940298507
Accuracy: 0.9552238805970149
Precision: 0.0
Recall: 0.0
F1 Score: 0.0
AUC: 0.6982421875

[1, 2, 3]-gram CNN for Obesity


  _warn_prf(average, modifier, msg_start, len(result))


Training time: 60.182400941848755 seconds
Predicted percent of patients that have the condition: 0.007462686567164179
Actual percent of patients that have the condition: 0.04477611940298507
Accuracy: 0.9552238805970149
Precision: 0.5
Recall: 0.08333333333333333
F1 Score: 0.14285714285714285
AUC: 0.6500651041666667

[1, 2, 3, 4]-gram CNN for Obesity
Training time: 101.12406587600708 seconds
Predicted percent of patients that have the condition: 0.0
Actual percent of patients that have the condition: 0.04477611940298507
Accuracy: 0.9552238805970149
Precision: 0.0
Recall: 0.0
F1 Score: 0.0
AUC: 0.7464192708333333

[1, 2, 3, 4, 5]-gram CNN for Obesity


  _warn_prf(average, modifier, msg_start, len(result))


Training time: 132.45684266090393 seconds
Predicted percent of patients that have the condition: 0.0
Actual percent of patients that have the condition: 0.04477611940298507
Accuracy: 0.9552238805970149
Precision: 0.0
Recall: 0.0
F1 Score: 0.0
AUC: 0.814453125

[2, 3, 4, 5]-gram CNN for Obesity


  _warn_prf(average, modifier, msg_start, len(result))


Training time: 123.31896090507507 seconds
Predicted percent of patients that have the condition: 0.0
Actual percent of patients that have the condition: 0.04477611940298507
Accuracy: 0.9552238805970149
Precision: 0.0
Recall: 0.0
F1 Score: 0.0
AUC: 0.8082682291666667

[1]-gram CNN for Schizophrenia.and.other.Psychiatric.Disorders


  _warn_prf(average, modifier, msg_start, len(result))


Training time: 9.585261821746826 seconds
Predicted percent of patients that have the condition: 0.0
Actual percent of patients that have the condition: 0.17164179104477612
Accuracy: 0.8283582089552238
Precision: 0.0
Recall: 0.0
F1 Score: 0.0
AUC: 0.8532119075597336

[1, 2]-gram CNN for Schizophrenia.and.other.Psychiatric.Disorders


  _warn_prf(average, modifier, msg_start, len(result))


Training time: 23.595029592514038 seconds
Predicted percent of patients that have the condition: 0.0708955223880597
Actual percent of patients that have the condition: 0.17164179104477612
Accuracy: 0.8768656716417911
Precision: 0.8421052631578947
Recall: 0.34782608695652173
F1 Score: 0.49230769230769234
AUC: 0.8668233450842147

[1, 2, 3]-gram CNN for Schizophrenia.and.other.Psychiatric.Disorders
Training time: 60.09175252914429 seconds
Predicted percent of patients that have the condition: 0.10820895522388059
Actual percent of patients that have the condition: 0.17164179104477612
Accuracy: 0.8917910447761194
Precision: 0.7931034482758621
Recall: 0.5
F1 Score: 0.6133333333333334
AUC: 0.8530650215432825

[1, 2, 3, 4]-gram CNN for Schizophrenia.and.other.Psychiatric.Disorders
Training time: 101.11079239845276 seconds
Predicted percent of patients that have the condition: 0.029850746268656716
Actual percent of patients that have the condition: 0.17164179104477612
Accuracy: 0.85820895522388