<p id="eda" style="font-size:30px; text-align:center; font-weight:bold">Pre-trained GPT-2 (Generative Pre-trained Transformer) Medium</p>

<div style="width:100%;height:1px; background-color:black"></div>

<p style="font-size: 20px">The transformer architecture serves as the foundation for the OpenAI model GPT-2. It is intended to provide text over long periods that is coherent and contextually relevant. GPT-2 can produce writing in a range of styles and on a variety of topics due to pre-training on a sizable corpus of internet-sourced content. The term "pre-trained" refers to the initial phase of training on a large amount of data before any specific fine-tuning </p>

<p style="font-size: 20px; font-weight:bold"><u>Justification (Why used for medical chatbot (doctor-patient dialogues dataset))</u></p>

<p style="font-size: 20px">The transformer architecture serves as the foundation for the OpenAI model GPT-2. It is intended to provide text over long periods that is coherent and contextually relevant. GPT-2 can produce writing in a range of styles and on a variety of topics due to pre-training on a sizable corpus of internet-sourced content. The term "pre-trained" refers to the initial phase of training on a large amount of data before any specific fine-tuning </p>

<p style="font-size: 20px; font-weight:bold"><u>Version</u></p>

<p style="font-size: 20px"> GPT-2 medium, version has been used in this project for the training on patient-doctor dialogues. GPT-2 medium has approximately 355 million parameters. The training time is shorter as compared to the models (like BART) having the same number of trained parameters. This version of gpt-2 primarily also designed for generative tasks.</p>

<div style="width:100%;height:3px; background-color:black"></div>

<p id="lib" style="font-size:30px; text-align:center; font-weight:bold">Required libraries or packages</p> <a href="#top">Back To Top</a>

<div style="width:100%;height:3px; background-color:black"></div>

In [1]:
import torch # PyTorch open-source deep learning framework
import json # json library to load JSON data
from sklearn.model_selection import train_test_split # train_test_split method from scikit learn for datset splitting 
import pandas as pd # for dataframe

# Step 1: Load the CSV into a DataFrame
df = pd.read_csv('eval_metrics_results_dataframe.csv')

# Step 2: Create a new record and append it to the DataFrame
new_record = {
    'Column1': 'Value1',
    'Column2': 'Value2',
    # ... Add all required columns and their respective values
}
df = df.append(new_record, ignore_index=True)

# Step 3: Save the modified DataFrame back to the CSV
df.to_csv('eval_metrics_results_dataframe.csv', index=False)

from transformers import (
    GPT2LMHeadModel, # GPT-2 model language modeling
    GPT2Config,  # configuration class for GPT-2
    GPT2Tokenizer, # tokenizer class for GPT-2
    TextDataset,  # dataset class for reading text files and tokenizing them
    DataCollatorForLanguageModeling, # data collator to form batches of input for language modeling
    TrainingArguments, # class to store arguments for training transformers models
    Trainer,  # trainer class for training and evaluating transformers models
    TrainerCallback, # case class for callbacks in the Trainer class
    IntervalStrategy # strategy to define when to run callbacks
)

from datasets import load_metric # metric from the datasets library for evaluation metrics
from nltk.translate.bleu_score import sentence_bleu # to calculate BLEU score
from rouge import Rouge # to calculate ROUGE score
from language_tool_python import LanguageTool# used to check grammar
import spacy # library for natural language processing
from transformers import EvalPrediction # used to store predictions and labels ids together
import torch.nn as nn # to calculate loss 
from sklearn.metrics import accuracy_score # to calculate accuracy


2023-09-06 02:17:30.085868: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-09-06 02:17:30.933684: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-11.2/lib64
2023-09-06 02:17:30.933737: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-11.2/lib64
2023-09-06 02:17:32.435730: I tensorflow/compiler/xla/stream_executor/cuda/c

<div style="width:100%;height:1px; background-color:black"></div>

In [2]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # check if GPU support available than use GPU otherwise use CPU


<p style="font-size: 23px; margin-top: 10px; font-weight: bold">Load Dataset</p>

In [3]:
with open("cleaned_medical_dialogues_dataset.json", "r") as f: # load the data from the JSON file
    data = json.load(f)

data = data[::4] # sampling 1/4th of the dataset to manage time and computational resources

print(len(data)) # print length number of samples in the sampled data

61776


<div style="width:100%;height:1px; background-color:black"></div>

<p style="font-size: 23px; margin-top: 10px; font-weight: bold">Data Formatting</p>

<p style="font-size: 20px">Data formatting, GPT-2 is a generative model, when training this model on dialogue dataset. The following format will help the model to differentiate between the two roles during training and perform better.</p>

In [4]:
formatted_data = []

for item in data:
   
    formatted_data.append("Patient: " + item["Patient"])
    formatted_data.append("Doctor: " + item["Doctor"])



In [5]:
# Save to a .txt file for training
with open('gpt_2_formatted_dataset.txt', 'w', encoding='utf-8') as f: # save the formatted in the text file to give the input to the model
    for line in formatted_data:
        f.write(line + '\n')

<div style="width:100%;height:1px; background-color:black"></div>

<p style="font-size: 23px; margin-top: 10px; font-weight: bold">Load Pre-trained Model</p>

In [6]:
MODEL_NAME = 'gpt2-medium'  # model name to be loaded
config = GPT2Config.from_pretrained(MODEL_NAME, resid_pdrop=0.2, embd_pdrop=0.2) # configuration for the model
tokenizer = GPT2Tokenizer.from_pretrained(MODEL_NAME) # load tokenizer for the model
model = GPT2LMHeadModel.from_pretrained(MODEL_NAME, config=config).to(device) # load he gpt2-medium model


<div style="width:100%;height:1px; background-color:black"></div>

<p style="font-size: 23px; margin-top: 10px; font-weight: bold">Prepare the Dataset </p>

In [7]:
train_dataset = TextDataset( # TextDataset method from the transformers library
    tokenizer=tokenizer, # convert text into token ids
    file_path="gpt_2_formatted_dataset.txt", # path to the pre-processed file
    block_size=128 # maximum number of tokens for each training example, if in the dataset number of tokens in each sample are greater, it will be truncated 
)




In [8]:
# data collator for language modeling
data_collator = DataCollatorForLanguageModeling( 
    tokenizer=tokenizer, # assigning the tokenizer we have intialized earlier
    mlm=False # passing false to mention it is not masked langauge modelling (like BERT) task 
)

<p style="font-size: 20px">The data collator is responsible for batching and preprocessing the data into a format suitable for training.</p>

<div style="width:100%;height:1px; background-color:black"></div>

<p style="font-size: 23px; margin-top: 10px; font-weight: bold">Split Dataset</p>

<p style="font-size: 20px">
  The training set and validation/test set are two sets that are typically separated from the available dataset in machine learning and deep learning.
This separation helps in evaluating the model's performance on unseen data. The main goal is to prevent the model from overfitting the training set of data. If a model performs exceptionally well on training data but poorly on validation data, it is obviously overfitted.</p>

In [9]:
train_dataset, val_dataset = train_test_split(train_dataset, test_size=0.2)  # 20% of data as validation


<p style="font-size: 20px">The parameter <b>test_size=0.2</b> indicates that 20% of the dataset will be used as the validation set, while the remaining 80% will be used for training.</p>

<div style="width:100%;height:1px; background-color:black"></div>

<p id="lib" style="font-size:30px; text-align:center; font-weight:bold">Fine tuning GPT-2 medium</p> <a href="#top">Back To Top</a>

<div style="width:100%;height:1px; background-color:black"></div>

<p style="font-size: 23px; margin-top: 10px; font-weight: bold">Define Early Stopping</p>

<p style="font-size: 20px">Callback function to halt the training process when there is no improvement in the model
</p>

In [10]:
class EarlyStoppingCallback(TrainerCallback):
    def __init__(self, early_stopping_patience): # intialize the callback with number of epochs to wait
        self.early_stopping_patience = early_stopping_patience
        self.patience_counter = 0 # counter to keep track of epochs
        self.best_score = None # store the best evaluation score

    def on_evaluate(self, args, state, control, metrics, **kwargs):
        current_score = metrics.get("eval_loss") # monitoring 'eval_loss'

        if self.best_score is None or current_score < self.best_score: #compare the evaluation score and reset the patience counter
            self.best_score = current_score
            self.patience_counter = 0
        else:
            self.patience_counter += 1

        if self.patience_counter >= self.early_stopping_patience: # condition if the patient counter exceeds or equal than stop training 
            control.should_training_stop = True


<div style="width:100%;height:1px; background-color:black"></div>

<p style="font-size: 23px; margin-top: 10px; font-weight: bold">Define Training Arguments</p>

In [11]:
# setting the training arguments or parameters for training of the model
training_args = TrainingArguments( 
    per_device_train_batch_size=4, # batch size 4, couldnt tried it more than 4 as kernel was dying
    num_train_epochs=2, # number of epochs 2, tried with more than 2, but it was taking weeks so for the prototype just limited to 2 
    logging_steps=10, # logging after 10 steps to check losses
    save_steps=10000, # saving the model after 10000 steps 
    output_dir="./medical_gpt2_finetuned", # saved model directory
    overwrite_output_dir=True, # to overwrite the existing model in the output directory
    do_train=True, # training the model
)


<div style="width:100%;height:1px; background-color:black"></div>

<p style="font-size: 23px; margin-top: 10px; font-weight: bold">Initialize  Trainer</p>

In [12]:
# initialize the Trainer instance 
trainer = Trainer(
    model=model, # passing the model to be fine tuned
    args=training_args, # passing the training configurations
    data_collator=data_collator, # data batching and format
    train_dataset=train_dataset, # passing the training dataset
    callbacks=[EarlyStoppingCallback(early_stopping_patience=2)], # callback function for ealry stopping, with 2 epoch
    eval_dataset=val_dataset # passing the validation dataset
)


<div style="width:100%;height:1px; background-color:black"></div>

<p style="font-size: 23px; margin-top: 10px; font-weight: bold">Train Model</p>

In [None]:
trainer.train() # #training the model

***** Running training *****
  Num examples = 89231
  Num Epochs = 2
  Instantaneous batch size per device = 4
  Total train batch size (w. parallel, distributed & accumulation) = 4
  Gradient Accumulation steps = 1
  Total optimization steps = 44616
  Number of trainable parameters = 354823168


Step,Training Loss
10,4.1474
20,4.0736
30,3.9043
40,3.7933
50,3.7366
60,3.6427
70,3.6735
80,3.6909
90,3.7034
100,3.6431


Saving model checkpoint to ./medical_gpt2_finetuned/checkpoint-10000
Configuration saved in ./medical_gpt2_finetuned/checkpoint-10000/config.json
Model weights saved in ./medical_gpt2_finetuned/checkpoint-10000/pytorch_model.bin
Saving model checkpoint to ./medical_gpt2_finetuned/checkpoint-20000
Configuration saved in ./medical_gpt2_finetuned/checkpoint-20000/config.json
Model weights saved in ./medical_gpt2_finetuned/checkpoint-20000/pytorch_model.bin


<div style="width:100%;height:1px; background-color:black"></div>

<p style="font-size: 23px; margin-top: 10px; font-weight: bold">Load the model from checkpoint</p>

<p style="font-size: 20px">From the last cell, model crash at step <b>29740</b> so there is a need to load the model from the checkpoint and continue training.</p>

In [13]:

model = GPT2LMHeadModel.from_pretrained("./medical_gpt2_finetuned/checkpoint-20000") # load the model from the checkpoint-20000
# setting the training arguments or parameters for training of the model
training_args = TrainingArguments( 
    per_device_train_batch_size=4, # batch size 4, couldnt tried it more than 4 as kernel was dying
    num_train_epochs=2, # number of epochs 2, tried with more than 2, but it was taking weeks so for the prototype just limited to 2 
    logging_steps=10, # logging after 10 steps to check losses
    save_steps=10000, # saving the model after 10000 steps 
    output_dir="./medical_gpt2_finetuned", # saved model directory
    overwrite_output_dir=True, # to overwrite the existing model in the output directory
    do_train=True, # training the model
)
# initialize the Trainer instance again
trainer = Trainer(
    model=model, # passing the model to be fine tuned
    args=training_args, # passing the training configurations
    data_collator=data_collator, # data batching and format
    train_dataset=train_dataset, # passing the training dataset
    callbacks=[EarlyStoppingCallback(early_stopping_patience=2)], # callback function for ealry stopping, with 2 epoch
    eval_dataset=val_dataset # passing the validation dataset
)

trainer.train(resume_from_checkpoint=True) # continue training again from the checkoint


loading configuration file ./medical_gpt2_finetuned/checkpoint-20000/config.json
Model config GPT2Config {
  "_name_or_path": "gpt2-medium",
  "activation_function": "gelu_new",
  "architectures": [
    "GPT2LMHeadModel"
  ],
  "attn_pdrop": 0.1,
  "bos_token_id": 50256,
  "embd_pdrop": 0.2,
  "eos_token_id": 50256,
  "initializer_range": 0.02,
  "layer_norm_epsilon": 1e-05,
  "model_type": "gpt2",
  "n_ctx": 1024,
  "n_embd": 1024,
  "n_head": 16,
  "n_inner": null,
  "n_layer": 24,
  "n_positions": 1024,
  "n_special": 0,
  "predict_special_tokens": true,
  "reorder_and_upcast_attn": false,
  "resid_pdrop": 0.2,
  "scale_attn_by_inverse_layer_idx": false,
  "scale_attn_weights": true,
  "summary_activation": null,
  "summary_first_dropout": 0.1,
  "summary_proj_to_labels": true,
  "summary_type": "cls_index",
  "summary_use_proj": true,
  "task_specific_params": {
    "text-generation": {
      "do_sample": true,
      "max_length": 50
    }
  },
  "torch_dtype": "float32",
  "transf

  0%|          | 0/7692 [00:00<?, ?it/s]

Step,Training Loss
30010,2.7568
30020,2.9876
30030,2.7252
30040,2.5813
30050,2.6907
30060,2.7386
30070,2.6432
30080,2.6212
30090,2.7591
30100,2.7986


Saving model checkpoint to ./medical_gpt2_finetuned/checkpoint-40000
Configuration saved in ./medical_gpt2_finetuned/checkpoint-40000/config.json
Model weights saved in ./medical_gpt2_finetuned/checkpoint-40000/pytorch_model.bin


Training completed. Do not forget to share your model on huggingface.co/models =)




TrainOutput(global_step=44616, training_loss=0.8986597944806506, metrics={'train_runtime': 45718.3347, 'train_samples_per_second': 3.904, 'train_steps_per_second': 0.976, 'total_flos': 4.143444583671398e+16, 'train_loss': 0.8986597944806506, 'epoch': 2.0})

<div style="width:100%;height:1px; background-color:black"></div>

<p style="font-size: 23px; margin-top: 10px; font-weight: bold">Save the model</p>

In [18]:
model.save_pretrained("./medical_gpt2_finetuned") # save the trained model in the directory
tokenizer.save_pretrained("./medical_gpt2_finetuned") # save the tokenizer in the directory

Configuration saved in ./medical_gpt2_finetuned/config.json
Model weights saved in ./medical_gpt2_finetuned/pytorch_model.bin
tokenizer config file saved in ./medical_gpt2_finetuned/tokenizer_config.json
Special tokens file saved in ./medical_gpt2_finetuned/special_tokens_map.json


('./medical_gpt2_finetuned/tokenizer_config.json',
 './medical_gpt2_finetuned/special_tokens_map.json',
 './medical_gpt2_finetuned/vocab.json',
 './medical_gpt2_finetuned/merges.txt',
 './medical_gpt2_finetuned/added_tokens.json')

<div style="width:100%;height:1px; background-color:black"></div>

<p ><center><u style="font-size: 28px; margin-top: 10px; font-weight: bold">Model Evaluation</u></center></p>

<p style="font-size: 20px">
Model evaluation is an important part of creating machine learning models. It's about testing how good the model is using data it hasn't seen during training, usually called test or validation data. We do this to see if the model's answers are right and understand any mistakes it might make.
</p

<div style="width:100%;height:1px; background-color:black"></div>

<p style="font-size: 23px; margin-top: 10px; font-weight: bold">Load the model</p>

In [13]:
model = GPT2LMHeadModel.from_pretrained("./medical_gpt2_finetuned")
tokenizer = GPT2Tokenizer.from_pretrained("./medical_gpt2_finetuned")


loading configuration file ./medical_gpt2_finetuned/config.json
Model config GPT2Config {
  "_name_or_path": "medical_gpt2_finetuned",
  "activation_function": "gelu_new",
  "architectures": [
    "GPT2LMHeadModel"
  ],
  "attn_pdrop": 0.1,
  "bos_token_id": 50256,
  "embd_pdrop": 0.2,
  "eos_token_id": 50256,
  "initializer_range": 0.02,
  "layer_norm_epsilon": 1e-05,
  "model_type": "gpt2",
  "n_ctx": 1024,
  "n_embd": 1024,
  "n_head": 16,
  "n_inner": null,
  "n_layer": 24,
  "n_positions": 1024,
  "n_special": 0,
  "predict_special_tokens": true,
  "reorder_and_upcast_attn": false,
  "resid_pdrop": 0.2,
  "scale_attn_by_inverse_layer_idx": false,
  "scale_attn_weights": true,
  "summary_activation": null,
  "summary_first_dropout": 0.1,
  "summary_proj_to_labels": true,
  "summary_type": "cls_index",
  "summary_use_proj": true,
  "task_specific_params": {
    "text-generation": {
      "do_sample": true,
      "max_length": 50
    }
  },
  "torch_dtype": "float32",
  "transformers

<div style="width:100%;height:1px; background-color:black"></div>

<p style="font-size: 23px; margin-top: 10px; font-weight: bold">Loss and Perplexity</p>

<p style="font-size: 20px">
Loss measures the difference between a model's predictions and the actual data. It helps in adjusting the model to make better predictions. By looking at the loss, we can see if the model is improving and make necessary changes if needed.
</p

<p style="font-size: 20px">
    Perplexity checks how good a model is at guessing the next word. For medical chatbots, a lower value means the bot can chat more smoothly and make sense.
</p

<p style="font-size: 20px; margin-top: 10px; font-weight: bold">ROUGE score</p>

<p style="font-size: 20px">
    The ROUGE score looks at how much the predicted text matches the reference text using different measures like precision, recall, and F1-score. It's particularly useful for tasks like summarization to see how much key information the model includes in its output. In the context of medical chatbot, ROUGE can help determine how closely the generated response matches a desired or reference answer, indicating the system's ability to provide accurate and relevant information.
</p>

In [14]:

def compute_metrics(eval_prediction: EvalPrediction):
    logits = torch.tensor(eval_prediction.predictions)  # these are raw logits
    labels = eval_prediction.label_ids
    labels = torch.tensor(labels, dtype=torch.long)  # convert to long tensor format

    
    criterion = nn.CrossEntropyLoss()  # intialize the cross-entropy loss
    loss = criterion(logits.view(-1, logits.shape[-1]), labels.view(-1))
    perplexity = torch.exp(loss)

   
    predictions = torch.argmax(logits, dim=-1)  # get the predicted token ids
    accuracy = accuracy_score(labels.view(-1).cpu().numpy(), predictions.view(-1).cpu().numpy()) # calculate accuracy


    pred_texts = [tokenizer.decode(ids, skip_special_tokens=True) for ids in predictions]  # Decode predictions and labels to text
    label_texts = [tokenizer.decode(ids, skip_special_tokens=True) for ids in labels]

   
    rouge = load_metric("rouge") # load the rouge method
    rouge_scores = rouge.compute(predictions=pred_texts, references=label_texts) # calculate rouge score

    flattened_rouge = {
        "rouge-1/r": rouge_scores['rouge1'].mid.recall,
        "rouge-1/p": rouge_scores['rouge1'].mid.precision,
        "rouge-1/f": rouge_scores['rouge1'].mid.fmeasure,
        "rouge-2/r": rouge_scores['rouge2'].mid.recall,
        "rouge-2/p": rouge_scores['rouge2'].mid.precision,
        "rouge-2/f": rouge_scores['rouge2'].mid.fmeasure,
        "rouge-l/r": rouge_scores['rougeL'].mid.recall,
        "rouge-l/p": rouge_scores['rougeL'].mid.precision,
        "rouge-l/f": rouge_scores['rougeL'].mid.fmeasure,
    }

    return {
        "loss": loss.item(),
        "perplexity": perplexity.item(),
        "accuracy": accuracy,
        **flattened_rouge
    }


In [15]:
training_args = TrainingArguments(
    per_device_eval_batch_size=1,
    output_dir="./medical_gpt2_finetuned",
    do_train=False,  # passing false so it will be only used for evaluation
    do_eval=True,
)
subset_val_dataset = val_dataset[:100]

trainer = Trainer(
    model=model,
    args=training_args,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
    eval_dataset=subset_val_dataset
)

eval_results = trainer.evaluate()

print(eval_results)

PyTorch: setting up devices
The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).
***** Running Evaluation *****
  Num examples = 100
  Batch size = 1




{'eval_loss': 10.379868507385254, 'eval_perplexity': 32204.7265625, 'eval_accuracy': 0.00953125, 'eval_rouge-1/r': 0.497471544679153, 'eval_rouge-1/p': 0.5265466190586023, 'eval_rouge-1/f': 0.5115013020298906, 'eval_rouge-2/r': 0.17695439009498778, 'eval_rouge-2/p': 0.18689878890315373, 'eval_rouge-2/f': 0.18173472982598352, 'eval_rouge-l/r': 0.3888662278959122, 'eval_rouge-l/p': 0.4111539644895145, 'eval_rouge-l/f': 0.39937917694833136, 'eval_runtime': 37.8654, 'eval_samples_per_second': 2.641, 'eval_steps_per_second': 2.641}


<p style="font-size: 23px; margin-top: 10px; font-weight: bold"><u>Saving Results to the dataframe</u></p>

<p style="font-size: 20px">
  I am saving the results in the dataframe one by one of each model so i can compare the results in the separate python file (medical_chatbot_eval_metrics.ipynb).
</p>

In [16]:
eval_results

{'eval_loss': 10.379868507385254,
 'eval_perplexity': 32204.7265625,
 'eval_accuracy': 0.00953125,
 'eval_rouge-1/r': 0.497471544679153,
 'eval_rouge-1/p': 0.5265466190586023,
 'eval_rouge-1/f': 0.5115013020298906,
 'eval_rouge-2/r': 0.17695439009498778,
 'eval_rouge-2/p': 0.18689878890315373,
 'eval_rouge-2/f': 0.18173472982598352,
 'eval_rouge-l/r': 0.3888662278959122,
 'eval_rouge-l/p': 0.4111539644895145,
 'eval_rouge-l/f': 0.39937917694833136,
 'eval_runtime': 37.8654,
 'eval_samples_per_second': 2.641,
 'eval_steps_per_second': 2.641}

In [17]:
eval_metrics_results = {
    'model_name': 'GPT-2 Medium',
    'loss': eval_results['eval_loss'],
    'perplexity': eval_results['eval_perplexity'],
    'accuracy': eval_results['eval_accuracy'],
    'rouge-1_r': eval_results['eval_rouge-1/r'],
    'rouge-1_p': eval_results['eval_rouge-1/p'],
    'rouge-1_f': eval_results['eval_rouge-1/f'],
    'rouge-2_r': eval_results['eval_rouge-2/r'],
    'rouge-2_p': eval_results['eval_rouge-2/p'],
    'rouge-2_f': eval_results['eval_rouge-2/f'],
    'rouge-l_r': eval_results['eval_rouge-l/r'],
    'rouge-l_p': eval_results['eval_rouge-l/p'],
    'rouge-l_f': eval_results['eval_rouge-l/f']
}


In [22]:
eval_metrics_results_dataframe = pd.read_csv('eval_metrics_results_dataframe.csv') # load the csv  
eval_metrics_results_dataframe

Unnamed: 0,model_name,loss,perplexity,accuracy,rouge-1_r,rouge-1_p,rouge-1_f,rouge-2_r,rouge-2_p,rouge-2_f,rouge-l_r,rouge-l_p,rouge-l_f
0,Encoder-Decoder LSTM,0.074382,1.05291,0.990507,0.985153,0.967791,0.976377,0.975947,0.946462,0.960935,0.985153,0.967791,0.976377


In [23]:
eval_metrics_results_dataframe = eval_metrics_results_dataframe.append(eval_metrics_results, ignore_index=True) # append the record in the dataframe
eval_metrics_results_dataframe.to_csv('eval_metrics_results_dataframe.csv', index=False) # save to the same file
eval_metrics_results_dataframe

Unnamed: 0,model_name,loss,perplexity,accuracy,rouge-1_r,rouge-1_p,rouge-1_f,rouge-2_r,rouge-2_p,rouge-2_f,rouge-l_r,rouge-l_p,rouge-l_f
0,Encoder-Decoder LSTM,0.074382,1.05291,0.990507,0.985153,0.967791,0.976377,0.975947,0.946462,0.960935,0.985153,0.967791,0.976377
1,GPT-2 Medium,10.379869,32204.726562,0.009531,0.497472,0.526547,0.511501,0.176954,0.186899,0.181735,0.388866,0.411154,0.399379


<div style="width:100%;height:1px; background-color:black"></div>

<p style="font-size: 23px; margin-top: 10px; font-weight: bold"><u>Answer to user queries by using the model</u></p>

In [28]:

nlp = spacy.load("en_core_web_sm") # load english model from the spacy

def correct_grammar(text): # function to correct the grammar
    tool = LanguageTool('en-US')
    matches = tool.check(text)
    return tool.correct(text)

def reduce_redundancy(text): # to remove duplicate sentences
    doc = nlp(text)
    sentences = list(doc.sents)
    to_remove = []

    for i in range(len(sentences) - 1):
        for j in range(i + 1, len(sentences)):
            if sentences[i].similarity(sentences[j]) > 0.8:  # setting the threshold
                to_remove.append(j)
    
    filtered_sentences = [str(sentences[i]) for i in range(len(sentences)) if i not in to_remove]
    return ' '.join(filtered_sentences)


In [29]:
def post_process(text): # calling above functions 
    refined_text = correct_grammar(text)
    refined_text = reduce_redundancy(refined_text)
    return refined_text


In [30]:
def generate_text(prompt, max_length=128, temperature=0.3): # defining a function, generate a response using the trained model
    input_ids = tokenizer.encode(prompt, return_tensors="pt")
    input_ids = input_ids.to(device)
    
    pad_token = tokenizer.pad_token_id or tokenizer.eos_token_id
    attention_mask = input_ids.ne(pad_token)

    output = model.generate(
        input_ids=input_ids,
        attention_mask=attention_mask,  
        pad_token_id=tokenizer.eos_token_id,
        max_length=max_length,
        temperature=temperature,
        num_return_sequences=1
    )
    
    generated_text = tokenizer.decode(output[0], skip_special_tokens=True) # decode response back to the text
    return generated_text

In [31]:
def chat_with_model(): # function to have a chat with the trained model
    while True:
        user_input = input("You: ")
        if user_input.lower() in ["exit", "quit", "bye"]:
            print("Chatbot: Goodbye!")
            break
        elif user_input.lower() in ["hi", "hello", "hey", "hola"]:
            print("Chatbot: Hi, I hope you are fine. How can i help you?")
            continue  
        response = generate_text(user_input)
        processed_response = post_process(response)  # apply post-processing for grammatical mistakes
        print("Chatbot:", processed_response)

In [32]:
chat_with_model() # calling the chat function

You:  hi


Chatbot: Hi, I hope you are fine. How can i help you?


You:  what is flu?


  from ipykernel import kernelapp as app


Chatbot: What is flu?
Doctor: hi, thank you for posting your query. Flu is a viral illness caused by the influenza virus. It is a self limiting illness and usually resolves within a week. However, if the symptoms persist, it is better to consult a doctor and get evaluated. I hope this helps 
Patient: hi, I have a question about my daughter. She is 3 years old, and she has been having a lot of trouble with her stomach. She has been having a lot of gas, and she has been having a lot of diarrhea. She has been having a lot of gas and diarrhea for the past few days


You:  bye


Chatbot: Goodbye!
