Colab Link - https://colab.research.google.com/drive/1LSZjttnWx9LIuhPgXyNLQ_9JmGxVA6Xp?usp=sharing

## Installations and Imports

In [1]:
! pip -q install transformers

In [3]:
!ls

All5debate_docs.jsonl  sample_data


In [2]:
from transformers import AutoModelWithLMHead, AutoTokenizer
import torch
import os

tokenizer = AutoTokenizer.from_pretrained("microsoft/DialoGPT-small")
model = AutoModelWithLMHead.from_pretrained("microsoft/DialoGPT-small")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/614 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]



config.json:   0%|          | 0.00/641 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/351M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

In [4]:
"""
Fine-tuning the library models for language modeling on a text file (GPT, GPT-2, BERT, RoBERTa).
GPT and GPT-2 are fine-tuned using a causal language modeling (CLM) loss while BERT and RoBERTa are fine-tuned
using a masked language modeling (MLM) loss.
"""

import glob
import logging
import os
import pickle
import random
import re
import shutil
from typing import Dict, List, Tuple
import json

import pandas as pd
import numpy as np
import torch

from sklearn.model_selection import train_test_split

from torch.nn.utils.rnn import pad_sequence
from torch.utils.data import DataLoader, Dataset, RandomSampler, SequentialSampler
from torch.utils.data.distributed import DistributedSampler
from tqdm.notebook import tqdm, trange

from pathlib import Path

from transformers import (
    MODEL_WITH_LM_HEAD_MAPPING,
    WEIGHTS_NAME,
    AdamW,
    AutoConfig,
    AutoModelWithLMHead,
    AutoTokenizer,
    PreTrainedModel,
    PreTrainedTokenizer,
    get_linear_schedule_with_warmup,
)


try:
    from torch.utils.tensorboard import SummaryWriter
except ImportError:
    from tensorboardX import SummaryWriter

# Configs
logger = logging.getLogger(__name__)

MODEL_CONFIG_CLASSES = list(MODEL_WITH_LM_HEAD_MAPPING.keys())
MODEL_TYPES = tuple(conf.model_type for conf in MODEL_CONFIG_CLASSES)

## Arguments

In [5]:
# Args to allow for easy convertion of python script to notebook
class Args():
    def __init__(self):
        self.output_dir = 'output-small-save'
        self.model_type = 'gpt2'
        self.model_name_or_path = 'microsoft/DialoGPT-small'
        self.config_name = 'microsoft/DialoGPT-small'
        self.tokenizer_name = 'microsoft/DialoGPT-small'
        self.cache_dir = 'cached'
        self.block_size = 512
        self.do_train = True
        self.do_eval = True
        self.evaluate_during_training = False
        self.per_gpu_train_batch_size = 4
        self.per_gpu_eval_batch_size = 4
        self.gradient_accumulation_steps = 1
        self.learning_rate = 5e-5
        self.weight_decay = 0.0
        self.adam_epsilon = 1e-8
        self.max_grad_norm = 1.0
        self.num_train_epochs = 3
        self.max_steps = -1
        self.warmup_steps = 0
        self.logging_steps = 1000
        self.save_steps = 3500
        self.save_total_limit = None
        self.eval_all_checkpoints = False
        self.no_cuda = False
        self.overwrite_output_dir = True
        self.overwrite_cache = True
        self.should_continue = False
        self.seed = 42
        self.local_rank = -1
        self.fp16 = False
        self.fp16_opt_level = 'O1'

args = Args()

## Preprocessing Data

In [None]:
import os

def check_file_permissions(file_path):
    if os.access(file_path, os.R_OK):
        print("File is readable")
    else:
        print("File is not readable")

# Specify the file path
file_path = '/content/All5debate_docs.jsonl'

# Check permissions
check_file_permissions(file_path)


File is readable


The max sequence length is 1024.

In [6]:
import pandas as pd
import json
from sklearn.model_selection import train_test_split

MAX_SEQUENCE_LENGTH = 1023  # Maximum sequence length allowed

def load_jsonl(file_path):
    with open(file_path, 'r') as file:
        data = [json.loads(line.strip()) for line in file if line.strip()]
    return data

def split_data(data, test_size=0.2, random_state=None):
    train, val = train_test_split(data, test_size=test_size, random_state=random_state)
    return train, val

# Load data from the JSON Lines file
data = load_jsonl("/content/All5debate_docs.jsonl")

# Reformat data for DataFrame with sequence length check
formatted_data = []
for obj in data:
    response = obj['prompt'][:MAX_SEQUENCE_LENGTH]  # Truncate if longer than 1023 tokens
    context = obj['completion'][:MAX_SEQUENCE_LENGTH]  # Truncate if longer than 1023 tokens
    formatted_data.append({'response': response, 'context': context})

# Split data into training and validation sets
train_data, val_data = split_data(formatted_data, test_size=0.2, random_state=42)

# Create DataFrames for training and validation
trn_df = pd.DataFrame(train_data)
val_df = pd.DataFrame(val_data)

# Display the first few rows of training and validation DataFrames
print("Training DataFrame:")
print(trn_df.head())
print("\nValidation DataFrame:")
print(val_df.head())

# Print the lengths of training and validation DataFrames
print("\nLength of Training DataFrame:", len(trn_df))
print("Length of Validation DataFrame:", len(val_df))


Training DataFrame:
                                            response  \
0  Zelensky has discussed the country's reforms a...   
1  The problems of energy and climate change have...   
2  wars have not been kind to American presidents...   
3  The Indigenous People of Biafra appeal to the ...   
4  many European policymakers worry that the U.S....   

                                             context  
0  Ukrainian President Volodymyr Zelensky has dis...  
1  The problems of energy and climate change have...  
2  Ever since World War II, wars have not been ki...  
3  The Indigenous People of Biafra (IPOB) under t...  
4  IS THE UNITED STATES NEGLECTING EUROPE? The au...  

Validation DataFrame:
                                            response  \
0  at the moment China can look relatively stable...   
1  China’s military modernization has increased t...   
2  until recently, macroeconomists have viewed ch...   
3  Do “good” decisions balance out “bad” decision...   
4  economi

In [7]:
trn_df.shape

(905, 2)

In [8]:
val_df.shape

(227, 2)

## Dataset Loader

In [9]:
def construct_conv(row, tokenizer, eos = True):
    flatten = lambda l: [item for sublist in l for item in sublist]
    conv = list(reversed([tokenizer.encode(x) + [tokenizer.eos_token_id] for x in row]))
    conv = flatten(conv)
    return conv

class ConversationDataset(Dataset):
    def __init__(self, tokenizer: PreTrainedTokenizer, args, df, block_size=512):

        block_size = block_size - (tokenizer.model_max_length - tokenizer.max_len_single_sentence)

        directory = args.cache_dir
        cached_features_file = os.path.join(
            directory, args.model_type + "_cached_lm_" + str(block_size)
        )

        if os.path.exists(cached_features_file) and not args.overwrite_cache:
            logger.info("Loading features from cached file %s", cached_features_file)
            with open(cached_features_file, "rb") as handle:
                self.examples = pickle.load(handle)
        else:
            logger.info("Creating features from dataset file at %s", directory)

            self.examples = []
            for _, row in df.iterrows():
                conv = construct_conv(row, tokenizer)
                self.examples.append(conv)

            logger.info("Saving features into cached file %s", cached_features_file)
            with open(cached_features_file, "wb") as handle:
                pickle.dump(self.examples, handle, protocol=pickle.HIGHEST_PROTOCOL)

    def __len__(self):
        return len(self.examples)

    def __getitem__(self, item):
        return torch.tensor(self.examples[item], dtype=torch.long)

## Caching and storing of Data/Checkpoints

In [10]:
# Cacheing and storing of data/checkpoints

def load_and_cache_examples(args, tokenizer, df_trn, df_val, evaluate=False):
    return ConversationDataset(tokenizer, args, df_val if evaluate else df_trn)


def set_seed(args):
    random.seed(args.seed)
    np.random.seed(args.seed)
    torch.manual_seed(args.seed)
    if args.n_gpu > 0:
        torch.cuda.manual_seed_all(args.seed)


def _sorted_checkpoints(args, checkpoint_prefix="checkpoint", use_mtime=False) -> List[str]:
    ordering_and_checkpoint_path = []

    glob_checkpoints = glob.glob(os.path.join(args.output_dir, "{}-*".format(checkpoint_prefix)))

    for path in glob_checkpoints:
        if use_mtime:
            ordering_and_checkpoint_path.append((os.path.getmtime(path), path))
        else:
            regex_match = re.match(".*{}-([0-9]+)".format(checkpoint_prefix), path)
            if regex_match and regex_match.groups():
                ordering_and_checkpoint_path.append((int(regex_match.groups()[0]), path))

    checkpoints_sorted = sorted(ordering_and_checkpoint_path)
    checkpoints_sorted = [checkpoint[1] for checkpoint in checkpoints_sorted]
    return checkpoints_sorted


def _rotate_checkpoints(args, checkpoint_prefix="checkpoint", use_mtime=False) -> None:
    if not args.save_total_limit:
        return
    if args.save_total_limit <= 0:
        return

    # Check if we should delete older checkpoint(s)
    checkpoints_sorted = _sorted_checkpoints(args, checkpoint_prefix, use_mtime)
    if len(checkpoints_sorted) <= args.save_total_limit:
        return

    number_of_checkpoints_to_delete = max(0, len(checkpoints_sorted) - args.save_total_limit)
    checkpoints_to_be_deleted = checkpoints_sorted[:number_of_checkpoints_to_delete]
    for checkpoint in checkpoints_to_be_deleted:
        logger.info("Deleting older checkpoint [{}] due to args.save_total_limit".format(checkpoint))
        shutil.rmtree(checkpoint)

In [11]:
trn_df

Unnamed: 0,response,context
0,Zelensky has discussed the country's reforms a...,Ukrainian President Volodymyr Zelensky has dis...
1,The problems of energy and climate change have...,The problems of energy and climate change have...
2,wars have not been kind to American presidents...,"Ever since World War II, wars have not been ki..."
3,The Indigenous People of Biafra appeal to the ...,The Indigenous People of Biafra (IPOB) under t...
4,many European policymakers worry that the U.S....,IS THE UNITED STATES NEGLECTING EUROPE? The au...
...,...,...
900,China’s projects are related to guaranteeing f...,Many of China’s projects are related to guaran...
901,influential voices are calling for what amount...,There can be no disputing that the murder of K...
902,it’s worth considering whether some important ...,The conventional truth that US-Israeli relatio...
903,Thought of Canada being the region where the s...,Thought of Canada being the region where the s...


## Defining Training and Evaluating functions

### Function to train the Model

In [12]:
def train(args, train_dataset, model: PreTrainedModel, tokenizer: PreTrainedTokenizer) -> Tuple[int, float]:
    """ Train the model """
    if args.local_rank in [-1, 0]:
        tb_writer = SummaryWriter()

    args.train_batch_size = args.per_gpu_train_batch_size * max(1, args.n_gpu)

    def collate(examples: List[torch.Tensor]):
        if tokenizer._pad_token is None:
            return pad_sequence(examples, batch_first=True)
        return pad_sequence(examples, batch_first=True, padding_value=tokenizer.pad_token_id)

    train_sampler = RandomSampler(train_dataset) if args.local_rank == -1 else DistributedSampler(train_dataset)
    train_dataloader = DataLoader(
        train_dataset, sampler=train_sampler, batch_size=args.train_batch_size, collate_fn=collate, drop_last = True
    )

    if args.max_steps > 0:
        t_total = args.max_steps
        args.num_train_epochs = args.max_steps // (len(train_dataloader) // args.gradient_accumulation_steps) + 1
    else:
        t_total = len(train_dataloader) // args.gradient_accumulation_steps * args.num_train_epochs

    model = model.module if hasattr(model, "module") else model  # Take care of distributed/parallel training
    model.resize_token_embeddings(len(tokenizer))
    # add_special_tokens_(model, tokenizer)


    # Prepare optimizer and schedule (linear warmup and decay)
    no_decay = ["bias", "LayerNorm.weight"]
    optimizer_grouped_parameters = [
        {
            "params": [p for n, p in model.named_parameters() if not any(nd in n for nd in no_decay)],
            "weight_decay": args.weight_decay,
        },
        {"params": [p for n, p in model.named_parameters() if any(nd in n for nd in no_decay)], "weight_decay": 0.0},
    ]
    optimizer = AdamW(optimizer_grouped_parameters, lr=args.learning_rate, eps=args.adam_epsilon)
    scheduler = get_linear_schedule_with_warmup(
        optimizer, num_warmup_steps=args.warmup_steps, num_training_steps=t_total
    )

    # Check if saved optimizer or scheduler states exist
    if (
        args.model_name_or_path
        and os.path.isfile(os.path.join(args.model_name_or_path, "optimizer.pt"))
        and os.path.isfile(os.path.join(args.model_name_or_path, "scheduler.pt"))
    ):
        # Load in optimizer and scheduler states
        optimizer.load_state_dict(torch.load(os.path.join(args.model_name_or_path, "optimizer.pt")))
        scheduler.load_state_dict(torch.load(os.path.join(args.model_name_or_path, "scheduler.pt")))

    if args.fp16:
        try:
            from apex import amp
        except ImportError:
            raise ImportError("Please install apex from https://www.github.com/nvidia/apex to use fp16 training.")
        model, optimizer = amp.initialize(model, optimizer, opt_level=args.fp16_opt_level)

    # multi-gpu training (should be after apex fp16 initialization)
    if args.n_gpu > 1:
        model = torch.nn.DataParallel(model)

    # Distributed training (should be after apex fp16 initialization)
    if args.local_rank != -1:
        model = torch.nn.parallel.DistributedDataParallel(
            model, device_ids=[args.local_rank], output_device=args.local_rank, find_unused_parameters=True
        )

    # Train!
    logger.info("***** Running training *****")
    logger.info("  Num examples = %d", len(train_dataset))
    logger.info("  Num Epochs = %d", args.num_train_epochs)
    logger.info("  Instantaneous batch size per GPU = %d", args.per_gpu_train_batch_size)
    logger.info(
        "  Total train batch size (w. parallel, distributed & accumulation) = %d",
        args.train_batch_size
        * args.gradient_accumulation_steps
        * (torch.distributed.get_world_size() if args.local_rank != -1 else 1),
    )
    logger.info("  Gradient Accumulation steps = %d", args.gradient_accumulation_steps)
    logger.info("  Total optimization steps = %d", t_total)

    global_step = 0
    epochs_trained = 0
    steps_trained_in_current_epoch = 0
    # Check if continuing training from a checkpoint
    if args.model_name_or_path and os.path.exists(args.model_name_or_path):
        try:
            # set global_step to gobal_step of last saved checkpoint from model path
            checkpoint_suffix = args.model_name_or_path.split("-")[-1].split("/")[0]
            global_step = int(checkpoint_suffix)
            epochs_trained = global_step // (len(train_dataloader) // args.gradient_accumulation_steps)
            steps_trained_in_current_epoch = global_step % (len(train_dataloader) // args.gradient_accumulation_steps)

            logger.info("  Continuing training from checkpoint, will skip to saved global_step")
            logger.info("  Continuing training from epoch %d", epochs_trained)
            logger.info("  Continuing training from global step %d", global_step)
            logger.info("  Will skip the first %d steps in the first epoch", steps_trained_in_current_epoch)
        except ValueError:
            logger.info("  Starting fine-tuning.")

    tr_loss, logging_loss = 0.0, 0.0

    model.zero_grad()
    train_iterator = trange(
        epochs_trained, int(args.num_train_epochs), desc="Epoch", disable=args.local_rank not in [-1, 0]
    )
    set_seed(args)  # Added here for reproducibility
    for _ in train_iterator:
        epoch_iterator = tqdm(train_dataloader, desc="Iteration", disable=args.local_rank not in [-1, 0])
        for step, batch in enumerate(epoch_iterator):
            # Skip past any already trained steps if resuming training
            if steps_trained_in_current_epoch > 0:
                steps_trained_in_current_epoch -= 1
                continue

            inputs, labels = (batch, batch)
            if inputs.shape[1] > 1024: continue
            inputs = inputs.to(args.device)
            labels = labels.to(args.device)
            model.train()
            outputs = model(inputs, labels=labels)
            loss = outputs[0]  # model outputs are always tuple in transformers (see doc)

            if args.n_gpu > 1:
                loss = loss.mean()  # mean() to average on multi-gpu parallel training
            if args.gradient_accumulation_steps > 1:
                loss = loss / args.gradient_accumulation_steps

            if args.fp16:
                with amp.scale_loss(loss, optimizer) as scaled_loss:
                    scaled_loss.backward()
            else:
                loss.backward()

            tr_loss += loss.item()
            if (step + 1) % args.gradient_accumulation_steps == 0:
                if args.fp16:
                    torch.nn.utils.clip_grad_norm_(amp.master_params(optimizer), args.max_grad_norm)
                else:
                    torch.nn.utils.clip_grad_norm_(model.parameters(), args.max_grad_norm)
                optimizer.step()
                scheduler.step()  # Update learning rate schedule
                model.zero_grad()
                global_step += 1

                if args.local_rank in [-1, 0] and args.logging_steps > 0 and global_step % args.logging_steps == 0:
                    # Log metrics
                    if (
                        args.local_rank == -1 and args.evaluate_during_training
                    ):  # Only evaluate when single GPU otherwise metrics may not average well
                        results = evaluate(args, model, tokenizer)
                        for key, value in results.items():
                            tb_writer.add_scalar("eval_{}".format(key), value, global_step)
                    tb_writer.add_scalar("lr", scheduler.get_lr()[0], global_step)
                    tb_writer.add_scalar("loss", (tr_loss - logging_loss) / args.logging_steps, global_step)
                    logging_loss = tr_loss

                if args.local_rank in [-1, 0] and args.save_steps > 0 and global_step % args.save_steps == 0:
                    checkpoint_prefix = "checkpoint"
                    # Save model checkpoint
                    output_dir = os.path.join(args.output_dir, "{}-{}".format(checkpoint_prefix, global_step))
                    os.makedirs(output_dir, exist_ok=True)
                    model_to_save = (
                        model.module if hasattr(model, "module") else model
                    )  # Take care of distributed/parallel training
                    model_to_save.save_pretrained(output_dir)
                    tokenizer.save_pretrained(output_dir)

                    torch.save(args, os.path.join(output_dir, "training_args.bin"))
                    logger.info("Saving model checkpoint to %s", output_dir)

                    _rotate_checkpoints(args, checkpoint_prefix)

                    torch.save(optimizer.state_dict(), os.path.join(output_dir, "optimizer.pt"))
                    torch.save(scheduler.state_dict(), os.path.join(output_dir, "scheduler.pt"))
                    logger.info("Saving optimizer and scheduler states to %s", output_dir)

            if args.max_steps > 0 and global_step > args.max_steps:
                epoch_iterator.close()
                break
        if args.max_steps > 0 and global_step > args.max_steps:
            train_iterator.close()
            break

    if args.local_rank in [-1, 0]:
        tb_writer.close()
    return global_step, tr_loss / global_step



### Function to evaluate the Model

In [13]:
# Function to evaluate the model

def evaluate(args, model: PreTrainedModel, tokenizer: PreTrainedTokenizer, df_trn, df_val, prefix="") -> Dict:
    # Loop to handle MNLI double evaluation (matched, mis-matched)
    eval_output_dir = args.output_dir

    eval_dataset = load_and_cache_examples(args, tokenizer, df_trn, df_val, evaluate=True)
    os.makedirs(eval_output_dir, exist_ok=True)
    args.eval_batch_size = args.per_gpu_eval_batch_size * max(1, args.n_gpu)
    # Note that DistributedSampler samples randomly

    def collate(examples: List[torch.Tensor]):
        if tokenizer._pad_token is None:
            return pad_sequence(examples, batch_first=True)
        return pad_sequence(examples, batch_first=True, padding_value=tokenizer.pad_token_id)

    eval_sampler = SequentialSampler(eval_dataset)
    eval_dataloader = DataLoader(
        eval_dataset, sampler=eval_sampler, batch_size=args.eval_batch_size, collate_fn=collate, drop_last = True
    )

    # multi-gpu evaluate
    if args.n_gpu > 1:
        model = torch.nn.DataParallel(model)

    # Eval!
    logger.info("***** Running evaluation {} *****".format(prefix))
    logger.info("  Num examples = %d", len(eval_dataset))
    logger.info("  Batch size = %d", args.eval_batch_size)
    eval_loss = 0.0
    nb_eval_steps = 0
    model.eval()

    for batch in tqdm(eval_dataloader, desc="Evaluating"):
        inputs, labels = (batch, batch)
        inputs = inputs.to(args.device)
        labels = labels.to(args.device)

        with torch.no_grad():
            outputs = model(inputs, labels=labels)
            lm_loss = outputs[0]
            eval_loss += lm_loss.mean().item()
        nb_eval_steps += 1

    eval_loss = eval_loss / nb_eval_steps
    perplexity = torch.exp(torch.tensor(eval_loss))

    result = {"perplexity": perplexity}

    output_eval_file = os.path.join(eval_output_dir, prefix, "eval_results.txt")
    with open(output_eval_file, "w") as writer:
        logger.info("***** Eval results {} *****".format(prefix))
        for key in sorted(result.keys()):
            logger.info("  %s = %s", key, str(result[key]))
            writer.write("%s = %s\n" % (key, str(result[key])))

    return result

## Running the Main Script

In [14]:
def main(df_trn, df_val):
    args = Args()

    if args.should_continue:
        sorted_checkpoints = _sorted_checkpoints(args)
        if len(sorted_checkpoints) == 0:
            raise ValueError("Used --should_continue but no checkpoint was found in --output_dir.")
        else:
            args.model_name_or_path = sorted_checkpoints[-1]

    if (
        os.path.exists(args.output_dir)
        and os.listdir(args.output_dir)
        and args.do_train
        and not args.overwrite_output_dir
        and not args.should_continue
    ):
        raise ValueError(
            "Output directory ({}) already exists and is not empty. Use --overwrite_output_dir to overcome.".format(
                args.output_dir
            )
        )

    # Setup CUDA, GPU & distributed training
    device = torch.device("cuda")
    args.n_gpu = torch.cuda.device_count()
    args.device = device

    # Setup logging
    logging.basicConfig(
        format="%(asctime)s - %(levelname)s - %(name)s -   %(message)s",
        datefmt="%m/%d/%Y %H:%M:%S",
        level=logging.INFO if args.local_rank in [-1, 0] else logging.WARN,
    )
    logger.warning(
        "Process rank: %s, device: %s, n_gpu: %s, distributed training: %s, 16-bits training: %s",
        args.local_rank,
        device,
        args.n_gpu,
        bool(args.local_rank != -1),
        args.fp16,
    )

    # Set seed
    set_seed(args)

    config = AutoConfig.from_pretrained(args.config_name, cache_dir=args.cache_dir)
    tokenizer = AutoTokenizer.from_pretrained(args.tokenizer_name, cache_dir=args.cache_dir)
    model = AutoModelWithLMHead.from_pretrained(
        args.model_name_or_path,
        from_tf=False,
        config=config,
        cache_dir=args.cache_dir,
    )
    model.to(args.device)

    logger.info("Training/evaluation parameters %s", args)

    # Training
    if args.do_train:
        train_dataset = load_and_cache_examples(args, tokenizer, df_trn, df_val, evaluate=False)

        global_step, tr_loss = train(args, train_dataset, model, tokenizer)
        logger.info(" global_step = %s, average loss = %s", global_step, tr_loss)

    # Saving best-practices: if you use save_pretrained for the model and tokenizer, you can reload them using from_pretrained()
    if args.do_train:
        # Create output directory if needed
        os.makedirs(args.output_dir, exist_ok=True)

        logger.info("Saving model checkpoint to %s", args.output_dir)
        # Save a trained model, configuration and tokenizer using `save_pretrained()`.
        # They can then be reloaded using `from_pretrained()`
        model_to_save = (
            model.module if hasattr(model, "module") else model
        )  # Take care of distributed/parallel training
        model_to_save.save_pretrained(args.output_dir)
        tokenizer.save_pretrained(args.output_dir)

        # Good practice: save your training arguments together with the trained model
        torch.save(args, os.path.join(args.output_dir, "training_args.bin"))

        # Load a trained model and vocabulary that you have fine-tuned
        model = AutoModelWithLMHead.from_pretrained(args.output_dir)
        tokenizer = AutoTokenizer.from_pretrained(args.output_dir)
        model.to(args.device)

    # Evaluation
    results = {}
    if args.do_eval and args.local_rank in [-1, 0]:
        checkpoints = [args.output_dir]
        if args.eval_all_checkpoints:
            checkpoints = list(
                os.path.dirname(c) for c in sorted(glob.glob(args.output_dir + "/**/" + WEIGHTS_NAME, recursive=True))
            )
            logging.getLogger("transformers.modeling_utils").setLevel(logging.WARN)  # Reduce logging
        logger.info("Evaluate the following checkpoints: %s", checkpoints)
        for checkpoint in checkpoints:
            global_step = checkpoint.split("-")[-1] if len(checkpoints) > 1 else ""
            prefix = checkpoint.split("/")[-1] if checkpoint.find("checkpoint") != -1 else ""

            model = AutoModelWithLMHead.from_pretrained(checkpoint)
            model.to(args.device)
            result = evaluate(args, model, tokenizer, df_trn, df_val, prefix=prefix)
            result = dict((k + "_{}".format(global_step), v) for k, v in result.items())
            results.update(result)

    return results

In [15]:
main(trn_df, val_df)



config.json:   0%|          | 0.00/641 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/614 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]



model.safetensors:   0%|          | 0.00/351M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]



Epoch:   0%|          | 0/3 [00:00<?, ?it/s]

Iteration:   0%|          | 0/226 [00:00<?, ?it/s]

Iteration:   0%|          | 0/226 [00:00<?, ?it/s]

Iteration:   0%|          | 0/226 [00:00<?, ?it/s]



Evaluating:   0%|          | 0/56 [00:00<?, ?it/s]

{'perplexity_': tensor(15.2537)}

## Generate Test Results

In [None]:
f_test = open("test_data.json")
test_data = json.load(f_test)
f_test.close()

test_query = []
test_response = []

for i in range(len(test_data)):
  test_response.append(test_data[i][1])
  test_query.append(test_data[i][0])

print(len(test_response))
print(len(test_query))

In [18]:
import torch
from transformers import AutoModelWithLMHead, AutoTokenizer

# Load the fine-tuned model and tokenizer
model_path = 'output-small-save'
tokenizer_path = 'microsoft/DialoGPT-small'
model = AutoModelWithLMHead.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(tokenizer_path)

# Prompt
prompt = "Tell me about Russia"

# Tokenize the prompt
input_ids = tokenizer.encode(prompt, return_tensors="pt")

# Generate text based on the tokenized prompt using the model
output = model.generate(input_ids, max_length=100, num_return_sequences=1, early_stopping=True)

# Decode the generated token IDs into human-readable text
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)

print("Generated Text:", generated_text)


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Generated Text: Tell me about Russia’s military capabilities. What are the consequences of a Russian military presence in the Middle East. What are the consequences of a Russian military presence in the Middle East. What are the consequences of a Russian military presence in the Middle East. What are the consequences of a Russian military presence in the Middle East. What are the consequences of a Russian military presence in the Middle East. What are the consequences of a Russian military presence in the Middle East. What are the consequences of a


In [22]:
import torch
from transformers import AutoModelWithLMHead, AutoTokenizer

# Load the fine-tuned model and tokenizer
model_path = 'output-small-save'
tokenizer_path = 'microsoft/DialoGPT-small'
model = AutoModelWithLMHead.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(tokenizer_path)

# Set the pad token ID
model.config.pad_token_id = tokenizer.pad_token_id

# Prompt
prompt = "Tell me about Russia"

# Tokenize the prompt
input_ids = tokenizer.encode(prompt, return_tensors="pt")

# Generate text based on the tokenized prompt using the model
output = model.generate(
      pad_token_id=tokenizer.eos_token_id,
      no_repeat_ngram_size=3,
      do_sample=True,
      top_k=10,
      top_p=0.7,
      temperature = 0.6
  )

# Decode the generated token IDs into human-readable text
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)

print("Generated Text:", generated_text)


A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Generated Text: .


test_chatbot is not our data.

In [None]:
test_chatbot = []

for i in range(len(test_query)):
  tokenizer = AutoTokenizer.from_pretrained('microsoft/DialoGPT-small')
  model = AutoModelWithLMHead.from_pretrained('output-small-save')
  # append the new user input tokens to the chat history
  bot_input_ids = tokenizer.encode(test_query[i] + tokenizer.eos_token, return_tensors='pt')
  print("Patient: {} \n".format(test_query[i]))
  print("Reference:  {} \n".format(test_response[i]))


  # generated a response while limiting the total chat history to 1000 tokens,
  chat_history_ids = model.generate(
      bot_input_ids, max_length=100,
      pad_token_id=tokenizer.eos_token_id,
      no_repeat_ngram_size=3,
      do_sample=True,
      top_k=10,
      top_p=0.7,
      temperature = 0.8
  )

  # pretty print last ouput tokens from bot
  print("Predict: {} \n\n".format(tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)))
  test_chatbot.append(tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True))

print(len(test_chatbot))

A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Patient: I have all the symptoms except fever, I went to Medicross and Dr said I can get tested if I want to I'm not sure if I should. She gave me antibiotics Klacid XL 500mg, she said I can take it if I feel worse I'm worried it will make immune system bad? 

Reference:  Antibiotic   I don't recommend antibiotics for a simple viral upper respiratory tract infection unless examination revealed signs of acute bronchitis or sinusitis.   They are not effective for viral infections like coVid 19 with no bacterial lung involvement either. If you've been exposed to someone with coVid 19 or or if you or someone you were exposed to travelled to a region where it was endemic, get tested  Would you like to video or text chat with me? 

Predict: If you have a fever, or if you have diarrhea, you are fine. If you have fever, you should be fine. 




A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Patient: I have pain/discomfort in my lungs. I don't experience simultaneous on both lungs and it not always at the hame position. I don't have a head nor do I have high temperature. I sneeze and cough maybe once a day. Do I have corona, should I get tested? 

Reference:  Exam needed. You are assuming that the discomfort is coming from both lungs, but pain of pleurisy or pneumonia wouldn't come and go In different sites. I doubt your symptoms are lung related, but fever, chills, sweats, worse cough, shortness of breath, wheezing, warrant testing.. If you've been exposed to someone with covid, or if you or someone you were exposed to  travelled to an affected region, get tested. 

Predict: No. COVID-19 is not contagious. You can get tested if you have a fever or if you are coughing or sneezing, but you will need to get tested 




A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Patient: I travelled to Mauritius and do not have symptoms. Should I get tested for covid19? 

Reference:  Self Quarantine . Suggest Self Quarantine for 14 days . If symptomatic , contact your Primary Care Physician or use this platform. 

Predict: Yes.    You can get tested.  If you have symptoms, you should get tested and be tested. If you do not get tested, you are not a suspect suspect.  You should be fine.  Would you like to video or text chat with me? 




A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Patient: I have a dry cough and a sore throat. I do not have a fever and a slight headache (could be I need more water) I am 4 months pregnant and not sure If I should be tested for Corona? 

Reference:  Socially isolate. get pleanty of fluids and rest and avoid others call dr if fever or difficulty breathing. 

Predict: Cough and sore throat is normal and normal. If you are in the right area and not in contact with anyone who has a fever, you should be alright. 




A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Patient: I went to the Dr for a cough and fever on march 5. They diagnosed me with bronchitis. I was not doing any better, so I returned to the Dr on march 15. They sent me to the ER to get tested for covid. At the er they ran a number of tests including a covid t. 

Reference:  Question   You have not posted a question.  Would you like to video or text chat with me? 

Predict: Hello and Welcome to Healthcare Magic.Welcome to the Magic Magic Magic.Ask a Magic Magicologist and get a test. Magic Magic is the Magic. Magic is 




A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Patient: What’s worse with the coronavirus, type 1 or type 2 diabetes? 

Reference:  Diabetes. My answer would be controlled or not. Type 1 or 2 if managed well and treated can still maintain good immunity. If either are out of control then infections of any type more likely. . 

Predict: It is very unlikely to have the same symptoms as diabetes. It depends on your severity of diabetes. If you are having a high risk of diabetes, then the coronovirus is more likely to have your symptoms. If it is, then you are more likely. 




A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Patient: Just found out that my MD tested positive for Covid-19. I had direct interaction with him. He has no symptoms, he was tested because he traveled in February to Switzerland. Should I be tested. I’m 67, or wait to see if I develop any symptoms.? 

Reference:  Per RSA NICD: Self-isolate for 14 days.Seek testing if sick.Call 0800 029 999 for test information.Seek consultation if sick, preferably by phone or telemedicine.Seek care right away if difficulty breathing, other severe symptoms.Good luck, from the United States! 

Predict: No. You should definitely be tested for Covide 19.    You will be tested by a doctor.  You can be tested and have access to testing kits.  If you do not have 




A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Patient: Hi. My children and husband were at the doctor yesterday and he has now informed us he has tested positive for covid19 . He has come home to my 2 babies and mom. How long before we can go for a test? 

Reference:  Incubation period   2-14 days(time from exposure to time of getting illness).  I am unclear if the doctor is positive or your husband is?You should isolate yourselves at home for 14 days.If your husband is positive,he should be in his own room and own bathroom,if possible.Testing only recommended if there are symptoms(fever or cough or shortness of breath)and then you should be tested.Good hand washing/no touching face  Would you like to video or text chat with me? 

Predict: Covid19 is not contagious. The test is done at home. The tests are done at the hospital. The testing is done by a patient who has been tested. 




A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Patient: We went to bali via singapore, on my second day in bali (17 March) I started feeling a bit sick? 

Reference:  Monitor temp. Most cases of COVID-19 are associated with symptoms, but not all. Your symptoms may be related to coronavirus infection or another infection. I would recommend that you be tested for Coronavirus and then stay inside, avoiding contact with others while awaiting your test results. Use good hand washing, disinfect surfaces and drink fluids. 

Predict: Hello and Welcome to the sub, we have reviewed your query and will be happy to assist you in resolving your query. Please contact us via chat. 




A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Patient: Is covering mouth and nose by mask sufficient to avoid corona exposure? Can It not be transmitted through eyes and ears? 

Reference:  Eyes, nose, mouth. Not ears.  See https://www.healthtap.com/blog/covid-19-care-guidelines/preventing-covid-19. 

Predict: Yes, it can be transmitted via the nose and mouth.    Would you like to video or text chat with me? 




A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Patient: I have scratchy throat, phlegm, loss of appetite, fatigue. I am pharmacist in research facility. My son had 3rd contact. He has flu symptoms. He was quarantined. We are waiting for the results for his contact. Do I need to test myself? 

Reference:  Cluster cases   I would recommend staying at home, rest, drink fluids and monitoring your temperature.   You may have COVID-19 or another illness. If you are otherwise healthy, await test results. Keep utensils, linens separate and washed. Use good hand washing and disinfect surfaces. If the test is positive, please have screening of all family members in your home. Avoid contact with others.   Would you like to video or text chat with me? 

Predict: Yes. It is possible to get a test for pneumonia. It may be possible to test for covid-19. If you are in a high risk situation, you should have been tested for Cov 




A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Patient: Can animal byproducts spread coronavirus? Do I have to worry about milk, eggs, or fruits? 

Reference:  No   we have no evidence of any concern with such spread.   It may be worth wiping all packages that come i nto the house with lysol wipes   Would you like to video or text chat with me? 

Predict: No, you are not. 




A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Patient: Why are so many people with diabetes dying from the coronavirus? It’s really frightening me. 

Reference:  Immunocompromised    body is busy dealing with diabetes, not as much resiliancy remaining to fight viral symptoms, and subsequent opportunistic infections  Would you like to video or text chat with me? 

Predict: No. There are no cases of people with the virus in the US, and there are no coronaviruses in the USA, so if you are in contact with someone with the coronovirus, you are at risk of getting coronaviral disease. So, you should be alright. 




A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Patient: I'm just a bit concerned as I have a sore throat and coughing a little bit as a result of my sore throat. My upper body is also aching a bit. Is this a cause for concern? 

Reference:  Possible virus. This is likely a viral infection. If you have a fever and/or the cough or sore throat get worse, you should be examined by your doctor. If you've been exposed to a particular illness, such as strep or the flu or coronavirus, you may need to be tested.For now, rest, drink fluids, take acetaminophen or ibuprofen as needed if you have no medical contraindication. 

Predict: I have been in contact with someone who has been in touch with a person with a fever, and they are experiencing a severe case of the virus. I have no symptoms, so I am not sure if I can get tested. I am also in contact right now with someone with a 




A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Patient: Will Lysol wipes kill coronavirus? How long can coronavirus live on household surfaces? 

Reference:  Several days   Lysol should  be good.    Seehttps://www.wired.com/story/coronavirus-disinfectant-cleaning-guide/  Would you like to video or text chat with me? 

Predict: No, you can not use any of these products. You can use any one of these but not all of these will be safe. 




A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Patient: Can I catch coronavirus on public transportation from someone who has it? How long do germs on public transportation last? 

Reference:  Coronavirus lives   New coronavirus may live on surfaces for a few hours or a few days depending on the type of surface, temperature or humidity.   One report indicated that new coronavirus can survive on hard surfaces for up to 72 hours and on cardboard for up to 24 hours.   Would you like to video or text chat with me? 

Predict: No. It is a virus. The virus is spread through public transportation. If you are in the US you are not going to get it. If your travel is short then you are at risk. 




A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Patient: Is it necessary to get tested for the corona virus if you arrive back in South Africa after visiting Zimbabwe and Victoria falls as part of a group? 

Reference:  It depends.   If the visited place has a high rate of coronavirus infections to take some precautions is important such as if you develop fever or cough or any other coronavirus infection symptoms.   To be tested is desired but will you be able to get tested? In some places testing kits are reserved to very sick and hospitalized patient. Self quarantine in cooperation with your doctor is the way to go.  Would you like to video or text chat with me? 

Predict: It is not necessary to stay at home, stay at your local hospital and stay in contact with someone who has been in contact for over a week, or stay at a local hospital that has a high rate of infection and has been treated with antibiotics. 




A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Patient: Hi. This COVID-19 outbreak is scary. I got screened this week and it was negative. But prior to screening I had a week of continuous dry coughs and also throat was sore. I've put myself in a quarantine. What next? Do I still need to screen again? 

Reference:  Screening test if. If you develop symptoms of dry cough, fever and/or shortness of breath, consider re testing since the test may convert to a positive test if infected. Other causes should be considered including an allergic reaction. Try one dose of an antihistamine and gauge improvement. Stay at home for three days if there is no fever. . 

Predict: Not really. It is not a virus. The virus is a virus that causes you to cough, which is not transmitted through the air. So if you have a fever or a cough then you should 




A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Patient: Is it true (or a myth) that coronavirus will not survive hotter climates. As summer approaches, is it expected to reduce in tropical summers at least. 

Reference:  Coronavirus.    Yes, according to previous knowledge, most virus including coronavirus likely will not survive hotter climates.   I would expect or hope this will die down in summer time. Nevertheless, this is new coronavirus, let's wait and see. I also think that with strong human intervention, the virus may reduce or be eliminated before summer.   Would you like to video or text chat with me? 

Predict: No. The coronaviruses are not immune to COVID.    The coronas virus is not effective against COVID-19.  Would you like to video or text chat with me? 




A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Patient: I heard ACE inhibitors might exacerbate coronavirus. I also heard NSAIDs might do the same. Does this mean any common meds could have an unexpected effect during coronavirus? 

Reference:  Coronavirus and ACEI.   Recent publications including the one from the lancet showed COVID-19 binds to ACE-2 receptor on target cells for entrance.   The expression of ACE-2 receptor is substantially increased in patients with diabetes who were treated with ACE inhibitors or ARB. NSAIDS has the similar effect. All medications can have some unexpected side effects.    Would you like to video or text chat with me? 

Predict: No. The coronaviruses are very different. The common med is the NSAIDs. The most common med in the US is the flu vaccine. The other common med are the flu and coronaviral vaccines. The US is not known for its ability to cure coronavitis. 




A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Patient: I have had mild chest pain for over a week. it now seems more persistent and pronounced. I don't have shortness of breath or any other COVID-19 symptoms, except some fatigue. I have been traveling a lot in high risk areas. Should I get tested? 

Reference:  Yes   I would advise screening due to your exposure.   Fever is very commonly associated with COVID-19. Stay at home, rest, drink fluids and monitor your temperature. Arrange the testing which also  may include a chest image with your PCP.  Since your have been traveling, a pulmonary embolism is another possible cause of your shortness of breath.   Would you like to video or text chat with me? 

Predict: No, but you should definitely be tested for COVID19. 




A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Patient: Ons het teruggekom vanaf oostenryk op 8 Maart..reeds 13 dae gelede. Het tans seer keel maar ook nasale drip. wat staan my te doen? 

Reference:  COVID19   even though your have no fever or shortness of breath, your travel history fits criteria to be tested.   I suggest you get tested for COVID19 - speak with your GP to advise further, self isolate in the meanwhile  Would you like to video or text chat with me? 

Predict: Befor    Nederlanden   Wir   m  N  erd   J   n  en  J tt   s   M  t 




A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Patient: I have shortness of breath and are worried, I don’t have a cough or sore throat, so they will not test me, should I do a private test? 

Reference:  Shortness of breath. Covid testing is hard to come by. At 37 you are in a lower risk group.  And I doubt you could get a test. Keep isolated for two weeks. I know we have been ordering tests through HealthTap, but that does not lead to getting the test done, due to shortages. (3/21/20). 

Predict: If you have short breath and cough then yes. If you have a sore throat then no, but if you have cough then no. 




A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Patient: How do you get HPV on the cervix? Is it from skin to skin contact during intercourse from a partner’s genital touching the cervix? Thank you. 

Reference:  Basically the HPV virus is transferred within fluids generated during intercourse.   A mechanical skin contact appears to be part of the process but defining the exact amount of contact required is difficult. The genital strains of HPV are specific to those locations. There are other HPV strains that cause things like skin warts but these are different.  Would you like to video or text chat with me? 

Predict: Virus. It is not a virus. You can get it through a virus and a person who has it. It does not spread through contact. 




A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Patient: My 2year old Son has had a dry cough and a runny nose since thursday . He's not talking yet so it is difficult to tell if he has any trouble breathing so I'm concerned. He does not have temperature ? 

Reference:  Observatn is enough.   By looking at your child you'll know if he is having breathing difficulties.   Cough with a runny nose is common in children. You should be concerned only if he is refusing to play and eat, or if he develops breathing difficulties. Having a fever doesn't mean he has the Coronavirus. He is more likely to be infected with any of the common infections in your area of Gauteng.  Would you like to video or text chat with me? 

Predict: No fever or cough. You should be fine. 




A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Patient: Can I test?No known contact with Corona  but been in contact with with Congo dr friend 7march. She is still healthy now. I had Slight Sore throat wed 11 &12 March . phlegm and red eyes. Temp seemed ok. Got thermometer .ok. 19 felt better but coughing. 

Reference:  Per RSA NICD:   Seek care right away if difficulty breathing, other severe symptoms.  Seek testing if sick, PLUS known COVID-19 contact, or travel to high risk country, or been in facility treating people with COVID-19, or severe pneumonia of unknown cause.Call 0800 029 999 for test. If sick or possibly exposed (not both), stay home.Seek consultation if sick.If known exposure & not sick, self-isolate 14 days.  Would you like to video or text chat with me? 

Predict: Corona    Your symptoms are not normal.   Would you like to video or text chat with me? 




A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Patient: I’m so Scared that I will die if I get the coronavirus, due to type1 diabetes? 

Reference:  Coronavirus. Stay calm. Most people with coronavirus infection only get mild respiratory symptoms, especially among younger patients. Diabetes and use of ACEI or ARB may increase your risk of infection. Practicing common sense of prevention is critical during this outbreak crisis. Avoid risk exposure, wash hand, wear mask as needed. Get tested if you develop symptoms. Effect drugs are coming. . 

Predict: It's not a virus. It is a virus that is causing you to die if you are not tested for it. If you are tested for the virus then you should not worry about it. 




A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Patient: A 2 year complaining that his head is sore. We gave him panado in the morning and at midday. He can't sleep he keeps waking up crying its painful. Should I take him to a Dr or hospital? 

Reference:  Fever?   Start by call Pediatrician ASAP.   if high fe ver or projectie vomiting take to ER .  take to ER  if inconsolable and cannot get pedicatrician IF mild respiratory or GI symptoms likely virus and want to make sure adequately hydrated and may just watch, isolate from contact , and give chicken soup . work on soothing and disstraction Keep in touch with pediatrician.   Would you like to video or text chat with me? 

Predict: No. You have to take him back to the hospital. He has a fever and has a sore throat. If you are not having trouble breathing then you should definitely call your doctor. 




A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Patient: I have covid 19 symptoms? 

Reference:  Can talk with doctor   Doctors can give you the up-to-date information.   Video consultations available (start over and click on Talk-to-Doctor). We have informational posts on our Healthtap.com/blog. As the US president has given mistaken information at various times during this pandemic, be sure to check with doctors and the CDC website. People usually should not be tested without symptoms or direct exposure. (3/21/20)  Would you like to video or text chat with me? 

Predict: Covid 19.    If you are not in contact with others who are covid19, then you should not worry.  If not, then don't worry. 




A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Patient: I've been in contact with a possible covid-19 case who is waiting on results. My brother seems to be I'll with Corona symptoms and I'd like to get advice on how to help him, as his safety will ensure my safety and prevent the spread of the virus. 

Reference:  Stay home. Family members should stay home and avoid contact with your brother. Keep linens, bath area and utensils separated. He should monitor his temperature, drink fluids, rest and your family should be screened for Coronavirus. 

Predict: !!!Tweetbot!!twitter!!Twitter!!’!!Thanks!! 




A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Patient: I have mild irritation in my chest but I am not coughing, I just feel a tingling sensation on my thought. I have no fever, no aches should I be worried about COVID 19? 

Reference:  Probably. We should all be cautious , not worried about Covid-19. It is prudent for us all to be socially isolating, washing our hands, supplementing with vits. D and C and working on making the best of it.  BE grateful for the things that are going well in your life, reach out with verbal and video support for ALL the loved ones and even people you just know are isolated with support. Don't bother to worry. 

Predict: No. COVID19 is not a virus. COV19 is a virus that causes you to cough, which can cause you to feel sick. 




A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Patient: Swollen tonsils, cough, and feeling slightly feverish. 

Reference:  Coronavirus possible. Seek care right away if difficulty breathing, other severe symptoms.Seek testing if sick, PLUS known COVID-19 contact, or travel to high risk country, or been in facility treating people with COVID-19, or severe pneumonia of unknown cause.Call 0800 029 999 for test. If sick or possibly exposed (not both), stay home.Seek consultation if sick.If known exposure & not sick, self-isolate 14 days. 

Predict: Hello and Welcome to RAOA Magic.Hello and Thank you for your question on Magic.Best to ask here: Magic.Ask away.Best way to get a diagnosis is to go to Magic. 




A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Patient: I have a sore throat  and a bit of a cough. This morning i coughed up mucus with what look like blood. Is this a sign of covid 19? 

Reference:  Source of bleeding   There are many causes of the small amount of blood in your sputum. You may have a nasal or upper airway source.    Strep infection, influenza are possible causes. COVID-19 typically is associated with fever, dry cough and/or shortness of breath.. Please contact your PCP for an evaluation, possible chest image and evaluation of the source of the bleeding.   Would you like to video or text chat with me? 

Predict: No.    You have to go to the hospital to get tested.  You can also go to a local hospital and get tested for Covid 19.  




A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Patient: Symptoms include irritation in the throat, dry cough, trouble breathing properly, headache. However not really a sign of a fever. Temperature is about 35 degrees. Should I get tested? 

Reference:  Many causes.   Most cases of COVID-19 with symptoms are associated with fever, however milder cases do occur. Your shortness of breath and difficulty breathing may have a number of causes and should be evaluated.   Please call your PCP. Avoid dust, mold, smoke, animal dander, pollen. Your symptoms also are consistent with an allergic reaction.  Would you like to video or text chat with me? 

Predict: No fever, no fever, not sure if you are in contact with anyone or if you have been infected with a virus. 




A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Patient: I am 63 living in Jhb and have had a tight chest and sore throat and wet cough for more than a week ? 

Reference:  Flu like sx.   Stay at home, monitor your temperature, drink fluids and rest.   If your cough is associated with fever and/or does not begin to improve, please see your PCP.  You may have bacterial bronchitis -avoid smoke, dust, mold, animal dander and pollen. Influenza and COVID are associated with fever. The cough with COVID is typically dry. If you have underlying lung disease, test for CoV/flu.  Would you like to video or text chat with me? 

Predict: No. You have been exposed to a virus. The virus has been in your chest for over a week. You should not be exposed to it. 




A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Patient: Chronic sore throat for 4 months. Not strep. No fever. Bloody green phlegm for a while. Sinus congestion and pressure sometimes. Bad breath in back of throat sometimes. Maybe post nasal drip. 2 wks. Of amoxicillin didn’t help. Is this mild COVID-19? 

Reference:  Bacterial likely. Use nasal saline rinses, a nasal steroid and discuss the use of Augmentin for 10 days with your PCP. It is more likely that you have a bacterial infection. An examination of your mouth or uploaded photo to a HT virtual consultation is important. Avoid smoke and if you are a smoker, please consider a biopsy of the area. . 

Predict: Hello,    You are in good shape.   If you have strep or fever,  You should get tested for COVID19.  If 




A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Patient: Been sneezing / stuffy nose and slight sore throat since returning from France, temperature is okay, not coughing or high fever symptoms, should I be alarmed? 

Reference:  Simple Cold but..... Sounds like simple Cold, But under the current health challenges of the world, it's best to be preemptive. And because Covid-19 is an infectious disease mostly transmitted through respiratory fluid droplets, One should first follow the WHO/CDC  current guidelines on hygiene. Then gargle with salt-water or vinegar-water after each meal/bedtime plus take Aleve or Ibuprofen for soreness/fever...   . 

Predict: Hello, symptoms are normal.    The fever is not high and you are not having any symptoms.  If you are having trouble breathing or have a fever, then you should be alright.  Would you like to video or text chat with me? 




A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Patient: Can I go for Coronavirus testing if my nose is blocked and I have traveled from a high risk country? 

Reference:  Coronavirus testing. Yes, get tested if symptomatic (early symptoms could be mild respiratory symptoms and nonspecific) or self quarantine given your high risk of exposure. . 

Predict: Not sure if this is a concern, but you should be fine.    If you have a nose problem, you should not travel to the country with high risk people, but if you have access to a high dose of covid-19, you are good to go.  Would you like to video or text chat with me? 




A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Patient: My 5 year old daughter has just come down with a 39 degree fever, sneezing, runny nose and cough. Should we get her tested? 

Reference:  Maybe. Has she been around anyone who has tested positive for COVID-19?  Is she having shortness of breath or labored breathing?  If so, then, yes, consider getting her tested.   If not, it is more likely that she has the flu or some other virus.  If it seems like more than just a regular cold, reach out to her pediatrician for more advice. 

Predict: Yes. Yes. Yes, you should get tested. 




A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Patient: I am a quadriplegic and have a tight chest and slight difficulty breathing.  Overnight I started feeling like I need to clear my throat. I flew home to Joburg from CPT 8 days ago & have been self isolating as a precaution. ? 

Reference:  Monitor.   Stay at home, monitor your temperature, drink fluids and rest.   Nasal congestion and post nasal drip is not typically associated with COVID-19. Watch for dry cough, fever and if your shortness of breath does not resolve, discuss whether you need an inhaler, chest physiotherapy and/or testing. If you have a trach, watch for signs of infection such as more secretions or a change in color .  Would you like to video or text chat with me? 

Predict: In brief: Yes, it is a common symptom of a fever.    You should consult your physician for this type of pneumonia.  You will need to consult your doctor for further questions.  Would you like 




A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Patient: Hi I am 39 years old and returned from Germany 19 days ago. Yesterday I started getting a sore throat, runny nose. Today I have sinus pressure and a headache with a blocked nose, throat seems to be improving. Should I get tested. If so how? 

Reference:  Stay home. Stay at home, rest, drink fluids and monitor your temperature. Your symptoms are more consistent with a cold or sinus infection. Nasal saline and a decongestant may be helpful. If you develop COVID symptoms including dry cough, fever, and/or shortness of breath, testing is indicated. Report fever and shortness of breath to your PCP. 

Predict: No. You are fine. 




A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Patient: Hi I am 39 years old and have flu like symptoms as of yesterday. Runny nose, sore throat. No fever yet. I returned from Germany 19 days ago. Should I get tested for cov19? 

Reference:  If you get fever yes. but from what we know it may bee a little late to get the chines virus after so many days. Monitor tempreture. 

Predict: Yes.    If you have covid-19, get tested.  If your symptoms are worsening, then get tested and get tested again.  Would you like to video or text chat with me? 




A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Patient: If you have wolf Parkinson white syndrome does this make you more of a risk with Covid 19? 

Reference:  Avoid cold/ice... Though untreated WPW Syndrome can be regarded as pre-existing chronic condition, there is no current evidence that one is more susceptible to Covid-19, an infectious disease mostly transmitted through Respiratory Fluid Droplets. Best to follow WHO/CDC guideline on hygiene. Plus avoid cold/ice  drinks/foods. Drink warm water/tea, even chicken soup. Aleve or ibuprofen can be helpful for fever/soreness. 

Predict: No.    Would you like to video or text chat with me? 




A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Patient: Criteria for COVID test? 

Reference:  Real vs. textbook. Real-life criteria is different because we don't have enough tests, supplies, or medical staff around the world at this time (March 2020). Textbook criteria are  suspected COVID in hospitalized persons or in those with underlying medical conditions; or persons who got symptoms less than 14 days after (1) contact with a suspect or confirmed COVID patient or (2) travel through high-risk COVID area. 

Predict: Not sure on the criteria.    Would you like to video or text chat with me? 




A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Patient: I have been feeling sick for a couple of days now and tried self medication but does not seem to help. I suffer from Polycystic Kidney disease, low blood pressure, Anemic, Endometriosis, Chronic sinus. 

Reference:  Screen sputum. You may have a bacterial illness such as bronchitis, pneumonia or a sinus infection.  Your PCP may collect sputum, screen for Strep/influenza. COVID-19 is usually associated with a dry cough, but secondary infection may occur. Please see your PCP. 

Predict: Hello and Welcome to the Magic Magic Magic.Hope I have answered your query. Magic Magic is a great service. I have been using Magic Magic for over a year. Magic. Magic is the Magic. 




A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Patient: I have had a runny nose and have been coughing and have had on and off headaches. I went to the doctor the first time, he gave me an antibiotic and i got worse then I went back a and he gave me a different one but it seems I still have these symptoms? 

Reference:  Likely a virus.   What you describe is likely caused by a virus, antibiotics  don't do anything against viruses.  Would you like to video or text chat with me? 

Predict: No, not at all. You have to go to the ER and get tested.    You are in the right place right now.  You should go to your doctor to get tested and get 




A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Patient: I am experiencing headache (severity: moderate) (time frame: gradual) (occurred: 1 - 3 hours) (side: front) (worsened by: psychological stress, being tired, menstruation, skipping meals) (unaffected by: bending forward, changes in sleep pattern, sneeze or. 

Reference:  Many possibilities.   Most people get a headache at some time, such as from colds, infections (usually viruses or sinusitis), lack of sleep, too much driving, long hours on the computer, stress at work, etc.  ..  A bad or severe headache, increasing headache, recurrent headache, weird headache, or one with other worrisome symptoms should be checked by one's primary care doctor or the ER. Can call one's Dr. first to chat.  Would you like to video or text chat with me? 

Predict: Hello and Welcome to r askhealthcare. I have reviewed your query and here is my advice for you. 1. Avoid strep 2. Do not have a 




A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Patient: This may sound silly but I’m wondering, if touch something in store that has Coronavirus 19 on it and it gets on clothes should we go home wash them soon as possible then shower and new clothes on so it dont get on our stuff at home ? 

Reference:  Washing clothes.   Although so far it has not been shown to live on surfaces like clothes for as long as it lives on surfaces like plastic and metal, it is definitely good hygienic practice to wash your clothes in the hottest water the garment allows.  Would you like to video or text chat with me? 

Predict: If you have Coronabirus 19, wash your hands thoroughly and keep your hands clean. If you have access to a dryer and or soap you are washing your hands frequently, wash hands thoroughly after washing your hand 




A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Patient: Re Covid 19 I understand the incubation period of Covid 19 is up to 14 days. But how long is the contagious period? I have read one is contagious 2 days before showing symptoms. Is that true? Are you still contagious until symptom free? 

Reference:  Covid. For most people the infection can last up to two weeks. In people who get particularly ill the infection last 4 to 6 weeks. . 

Predict: I am not a doctor. I have been in contact with a patient who has been tested for Covid19. I am a non medical patient and I have tested positive for Covide 19. I also have been to a doctor who 




A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Patient: Hi I am a health worker and a chronic asthmatic, is it easier for me to contract Corona? Will I survive it with my comorbidity? Should I ask for special leave during this time? I already feel my sinus blocked, very minor sore throat, seldom dry cough. 

Reference:  Check with PCP.   Data emerging from preliminary studies in France suggest a decades old medicine, chloroquine may be useful as a preventative drug.   It has been used to prevent malaria for decades & manufactures in the US have announced they are increasing production to meet an expected demand. Your status & what they are doing in your country should be answered by your doc.  Would you like to video or text chat with me? 

Predict: Corona.    Would you like to video or text chat with me? 




A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Patient: Possible COVID-19 - mild symptoms - Next steps? Symptoms: - Initially: tight chest, stuffy nose - 10% difficulty breathing - Mildly feverish (hot and dry mouth, bit weak) for a few hours yesterday - Mostly fine this morning (5% breathing difficulty). 

Reference:  May confirm with. 

Predict: Hello and Welcome to the Healthcare Magic Magic Magic. I will be happy to answer your query. If you have any further questions or concerns please feel free to contact me. 




A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Patient: Throat a bit sore and want to get a good imune booster, especially in light of the virus. Please advise. Have not been in contact with Nyone with the virus. 

Reference:  During this pandemic. Throat pain can be from a strep throat infection (antibiotics needed), a cold or influenza or other virus, or from some other cause such as allergies or irritants. Usually, a person sees the doctor (call first) if the sore throat is bothersome, recurrent, or doesn't go away quickly. COVID-19 infections tend to have cough, whereas strep throat usually lacks cough but has more throat pain. (3/21/20). 

Predict: Hello, Please contact your local doctor. I have been in touch with your doctor. He has been working with me. He is currently working on a vaccine. I will contact your doctor if I can assist you. Regards,Dr.Jolanda. 




A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Patient: I have had a sore throat (glands enlarged in throat) for over a week and I have constant headache. Do I need COVID-19 test. If yes, how do a get someone to come to my house to test. 

Reference:  Testing.   People are still getting other illnesses than COVID.    Given no cough or shortness of breath, other than watchful waiting COVID should pass in a week or two. Strep, influenza still possible. This testing can be done through your own provider. You can also get your own strep test at pharmacy or many can swab you there.   Would you like to video or text chat with me? 

Predict: Hello and Welcome to Healthcare Magic.   If you have COVID (or a cough) or a fever, get tested for COVID 19. If you are sick or have a fever or have been sick, get checked for COID-19. 




A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Patient: Do I just go to a Dr to determine if it's a sinus infection or do I go get tested for COVID first? To not risk anyone in practice? 

Reference:  Phone triage.   Call your GP's office for instructions and identify local resources.   Your symptoms are suggesting sinusitis (other causes still possible) that is frequently viral and self limited. Local regulations should also be followed in such circumstances. Self quarantine will work for all in trying to prevent the spread. Hydration and rest will also work regardless of what the cause for your symptoms is.  Would you like to video or text chat with me? 

Predict: Yes. If you are in contact with someone who has been in contact for over a year, then you should be alright. 




A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Patient: I have a dry cough and sore throat- it's been a week now and the cough seems to be getting worse- no runny nose or fever, sometimes a headache, no shortness of breath...should I get tested for covid19? 

Reference:  COVID   Good guidelines can be found at cdc.  gov/coronavirus/2019.  You would be considered low risk. Symptoms last up to two weeks. High probability many will get the disease and testing leads to no change in action at this time.  No obvious therapy (however some encouraging possibilites) and for most people the virus runs its course without incidence. Stay put and talk to your provider.   Would you like to video or text chat with me? 

Predict: No, you should not get tested. You should not be getting tested. The test is for the virus. If you have a fever or runny throat, you need to get tested, if you are not having a fever, then you should 




A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Patient: If you've been infected by SARS-CoV-2, develop Covid-19 symptoms and recover, can you: 1) develop immunity to Covid-19 2) infect others after 14 days post-symptom onset & recovery 3) get re-infected later by SARS-CoV-2 despite developing immunity? 

Reference:  Yes, Maybe, Maybe. We presume one is immune to a respiratory virus for a while after getting the infection (that's the way it is with other viruses). A few people have trouble with their antibodies and don't have much immunity. Shedding of viruses sometimes is longer than 14 days. In life, there will always be some people who get reinfected, for whatever reason. Usually one is immune after the infection. (3/21/20). 

Predict: Infection is not a virus. It is a virus, and if you develop immunity you will be fine. However, if you have been 




A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Patient: If I get coronavirus, are there ways to avoid getting other illnesses as well, like pneumonia? 

Reference:  Coronavirus,   COVID-19 has high affinity to the lung, causing viral pneumonia.   Fortunately, most people get only mild respiratory symptoms. Those who have high risks (age over 65, chronic diseases) tend to get pneumonia and complications. The most important way to prevent the complications is not to get infection, practice common sense of prevention. If you catch the infection, start early treatment.  Would you like to video or text chat with me? 

Predict: No. You can get coronovirus. It is not a virus. It can cause you to get other symptoms. If you have a high fever and a high cough you will have other symptoms, such as shortness of breath, or a shortness cough. 




A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Patient: I was on a plane on March 8th have been home ever since self quarantine myself I have had headaches and runny throat I have 2 days left of my self quarantine should I be having worse symptoms by now ? 

Reference:  COVID   These are HealthTap guidelines:https://www.  healthtap.com/blog/covid-19-care-guidelines/virtual-care-guidelines-for-covid-19?utm_campaign=COVID%20OUTREACH&utm_medium=email&_hsmi=84951814&_hsenc=p2ANqtz-_jlHAfqZrXrCncGR4mJ57eQcomM2TItlnybAQii_sH-f4IyZjpz7-eIVc1oKN2_qDpZnO3&utm_content=84951814&utm_source=hs_emailI agree your symptoms should be improving.  Contact provider.   Would you like to video or text chat with me? 

Predict: Yes, self quarantine is a good way to prevent COVID-19.    You can get tested for COVID and then get tested again.  If you have been in contact with someone who has been in touch with you for over a week, then get test. 




A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Patient: My son has a fever, I am concerned due to the coronavirus ? 

Reference:  Covid-19 unlikely.   The usual infections in your area of Kwa Zulu Natal are more likely to be the cause of his fever. If he is exposed to someone who is sick with Covid-19, then you need to be concerned.   Meanwhile, treat him as you would have treated him 2 months ago.  If he has shortness of breath or any form of difficulty breathing, get him to a doctor ASAP  Would you like to video or text chat with me? 

Predict: No fever. Only symptoms are a slight cough and a slight headache. 




A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Patient: Can coronavirus symptoms be mild for some people versus severe? For example, could it just involve being very fatigued, low grade fever for a few days and not the extreme symptoms? Or is it always a full blown cold and struggle to breathe? 

Reference:  Symptoms vary.   Some may have no symptoms at all. Some can be life threatening.   Would you like to video or text chat with me? 

Predict: Yes, you are at high risk of coronaviruses. The coronaviral virus is the cause of the fever and shortness of breath, so if you have been sick or have been in contact with someone with a fever or short 




A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Patient: Good day,  I am 29 weeks pregnant and my profession is teaching, I also have a 2 year old son.  My chest and throat is sore, I have a constant headache and feel dizzy at times. Is it necessary to get myself tested for the coronavirus? 

Reference:  Can give Dr. a call.   Throat pain can be from a strep throat infection (antibiotics needed), a cold or influenza (antibiotics usually not used), or from other causes such as allergies, irritants, or COVID-19.   Usually, a person sees the doctor (call first) if the sore throat is bothersome, recurrent, or doesn't go away quickly. COVID-19 tends to have cough, whereas strep usually lacks cough but has more throat pain  Would you like to video or text chat with me? 

Predict: No, but you should be alright.    You are in good health and have a good chance of getting tested.  You should also be alright if you have a fever, but if you do not 


61
