<a href="https://colab.research.google.com/github/kevalkamani/AIMLOps_Miniprojects/blob/mp_8_GPT2/M6_NB_MiniProject_1_Deploy_Medical_Q%26A_GPT2_Keval.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Advanced Certification Programme in AI and MLOps
## A programme by IISc and TalentSprint
### Mini-Project: Medical Q&A using GPT2 | Deployment on Hugging Face Spaces

## Learning Objectives

At the end of the experiment, you will be able to:

* perform data preprocessing, EDA and feature extraction on the Medical Q&A dataset
* load a pre-trained tokenizer
* finetune a GPT-2 language model for medical question-answering
* upload your fine-tuned model to Hugging Face Model Hub
* deploy application with uploaded model on HuggingFace Spaces using Gradio

## Dataset Description

The dataset used in this project is the *Medical Question Answering Dataset* ([MedQuAD](https://github.com/abachaa/MedQuAD/tree/master)). It includes medical question-answer pairs along with additional information, such as the question type, the question *focus*, its UMLS(Unified Medical Language System) details like - Concept Unique Identifier(*CUI*) and Semantic *Type* and *Group*.

To know more about this data's collection, and construction method, refer to this [paper](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-3119-4).

The data is extracted and is in CSV format with below features:

- **Focus**: the question focus
- **CUI**: concept unique identifier
- **SemanticType**
- **SemanticGroup**
- **Question**
- **Answer**

## Grading = 10 Points

## Information

Healthcare professionals often have to refer to medical literature and documents while seeking answers to medical queries. Medical databases or search engines are powerful resources of upto date medical knowledge. However, the existing documentation is large and makes it difficult for professionals to retrieve answers quickly in a clinical setting. The problem with search engines and informative retrieval engines is that these systems return a list of documents rather than answers. Instead, healthcare professionals can use question answering systems to retrieve short sentences or paragraphs in response to medical queries. Such systems have the biggest advantage of generating answers and providing hints in a few seconds.

### Problem Statement

Fine-tune gpt2 model on medical-question-answering-dataset for performing response generation for medical queries. Later, deploy the fine-tuned model on Hugging Face Spaces.

Please refer to ***M6 Assignment-1 Fine-tune GPT2*** and ***M6 AdditionalNB Fine-tune GPT2 for TextClassification*** to get familiar with how to load pre-trained gpt2 tokenizer and model.

Please refer to ***The demo session held on 14 Sep - Hugging Face Spaces Deployment*** to get familiar with how to do deployment using Hugging Face Spaces.

### Installing Dependencies

In [1]:
%%capture
!pip -q uninstall pyarrow -y
!pip -q install pyarrow==15.0.2
!pip -q install datasets
!pip -q install accelerate
!pip -q install transformers

### <font color="#990000">Restart Session/Runtime</font>

### Import required packages

In [1]:
import os
import re
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
import torch
from transformers import GPT2Tokenizer, GPT2LMHeadModel, DataCollatorForLanguageModeling
from transformers import Trainer, TrainingArguments

import warnings
warnings.filterwarnings('ignore')

In [2]:
#@title Download the dataset
!wget -q https://cdn.iisc.talentsprint.com/AIandMLOps/MiniProjects/Datasets/MedQuAD.csv
!ls | grep ".csv"

MedQuAD.csv


**Exercise 1: Read the MedQuAD.csv dataset**

**Hint:** pd.read_csv()

In [3]:
pd.set_option('display.max_colwidth', None)
pd.set_option('display.max_rows', None)

In [13]:
data = pd.read_csv('MedQuAD.csv')

In [14]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16412 entries, 0 to 16411
Data columns (total 6 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   Focus          16398 non-null  object
 1   CUI            15847 non-null  object
 2   SemanticType   15815 non-null  object
 3   SemanticGroup  15847 non-null  object
 4   Question       16412 non-null  object
 5   Answer         16407 non-null  object
dtypes: object(6)
memory usage: 769.4+ KB


In [15]:
data.head(2)

Unnamed: 0,Focus,CUI,SemanticType,SemanticGroup,Question,Answer
0,Adult Acute Lymphoblastic Leukemia,C0751606,T191,Disorders,What is (are) Adult Acute Lymphoblastic Leukemia ?,"Key Points - Adult acute lymphoblastic leukemia (ALL) is a type of cancer in which the bone marrow makes too many lymphocytes (a type of white blood cell). - Leukemia may affect red blood cells, white blood cells, and platelets. - Previous chemotherapy and exposure to radiation may increase the risk of developing ALL. - Signs and symptoms of adult ALL include fever, feeling tired, and easy bruising or bleeding. - Tests that examine the blood and bone marrow are used to detect (find) and diagnose adult ALL. - Certain factors affect prognosis (chance of recovery) and treatment options. Adult acute lymphoblastic leukemia (ALL) is a type of cancer in which the bone marrow makes too many lymphocytes (a type of white blood cell). Adult acute lymphoblastic leukemia (ALL; also called acute lymphocytic leukemia) is a cancer of the blood and bone marrow. This type of cancer usually gets worse quickly if it is not treated. Leukemia may affect red blood cells, white blood cells, and platelets. Normally, the bone marrow makes blood stem cells (immature cells) that become mature blood cells over time. A blood stem cell may become a myeloid stem cell or a lymphoid stem cell. A myeloid stem cell becomes one of three types of mature blood cells: - Red blood cells that carry oxygen and other substances to all tissues of the body. - Platelets that form blood clots to stop bleeding. - Granulocytes (white blood cells) that fight infection and disease. A lymphoid stem cell becomes a lymphoblast cell and then one of three types of lymphocytes (white blood cells): - B lymphocytes that make antibodies to help fight infection. - T lymphocytes that help B lymphocytes make the antibodies that help fight infection. - Natural killer cells that attack cancer cells and viruses. In ALL, too many stem cells become lymphoblasts, B lymphocytes, or T lymphocytes. These cells are also called leukemia cells. These leukemia cells are not able to fight infection very well. Also, as the number of leukemia cells increases in the blood and bone marrow, there is less room for healthy white blood cells, red blood cells, and platelets. This may cause infection, anemia, and easy bleeding. The cancer can also spread to the central nervous system (brain and spinal cord). This summary is about adult acute lymphoblastic leukemia. See the following PDQ summaries for information about other types of leukemia: - Childhood Acute Lymphoblastic Leukemia Treatment. - Adult Acute Myeloid Leukemia Treatment. - Childhood Acute Myeloid Leukemia/Other Myeloid Malignancies Treatment. - Chronic Lymphocytic Leukemia Treatment. - Chronic Myelogenous Leukemia Treatment. - Hairy Cell Leukemia Treatment."
1,Adult Acute Lymphoblastic Leukemia,C0751606,T191,Disorders,What are the symptoms of Adult Acute Lymphoblastic Leukemia ?,"Signs and symptoms of adult ALL include fever, feeling tired, and easy bruising or bleeding. The early signs and symptoms of ALL may be like the flu or other common diseases. Check with your doctor if you have any of the following: - Weakness or feeling tired. - Fever or night sweats. - Easy bruising or bleeding. - Petechiae (flat, pinpoint spots under the skin, caused by bleeding). - Shortness of breath. - Weight loss or loss of appetite. - Pain in the bones or stomach. - Pain or feeling of fullness below the ribs. - Painless lumps in the neck, underarm, stomach, or groin. - Having many infections. These and other signs and symptoms may be caused by adult acute lymphoblastic leukemia or by other conditions."


### Pre-processing and EDA

**Exercise 2: Perform below operations on the dataset [0.5 Mark]**

- Handle missing values
- Remove duplicates from data considering `Question` and `Answer` columns

- **Handle missing values**

In [16]:
data.isna().sum()

Unnamed: 0,0
Focus,14
CUI,565
SemanticType,597
SemanticGroup,565
Question,0
Answer,5


In [17]:
df = data.dropna(subset=['Answer'])
print(f"Shape after dropping missing values: {df.shape}")

Shape after dropping missing values: (16407, 6)


In [18]:
df.isna().sum()

Unnamed: 0,0
Focus,14
CUI,565
SemanticType,597
SemanticGroup,565
Question,0
Answer,0


- **Remove duplicates from data considering `Question` and `Answer` columns**

In [19]:
df.duplicated(subset=['Question', 'Answer']).sum()

48

In [20]:
df = df.drop_duplicates(subset=['Question', 'Answer'])

In [21]:
print(f"Shape after removing duplicates: {df.shape}")

Shape after removing duplicates: (16359, 6)


**Exercise 3: Display the category name, and the number of records belonging to top 100 categories of `Focus` column [0.5 Mark]**

In [22]:
# Total categories in Focus column
print(f"Total categories in Focus column: {df['Focus'].nunique()}")

Total categories in Focus column: 5125


In [23]:
# Displaying the distinct categories of Focus column and the number of records belonging to each category
# (Top 100 only)

df_top = df.groupby(['Focus'], as_index=False).size().sort_values(by='size', ascending=False).reset_index(drop=True)[:100]
df_top

Unnamed: 0,Focus,size
0,Breast Cancer,53
1,Prostate Cancer,43
2,Stroke,35
3,Skin Cancer,34
4,Alzheimer's Disease,30
5,Colorectal Cancer,29
6,Lung Cancer,29
7,High Blood Cholesterol,28
8,Heart Attack,28
9,Heart Failure,28


In [24]:
# Top 100 Focus categories names
top_100_names = df_top['Focus'].unique()
top_100_names

array(['Breast Cancer', 'Prostate Cancer', 'Stroke', 'Skin Cancer',
       "Alzheimer's Disease", 'Colorectal Cancer', 'Lung Cancer',
       'High Blood Cholesterol', 'Heart Attack', 'Heart Failure',
       'High Blood Pressure', "Parkinson's Disease", 'Leukemia',
       'Osteoporosis', 'Shingles', 'Hemochromatosis', 'Diabetes',
       'Age-related Macular Degeneration', 'Psoriasis',
       'Gum (Periodontal) Disease', 'Diabetic Retinopathy',
       'Kidney Disease', 'Dry Mouth', 'Balance Problems', 'COPD',
       'Cataract', 'Glaucoma', 'Gout', 'Wilson Disease',
       'Prescription and Illicit Drug Abuse',
       'Medicare and Continuing Care', 'Rheumatoid Arthritis',
       'Short Bowel Syndrome', 'Osteoarthritis', 'Problems with Taste',
       'Endometrial Cancer', 'Narcolepsy', 'Neuroblastoma',
       'Pituitary Tumors', 'Dry Eye', 'Kidney Dysplasia',
       'Anxiety Disorders', 'Urinary Tract Infections in Children',
       'Problems with Smell', 'Surviving Cancer',
       'Perip

### Create Training and Validation set

**Exercise 4: Create training and validation set [1 Mark]**

- Consider 4 samples per `Focus` category, for each top 100 categories, from the dataset (It will give 400 samples for training)

- Consider 1 sample per `Focus` category (different from training set), for each top 100 categories, from the dataset (It will give 100 samples for validation)

In [25]:
filtered_df = df[df['Focus'].isin(top_100_names)]
filtered_df.reset_index(inplace=True)

selected_samples = filtered_df.groupby('Focus').apply(lambda x: x.sample(n=4, random_state=42)).reset_index(drop=True)

shuffled_samples = selected_samples.sample(frac=1, random_state=42).reset_index(drop=True)

train_df = shuffled_samples.copy()
print(f"Shape of training set: {train_df.shape}")

Shape of training set: (400, 7)


In [26]:
train_df.head(2)

Unnamed: 0,index,Focus,CUI,SemanticType,SemanticGroup,Question,Answer
0,15242,Knee Replacement,C2186386,T033,Disorders,What are the complications of Knee Replacement ?,"To reduce the risk of clots, your doctor may have you elevate your leg periodically and prescribe special exercises, support hose, or blood thinners. To reduce the risk of infection, your doctor may prescribe antibiotics for you to take prior to your surgery and for a short time afterward."
1,15357,Parkinson's Disease,C0030567,T047,Disorders,What are the treatments for Parkinson's Disease ?,"Deep Brain Stimulation Deep brain stimulation, or DBS, is a surgical procedure used to treat a variety of disabling disorders. It is most commonly used to treat the debilitating symptoms of Parkinsons disease. Deep brain stimulation uses an electrode surgically implanted into part of the brain. The electrodes are connected by a wire under the skin to a small electrical device called a pulse generator that is implanted in the chest. The pulse generator and electrodes painlessly stimulate the brain in a way that helps to stop many of the symptoms of Parkinson's such as tremor, bradykinesia, and rigidity. DBS is primarily used to stimulate one of three brain regions: the subthalamic nucleus, the globus pallidus, or the thalamus. Researchers are exploring optimal generator settings for DBS, whether DBS of other brain regions will also improve symptoms of Parkinsons disease, and also whether DBS may slow disease progression. Deep brain stimulation usually reduces the need for levodopa and related drugs, which in turn decreases dyskinesias and other side effects. It also helps to relieve on-off fluctuation of symptoms. People who respond well to treatment with levodopa tend to respond well to DBS. Unfortunately, older people who have only a partial response to levodopa may not improve with DBS. Complementary and Supportive Therapies A wide variety of complementary and supportive therapies may be used for Parkinson's disease. Among these therapies are standard physical, occupational, and speech therapies, which help with gait and voice disorders, tremors and rigidity, and decline in mental functions. Other supportive therapies include diet and exercise. Diet At this time there are no specific vitamins, minerals, or other nutrients that have any proven therapeutic value in Parkinson's disease. Some early reports have suggested that dietary supplements might protect against Parkinson's. Also, a preliminary clinical study of a supplement called coenzyme Q10 suggested that large doses of this substance might slow disease progression in people with early-stage Parkinson's. This supplement is now being tested in a large clinical trial. Other studies are being conducted to find out if caffeine, antioxidants, nicotine, and other dietary factors may help prevent or treat the disease. While there is currently no proof that any specific dietary factor is beneficial, a normal, healthy diet can promote overall well-being for people with Parkinson's disease, just as it would for anyone else. A high protein meal, however, may limit levodopa's effectiveness because for a time afterwards less levodopa passes through the blood-brain barrier. Exercise Exercise can help people with Parkinson's improve their mobility and flexibility. It can also improve their emotional well-being. Exercise may improve the brain's dopamine production or increase levels of beneficial compounds called neurotrophic factors in the brain. Other Therapies Other complementary therapies include massage therapy, yoga, tai chi, hypnosis, acupuncture, and the Alexander technique, which improves posture and muscle activity. There have been limited studies suggesting mild benefits from some of these therapies, but they do not slow Parkinson's disease and to date there is no convincing evidence that they help. However, this remains an active area of investigation."


In [27]:
filtered_df = df[df['Focus'].isin(top_100_names)]
filtered_df = filtered_df[~filtered_df.index.isin(train_df['index'])]

selected_samples = filtered_df.groupby('Focus').apply(lambda x: x.sample(n=1, random_state=42)).reset_index(drop=True)

shuffled_samples = selected_samples.sample(frac=1, random_state=42).reset_index(drop=True)

val_df = shuffled_samples.copy()
print(f"Shape of validation set: {val_df.shape}")

Shape of validation set: (100, 6)


In [28]:
# Drop the index column in train_df

train_df.drop('index', axis=1, inplace=True)
train_df.shape

(400, 6)

### Pre-process `Question` and `Answer` text

**Exercise 5: Perform below tasks:  [1 Mark]**

- Combine `Question` and `Answer` for train and validation data as shown below:
    - sequence = *'\<question\>' + question-text + '\<answer\>' + answer-text + '\<end\>'*

- Join the combined text using '\n' into a single string for training and validation separately

- Save the training and validation strings as separate text files

- **Combine Question and Answer for train and val data**

In [29]:
# Combine Questions and Answers for train and val data
## sequence = '<question>' + question + '<answer>' + answer

train_df['combined'] = '<question>' + train_df['Question'] + '<answer>' + train_df['Answer'] + '<end>'
train_df.head(1)

Unnamed: 0,Focus,CUI,SemanticType,SemanticGroup,Question,Answer,combined
0,Knee Replacement,C2186386,T033,Disorders,What are the complications of Knee Replacement ?,"To reduce the risk of clots, your doctor may have you elevate your leg periodically and prescribe special exercises, support hose, or blood thinners. To reduce the risk of infection, your doctor may prescribe antibiotics for you to take prior to your surgery and for a short time afterward.","<question>What are the complications of Knee Replacement ?<answer>To reduce the risk of clots, your doctor may have you elevate your leg periodically and prescribe special exercises, support hose, or blood thinners. To reduce the risk of infection, your doctor may prescribe antibiotics for you to take prior to your surgery and for a short time afterward.<end>"


In [30]:
val_df['combined'] = '<question>' + val_df['Question'] + '<answer>' + val_df['Answer'] + '<end>'
val_df.head(1)

Unnamed: 0,Focus,CUI,SemanticType,SemanticGroup,Question,Answer,combined
0,Prostate Enlargement: Benign Prostatic Hyperplasia,C0426732,T191,Disorders,What causes Prostate Enlargement: Benign Prostatic Hyperplasia ?,"The cause of benign prostatic hyperplasia is not well understood; however, it occurs mainly in older men. Benign prostatic hyperplasia does not develop in men whose testicles were removed before puberty. For this reason, some researchers believe factors related to aging and the testicles may cause benign prostatic hyperplasia. Throughout their lives, men produce testosterone, a male hormone, and small amounts of estrogen, a female hormone. As men age, the amount of active testosterone in their blood decreases, which leaves a higher proportion of estrogen. Scientific studies have suggested that benign prostatic hyperplasia may occur because the higher proportion of estrogen within the prostate increases the activity of substances that promote prostate cell growth. Another theory focuses on dihydrotestosterone (DHT), a male hormone that plays a role in prostate development and growth. Some research has indicated that even with a drop in blood testosterone levels, older men continue to produce and accumulate high levels of DHT in the prostate. This accumulation of DHT may encourage prostate cells to continue to grow. Scientists have noted that men who do not produce DHT do not develop benign prostatic hyperplasia.","<question>What causes Prostate Enlargement: Benign Prostatic Hyperplasia ?<answer>The cause of benign prostatic hyperplasia is not well understood; however, it occurs mainly in older men. Benign prostatic hyperplasia does not develop in men whose testicles were removed before puberty. For this reason, some researchers believe factors related to aging and the testicles may cause benign prostatic hyperplasia. Throughout their lives, men produce testosterone, a male hormone, and small amounts of estrogen, a female hormone. As men age, the amount of active testosterone in their blood decreases, which leaves a higher proportion of estrogen. Scientific studies have suggested that benign prostatic hyperplasia may occur because the higher proportion of estrogen within the prostate increases the activity of substances that promote prostate cell growth. Another theory focuses on dihydrotestosterone (DHT), a male hormone that plays a role in prostate development and growth. Some research has indicated that even with a drop in blood testosterone levels, older men continue to produce and accumulate high levels of DHT in the prostate. This accumulation of DHT may encourage prostate cells to continue to grow. Scientists have noted that men who do not produce DHT do not develop benign prostatic hyperplasia.<end>"


- **Join the combined text using '\n' into a single string for training and validation separately**

In [31]:
# Train and Validation text for all Q&As

train_string = '\n'.join(train_df['combined'])
val_string = '\n'.join(val_df['combined'])

- **Save the training and validation strings as text files**

In [32]:
# Save the training and validation data as text files

with open("train.txt", "w") as f:
    f.write(train_string)

with open("val.txt", "w") as f:
    f.write(val_string)

**Exercise 6: Load pre-trained GPT2Tokenizer**

- Use checkpoint = "gpt2"

In [33]:
# Set up the tokenizer
checkpoint = "gpt2"
tokenizer = GPT2Tokenizer.from_pretrained(checkpoint)

# set pad_token_id to unk_token_id
tokenizer.pad_token = tokenizer.unk_token

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

**Exercise 7: Tokenize train and validation data [0.5 Mark]**

- Use the loaded pre-trained tokenizer
- Use training and validation data saved in text files

In [34]:
from datasets import load_dataset

train_file_path = 'train.txt'
val_file_path = 'val.txt'

dataset = load_dataset("text", data_files={"train": train_file_path,
                                           "validation": val_file_path})

Generating train split: 0 examples [00:00, ? examples/s]

Generating validation split: 0 examples [00:00, ? examples/s]

In [35]:
dataset

DatasetDict({
    train: Dataset({
        features: ['text'],
        num_rows: 400
    })
    validation: Dataset({
        features: ['text'],
        num_rows: 100
    })
})

In [36]:
dataset['train']['text'][0]

'<question>What are the complications of Knee Replacement ?<answer>To reduce the risk of clots, your doctor may have you elevate your leg periodically and prescribe special exercises, support hose, or blood thinners. To reduce the risk of infection, your doctor may prescribe antibiotics for you to take prior to your surgery and for a short time afterward.<end>'

In [37]:
train_df['word_count'] = train_df['combined'].apply(lambda x: len(x.split()))

In [38]:
train_df['word_count'].mean(), train_df['word_count'].max()

(227.23, 2159)

In [39]:
sum(train_df['word_count'] > 512)

40

In [40]:
block_size = 512   # max tokens in an input sample

def tokenize_function(examples):
    return tokenizer(examples["text"], padding='max_length', truncation=True, max_length=block_size, return_tensors='pt')

tokenized_datasets = dataset.map(tokenize_function, batched=True)

Map:   0%|          | 0/400 [00:00<?, ? examples/s]

Map:   0%|          | 0/100 [00:00<?, ? examples/s]

In [41]:
tokenized_datasets

DatasetDict({
    train: Dataset({
        features: ['text', 'input_ids', 'attention_mask'],
        num_rows: 400
    })
    validation: Dataset({
        features: ['text', 'input_ids', 'attention_mask'],
        num_rows: 100
    })
})

In [42]:
len(tokenized_datasets['train']['input_ids'][0])

512

In [43]:
tokenizer.decode(tokenized_datasets['train']['input_ids'][0])

'<question>What are the complications of Knee Replacement?<answer>To reduce the risk of clots, your doctor may have you elevate your leg periodically and prescribe special exercises, support hose, or blood thinners. To reduce the risk of infection, your doctor may prescribe antibiotics for you to take prior to your surgery and for a short time afterward.<end><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|endoftext|><|

**Exercise 8: Create a DataCollator object**

In [44]:
# Create a Data collator object
data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False, return_tensors="pt")

**Exercise 9: Load pre-trained GPT2LMHeadModel**

In [45]:
# Set up the model
model = GPT2LMHeadModel.from_pretrained(checkpoint)

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

**Exercise 10: Fine-tune GPT2 Model [1 Mark]**

- Specify training arguments and create a TrainingArguments object (Use 30 epochs)

- Train a GPT-2 model using the provided training arguments

- Save the resulting trained model and tokenizer to a specified output directory

In [46]:
# Set up the training arguments

model_output_path = "/content/gpt_model"

training_args = TrainingArguments(
    output_dir = model_output_path,
    overwrite_output_dir = True,
    per_device_train_batch_size = 4,
    per_device_eval_batch_size = 4,
    num_train_epochs = 30,
    save_steps = 1_000,
    save_total_limit = 2,
    logging_dir = './logs',
    )

In [47]:
# Train the model
trainer = Trainer(
    model = model,
    args = training_args,
    data_collator = data_collator,
    train_dataset = tokenized_datasets["train"],
    eval_dataset = tokenized_datasets["validation"],
)

trainer.train()

Step,Training Loss
500,2.0272
1000,1.3663
1500,0.968
2000,0.7073
2500,0.5528
3000,0.4825


TrainOutput(global_step=3000, training_loss=1.0173629404703777, metrics={'train_runtime': 1823.3162, 'train_samples_per_second': 6.581, 'train_steps_per_second': 1.645, 'total_flos': 3135504384000000.0, 'train_loss': 1.0173629404703777, 'epoch': 30.0})

In [48]:
saved_model_path = "finetuned_gpt2_model"

# Save the model
trainer.save_model(saved_model_path)

# Save the tokenizer
tokenizer.save_pretrained(saved_model_path)

('finetuned_gpt2_model/tokenizer_config.json',
 'finetuned_gpt2_model/special_tokens_map.json',
 'finetuned_gpt2_model/vocab.json',
 'finetuned_gpt2_model/merges.txt',
 'finetuned_gpt2_model/added_tokens.json')

**Exercise 11: Test Model with user input prompts [1 Mark]**

- Create `generate_response()` function that takes a trained *model*, *tokenizer*, and a *prompt* string as input and generates a response using the GPT-2 model

- Test it with some user input prompts

In [49]:
def generate_response(model, tokenizer, prompt, max_length=200):

    input_ids = tokenizer.encode(prompt, return_tensors="pt")      # 'pt' for returning pytorch tensor

    # Check the device of the model
    device = next(model.parameters()).device

    # Move input_ids to the same device as the model
    input_ids = input_ids.to(device)

    # Create the attention mask and pad token id
    attention_mask = torch.ones_like(input_ids)
    pad_token_id = tokenizer.eos_token_id

    output = model.generate(
        input_ids,
        max_length=max_length,
        num_return_sequences=1,
        attention_mask=attention_mask,
        pad_token_id=pad_token_id
    )

    return tokenizer.decode(output[0], skip_special_tokens=True)

In [53]:
# Load the fine-tuned model and tokenizer

my_model = GPT2LMHeadModel.from_pretrained(saved_model_path)
my_tokenizer = GPT2Tokenizer.from_pretrained(saved_model_path)

In [54]:
# Testing with a sample prompt 1

prompt = 'What are the treatments for Gastrointestinal Stromal Tumors ?'
response = generate_response(my_model, my_tokenizer, prompt)
print("Generated response:")
response

Generated response:


'What are the treatments for Gastrointestinal Stromal Tumors?<answer>How might Gastrointestinal Stromal tumors be treated? Because gastroesophageal reflux (GI) is the bodys normal response to changes in fluid volumes, it is important to learn as much as you can about gastrointestinal reflux and how to best manage the disorder. Many gastroesophages are treated with medications, lifestyle changes, and possibly surgery. Learn more about the treatment of gastroesophageal reflux. How might Gastrointestinal Stromal Tumors Be Treated? Because gastroesophageal reflux (GI) is the bodys normal response to changes in fluid volumes, it is important to learn as much as you can about gastrointestinal reflux and how to best manage the disorder. Many gastroesophages are treated with medications, lifestyle changes, and possibly surgery. Learn more about the treatment of gastroesophageal reflux'

In [55]:
# Testing with a sample prompt 2

prompt = 'How to diagnose Parasites - Scabies ?'
response = generate_response(my_model, my_tokenizer, prompt)
print("Generated response:")
response

Generated response:


'How to diagnose Parasites - Scabies?<answer>Most people who have had scabies are not sure what is causing the condition. If symptoms appear or get worse, the most common treatment is probably to stop the disease altogether. If symptoms persist, the most common way to control them is to keep the disease under control. This involves taking steps, such as - decreasing the time it takes for the rash to get worse - increasing the time it takes for the rash to get worse - treating the underlying cause of the condition - recognizing the underlying cause of the condition and taking steps to control it Family doctors can diagnose scabies based on medical and family histories, a physical exam, laboratory tests, and test results. If symptoms appear or get worse, the most common treatment is probably to stop the disease altogether. If symptoms persist, the most common way to control them is to keep the disease under control. This involves taking steps, such as - decreasing the time it takes for t

**Exercise 12: Compare the performance of a *GPT2 model* with the *GPT2 model fine-tuned* on MedQuAD data [0.5 Mark]**

- Load another pre-trained GPT2LMHeadModel and do not fine-tune it

- To generate response using the untuned model, pass it as a parameter to `generate_response()` function

- Test both models (fine-tuned and untuned) with below user input prompts:

    - "What precautions to take for a healthy life?"
    - "What to do after being diagnosed with cancer?"
    - "What to do when feeling sick?"

In [56]:
# Load a pre-trained GPT2 model, do not finetune it with MedQuAD data
checkpoint = "gpt2"

base_model = GPT2LMHeadModel.from_pretrained(checkpoint)
base_tokenizer = GPT2Tokenizer.from_pretrained(checkpoint)

In [57]:
# Testing with finetuned model: prompt 1

prompt = "What precautions to take for a healthy life?"
response = generate_response(my_model, my_tokenizer, prompt)
print("Generated response:")
response

Generated response:


'What precautions to take for a healthy life? Check out these tips for healthy eating and diet. - Be sure to include salt in your diet. - Limit how much you eat. - Keep track of how much you eat and how much you get high. - Check your blood pressure and cholesterol daily. - Check your heart health and medicines for extra damage to your heart. - Check your blood pressure and cholesterol daily. - Check your heart health and medicines for extra damage to your heart. - Eat a well-balanced diet. - Limit how much you eat. Keep track of how much you eat and how much you get high. Check your blood pressure and cholesterol daily. Check your heart health and medicines for extra damage to your heart. Eat a well-balanced diet. - Limit how much you eat. Keep track of how much you eat and how much you get high. - Eat a well-balanced diet. - Limit how much you eat. Keep track of how much you eat. Keep track'

In [58]:
# Testing with untuned model: prompt 1

prompt = "What precautions to take for a healthy life?"
response = generate_response(base_model, base_tokenizer, prompt)
print("Generated response:")
response

Generated response:


"What precautions to take for a healthy life?\n\nThe following are some of the most common questions you'll hear from your doctor or nurse about your health.\n\nWhat are the risks of taking a drug that can cause cancer?\n\nThe risks of taking a drug that can cause cancer are very high.\n\nWhat are the risks of taking a drug that can cause cancer?\n\nThe risks of taking a drug that can cause cancer are very high.\n\nWhat are the risks of taking a drug that can cause cancer?\n\nThe risks of taking a drug that can cause cancer are very high.\n\nWhat are the risks of taking a drug that can cause cancer?\n\nThe risks of taking a drug that can cause cancer are very high.\n\nWhat are the risks of taking a drug that can cause cancer?\n\nThe risks of taking a drug that can cause cancer are very high.\n\nWhat are the risks of taking a drug that can cause"

In [77]:
# Testing with finetuned model: prompt 2

prompt = "What to do after being diagnosed with cancer?"
response = generate_response(my_model, my_tokenizer, prompt)
print("Generated response:")
response

Generated response:


'What to do after being diagnosed with cancer? - Talk with your doctor or other health care provider about what you need to know about your symptoms and possible treatment. - Check out the labels on food, medicines, and other products you take. - Check out the ingredients list on the package. - Check the ingredients list on the package. - Check the ingredients list on the package. - Check the ingredients list on the package. - Check the packaging for the medicine. (Some medicines can have additives.) - Check the ingredients list on the package. (Some medicines can have additives.) Check the packaging for the medicine. (Some medicines can have additives.) Talk with your doctor or other health care provider about what you need to know about your symptoms and possible treatment. Check out the labels on food, medicines, and other products you take. Check the ingredients list on the package. Check the ingredients list on the package. Check the packaging for the medicine. (Some medicines can

In [60]:
# Testing with untuned model: prompt 2

prompt = "What to do after being diagnosed with cancer?"
response = generate_response(base_model, base_tokenizer, prompt)
print("Generated response:")
response

Generated response:


"What to do after being diagnosed with cancer?\n\nThe first step is to get your doctor's approval for a treatment.\n\nIf you have a cancer diagnosis, you may need to get a second opinion.\n\nIf you have a cancer diagnosis, you may need to get a second opinion. If you have a cancer diagnosis, you may need to get a third opinion.\n\nIf you have a cancer diagnosis, you may need to get a third opinion. If you have a cancer diagnosis, you may need to get a fourth opinion.\n\nIf you have a cancer diagnosis, you may need to get a fourth opinion. If you have a cancer diagnosis, you may need to get a fifth opinion.\n\nIf you have a cancer diagnosis, you may need to get a fifth opinion. If you have a cancer diagnosis, you may need to get a sixth opinion.\n\nIf you have a cancer diagnosis, you may need to get a sixth opinion. If you have"

In [61]:
# Testing with finetuned model: prompt 3

prompt = "What to do when feeling sick?"
response = generate_response(my_model, my_tokenizer, prompt)
print("Generated response:")
response

Generated response:


'What to do when feeling sick? Ask your doctor about how you can best manage your symptoms. If you have a serious illness, such as a stroke or heart attack, talk with your doctor about how you can best manage your symptoms. You may be able to manage your symptoms with medications. Talk with your doctor about medications. Ask your doctor about the medicines you take. If you smoke, tell your doctor about the possible side effects of smoking. You should also tell your doctor about the possible interactions with other medicines. Ask your doctor about the medicines you take. If you smoke, tell your doctor about the possible side effects of smoking. You should also tell your doctor about the possible interactions with other medicines. Talk with your doctor about the possible interactions with other medicines. Ask your doctor about the possible interactions with other medicines. (Watch the video to learn more about how to manage your symptoms. To enlarge the video, click the brackets in the l

In [62]:
# Testing with untuned model: prompt 3

prompt = "What to do when feeling sick?"
response = generate_response(base_model, base_tokenizer, prompt)
print("Generated response:")
response

Generated response:


"What to do when feeling sick?\n\nThe first thing you should do is to get your body to relax.\n\nIf you're feeling sick, you should take a few minutes to relax.\n\nIf you're feeling sick, you should take a few minutes to relax.\n\nIf you're feeling sick, you should take a few minutes to relax.\n\nIf you're feeling sick, you should take a few minutes to relax.\n\nIf you're feeling sick, you should take a few minutes to relax.\n\nIf you're feeling sick, you should take a few minutes to relax.\n\nIf you're feeling sick, you should take a few minutes to relax.\n\nIf you're feeling sick, you should take a few minutes to relax.\n\nIf you're feeling sick, you should take a few minutes to relax.\n\nIf you're feeling sick, you should take a few minutes to relax.\n\nIf you're feeling sick"

## Push your model to Hugging Face Model Hub

**Exercise 13: Follow below steps to push your fine-tuned model to HuggingFace Model Hub**

1. [Sign up](https://huggingface.co/join) for a Hugging Face account
2. Create an access token for your account and save it
3. Store your access token in the Hugging Face cache folder within colab
4. Push your fine-tuned model and tokenizer to Model Hub
5. Load the model back from Hub and test it with user input prompts

* **Create an access token for your account**

    Once you have an account, to create an access token:
    
    - Go to your `Settings`, then click on the `Access Tokens` tab. Click on the `New token` button to create a new User Access Token.
    - Select a Token type as `Write` and give a name for your token
    - Click on Create token
    - Once a token is created save it somewhere
    - When required later, use the old saved token or create a new token again

    To know more about Access Tokens, refer [here](https://huggingface.co/docs/hub/security-tokens).

* **Store your access token in the Hugging Face cache folder within colab**

    Once you have your User Access Token, run the following command to authenticate your identity to the Hub.
    - `!huggingface-cli login`
    - Paste your Access token when prompted
    - Type **n** when prompted to Add token as git credential? (Y/n)

    For more details on login, refer [here](https://huggingface.co/docs/huggingface_hub/quick-start#login).

In [68]:
!huggingface-cli login


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    A token is already saved on your machine. Run `huggingface-cli whoami` to get more information or `huggingface-cli logout` if you want to log out.
    Setting a new token will erase the existing one.
    To login, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Enter your token (input will not be visible): 
Add token as git credential? (Y/n) Y
Token is valid (permission: write)

* **Push your fine-tuned model and tokenizer to Model Hub [0.5 Mark]**

    - Use `push_to_hub()` method of your model and tokenizer both, to push them on hub
    - Specify name for your repository where the model and tokenizer will be pushed using `repo_id` parameter
    - Push model and tokenizer to the same repository

    - **Hint:**

        - Use `push_to_hub()` method of your model. For parameter details, refer [here](https://huggingface.co/docs/transformers/main_classes/model#transformers.PreTrainedModel.push_to_hub).
        - Use `push_to_hub()` method of your tokenizer. For parameter details, refer [here](https://huggingface.co/docs/transformers/main_classes/tokenizer#transformers.PreTrainedTokenizer.push_to_hub).
        - Access your pushed model at `https://huggingface.co/[YOUR-USER-NAME]/[YOUR-MODEL-REPO-NAME]/tree/main`

In [70]:
# Push model
my_repo = "medquad-finetuned-gpt2"
my_model.push_to_hub(repo_id= my_repo, commit_message= "Upload fine-tuned model")

model.safetensors:   0%|          | 0.00/498M [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/kevalkamani/medquad-finetuned-gpt2/commit/f1afb96ff5344e7967aaf287b8828053968084b2', commit_message='Upload fine-tuned model', commit_description='', oid='f1afb96ff5344e7967aaf287b8828053968084b2', pr_url=None, pr_revision=None, pr_num=None)

In [71]:
# Push tokenizer
my_tokenizer.push_to_hub(repo_id= my_repo, commit_message= "Upload tokenizer used")

README.md:   0%|          | 0.00/5.17k [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/kevalkamani/medquad-finetuned-gpt2/commit/00b7df088c5d98cb4b6581d8bc95aba774006823', commit_message='Upload tokenizer used', commit_description='', oid='00b7df088c5d98cb4b6581d8bc95aba774006823', pr_url=None, pr_revision=None, pr_num=None)

* **Load the model and tokenizer back from Hub and test it with user input prompts [0.5 Mark]**

    - In many cases, the architecture you want to use can be guessed from the name or the path of the pretrained model you are supplying to the `from_pretrained()` method. **AutoClasses** can be used to automatically retrieve the relevant model given the name/path to the pretrained weights/config/vocabulary.

    - Instantiating one of `AutoConfig`, `AutoModel`, and `AutoTokenizer` will directly create a class of the relevant architecture.

    - When the GPT2 Model transformer has a language modeling head on top, you can use an auto class with language modeling head on top as well - `AutoModelWithLMHead`.

    - Specify full path of your model repo i.e. ***''YOUR-USER-NAME/YOUR-REPO-NAME''*** while calling `from_pretrained()` method.

In [72]:
from transformers import AutoModelWithLMHead, AutoTokenizer

In [73]:
username = "kevalkamani"
my_checkpoint = username + '/' + my_repo
my_checkpoint

'kevalkamani/medquad-finetuned-gpt2'

In [74]:
# Load your model from hub

loaded_model = AutoModelWithLMHead.from_pretrained(my_checkpoint)

config.json:   0%|          | 0.00/923 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/498M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/119 [00:00<?, ?B/s]

In [75]:
# Load your tokenizer from hub

loaded_tokenizer = AutoTokenizer.from_pretrained(my_checkpoint)

tokenizer_config.json:   0%|          | 0.00/525 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/999k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/583 [00:00<?, ?B/s]

In [76]:
# Response from loaded model

prompt = "What is the outlook for Skin Cancer ?"
response = generate_response(loaded_model, loaded_tokenizer, prompt)
response

'What is the outlook for Skin Cancer?<answer>The outlook for melanoma varies depending on the type of skin cancer. The most common type of melanoma is called acute myeloid leukemia or B3. The outlook for melanoma varies by type because the different types of melanoma develop differently. Acute myeloid leukemia or B3 usually goes away on its own. It may reappear in the future. Other types of melanoma may reappear in the future. Most cases of acute myeloid leukemia or B3 are treatable with radiation therapy and chemotherapy.<end>The Outlook for Melanoma in Young Adults Skin cancer is much less common than in childhood. About 5 percent of adults in the United States have melanoma. About 5 million to 10 million adults in the United States have melanoma. About half of all adults in the United States have melanoma. About half of all adults in the United States have melanoma. Melanoma is found in many different'

## Gradio Implementation

Gradio is an open-source python library that allows us to quickly create easy-to-use, customizable UI components for our ML model, any API, or any arbitrary function in just a few lines of code. We can integrate the GUI directly into the Python notebook, or we can share the link with anyone.

**Exercise 14: Create a Gradio app for your fine-tuned model pushed on Hugging Face Model Hub [1 Marks]**

- Install and import `gradio` library
- Create a function to use your fine-tuned model for response generation
    - Use the model and tokenizer directly within the function, do not pass them as parameters
    - Function should take input prompt text, and max response length as its input parameters
    - Function should output the generated response text
- Create input and output gradio elements
- Create a gradio interface object
- Launch the interface to generate UI

In [78]:
!pip -q install gradio

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.4/50.4 kB[0m [31m3.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m18.1/18.1 MB[0m [31m92.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m318.7/318.7 kB[0m [31m24.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m94.6/94.6 kB[0m [31m8.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.4/76.4 kB[0m [31m6.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.9/77.9 kB[0m [31m6.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m141.9/141.9 kB[0m [31m11.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.3/10.3 MB[0m [31m37.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [84]:
import gradio as gr

In [85]:
# Function for response generation

def generate_query_response(prompt, max_length=200):

    model = loaded_model
    tokenizer = loaded_tokenizer

    input_ids = tokenizer.encode(prompt, return_tensors="pt")      # 'pt' for returning pytorch tensor

    # Check the device of the model
    device = next(model.parameters()).device

    # Move input_ids to the same device as the model
    input_ids = input_ids.to(device)

    # Create the attention mask and pad token id
    attention_mask = torch.ones_like(input_ids)
    pad_token_id = tokenizer.eos_token_id

    output = model.generate(
        input_ids,
        max_length=max_length,
        num_return_sequences=1,
        attention_mask=attention_mask,
        pad_token_id=pad_token_id
    )

    return tokenizer.decode(output[0], skip_special_tokens=True)

In [86]:
# Gradio elements

# Input from user
in_prompt = 'how to lead a healthy life?'
in_max_length = 200

# Output response
out_response = generate_query_response(in_prompt, in_max_length)
print(out_response)

how to lead a healthy life? The answer is no. You must have a healthy immune system to survive. You must also be able to tolerate certain medicines and the environment. You must also be able to handle stress well. You must also be able to handle your emotions well. You must also be able to handle the stress of life. You must also be able to handle the emotions of others well. You must also be able to handle the tedium of life. You must also be able to handle the tedium of life. You must also be able to handle the tedium of life. You must also be able to handle the tedium of life. You must also be able to handle the tedium of life. You must also be able to handle the tedium of life. You must also be able to handle the tedium of life. You must also be able to handle the tedium of life. You must also be able to handle the tedium of life. Finally,


In [87]:
# Gradio interface to generate UI link
iface = gr.Interface(fn=generate_query_response,
                    inputs="textbox",
                    outputs="textbox",
                    title="Medical Question Answering Bot",
                    description="via gradio",
                    allow_flagging="never",
                     )

In [88]:
iface.launch()

Setting queue=True in a Colab notebook requires sharing enabled. Setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Running on public URL: https://d26e91eb501e77ebee.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)




## Upload your Gradio application on Hugging Face Spaces

**Exercise 15: Upload your Gradio application on Hugging Face Spaces [2 Marks]**

1. Start a new Hugging Face Space by going to your profile and [clicking "New Space"](https://huggingface.co/new-space)

2. Provide details for your space:
    - Space name
    - License (eg. [MIT](https://opensource.org/licenses/MIT))
    - Space SDK (software development kit) (eg. `Gradio`)
    - Space hardware (CPU basic)
    - Choose whether your Space is public or private
    - Click "Create Space"

3. Go to ***Add files -> Create a new file*** option to add below files:
    - `requirements.txt`: should contain the dependencies to run your app such as transformers, torch, and gradio
    - `app.py`: should contain steps to
        - import required packages
        - load your fine-tuned model and tokenizer from the Model Hub
        - function to use your fine-tuned model for response generation
        - create input and output gradio elements
        - create a gradio inference object
        - launch the interface to generate UI

4. Access the `App` tab of your repository to see the build progress (debug if error persists)

5. Once the app has built successfully, test the application running on your Space with a user input prompt



Requirements file

In [None]:
# requirements.txt

gradio
torch
transformers

App file

In [None]:
# app.py

import gradio as gr
from transformers import AutoModelWithLMHead, AutoTokenizer
import torch

username = "kevalkamani"
my_repo = "medquad-finetuned-gpt2"
my_checkpoint = username + '/' + my_repo

loaded_model = AutoModelWithLMHead.from_pretrained(my_checkpoint)
loaded_tokenizer = AutoTokenizer.from_pretrained(my_checkpoint)



def generate_query_response(prompt, max_length=200):

    model = loaded_model
    tokenizer = loaded_tokenizer

    input_ids = tokenizer.encode(prompt, return_tensors="pt")      # 'pt' for returning pytorch tensor

    # Check the device of the model
    device = next(model.parameters()).device

    # Move input_ids to the same device as the model
    input_ids = input_ids.to(device)

    # Create the attention mask and pad token id
    attention_mask = torch.ones_like(input_ids)
    pad_token_id = tokenizer.eos_token_id

    output = model.generate(
        input_ids,
        max_length=max_length,
        num_return_sequences=1,
        attention_mask=attention_mask,
        pad_token_id=pad_token_id
    )

    return tokenizer.decode(output[0], skip_special_tokens=True)

medquad = gr.Interface(fn=generate_query_response,
                    inputs="text",
                    outputs="text",
                    title="Medical Question Answering Bot",
                    description="built via gradio")
medquad.launch()

HuggingFace Spaces link

https://huggingface.co/spaces/kevalkamani/medquad_finetuned_gpt2_spaces