# LeaBot

This is a chatbot designed to assist expectant mothers with access to information on pregnancy, postnatal care, and antenal care.

The chatbot is implemented using a pre-trained transformer model.

**STEPS**

1. Collecting and loading dataset containing conversational pairs around maternal health

2. Data Preprocessing; tokenisation, normalization, handling missing values.

3. Selecting pre-trained Transformer model and fine tuning.

4. Hyperparameter tuning; learning rate adjustments, batch size, optimizer selection, training epochs.

5. Evaluation using appropriate NLP metrics eg BLEU score, F1 score, perplixity.

**NOTE:** The chatbot will correctly answer relevant questions and will reject out of domain queries.

# 1. Loading the dataset.

This section focuses on **loading the dataset** that will be used to train and evaluate the LeaBot chatbot.

The dataset, *which contains conversational pairs related to maternal health*, is sourced from Hugging Face's `datasets` library. It is publicly available and can be accessed using the following identifier: `nyarkssss/maternal_1k`.

The `datasets` library will be utilized to efficiently load this data into our Colab environment.

We will then perform exploratory data analysis to gain insights into its structure and content.

In [1]:
# Install the datasets library
!pip install datasets



In [2]:
!pip install evaluate



In [3]:
# Importing libraries
import pandas as pd
import tensorflow as tf
from transformers import (
    T5Tokenizer,
    T5ForConditionalGeneration,
    TrainingArguments,
    Trainer
)
from datasets import load_dataset, Dataset
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import datasets
from sklearn.model_selection import train_test_split
from evaluate import load
import os
import torch


In [4]:
# Loading the dataset from Datasets
ds = load_dataset("nyarkssss/maternal_1k")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


# 2. Exploring the Data

In [5]:
# Display the Dataset object to inspect structure and content
ds

DatasetDict({
    train: Dataset({
        features: ['question', 'answer', 'domain', 'context'],
        num_rows: 1035
    })
})



---


Our dataset contains 1035 rows and is only split into train with features: question(questions relating to maternal health), answer(corresponding answers to the questions), domain(the area of maternal health) and context.


---



In [6]:
print(f"Original dataset size: {ds['train'].num_rows}")

Original dataset size: 1035


In [7]:
# Exploring the datatypes of each column
train_data = ds['train']
print(train_data.features)

{'question': Value(dtype='string', id=None), 'answer': Value(dtype='string', id=None), 'domain': Value(dtype='string', id=None), 'context': Value(dtype='string', id=None)}



```
# All columns are of datatype "string"
```

In [8]:
# Display row 5 from training dataset to get the content
print(train_data[5])

{'question': 'What are early signs of hunger in a baby?', 'answer': 'Early signs of hunger include rooting reflex (turning head and opening mouth), sucking on hands or fingers, making sucking motions with the mouth, and becoming more alert and active. Crying is a late hunger sign and can make latching more difficult.', 'domain': 'Baby Care', 'context': 'Postpartum'}


# 3. Data Preprocessing

**Check for missing values**

In [9]:
# Initialize a dictionary to store missing value counts for each column.
missing_counts = {col: 0 for col in train_data.features}

# Iterate through each data point in the training dataset.
for example in train_data:
    # Iterate through each column in the current data point.
    for col in train_data.features:
        # Check if the value in the current column is None
        if example[col] is None:
            # If missing, increment the count for that column in the dictionary.
            missing_counts[col] += 1

# Print the missing value counts for each column.
print(missing_counts)

{'question': 0, 'answer': 0, 'domain': 0, 'context': 1}




---


There is 1 missing value in the column 'context'. The rest of the columns have no missing values.

Let's check where the missing value is appearning to know how to handle it


---



In [10]:
# Convert the Dataset to pandas DataFrame
train_df = pd.DataFrame(train_data)
# Check for the row index of the missing value
missing_row_index = train_df[train_df['context'].isnull()].index[0]
# Prints the row
print(f"Missing value in 'context' found at row index: {missing_row_index}")

Missing value in 'context' found at row index: 8




---


The missing value is in row index 8, we will check the whole row


---



In [11]:
# prints row 8 with the missing context
print(train_data[8])

{'question': 'What should I do if my baby falls asleep too soon while breastfeeding?', 'answer': 'If your baby becomes too sleepy while feeding, try stroking the bottom of the baby’s foot, sitting baby up for burping, loosening baby’s clothing, changing breastfeeding positions, or gently squeezing and massaging the breast to encourage milk flow.', 'domain': 'Baby Care', 'context': None}




---


From the 5th row, we had the same domain which was baby care and the context was postpartum, so we can as well use that context postpartum in row 8.


---



In [12]:
# Convert train_data to pandas Dataframe
train_df = pd.DataFrame(train_data)

# Using the same context in row 5 to fill in the missing context
train_df.loc[missing_row_index, 'context'] = train_df.loc[5, 'context']

# Convert back to DatasetDict
train_data = datasets.Dataset.from_pandas(train_df)

In [13]:
# Confirm if the missing data was handled
# Initialize a dictionary to store missing value counts for each column.
missing_counts = {col: 0 for col in train_data.features}

# Iterate through each data point in the training dataset.
for example in train_data:
    # Iterate through each column in the current data point.
    for col in train_data.features:
        # Check if the value in the current column is None
        if example[col] is None:
            # If missing, increment the count for that column in the dictionary.
            missing_counts[col] += 1

# Print the missing value counts for each column.
print(missing_counts)

{'question': 0, 'answer': 0, 'domain': 0, 'context': 0}


**Check for duplicates**

On reviewing the dataset, I found duplicate questions with varying answers.

Check and confirm that then handle by keeping only first instance.

In [14]:
# Convert the dataset to a Pandas DataFrame
train_df = pd.DataFrame(train_data)

# Group by 'question' and check if there are multiple unique answers
duplicate_questions = train_df.groupby('question')['answer'].nunique()
duplicate_questions = duplicate_questions[duplicate_questions > 1]

# Print the duplicate questions with different answers
for question, count in duplicate_questions.items():
    print(f"Question: {question}")
    print(f"Number of Different Answers: {count}")
    print(train_df[train_df['question'] == question][['question', 'answer']])
    print("-" * 20) # Line separator

Question: How can I manage constipation during pregnancy?
Number of Different Answers: 2
                                            question  \
385  How can I manage constipation during pregnancy?   
937  How can I manage constipation during pregnancy?   

                                                answer  
385  Eat fiber-rich foods like fruits, vegetables, ...  
937  Ensure your diet includes plenty of fresh frui...  
--------------------
Question: How can I relieve abdominal and groin pain during pregnancy?
Number of Different Answers: 2
                                              question  \
898  How can I relieve abdominal and groin pain dur...   
929  How can I relieve abdominal and groin pain dur...   

                                                answer  
898  You can relieve abdominal and groin pain by ly...  
929  You can lie on your side with your knees and h...  
--------------------
Question: How can I relieve leg cramps during pregnancy?
Number of Different Answ

In [15]:
# Drop duplicates, keeping only the first instance of each question
train_df = train_df.drop_duplicates(subset=['question'], keep='first')

# Convert back to a Hugging Face Dataset
train_data = Dataset.from_pandas(train_df)



---


Now, there are no missing values. nor duplicates. We can perform other preprocessing steps; tokenization and augmentation.

But first let's define the model and tokenizer.


---



In [16]:
# Defines the model and tokenizer
tokenizer = T5Tokenizer.from_pretrained('google-t5/t5-base')
model = T5ForConditionalGeneration.from_pretrained('google-t5/t5-base')


You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565



**Check the distribution of token lengths and maximum values in the dataset to determine a suitable value for padding and truncation.**

In [17]:
# Function to calculate the distribution of sequence lengths
def calculate_length_distribution(dataset, tokenizer):
    input_lengths = [len(tokenizer(q)["input_ids"]) for q in dataset["question"]]
    output_lengths = [len(tokenizer(a)["input_ids"]) for a in dataset["answer"]]

    # Calculate percentiles
    input_90th = np.percentile(input_lengths, 90)
    input_95th = np.percentile(input_lengths, 95)
    input_99th = np.percentile(input_lengths, 99)

    output_90th = np.percentile(output_lengths, 90)
    output_95th = np.percentile(output_lengths, 95)
    output_99th = np.percentile(output_lengths, 99)

    return {
        "input_length_percentiles": (input_90th, input_95th, input_99th),
        "output_length_percentiles": (output_90th, output_95th, output_99th)
    }

# Calculate and display the distributions before back translation
distributions = calculate_length_distribution(train_data, tokenizer)
print("Input Length Percentiles (90th, 95th, 99th):", distributions["input_length_percentiles"])
print("Output Length Percentiles (90th, 95th, 99th):", distributions["output_length_percentiles"])


Input Length Percentiles (90th, 95th, 99th): (20.0, 23.0, 28.0)
Output Length Percentiles (90th, 95th, 99th): (65.0, 78.94999999999993, 117.57999999999993)


In [18]:
def get_max_token_length(dataset, field):
    """
    Calculates the maximum token length for a specific field in a dataset.

    Args:
        dataset: The Hugging Face dataset.
        field: The field name (e.g., 'question', 'answer').

    Returns:
        The maximum token length.
    """
    max_length = 0
    for example in dataset:
        tokenized_text = tokenizer(example[field])
        length = len(tokenized_text['input_ids'])
        if length > max_length:
            max_length = length
    return max_length

# Find maximum token lengths for questions and answers
max_question_length = get_max_token_length(train_data, 'question')
max_answer_length = get_max_token_length(train_data, 'answer')

print(f"Maximum token length for questions: {max_question_length}")
print(f"Maximum token length for answers: {max_answer_length}")

Maximum token length for questions: 39
Maximum token length for answers: 237




**For the inputs (questions):**

**90%** of the questions are *20 tokens or fewer*.
**95%** are *23 tokens or fewer*.
**99%** are *28 tokens or fewer*.

In addition, the max length is 39.

Therefore, using a max_length of **45** for inputs should be suitable.

**For the outputs (answers):**

**90%** of the answers are 65 tokens or fewer.
**95%** are 78 tokens or fewer.
**99%** are 117 tokens or fewer.

The maximum token length for answers is 237.

Hence, using a max_length of **256** for outputs is appropriate, as it covers nearly all answers without truncating any significant content.


---



# Data Augmentation: Back translation

Since our dataset currently only includes a training split, we need to further split it into a test (validation) set. This will reduce the size of the training set and result in fewer variations for our chatbot to learn from.

To address this, we use back translation as a data augmentation technique. This involves translating both the questions(input) into a different language (in this case, French) and then translating them back into English. This process generates additional samples and introduces subtle variations in the wording, which helps increase the diversity of the training data and improve the model's robustness.

In [19]:
!pip install nltk==3.8.1 googletrans==4.0.0-rc1



In [20]:
import nltk
import random
from nltk.corpus import wordnet
from googletrans import Translator

nltk.download('punkt')
nltk.download('wordnet')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


True

In [21]:
def back_translate(text, target_lang='fr'):
  # Initialize Google translator API
  translator = Translator()

  # Translate input to target language(French)
  translated = translator.translate(text, dest=target_lang).text

  # Translate back to English
  back_translated = translator.translate(translated, dest='en').text

  # return the back-translated text
  return back_translated

In [22]:
def augment_data_with_back_translation(dataset, num_augmented_samples=1000):
    """
    Augments the dataset using back translation.

    Args:
        dataset: The original dataset to augment.
        num_augmented_samples: The number of augmented samples to generate.

    Returns:
        A new dataset with augmented samples.
    """

    augmented_samples = []
    # List to store examples for inspection
    examples_for_inspection = []

    for _ in range(num_augmented_samples):
        # Randomly select a sample from the dataset
        random_index = random.randint(0, len(dataset) - 1)
        sample = dataset[random_index]

        # Apply back translation to the 'question' field
        augmented_question = back_translate(sample['question'])

        # Create a new augmented sample
        augmented_sample = {
            'question': augmented_question,
            'answer': sample['answer'],
            'domain': sample['domain'],
            'context': sample['context']
        }
        augmented_samples.append(augmented_sample)

        # Store the original and augmented questions for inspection
        examples_for_inspection.append({
            'original_question': sample['question'],
            'augmented_question': augmented_question
        })

    # Create a new dataset with the augmented samples
    augmented_dataset = datasets.Dataset.from_pandas(pd.DataFrame(augmented_samples))
    augmented_data = datasets.concatenate_datasets([dataset, augmented_dataset])

    # Print or inspect examples
    print("Examples of Original and Augmented Questions:")
    for example in examples_for_inspection[:5]:  # Print first 5 examples
        print(f"Original: {example['original_question']}")
        print(f"Augmented: {example['augmented_question']}")
        print("-" * 20)

    return augmented_data

# Tokenization

Here we will have a function that will use the T5Tokenizer to tokenizer our inputs and labels.

In [23]:
# Function for tokenization
def tokenize_data(examples):
    """
    Tokenizes the input dataset for training a T5 model.

    Args:
        examples (dict): A batch of data points from a Hugging Face dataset.
                         Each batch contains a list of "question" and "answer" pairs.

    Returns:
        dict: A dictionary containing tokenized input sequences and labels,
              formatted for model training.
    """

    # Format the input text as "chatbot: <question>"
    inputs = [f"chatbot: {q}" for q in examples["question"]]

    # Extract the corresponding target (answer)
    targets = [a for a in examples["answer"]]

    # Tokenize the inputs (questions)
    # Truncation ensuring the input doesn't exceed `max_length=128`
    # Padding ensures all inputs have the same length
    model_inputs = tokenizer(inputs, max_length=45, truncation=True, padding="max_length")

    # Tokenize the target (labels)
    # Truncation and padding are applied just like with the inputs
    labels = tokenizer(targets, max_length=256, truncation=True, padding="max_length")

    # Add the tokenized target labels to the input dictionary
    model_inputs["labels"] = labels["input_ids"]

    return model_inputs


# Splitting

Here we will split our dataset to train and validation sets.

The train set will be augmented adding 500 extra samples.

The sets are then passed to the preprocess function for tokenization.

In [25]:
# 80-20 train split on training dataset
ds = train_data.train_test_split(test_size=0.2, seed=42)

# Apply augmentation to the training set
train_dataset = augment_data_with_back_translation(ds['train'], num_augmented_samples=500)




Examples of Original and Augmented Questions:
Original: What should I do if my nipples are sore from breastfeeding?
Augmented: What should I do if my nipples are painful from breastfeeding?
--------------------
Original: I’ve noticed some cramping and pulling sensations in my lower abdomen. Is this normal, or should I be worried?
Augmented: I noticed feelings of cramps and shooting at the bottom of the abdomen. Is it normal or should I worry?
--------------------
Original: What are the danger signs for mothers and newborns after childbirth?
Augmented: What are the signs of danger for mothers and newborns after childbirth?
--------------------
Original: I have persistent lower back pain after childbirth. Is this normal?
Augmented: I have persistent pain in the lower back after childbirth. Is it normal?
--------------------
Original: What are common postpartum perineal changes and how can I care for them?
Augmented: What are the common postpartum perineal changes and how can I take care 

From the few examples, the augmented questions do make sense and we can proceed.

In [26]:
# Apply (tokenization) to the train and val sets
train_dataset = train_dataset.map(tokenize_data, batched=True)
val_dataset = ds['test'].map(tokenize_data, batched=True)

Map:   0%|          | 0/1317 [00:00<?, ? examples/s]

Map:   0%|          | 0/205 [00:00<?, ? examples/s]

In [27]:
# Print dataset sizes
print(f"Train set size: {train_dataset.num_rows}")
print(f"Validation set size: {val_dataset.num_rows}")


Train set size: 1317
Validation set size: 205


Since we are using T5 model, normalisation is not required as T5 uses subword tokenization and is also pretrained on massive dataset allowing the model learn words and different forms reducing need for normalization.

# 4. Building the model

Learning rates: 3e-5, 5e-5, 1e-4, 3e-5, 3e-6, 1e-5.



In [30]:
training_args = TrainingArguments(
    report_to="none",
    output_dir="./maternal_bot",
    evaluation_strategy="epoch",
    save_strategy="epoch",
    learning_rate=1e-4,
    num_train_epochs=6,
    per_device_train_batch_size=4,
    per_device_eval_batch_size=4,
    logging_strategy="steps",
    logging_steps=10,
    weight_decay=3e-4,
    load_best_model_at_end=True,
    metric_for_best_model="loss",
    eval_steps=500,
    save_total_limit=1,
    fp16=True
)



In [None]:
# # Define DataCollator
# from transformers import DataCollatorForSeq2Seq

# data_collator = DataCollatorForSeq2Seq(tokenizer=tokenizer, model=model)

In [31]:
# Load evaluation metrics
bleu = load("bleu")
f1_metric = load("f1")
accuracy = load("accuracy")


Downloading builder script:   0%|          | 0.00/5.94k [00:00<?, ?B/s]

Downloading extra modules:   0%|          | 0.00/1.55k [00:00<?, ?B/s]

Downloading extra modules:   0%|          | 0.00/3.34k [00:00<?, ?B/s]

Downloading builder script:   0%|          | 0.00/6.79k [00:00<?, ?B/s]

Downloading builder script:   0%|          | 0.00/4.20k [00:00<?, ?B/s]

In [32]:
# Preprocess logits before calculating metrics
def preprocess_logits_for_metrics(logits, labels):
  # Get first element of the logit
  logits = logits[0]
  # Apply softmax to convert logits to probabilities
  probabilities = torch.nn.functional.softmax(logits, dim=-1)
  return torch.max(probabilities, dim=-1)
  # return torch.stack((max_values, max_indices), dim=-1)

In [33]:
def calculate_perplexity(preds):
  probs, ids = preds

  # n is the sequence size
  n = probs.shape[1]

  # calculate the logs of probabilities
  log_probs = np.log(probs)

  # calculate the mean of all probabilities
  avg_log_prob = probs.mean()

  # calculate the mean perplexity of all training batches
  perplexity = np.exp(-avg_log_prob)

  return perplexity

In [34]:
def compute_metrics(eval_pred):
  # Get the predictions and labels from eval_pred
  predictions, labels = eval_pred

  # Get the word_ids and probabilities from predictions
  probs, word_ids = predictions

  # Flatten word_ids and labels
  flat_word_ids = word_ids.flatten()
  flat_labels = labels.flatten()

  # Decode the word_ids into actual words
  candidates = tokenizer.batch_decode(word_ids, skip_special_tokens=True)
  references = tokenizer.batch_decode(labels, skip_special_tokens=True)

  # Each training example can have multiple references, so references is a 2D array
  references = [[l] for l in references]

  # Compute BLEU
  bleu_score = bleu.compute(predictions=candidates, references=references)["bleu"]

  # Compute Accuracy
  accuracy_score = accuracy.compute(predictions=flat_word_ids, references=flat_labels)["accuracy"]

  # Compute F1 Score (micro)
  f1_score = f1_metric.compute(predictions=flat_word_ids, references=flat_labels, average="micro")["f1"]

  # Compute Perplexity score manually since the perplexity.compute function only supports
  # causal language models and not encode-decoder models like T5
  perplexity_score = calculate_perplexity(predictions)

  return {
      "accuracy": accuracy_score,
      "bleu": bleu_score,
      "f1": f1_score,
      "perplexity": perplexity_score
  }


In [35]:
from transformers import EarlyStoppingCallback

trainer = Trainer (
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    tokenizer=tokenizer,
    compute_metrics=compute_metrics,
    preprocess_logits_for_metrics=preprocess_logits_for_metrics,
    #callbacks=[EarlyStoppingCallback(early_stopping_patience=3)]
)

  trainer = Trainer (


In [36]:
trainer.train()

Passing a tuple of `past_key_values` is deprecated and will be removed in Transformers v4.48.0. You should pass an instance of `EncoderDecoderCache` instead, e.g. `past_key_values=EncoderDecoderCache.from_legacy_cache(past_key_values)`.


Epoch,Training Loss,Validation Loss,Accuracy,Bleu,F1,Perplexity
1,0.4415,0.381344,0.917264,0.122384,0.917264,0.397891
2,0.3105,0.370747,0.919912,0.135976,0.919912,0.39391
3,0.3485,0.366281,0.921037,0.144078,0.921037,0.392327
4,0.3116,0.368665,0.92237,0.154075,0.92237,0.390799
5,0.239,0.370795,0.922313,0.156996,0.922313,0.390003
6,0.2339,0.373345,0.922142,0.158804,0.922142,0.389452


There were missing keys in the checkpoint model loaded: ['encoder.embed_tokens.weight', 'decoder.embed_tokens.weight', 'lm_head.weight'].


TrainOutput(global_step=1980, training_loss=0.37330904151454114, metrics={'train_runtime': 744.4465, 'train_samples_per_second': 10.615, 'train_steps_per_second': 2.66, 'total_flos': 422928391219200.0, 'train_loss': 0.37330904151454114, 'epoch': 6.0})

# 5. Evaluating the model

1.  **BLEU Score:** A higher bleu score indicates a better performing model.

2.   **PerplexitY:** A low perplexity score indicates better language modeling and understanding.

4. **Accuracy**

5. **Loss**




```
compares the generated text (predictions) against the expected
or ground truth text (references) and returns a BLEU score, which is a
common metric for evaluating text generation quality. Higher BLEU scores
generally indicate better text quality and closer alignment with the
expected outputs.
```



In [37]:
eval_results = trainer.evaluate()
print(f"Evaluation results: {eval_results}")

Evaluation results: {'eval_loss': 0.36628085374832153, 'eval_accuracy': 0.9210365853658536, 'eval_bleu': 0.14407771287977986, 'eval_f1': 0.9210365853658536, 'eval_perplexity': 0.3923273980617523, 'eval_runtime': 9.2817, 'eval_samples_per_second': 22.086, 'eval_steps_per_second': 5.602, 'epoch': 6.0}


In [38]:
# Save model and tokenizer
model.save_pretrained("./t5_maternal_bot1")
tokenizer.save_pretrained("./t5_maternal_bot1")

('./t5_maternal_bot1/tokenizer_config.json',
 './t5_maternal_bot1/special_tokens_map.json',
 './t5_maternal_bot1/spiece.model',
 './t5_maternal_bot1/added_tokens.json')

In [60]:
def generate_response(question):
    """
    Generates responses from the chatbot for a given question
    """
    # Handle empty input
    if not question.strip():
        return "Please enter a valid question."

    # limits maternal related questions
    maternal_keywords = ["pregnancy", "baby", "birth", "mother", "preterm",
                         'breast', 'miscarriage', 'pregnant', 'fertility',
                         'fertile', 'abortion', 'malnourished', 'ovulation',
                         'menstrual cycle', 'menstruation', 'stillbirth',
                         "antenatal", 'postnatal', 'doctor', 'nurse', 'babycare',
                         "labor", "postpartum", "maternal", "neonatal"]
    if not any(keyword in question.lower() for keyword in maternal_keywords):
        return "Sorry, I can only answer maternal health-related questions."

    # format the input
    input_text = f"question: {question}"

    # Encodes the input text using tokenizer, specifying PyTorch tensors
    input_ids = tokenizer.encode(input_text, return_tensors="pt")

    # Move input_ids to the same device as the model
    # If using GPU, use: device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
    device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
    input_ids = input_ids.to(device)
    model.to(device)

    # Generate output IDs using model
    output_ids = model.generate(
        input_ids,
        max_new_tokens=120,
        do_sample=True,
        top_k=45,
        temperature=0.9, # controls randomness of generated text
        top_p=0.8, # Controls diversity of generated text
        repetition_penalty=1.3, # Discourages model from repeating same words

        )

    # Decodes output IDs back to text
    response = tokenizer.decode(output_ids[0], skip_special_tokens=True)

    # Condition to handle empty generated responses
    # If the generated response is empty, replace it with a message
    if not response:
        response = "I am unable to answer that question."
    return response

In [70]:
print("Chatbot: Hello! Ask me anything.")

while True:
    # Get user input
    user_input = input("You: ")

    # Exit condition
    if user_input.lower() in ['quit', 'exit', 'bye']:
        print("Chatbot: Goodbye!")
        break

    # Get the chatbot response
    response = generate_response(user_input)

    # Print the response
    print(f"Chatbot: {response}")


Chatbot: Hello! Ask me anything.
You: What is the definition of miscarriage?
Chatbot: miscarriage is the loss of a baby by a faulty pregnancy or by a defective pregnancy.
You: Why is breast milk good food for the baby?
Chatbot: Breast milk is good for the baby due to its low calorie content and excellent protein levels.
You: Can I have sex when pregnant?
Chatbot: Yes, sex is legal in pregnancy
You: How do I manage constipation when pregnant?
Chatbot: Using a bowel spray, a bowel rinse, and a swollen belly button are the best ways to manage constipation
You: What is the importance of rest and sleep after child birth?
Chatbot: Rest and sleep after childbirth are extremely important for preventing infections, maintaining good health, and maintaining a healthy weight.
You: Why is hand washing good for mothers?
Chatbot: Hand washing helps prevent colds, flu, and bacterial infections.
You: Hello?
Chatbot: Sorry, I can only answer maternal health-related questions.
You: Is using a tampon safe

In [39]:
!zip -r t5_maternal_bot1.zip t5_maternal_bot1
from google.colab import files
files.download("t5_maternal_bot1.zip")

  adding: t5_maternal_bot1/ (stored 0%)
  adding: t5_maternal_bot1/tokenizer_config.json (deflated 94%)
  adding: t5_maternal_bot1/generation_config.json (deflated 29%)
  adding: t5_maternal_bot1/config.json (deflated 62%)
  adding: t5_maternal_bot1/added_tokens.json (deflated 83%)
  adding: t5_maternal_bot1/model.safetensors (deflated 8%)
  adding: t5_maternal_bot1/special_tokens_map.json (deflated 85%)
  adding: t5_maternal_bot1/spiece.model (deflated 48%)


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [None]:
# Evaluate Model

# Get predictions for all questions in test set
predictions = [generate_response(q) for q in val_dataset["question"]]

# Get correct answers from the set
references = val_dataset["answer"]

# Print some examples for comparison
for i in range(5):  # Print 10 examples
    print(f"Question: {val_dataset['question'][i]}")
    print(f"Predicted Answer: {predictions[i]}")
    print(f"Expected Answer: {references[i]}")
    print("-" * 20)

