# T5 - Question answering with indexing of answers

In this notebook, we implement the pipeline for using [T5](https://github.com/google-research/text-to-text-transfer-transformer) (Text-To-Text Transfer Transformer) on the Medical Meadows Anki flashcards dataset. We will use the [flan-t5-large](https://huggingface.co/google/flan-t5-small). \
To perform the task we will:
- encode all the answers in the dataset in embeddings, using
[multi-qa-MiniLM-L6-cos-v1](https://huggingface.co/sentence-transformers/multi-qa-MiniLM-L6-cos-v1)
- given a question of a user, first search for the most similar 5 question      (embeddings), then use these as context to answer.

## 0) Imports, loading datasets and models

In [None]:
!pip -q install -U transformers sentence-transformers
import numpy as np

In [None]:
dataset_filepath = './medical_meadow_wikidoc_medical_flashcards.json'

In [None]:
# Load dataset
import json
import gzip

with open(dataset_filepath, "r") as f:
    dataset = json.load(f)

answers = []
for data in dataset:
    answers.append(data['output'])

print(f"Retrieved {len(answers)} answers")
answers[:10]

Retrieved 33955 answers


['Very low Mg2+ levels correspond to low PTH levels which in turn results in low Ca2+ levels.',
 'Low estradiol production leads to genitourinary syndrome of menopause (atrophic vaginitis).',
 'Low REM sleep latency and experiencing hallucinations/sleep paralysis suggests narcolepsy.',
 'PTH-independent hypercalcemia, which can be caused by cancer, granulomatous disease, or vitamin D intoxication.',
 'The level of anti-müllerian hormone is directly related to ovarian reserve - a lower level indicates a lower ovarian reserve.',
 'Low Mobility and bulging of TM is suggestive of Acute otitis media.',
 'Low glucose and high C-peptide levels can be caused by an insulinoma or the use of sulfonylurea drugs.',
 'Insulinoma or sulfonylurea drugs can cause low Glucose and high C-peptide levels.',
 'Low Ejection fraction is commonly associated with systolic dysfunction.',
 'Emphysema is associated with low DLCO.']

In [None]:
# Import models
from sentence_transformers import SentenceTransformer, CrossEncoder

# sentence-transformers model: It maps sentences & paragraphs to a 384 dimensional dense vector space and was designed for semantic search.
# trained on 215M (question, answer) pairs from diverse sources
semb_model = SentenceTransformer('multi-qa-MiniLM-L6-cos-v1')

# information-retrieval model: given a query, return all possible relevant passages related to it and sort them in decreasing order.
xenc_model = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')

  from .autonotebook import tqdm as notebook_tqdm


### 1.1) Compute embeddings
We now embed all the answers in the dataset (checkpointing the results).

In [None]:
import os
import pickle

# Define hnswlib index path
embeddings_cache_path = './qa_embeddings_cache.pkl'

# Load cache if available
if os.path.exists(embeddings_cache_path):
    print('Loading embeddings cache')
    with open(embeddings_cache_path, 'rb') as f:
        corpus_embeddings = pickle.load(f)
# Else compute embeddings
else:
    print('Computing embeddings')
    corpus_embeddings = semb_model.encode(answers, convert_to_tensor=True, show_progress_bar=True)
    # Save the index to a file for future loading
    print(f'Saving index to: \'{embeddings_cache_path}\'')
    with open(embeddings_cache_path, 'wb') as f:
        pickle.dump(corpus_embeddings, f)

Loading embeddings cache


### 1.2) Index the answers


In [None]:
!pip -q install hnswlib

# Index embeddings
import os
import hnswlib

# Create empty index
index = hnswlib.Index(space='cosine', dim=384)

# Define hnswlib index path
index_path = './qa_hnswlib.index'

# Load index if available
if os.path.exists(index_path):
    print('Loading index...')
    index.load_index(index_path)
# Else index data collection
else:
    # Initialise the index
    print('Started creating HNSWLIB index')
    index.init_index(max_elements=corpus_embeddings.size(0), ef_construction=400, M=64)
    #  Compute the HNSWLIB index (it may take a while)
    index.add_items(corpus_embeddings.cpu(), list(range(len(corpus_embeddings))))
    # Save the index to a file for future loading
    print(f'Saving index to: {index_path}')
    index.save_index(index_path)

Loading index...


### 2) Loading model and tokenizer

In [None]:
!pip -q install sentencepiece
!pip install accelerate
import torch
from transformers import T5Tokenizer, T5ForConditionalGeneration

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-large")
model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-large", device_map="cuda",) #torch_dtype=torch.bfloat16)



You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


### 3) Define pipeline
We now define the pipeline with the following steps:
- retrieve the top k(=64) answers
- re-rank these k answers
- take the top 5 answers according to re-rank and join them together, creating the "context"
- create the input to the T5 model concatenating question and context
- tokenize the input
- generate tokenized output
- decode output

In [None]:
# Define the QA pipeline

def qa_pipeline(
    question,
    n_best_answers=5,
    similarity_model=semb_model,
    embeddings_index=index,
    re_ranking_model=xenc_model,
    generative_model=model,
    device=device
):
    if not question.endswith('?'):
        question = question + '?'
    # Embed question
    question_embedding = semb_model.encode(question, convert_to_tensor=True)
    # Search documents similar to question in index
    corpus_ids, distances = index.knn_query(question_embedding.cpu(), k=64)
    # Re-rank results
    xenc_model_inputs = [(question, answers[idx]) for idx in corpus_ids[0]]
    cross_scores = xenc_model.predict(xenc_model_inputs)
    # Get best matching answers
    top_answers_idx = np.argsort(-cross_scores)[:n_best_answers]
    context = [answers[corpus_ids[0][idx]] for idx in top_answers_idx]
    context = '\n'.join(context)
    # Encode input
    input_text = f"Given the following facts:\n\n{context}\n\nPlease answer the following question exhaustively, providing comprehensive explanation: {question}"
    input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to(device)
    # Generate output
    output_ids = model.generate(input_ids, max_new_tokens=64)
    # Decode output
    output_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)

    # Return result
    return f"Facts:\n\n{context}\n\nQ: {question}\n\nA: {output_text}"

Now we can test the model:

In [None]:
# Try out the model with custom questions

question = input("Ask a (medical related) question >>> ")  # e.g., "What are the causes of asthma?", "What are the symptoms of high levels of cortisol?", ...
print()
print(qa_pipeline(question))


Facts:

The most common endogenous cause of Cushing's syndrome is Cushing's disease, which is characterized by the presence of an ACTH-secreting pituitary adenoma. Cushing's syndrome is a rare disorder that occurs when the body is exposed to high levels of the hormone cortisol for an extended period. Cortisol is a hormone that is produced by the adrenal glands and plays a vital role in regulating metabolism, immune function, and stress response. When cortisol levels are too high, it can lead to a range of symptoms, including weight gain, muscle weakness, high blood pressure, and mood changes. Cushing's disease is responsible for around 70% of all cases of Cushing's syndrome and is more common in women than men. Diagnosis of Cushing's disease may involve blood tests, imaging studies, and a physical exam to evaluate cortisol levels and identify the presence of a pituitary adenoma. Treatment may involve surgery to remove the adenoma, radiation therapy, or medications to lower cortisol le