# Medical Question Answering with LangChain and Hugging Face

This document outlines a Python code snippet that demonstrates a natural language processing (NLP) workflow for medical question-answering using the LangChain framework and Hugging Face's Transformers library. The code sets up components, defines a data processing pipeline, executes the pipeline, and measures the execution time. It is tailored for the MASHQA dataset, which focuses on medical questions and answers.

The document is divided into the following sections:

1. **Setting up Components for Medical Question Answering**: In this section, we initialize the essential components for NLP tasks, such as embeddings, tokenizers, and models, specifically designed for medical question-answering.

2. **Initializing a Text-to-Text Generation Model**: This section introduces the code responsible for initializing a text-to-text generation model tailored for medical questions and answers.

3. **Building a Data Processing Pipeline with RAG Model**: The next section focuses on constructing a data processing pipeline using the LangChain framework and explains the role of each stage within the pipeline for medical question-answering.

4. **Executing the Data Processing Pipeline and Measuring Execution Time**: The final section demonstrates the execution of the data processing pipeline with a specific medical question from the MASHQA dataset and measures the time taken for the pipeline to complete.

These code snippets and explanations offer a comprehensive overview of how to set up and utilize NLP components for medical question-answering, build a processing pipeline, and measure execution time in the context of the MASHQA dataset.

In [1]:
# IMPORTS
import os
os.environ["CUDA_VISIBLE_DEVICES"] = ""
import json
import pandas as pd
from operator import itemgetter
from transformers import AutoTokenizer, AutoModelForCausalLM
from langchain.prompts import ChatPromptTemplate
from langchain.chat_models import ChatOpenAI
from langchain.embeddings import OpenAIEmbeddings
from langchain.schema.output_parser import StrOutputParser
from langchain.schema.runnable import RunnablePassthrough, RunnableLambda
from langchain.vectorstores import FAISS
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders.csv_loader import CSVLoader

# Embed and store splits
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings, HuggingFaceEmbeddings
from langchain import hub
from langchain.llms import HuggingFacePipeline
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from transformers import OpenAIGPTTokenizer, OpenAIGPTModel
import torch
from langchain.llms import HuggingFaceHub
from langchain.llms import HuggingFacePipeline
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline, AutoModelForSeq2SeqLM

 # RAG chain
from langchain.schema.runnable import RunnablePassthrough

# torch.cuda.set_device('cpu')
# dont use cuda
device = torch.device('cpu')


  from .autonotebook import tqdm as notebook_tqdm


## Setting up Components for LangChain Modules

This code segment is responsible for setting up the components required for the processing of custom data using LangChain framework, specifically, for retrieval-augmented generation. It performs the following tasks:

1. **Embeddings Model**: Defines the model name for embeddings, which is "alibidaran/medical_transcription_generator".

2. **Initializing Embeddings**: Initializes an embeddings object using the Hugging Face model specified.

3. **Vector Store Creation**: Initializes a vector store using the Chroma library, specifying the `persist_directory` and the embedding function.

4. **Retriever Creation**: Creates a retriever using the vector store, enabling efficient text data retrieval.

5. **Prompt Loading**: Loads a prompt for the Retrieval-Augmented Generation (RAG) model using the `hub.pull` method.

These components are essential for processing and generating text data by using vector database based solution.

In [2]:
# Define the model name for embeddings
embeddings_model_name = "alibidaran/medical_transcription_generator"

# Initialize an embeddings object using the Hugging Face model specified
embeddings = HuggingFaceEmbeddings(model_name=embeddings_model_name)

# Initialize a vector store using the Chroma library, specifying the persist_directory and the embedding function
vectorstore = Chroma(persist_directory="./vector_stores/vectorstore_train/", embedding_function=embeddings)

# Create a retriever using the vector store
retriever = vectorstore.as_retriever()

# Load a prompt for the RAG (Retrieval-Augmented Generation) model using hub.pull
rag_prompt = hub.pull("rlm/rag-prompt")

No sentence-transformers model found with name /home/balu/.cache/torch/sentence_transformers/alibidaran_medical_transcription_generator. Creating a new one with MEAN pooling.


In [3]:
# !pip install accelerate

## Initializing a Text-to-Text Generation Model

In this section, we initialize a text-to-text generation model for natural language processing tasks. The code accomplishes the following:

1. **Model Selection**: Specifies the model ID as 'google/flan-t5-small', indicating the choice of the model for text generation.

2. **Tokenizer Initialization**: Initializes a tokenizer using the selected model's pretrained weights.

3. **Model Initialization**: Initializes the text-to-text generation model using the model ID, with additional parameters such as `load_in_8bit` and `device_map` settings.

4. **Pipeline Setup**: Creates a text-to-text generation pipeline that utilizes the model and tokenizer. It sets the maximum generated text length to 128 characters.

5. **HuggingFace Pipeline**: Wraps the pipeline in a HuggingFacePipeline object for easier interaction with the model.

These steps prepare the model and associated components for text-to-text generation tasks.

In [4]:
# Initialize the model ID for the text-to-text generation model
model_id = 'google/flan-t5-small'

# Initialize the tokenizer using the selected model's pretrained weights
tokenizer = AutoTokenizer.from_pretrained(model_id)

# Initialize the text-to-text generation model using the model ID, with additional parameters
model = AutoModelForSeq2SeqLM.from_pretrained(model_id, load_in_8bit=False, device_map='cpu')

# Create a text-to-text generation pipeline using the model and tokenizer, limiting the generated text length
pipeline = pipeline(
    "text2text-generation",
    model=model,
    tokenizer=tokenizer,
    max_length=128
)

# Wrap the pipeline in a HuggingFacePipeline object for easier interaction with the model
hf = HuggingFacePipeline(pipeline=pipeline)

## Building a Data Processing Pipeline with RAG Model

In this section, we construct a data processing pipeline that involves the use of a Retrieval-Augmented Generation (RAG) model. The code accomplishes the following:

1. **Pipeline Definition**: We define a data processing pipeline using the LangChain framework. The pipeline consists of three stages: retrieving relevant information, forming a question, and generating a response.

2. **Retriever**: The initial stage of the pipeline involves the `retriever`, which is responsible for retrieving context or information from a dataset.

3. **Question Generation**: The second stage utilizes `RunnablePassthrough()` to create a question based on the retrieved context.

4. **RAG Model**: The third stage of the pipeline incorporates a Retrieval-Augmented Generation (RAG) model, `rag_prompt`, to generate responses based on the context and question.

5. **HuggingFace Pipeline**: We then use the HuggingFace pipeline (`hf`) to execute the complete data processing chain.

The result of this pipeline is a generated response based on the provided context and question.

In [5]:
from langchain.schema.runnable import RunnablePassthrough

# Define a data processing pipeline with LangChain
rag_chain = (
    # First stage: Retrieving relevant information
    {"context": retriever, "question": RunnablePassthrough()}

    # Second stage: Utilizing RAG model to generate responses
    | rag_prompt

    # Third stage: Using the HuggingFace pipeline for execution
    | hf
)

## Executing the Data Processing Pipeline and Measuring Execution Time

In this section, we execute the data processing pipeline and measure the execution time. The code accomplishes the following:

1. **Time Measurement Start**: We record the current time before executing the pipeline to measure the time it takes to complete the processing.

2. **Pipeline Execution**: The `rag_chain` is invoked with a specific question, "What are the symptoms of ischemic heart disease?".

3. **Result Output**: We print the result of the pipeline's execution, which includes the generated response.

4. **Time Measurement End**: We calculate the time taken to execute the pipeline and print the elapsed time.

This code provides a practical example of using the data processing pipeline with a specific question and measures the processing time.

In [7]:
import time

# Record the start time for measuring execution time
s = time.time()

# Execute the data processing pipeline with a specific question
ress = rag_chain.invoke("What are the symptoms of ischemic heart disease?")

# Print the result of the pipeline's execution
print(ress)

# Calculate and print the elapsed time for execution
print(time.time()-s)

systolic heart failure
24.305009603500366
