<a href="https://colab.research.google.com/github/oldflag/Symptom_Dx_ICD/blob/main/Symptom_Dx_ICD.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


## Project Overview

In this simple clinical setting, the process is straightforward:


1. A patient visits a doctor’s office and describes their symptoms.
2. The physician writes a diagnosis in the visit note.
3. Medical coders extract ICD codes for the diagnoses from the visit note.

This project aims to simplify and implement this clinical scenario using Large Language Models (LLMs), Retrieval-Augmented Generation (RAG), and fine-tuning techniques. It integrates and builds upon my previous two GitHub projects: [FineTuning_Llama_3_8b_Symptom_Dx](https://github.com/oldflag/FineTuning_Llama_3_8b_Symptom_Dx) and [GenAI_RAG_ICDcoding](https://github.com/oldflag/GenAI_RAG_ICDcoding).

## Previous Projects Integration

### FineTuning_Llama_3_8b_Symptom_Dx:
In this project, I fine-tuned the Llama 3 8b model with symptom and diagnosis Q&A data. The goal was to enable the model to provide accurate diagnoses based on given symptom descriptions.

### GenAI_RAG_ICDcoding:
In this project, I developed an RAG application using OpenAI ChatGPT APIs and a subset of ICD code descriptions from CMS. The application is designed to extract and suggest appropriate ICD codes based on clinical notes.

## Project Goals

This project combines the strengths of the above two projects to create a comprehensive solution for clinical note processing. The primary objectives are:

* Symptom to Diagnosis: Utilizing the fine-tuned Llama 3 8b model to interpret patient symptoms and provide probable diagnoses.
* Diagnosis to ICD Coding: Implementing an RAG approach to accurately extract ICD codes from the diagnoses mentioned in clinical notes.

By integrating these advanced AI techniques, this project aims to streamline the workflow in clinical settings, enhancing the efficiency and accuracy of medical coding

In [None]:
# install necessary packages
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install --no-deps xformers "trl<0.9.0" peft accelerate bitsandbytes
!pip install langchain openai huggingface_hub langchain_community langchain_openai faiss-gpu

In [33]:
# set up environment varialbes for CoLab

import os
from google.colab import userdata

HF_KEY = userdata.get('HF_TOKEN')
TRANSFORMERS_CACHE = userdata.get('TRANSFORMERS_CACHE')
HF_HOME = userdata.get('HF_HOME')
os.environ['LANGCHAIN_TRACING_V2'] = userdata.get('LANGCHAIN_TRACING_V2')
os.environ['OPENAI_API_KEY'] = userdata.get('OPENAI_API_KEY')
os.environ['LANGCHAIN_API_KEY'] = userdata.get('LANGCHAIN_API_KEY')
os.environ['LANGCHAIN_ENDPOINT'] = userdata.get('LANGCHAIN_ENDPOINT')


In [None]:
# Getting the finetuned model from HuggingFace
from langchain_community.llms.huggingface_pipeline import HuggingFacePipeline
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

model_id = "oldflag/symptom_dx_finetue_Llama-3_8b_Unsloth"
tokenizer = AutoTokenizer.from_pretrained(
    model_id
)
model = AutoModelForCausalLM.from_pretrained(model_id)
pipe = pipeline("text-generation", model=model,
                tokenizer=tokenizer, max_new_tokens=512)

hf = HuggingFacePipeline(pipeline=pipe)

In [6]:
# set up langchain to get response with diagnoses from symptom input
from langchain.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser

template = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Based on the following symptoms, suggest  possible diagnoses

### Input:
{input}

### Response:

"""
prompt = PromptTemplate.from_template(template)

# chain
chain_Dx = prompt | hf | StrOutputParser()



In [26]:
# set up a path for Colab
import sys
dir = """/content/drive/MyDrive/Colab Notebooks/LangChains/"""
sys.path.append(dir)
from rag_icdcoding import *


In [36]:
# set up RAG with an OpenAI gpt model and ICD description
icdfile_path = '/content/drive/MyDrive/Colab Notebooks/LangChains/icd10cm_order_2024_top100.txt'  # only first 100 rows for testing
icd10_df = parse_icd10_description(icdfile_path)
data = load_dfdata(icd10_df)
vectordb = create_vectorstore(data)
# Initialize the language model (LLM)
llm = ChatOpenAI(model='gpt-3.5-turbo', temperature=0)
chain_demo = extract_demo(llm, vectordb)
chain_icd = extract_icd(llm, vectordb)


In [7]:
#symptom > diagnosis > ICD code

# input: symptom
symptom = "I have a skin rash that gets worse in the winter when the air is dry. I have to moisturize more regularly and use humidifiers to keep my skin moisturized. I also have joint pain"

# from symptom to diagnoses
diagnoses = chain_Dx.invoke({"input": symptom})

# from diagnoses to ICD codes
icds = chain_icd.invoke(diagnoses)

print(diagnoses)
print(icds)

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Based on the following symptoms, suggest a possible diagnosis

### Input:
I have a skin rash that gets worse in the winter when the air is dry. I have to moisturize more regularly and use humidifiers to keep my skin moisturized. I also have joint pain

### Response:

psoriasis


'Based on the provided note, the related ICD codes for psoriasis would be:\n- A074 for Cyclosporiasis\n- A073 for Isosporiasis\n- A067 for Cutaneous amebiasis\n\nThese are the closest matches based on the information provided.'