# Local LLM interactive demo

This notebook uses Colab to demonstrate how to set up an LLM service, use langchain to work with the service with multiple LLM techniques.

First we need to set up the LLM service. Make sure you use the GPU kernel, so that you won't be waiting too long for any responses. If you configured correctly, the following cell should display an Nvidia GPU information. For example:

```
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   57C    P0             29W /   70W |    1578MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+

```

In [None]:
!nvidia-smi

In [None]:
!curl -fsSL https://ollama.com/install.sh | sh

In [None]:
# install some linux dependencies, so that ollama can recognize GPU hardware. Tmux is a software to allow the service ran in the backend.
!sudo apt install pciutils lshw tmux

In [None]:
!lspci | grep -i nvidia
!lshw -C display

In [None]:
# start ollama server in the background
!tmux new-session -d -s ollama 'ollama serve'

In [None]:
# download model in the background
!tmux new-session -d -s qwen 'ollama pull qwen2.5-coder:1.5b'

In [None]:
%%bash
# test if model is loaded successfully, you might need to wait for around half a minute
curl http://localhost:11434/api/generate -d '{
  "model": "qwen2.5-coder:1.5b",
  "prompt":"Hello"
}'

## Install python packages

In [None]:
!pip install langchain requests langchain-ollama

## Start a llm client

In [None]:
!ollama stop deepseek-r1:1.5b

In [None]:
from langchain import LLMChain
from langchain_ollama.llms import OllamaLLM
from langchain.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser


llm  = OllamaLLM(model="qwen2.5-coder:1.5b")
output_parser= StrOutputParser()


# Zero shot learning

In [None]:
# Define a zero-shot prompt for clinical entity extraction
prompt_template = """
You are a clinical NLP assistant. Extract all clinical entities (diseases, medications, procedures) from the following text:
{text}
"""

# Convert the above prompt string into a chain, so that we can use it to process input text
chain= PromptTemplate(template=prompt_template) | llm | output_parser

# now we make a small input text and request the chain to process it and see what we can get in "result"
clinical_text = "Patient John Doe, 58, presents with chest pain and shortness of breath. Administered aspirin and nitroglycerin."
result = chain.invoke(clinical_text)
print("Extracted Entities:", result)

# Exercise 1:  

Try to run the above cell multiple times, you may have noticed that each time the response is different, even though the main idea might be the same, right? How would to tweak the prompt to fix the problem?

# A few shot learning
Instead of tweaking the prompt request itself, we can actually show examples to LLM. This becomes a few shot learning.

In [None]:
# Define a zero-shot prompt for clinical entity extraction
prompt_template = """
You are a clinical NLP assistant. Extract all clinical entities (diseases, medications, procedures) from the following text.

For example:
Input:
Patient John Doe, 58, presents with chest pain and shortness of breath. Administered aspirin and nitroglycerin.
Output:
1. Signs & Symptoms: chest pain, shortness of breath
2. Medications: aspirin, nitroglycerin
3. Procedures: None

Input:
{text}
"""

# Convert the above prompt string into a chain, so that we can use it to process input text
chain= PromptTemplate(template=prompt_template) | llm | output_parser

# now we make a small input text and request the chain to process it and see what we can get in "result"
clinical_text = "Pt is a 58yo male, presents with chest pain and shortness of breath. Administered aspirin and nitroglycerin."
result = chain.invoke(clinical_text)
print(result)

In [None]:
# You might consider that the model just copy the answer. Let's try a different example
prompt_template = """
You are a clinical NLP assistant. Extract all clinical entities (diseases, medications, procedures) from the following text.

For example:
Input:
Patient John Doe, 58, presents with chest pain and shortness of breath. Administered aspirin and nitroglycerin.
Output:
1. Signs & Symptoms: chest pain, shortness of breath
2. Medications: aspirin, nitroglycerin
3. Procedures: None

Input:
{text}
"""

# Convert the above prompt string into a chain, so that we can use it to process input text
chain= PromptTemplate(template=prompt_template) | llm | output_parser

# now we make a small input text and request the chain to process it and see what we can get in "result"
clinical_text = "A 67-year-old female presented to the emergency department with a sudden, severe headache, blurred vision, and confusion, accompanied by elevated blood pressure and an irregular pulse. On physical examination, she exhibited diaphoresis and nausea. A CT scan of the head revealed evidence of a hemorrhagic stroke. She was promptly managed with aggressive blood pressure control and underwent emergency neurosurgical intervention to evacuate the hematoma."
result = chain.invoke(clinical_text)
print(result)

# Let's try chain of thought (CoT)

Chain of Thought prompting for clinical entity extraction

Chain-of-thought (CoT) prompting is an approach that encourages language models to break down complex problems into intermediate reasoning steps before arriving at a final answer. By guiding the model to “think aloud,” CoT helps reveal the internal decision-making process, making its outputs more transparent and often more accurate, especially for tasks that involve multi-step logic or complex reasoning. This method not only improves problem-solving performance but also provides valuable insights into how the model processes information—a feature that can be crucial in sensitive domains like clinical decision support.


In [None]:
# let's try a different task using zero shot learning, and see how it does:

prompt_template = """
You are a clinical NLP assistant, given the provided clinical text, try to list the most likely diagnoses.

Input:
{text}
"""
chain= PromptTemplate(template=prompt_template) | llm | output_parser
clinical_text = "A 67-year-old female presented to the emergency department with a sudden, severe headache, blurred vision, and confusion, accompanied by elevated blood pressure and an irregular pulse. On physical examination, she exhibited diaphoresis and nausea. A CT scan of the head revealed evidence of a hemorrhagic stroke. She was promptly managed with aggressive blood pressure control and underwent emergency neurosurgical intervention to evacuate the hematoma."
result = chain.invoke(clinical_text)
print("Extracted Entities:", result)


In [None]:
# Define a zero-shot prompt for clinical entity extraction
prompt_template = """
You are a clinical NLP assistant. Analyze the following clinical text by breaking down your reasoning into steps.

Step 1: Identify and list key signs, symptoms, and clinical details.
Step 2: Note any tests, procedures, or treatments mentioned.
Step 3: Based on the collected information, propose possible diagnoses.
Step 4: Explain briefly why these diagnoses are the most likely.
Step 5: Verify and summarize the proposed diagnoses using a json format string: {{"diagnoses": ["diagnosis1", "diagnosis2", ...]}}

Input:
{text}
"""
# Convert the above prompt string into a chain, so that we can use it to process input text
chain= PromptTemplate(template=prompt_template) | llm | output_parser

# now we make a small input text and request the chain to process it and see what we can get in "result"
clinical_text = "A 67-year-old female presented to the emergency department with a sudden, severe headache, blurred vision, and confusion, accompanied by elevated blood pressure and an irregular pulse. On physical examination, she exhibited diaphoresis and nausea. A CT scan of the head revealed evidence of a hemorrhagic stroke. She was promptly managed with aggressive blood pressure control and underwent emergency neurosurgical intervention to evacuate the hematoma."
result = chain.invoke(clinical_text)
print("Extracted Entities:", result)

# Retrieval-Augmented Generation (RAG)

Retrieval-Augmented Generation (RAG) combines information retrieval techniques with large language models to enhance response quality. In RAG, relevant documents or passages are first retrieved from a knowledge base, then the language model uses that context to generate more informed and accurate responses. This hybrid approach helps ensure that the generated text is both contextually rich and up-to-date.

I prepared the following guideline by asking chatgpt: "Summarize the page below to a guideline for LLM use: " and copy the content of the page: https://www.mayoclinic.org/diseases-conditions/stroke/diagnosis-treatment/drc-20350119.


This is just a demo of RAG, where how to get the augmented content is not included here. But you can see how the augumentation can help with the LLM's response. You can read more about RAG if you are interested.

In [None]:

guideline='''# Guideline for Recognition of Stroke Symptoms (FAST):

F: Facial weakness

A: Arm weakness

S: Speech slurring

T: Time – immediate action is critical
Emphasize that if these symptoms are observed—even if transient—they warrant urgent emergency evaluation.

Diagnostic Evaluation:

Initial Assessment: Highlight the importance of a physical and neurological exam along with vital sign monitoring.

Imaging and Tests: Include the use of CT scans, MRI, carotid ultrasound, cerebral angiograms, and echocardiograms to differentiate stroke types and determine the extent of brain injury.

Stroke Types and Treatments:

Ischemic Stroke: Explain that blood clots can block blood flow, and timely treatment (e.g., IV TPA within the therapeutic window and endovascular procedures) is essential to restore circulation and minimize brain damage.

Hemorrhagic Stroke: Describe how a ruptured aneurysm or other vascular abnormality causes bleeding, requiring rapid measures to control bleeding, relieve brain pressure, and possibly surgical intervention (clipping, coiling, or radiosurgery).

Prevention and Risk Management:

Risk Factors: Distinguish between non-modifiable (age, gender, family history) and modifiable (high blood pressure, cholesterol, smoking, diabetes, etc.) risk factors.

Prevention Strategies: Stress the importance of managing modifiable risk factors through lifestyle changes and medications, which are more effective than post-stroke treatments.

Rehabilitation and Recovery:

Multidisciplinary Approach: Outline that recovery involves a team of healthcare professionals (neurologists, rehabilitation physicians, therapists, etc.) who guide the patient through physical, occupational, and speech therapies.

Gradual Process: Communicate that stroke recovery is gradual, and continuous therapy over months can lead to significant improvements.

Patient Education and Urgency:

Emergency Action: Reinforce that stroke symptoms require immediate medical attention to improve outcomes.

Ongoing Support: Provide reassurance regarding the availability of support, both medical and emotional, throughout the recovery process.'''

In [None]:
# Define a zero-shot prompt for clinical entity extraction
prompt_template = guideline+"""

You are a clinical NLP assistant. Analyze the following clinical text using the guideline above and break down your reasoning into steps.

Step 1: Identify and list key signs, symptoms, and clinical details.
Step 2: Note any tests, procedures, or treatments mentioned.
Step 3: Based on the collected information, propose possible diagnoses.
Step 4: Explain briefly why these diagnoses are the most likely.
Step 5: Verify and summarize the proposed diagnoses using a json format string: {{"diagnoses": ["diagnosis1", "diagnosis2", ...]}}

Input:
{text}
"""
# Convert the above prompt string into a chain, so that we can use it to process input text
chain= PromptTemplate(template=prompt_template) | llm | output_parser

# now we make a small input text and request the chain to process it and see what we can get in "result"
clinical_text = "A 67-year-old female presented to the emergency department with a sudden, severe headache, blurred vision, and confusion, accompanied by elevated blood pressure and an irregular pulse. On physical examination, she exhibited diaphoresis and nausea. A CT scan of the head revealed evidence of a hemorrhagic stroke. She was promptly managed with aggressive blood pressure control and underwent emergency neurosurgical intervention to evacuate the hematoma."
result = chain.invoke(clinical_text)
print("Extracted Entities:", result)

# Exercise 2:

Think about your own project, can you try one or two examples of your data and design a prompt to work out a solution?