# Medical Question Answering with HuatuoGPT-o1

Authored by: [Alan Ponnachan](https://huggingface.co/AlanPonnachan)

## Introduction

This notebook showcases the impressive capabilities of [HuatuoGPT-o1](https://huggingface.co/FreedomIntelligence/HuatuoGPT-o1-7B), a state-of-the-art large language model (LLM) specifically designed for medical applications. Unlike many other medical LLMs, HuatuoGPT-o1 excels at **complex reasoning**, allowing it to tackle challenging medical questions by thinking through the problem step-by-step, much like a human doctor.

**What makes HuatuoGPT-o1 special?**

*   **"Thinks-Before-It-Answers" Approach:** HuatuoGPT-o1 doesn't just provide answers. It demonstrates its reasoning process through a unique "Thinking" section, followed by a "Final Response." This makes the model's output more transparent and understandable.
*   **Trained for Complex Reasoning:** The model has been trained using a novel two-stage approach that focuses on learning complex reasoning trajectories and refining them through reinforcement learning. This results in more accurate and reliable answers.
*   **Strong Performance:** HuatuoGPT-o1 has achieved state-of-the-art results on several medical benchmarks, outperforming both general-purpose and other medical-specific LLMs.

**Objective of this Notebook:**

In this notebook, we will demonstrate how to use the pre-trained HuatuoGPT-o1-7B model to answer a variety of medical questions **without any fine-tuning**. This will highlight the model's impressive out-of-the-box capabilities and its potential as a valuable tool for medical professionals, researchers, and anyone interested in medical AI.

## How to Use This Notebook

This notebook is designed to be self-contained and easy to follow. It's broken down into the following sections:

1. **Setup:** Installing the necessary libraries and importing them.
2. **Loading the Model and Tokenizer:** Downloading and loading the HuatuoGPT-o1-7B model and its associated tokenizer from the Hugging Face Hub.
3. **Preparing Medical Questions:** Defining a set of medical questions to test the model.
4. **Generating Answers:** Using the model to generate answers to the questions.
5. **Analyzing the Output:** Examining the model's "Thinking" and "Final Response" sections to understand its reasoning process.
6. **Discussion:** Discussing the model's strengths, limitations, and ethical considerations.
7. **Conclusion:** Summarizing the key takeaways and suggesting further exploration.

Let's get started!

## Setup

First, we need to install the required libraries. You can uncomment and run the following cell if you don't already have them installed in your environment.

**Important:** Before running the code, make sure you are using a GPU runtime for faster performance. You can change the runtime type by going to **"Runtime" -> "Change runtime type"** and selecting **"GPU"** under "Hardware accelerator".

In [None]:
# Install necessary libraries (Optional: Run only if needed)
# !pip install transformers torch accelerate

In [1]:
# Import required libraries
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

## Load the HuatuoGPT-o1 Model and Tokenizer

Now, let's load the pre-trained HuatuoGPT-o1-7B model and its corresponding tokenizer from the Hugging Face Model Hub:

In [2]:
# Load the pre-trained model and tokenizer
model_name = "FreedomIntelligence/HuatuoGPT-o1-7B"
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/686 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/27.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/4 [00:00<?, ?it/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.88G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/4.93G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.33G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.09G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/243 [00:00<?, ?B/s]



tokenizer_config.json:   0%|          | 0.00/7.39k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/2.78M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/1.67M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/11.4M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/605 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/613 [00:00<?, ?B/s]

## Define Medical Questions

Let's create a list of diverse medical questions to test HuatuoGPT-o1's capabilities. These questions cover a range of topics and require different levels of reasoning.

In [3]:
# Define a list of medical questions
medical_questions = [
    "What are the common causes of chest pain?",
    "How is atrial fibrillation diagnosed and treated?",
    "What are the risk factors for developing type 2 diabetes?",
    "A patient presents with sudden onset of severe headache, what are the possible differential diagnoses?",
    "What are the latest treatment guidelines for managing hypertension?",
    "A 55-year-old male presents to the clinic with complaints of increasing shortness of breath on exertion and a persistent cough for the past 3 months. He reports occasional wheezing and chest tightness, especially at night. He has a 30-pack-year smoking history but quit 5 years ago. On physical examination, his respiratory rate is 20 breaths/min, and there are decreased breath sounds with prolonged expiration and scattered wheezes bilaterally. What is the most likely diagnosis and what further investigations would you recommend to confirm the diagnosis and assess the severity of the condition?"
]

## Generate and Analyze Answers

We'll now loop through each question, feed it to the model, and generate an answer. We'll then analyze the output, paying close attention to the "Thinking" section to understand how the model arrives at its conclusions.

In [4]:
# Generate answers for each question
for question in medical_questions:
    print(f"\n\033[94m\033[1mQuestion:\033[0m\033[0m {question}")

    messages = [
        {"role": "system", "content": "You are a helpful medical assistant."},
        {"role": "user", "content": question},
    ]

    inputs = tokenizer(
        tokenizer.apply_chat_template(
            messages, tokenize=False, add_generation_prompt=True
        ),
        return_tensors="pt",
    ).to(model.device)

    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=512,
            temperature=0.7,
            top_p=0.95,
            top_k=50,
            repetition_penalty=1.2,
            eos_token_id=tokenizer.eos_token_id,
        )
    answer = tokenizer.decode(outputs[0], skip_special_tokens=True)

    # Extract and print the reasoning and final response
    try:
        thinking_start = answer.find("## Thinking")
        response_start = answer.find("## Final Response")

        reasoning = answer[thinking_start:response_start].strip()
        final_response = answer[response_start + len("## Final Response"):].strip()

        print("\033[94m\033[1mReasoning:\033[0m\033[0m")  # Blue and bold
        print(reasoning)
        print("\n\033[92m\033[1mFinal Response:\033[0m\033[0m")  # Green and bold
        print(final_response)
    except Exception:
        print("Could not parse output, printing raw output instead.")
        print(answer)


[94m[1mQuestion:[0m[0m What are the common causes of chest pain?
[94m[1mReasoning:[0m[0m
## Thinking

Okay, let's think about what could cause someone to have chest pain.

First off, heart issues definitely come up as one big category. We're talking angina or even something more serious like a heart attack here. These can really be scary because they involve problems with how blood flows in and out of the heart itself. Angina is when you feel discomfort due to not enough oxygen getting to your heart muscle; it usually happens during exercise but might get better if you rest for a bit. A heart attack would mean that part of the heart isn't receiving any blood at all – yikes!

Next on my list would be lung-related stuff since we often hear people talk about having trouble breathing along with their chest pains sometimes. Pulmonary embolism jumps right into mind there - basically an artery in your lungs gets blocked by some sort of clot. That sounds pretty dangerous too! Then aga

## Discussion

In this notebook, we explored the out-of-the-box medical question-answering capabilities of the pre-trained HuatuoGPT-o1-7B model. We observed that the model can generate comprehensive and informative answers to a variety of medical questions, even without any fine-tuning. The "Thinking" and "Final Response" format provides valuable insights into the model's reasoning process.

**Key Observations:**

*   **Strong Reasoning:** HuatuoGPT-o1-7B demonstrates a remarkable ability to reason through complex medical scenarios, considering various factors and arriving at logical conclusions.
*   **Interpretability:** The model's "Thinking" section provides a degree of transparency, allowing us to understand the steps it takes to reach an answer.
*   **Potential for Medical Applications:** The model's performance suggests its potential as a valuable tool for medical professionals, researchers, and students, assisting in tasks like preliminary diagnosis, treatment planning, and medical education.

**Limitations:**

*   **Reliance on Pre-trained Knowledge:** Without fine-tuning, the model relies solely on its pre-existing knowledge, which may not be completely up-to-date or specific enough for all medical situations.
*   **Potential for Errors:** Like all LLMs, HuatuoGPT-o1 is not infallible and may occasionally produce incorrect or misleading information.
*   **Not a Replacement for Doctors:** It's crucial to remember that this model is not a substitute for qualified medical professionals. Its output should always be reviewed and validated by a human expert.

**Ethical Considerations:**

*   **Patient Privacy:** When using medical LLMs, it's essential to prioritize patient privacy and data security.
*   **Transparency and Explainability:** The model's reasoning process should be as transparent as possible to build trust and ensure responsible use.
*   **Bias and Fairness:** LLMs can inherit biases from their training data. It's important to be aware of potential biases and work towards mitigating them.
*   **Accountability:** Clear guidelines should be established for the responsible use of medical LLMs, including accountability for errors or misuse.

**Further Exploration:**

*   **Experiment with Different Questions:** Try asking the model your own medical questions to explore its capabilities further.
*   **Fine-tuning:** If you have a specific medical task in mind, consider fine-tuning the model on a relevant dataset to improve its performance.
*   **Compare with Other Models:** Compare HuatuoGPT-o1's performance with other medical LLMs to understand its strengths and weaknesses relative to other models.
*   **Investigate Prompt Engineering:** Experiment with different prompt formats and instructions to see how they affect the model's responses.

## Conclusion

HuatuoGPT-o1-7B represents a significant advancement in the field of medical AI. Its ability to perform complex reasoning and provide interpretable answers makes it a promising tool for various medical applications. However, it's essential to use this technology responsibly, acknowledging its limitations and addressing ethical considerations. As research in this area continues to progress, we can expect even more powerful and reliable medical LLMs to emerge, potentially revolutionizing healthcare and improving patient outcomes.

## References

*   **HuatuoGPT-o1 Paper:** [https://arxiv.org/pdf/2412.18925](https://arxiv.org/pdf/2412.18925)
*   **HuatuoGPT-o1-7B Model:** [https://huggingface.co/FreedomIntelligence/HuatuoGPT-o1-7B](https://huggingface.co/FreedomIntelligence/HuatuoGPT-o1-7B)
*   **Hugging Face Transformers Documentation:** [https://huggingface.co/transformers/](https://huggingface.co/transformers/)