Redefining Digital Health Interfaces with Large Language Models

This repository provides the implementation of our paper, Redefining Digital Health Interfaces with Large Language Models. Our work demonstrates how large language models (LLMs) can be effectively integrated with digital health tools using LangChain.

Overview

The repository includes scripts and resources to replicate the experiments and results presented in the paper. It also provides an interactive interface using Streamlit to explore the functionality of a Retrieval-Augmented Generation (RAG)-based system.

Prerequisites

To use this repository, you will need an API key for querying the LLM. The simplest way to set this up is to create a Python file named openai_config.py in the root directory with the following structure:

def get_openai_config():
    openai_config = {
        "api_type": "azure",
        "api_base": api_base, # URL that specifies the endpoint for the API request
        "api_version": api_version, # Azure OpenAI REST API version to use
        "api_key": api_key_main, # Your API key
        "deployment_id": deployment_name, # Azure deployment name for the model
        "deployment_id_ada": deployment_name_ada, # Name of text embedding model, e.g.text-embedding-ada-002
        "temperature": 0.0,
        "seed": 0
    }
    return openai_config

Replace the placeholders (api_base, api_version, api_key_main, deployment_name, deployment_name_ada) with your actual configuration values. This will allow you to query LLMs using our framework via the Azure API.

Getting started

1. Install dependencies

Ensure that you have Python 3.10 or later installed. Create a virtual environment and install the required packages by running:

python -m venv venv
source venv/bin/activate  # On macOS/Linux
venv\Scripts\activate     # On Windows
pip install -r requirements.txt

2. Repository structure

Here is the structure of the repository.

.
├── README.md
├── app_cvd.py                # Streamlit-based interactive app
├── main_all_questions.py     # Script to replicate all questions from the paper
├── main_specific_question.py # Script to replicate a specific question
├── custom_tools.py           # Custom utilities for the project
├── model_utils.py            # Model-related utilities
├── openai_config.py          # Configuration file for LLM API keys (user-created)
├── resources/                # Folder containing data and documents
│   ├── documents/            # Supporting documents
│   ├── model/                # Pre-trained model files
│   └── patient_info/         # Synthetic patient information and background data
├── conversations/            # Conversation logs
└── requirements.txt          # Required Python packages

3. Running the project

Interactive interfaces. To explore the RAG-based system interactively, launch the Streamlit app streamlit run app_cvd.py. This will open a web-based interface where you can interact with the system.
Reproducing results from the paper. To replicate results for all questions discussed in the paper, run python main_all_questions.py.
Reproducing specific questions. To reproduce specific questions, run python main_specific_question.py <index>. Replace <index> with the index of the question you want to replicate.

4. Data description

Synthetic background dataset: The file resources/patient_info/background_dataset.csv is a synthetically generated dataset used to compute SHAP values for the Autoprognosis2 model. This synthetic data replaces the original dataset, which is private. Substituting this file with the original data (if available) will yield accurate SHAP values.
Patient information: The resources/patient_info/ directory contains additional synthetic patient records used in the paper.

5. Finding the key questions

The questions that were used to prompt the language models can be found under conversations/LLM_interfaces_questions_answers.csv. The tools and their respective documentation is equivalent to the docstrings of the given tools in the custom_tools.py file.

6. Citation

The details on the implementations can be found in the paper. If you found this repo useful, please consider citing our paper:

@Article{imrie2023redefining,
    title={Redefining Digital Health Interfaces with Large Language Models}, 
    author={Fergus Imrie and Paulius Rauba and Mihaela van der Schaar},
    year={2023},
    journal={arXiv preprint arXiv:2310.03560}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Redefining Digital Health Interfaces with Large Language Models

Overview

Prerequisites

Getting started

1. Install dependencies

2. Repository structure

3. Running the project

4. Data description

5. Finding the key questions

6. Citation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
conversations		conversations
resources		resources
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
app_cvd.py		app_cvd.py
custom_tools.py		custom_tools.py
main_all_questions.py		main_all_questions.py
main_all_questions_stroke.py		main_all_questions_stroke.py
main_specific_question.py		main_specific_question.py
model_utils.py		model_utils.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Redefining Digital Health Interfaces with Large Language Models

Overview

Prerequisites

Getting started

1. Install dependencies

2. Repository structure

3. Running the project

4. Data description

5. Finding the key questions

6. Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages