Skip to content

som-shahlab/Clinfo.AI

Repository files navigation

logo

Welcome to the official repository for Clinfo.ai: An Open-Source Retrieval-Augmented Large Language Model System for Answering Medical Questions using Scientific Literature.

Arxiv: Arxiv     |     Paper: PSB 2024     |     Demo: Live App

If you would like to see some functionality or have a comment, open an issue on this repo, we will try to reply as soon as possible

📖 Table of Contents

  1. Intro
  2. Clinfo+Closed Source Models
  3. Clinfo+Open Source Models
  4. Citation

Updates

Jul 05, 2024 Update: Added Support for Google's API models

Millions of medical research articles are published every year. On the other side, healthcare professionals and medical researchers are expected to stay abreast of the latest scientific discoveries pertinent to their daily practice. However, with limited time and a broad field to cover, keeping up-to-date can be a challenging task. Clinfo.AI searches and synthesizes medical literature tailored to a specific clinical question to provide an answer grounded on indexed literature. By leveraging a chain of LLMS clinfo.ai, can analyze the context of the inquiry to identify and present the most relevant articles pertinent to a scientific question.

comparison

What type of questions can I ask?

Questions based on scientific evidence reported in the literature, for example:

  1. What percentage of HIV-positive patients transmit the virus to their children?

  2. When do most episodes of COVID-19 rebound after stopping paxlovid treatment?

  3. Does magnesium consumption significantly improve sleep quality?

Type of questions you can’t answer with clinfo.AI

Broad questions: These types of questions could potentially be answered by clinfo.AI, but it is highly probable you won’t get what you are looking for. How to correct this type of question? Provide context. For example: "Chest pain pediatrics?"

We recommend asking a specific question to get the best answer:

Original Question: "Chest pain pediatrics?"

Improved Question: "What are common causes of chest pain in pediatric patients?"


How does Clinfo.AI work?

diagram

Clinfo.AI is a RetA LLM system, it consists of a collection of four LLMs working conjointly (an LLM chain) coupled to a Search Index as depicted in the above Figure:

  1. First, the input (the question submitted by the user) is converted to a query by an LLM (Question2Query). E.g. for PubMed, the question is converted to a query containing MeSH terms.
  2. The generated queries are then used to retrieve articles from indexed sources (e.g. PubMed)
  3. Then give an article and the original question an LLM is tasked to classify if the article is relevant (if enabled BM25 is used to rank the selected articles).
  4. Relevant articles are individually summarized by an LLM.
  5. Lasyty an LLM aggregates all summaries to provide an overview of all relevant articles.

OPENAI API:

Create an OpenAI account, get an API Key, and edit the key field OPENAI_API_KEY in config.py with your own key.

NCBI API:

Clinfo.ai retrieves literature using the NCBI API; while access does not require an account, calls are limited for unregistered users. We recommend creating an NCBI account. Once generated, save the NCBI API key and email under NCBI_API_KEY and EMAIL, respectively.

In summary edit the following variables inside config.py:

OPENAI_API_KEY = "YOUR API TOKEN"
NCBI_API_KEY   = "YOUR API TOKEN"  (optional)
EMAIL          = "YOUR EMAIL"      (optional)

Using Clinfo.AI:

from  src.clinfoai.clinfoai import ClinfoAI
from config   import OPENAI_API_KEY, NCBI_API_KEY, EMAIL
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY

question = "What is the prevalence of COVID-19 in the United States?"
clinfo   = ClinfoAI(llm="gpt-3.5-turbo",openai_key=OPENAI_API_KEY, email= EMAIL)
answer   = clinfo.forward(question=question)         

src/notebooks/01_UsingClinfoAI.ipynb has a quick run-through and explanation for each individaul clinfo.AI component.

Clinfo.ai has full integration with vLLM. We can use any open source LLM as a backbone following two simple steps:

Setting an API server

First, we use vLLM to create an API selecting the model you want to work with: In the following example we use Qwen/Qwen2-beta-7B-Chat

 python -m vllm.entrypoints.openai.api_server --model Qwen/Qwen2-beta-7B-Chat

Switch the LLM model name to the selected model

Instantiate a clinfoAI object with the desired LLM :

from  src.clinfoai.clinfoai import ClinfoAI
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY

question = "What is the prevalence of COVID-19 in the United States?"
clinfo   = ClinfoAI(llm="Qwen/Qwen2-beta-7B-Chat")
answer   = clinfo.forward(question=question)         

IMPORTANT:

While anyone can use Clinfo.AI, our goal is to augment medical experts not replace them. Read our disclaimer disclaimer and DO NOT use clinfo.AI for medical diagnosis.

If you use Clinfo.ai, please consider citing:

@inproceedings{lozano2023clinfo,
  title={Clinfo. ai: An open-source retrieval-augmented large language model system for answering medical questions using scientific literature},
  author={Lozano, Alejandro and Fleming, Scott L and Chiang, Chia-Chun and Shah, Nigam},
  booktitle={PACIFIC SYMPOSIUM ON BIOCOMPUTING 2024},
  pages={8--23},
  year={2023},
  organization={World Scientific}
}

About

This is Clinfo.AI Demo Instruction

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published