MED_LLM – Gut / IBD Knowledge Graph QA

This repository contains the code and assets for a prototype system that builds a biomedical knowledge graph for gut health and inflammatory bowel disease (IBD) and exposes it via a Streamlit QA interface. The project compares a custom GenAI‑driven extraction pipeline with Neo4j’s LLM Graph Builder on shared biomedical articles (e.g. Burger et al. for the Human Gut Cell Atlas common coordinate framework).

The main goals are:

Extract entities and relations from gut / IBD literature using LLMs.
Populate a custom knowledge graph and a Neo4j Aura instance with consistent schema.
Support question‑answering over the graph (Streamlit app).
Qualitatively compare answers and graph structure against the Neo4j LLM graph builder.

Repository structure

src/
Core Python code:
- preprocessing (PDF → text, sentence splitting)
- LLM prompts and extraction
- KG construction and Neo4j loading
- Streamlit app entry point
data/
- raw/ – source PDFs or text (not all tracked in Git)
- processed/ – cleaned text, sentence indices, extracted triples, etc.
eval/
Notebooks and scripts for:
- baseline keyword search
images/
Screenshots and figures used in the dissertation and README (Streamlit UI, Neo4j graphs, HGCA examples).
requirement.txt
Python dependencies for the project.

Setup

Clone the repo

git clone https://github.com/lostiithi/MED_LLM.git
cd MED_LLM

Create and activate a virtual environment

python -m venv .venv
source .venv/bin/activate   # Windows: .venv\Scripts\activate

Install dependencies

pip install -r requirement.txt

Configure environment variables

Create a .env file (not committed to Git) with at least:

OPENAI_API_KEY=...
NEO4J_URI=...
NEO4J_USER=...
NEO4J_PASSWORD=...

Running the pipeline

1. Preprocess input articles

Example:

python src/parse/parse_pdf.py \
  --input data/raw \
  --output data/processed

This produces cleaned text and sentence indices for each PDF.

2. Run LLM‑based extraction

python src/run_extraction.py \
  --config src/config/extraction_hgca.yml \
  --input data/processed \
  --output data/processed/sentences.csv

This step calls the LLM to extract entities and relations and saves triples to CSV.

3. Build the custom knowledge graph

python src/llm_test.py \
  --triples data/processed/sentences.csv \
  --output data/processed/llm_entities.csv
  --output data/processed/llm_relations.csv

4. Load triples into Neo4j

Use Neo4j import option to upload the csv files to proceed. Neo4j connection details are taken from .env.

Streamlit QA app

To start the Gut / IBD Knowledge Graph QA interface:

streamlit run src/app.py

The app lets you:

Ask HGCA competency questions (e.g. Crohn’s lesions in terminal ileum).
See answers generated over the custom KG.
Inspect retrieved nodes, edges, and supporting sentences.

Evaluation scripts

Under eval/ you will find:

Baseline keyword search for each competency question.

Contact

For questions about this project, please contact:

Mohammed Ihthisham Neelam Kadavil (GitHub: lostithi)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MED_LLM – Gut / IBD Knowledge Graph QA

Repository structure

Setup

Running the pipeline

1. Preprocess input articles

2. Run LLM‑based extraction

3. Build the custom knowledge graph

4. Load triples into Neo4j

Streamlit QA app

Evaluation scripts

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data		data
eval		eval
images		images
src		src
LICENSE		LICENSE
README.md		README.md
requirement.txt		requirement.txt

Folders and files

Latest commit

History

Repository files navigation

MED_LLM – Gut / IBD Knowledge Graph QA

Repository structure

Setup

Running the pipeline

1. Preprocess input articles

2. Run LLM‑based extraction

3. Build the custom knowledge graph

4. Load triples into Neo4j

Streamlit QA app

Evaluation scripts

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages