# üß¨ Reactome LNP Agent ‚Äî Ionizable Lipid Reaction Templates

LangGraph agent with local FAISS RAG over:
- Research papers (PDFs)
- LNP design rules & reaction templates
- Building block & liver score data

**RAG Stack:** FAISS (vector DB) + Bedrock Titan Embeddings + ChatBedrock (LLM)

In [1]:
import os, glob, json
import pandas as pd
from dotenv import load_dotenv

from langchain_aws import ChatBedrock, BedrockEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_community.document_loaders import PyMuPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_core.documents import Document
from langgraph.graph import StateGraph, START, END
from typing import TypedDict

load_dotenv()

REGION = os.getenv('AWS_REGION')
MODEL_ID = os.getenv('BEDROCK_MODEL_ID')

print(f'Region: {REGION} | Model: {MODEL_ID}')

Region: us-west-2 | Model: us.anthropic.claude-sonnet-4-5-20250929-v1:0


## 1. Document Ingestion

In [2]:
DATA_DIR = '../data'
docs = []

# --- PDFs ---
pdf_paths = glob.glob(f'{DATA_DIR}/papers/*.pdf') + glob.glob(f'{DATA_DIR}/lnp_data/*.pdf')
for path in pdf_paths:
    loader = PyMuPDFLoader(path)
    loaded = loader.load()
    for d in loaded:
        d.metadata['source_type'] = 'paper'
    docs.extend(loaded)
    print(f'  PDF: {os.path.basename(path)} ‚Üí {len(loaded)} pages')

# --- Markdown files ---
for md_path in glob.glob(f'{DATA_DIR}/**/*.md', recursive=True):
    text = open(md_path).read()
    docs.append(Document(page_content=text, metadata={'source': md_path, 'source_type': 'rules'}))
    print(f'  MD:  {os.path.basename(md_path)} ‚Üí {len(text)} chars')

# --- Reaction templates (Python code) ---
rxn_path = f'{DATA_DIR}/lnp_data/lnp_reaction.py'
rxn_code = open(rxn_path).read()
docs.append(Document(
    page_content=f'# LNP Reaction Templates (SMARTS)\n\n{rxn_code}',
    metadata={'source': rxn_path, 'source_type': 'reaction_templates'}
))
print(f'  PY:  lnp_reaction.py ‚Üí {len(rxn_code)} chars')

# --- Liver score data (small CSV, embed as context) ---
liver_df = pd.read_csv(f'{DATA_DIR}/lnp_data/final_liver.csv')
liver_summary = f"""# Liver Targeting Score Dataset
Total compounds: {len(liver_df)}
Target range: [{liver_df['target'].min():.4f}, {liver_df['target'].max():.4f}]
Mean target: {liver_df['target'].mean():.4f}
Top 10 compounds by liver score:\n"""
for _, row in liver_df.nlargest(10, 'target').iterrows():
    liver_summary += f"  SMILES: {row['smiles'][:80]}... | score: {row['target']:.4f}\n"
docs.append(Document(page_content=liver_summary, metadata={'source': 'final_liver.csv', 'source_type': 'data'}))
print(f'  CSV: final_liver.csv ‚Üí {len(liver_df)} compounds')

# --- Building blocks summary (too large to embed all, summarize) ---
bb_df = pd.read_csv(f'{DATA_DIR}/lnp_data/filtered_building_blocks.csv')
bb_summary = f"""# Building Blocks Dataset
Total building blocks: {len(bb_df)}
Columns: {list(bb_df.columns)}
Sample SMILES (first 20):\n"""
for _, row in bb_df.head(20).iterrows():
    bb_summary += f"  {row['smiles']} (id={row['reagent_id']})\n"
docs.append(Document(page_content=bb_summary, metadata={'source': 'filtered_building_blocks.csv', 'source_type': 'data'}))
print(f'  CSV: filtered_building_blocks.csv ‚Üí {len(bb_df)} blocks (summary only)')

print(f'\nTotal documents: {len(docs)}')

  PDF: 2024_A Deep Generative Model for the Design of Synthesizable Ionizable Lipids.pdf ‚Üí 19 pages
  PDF: 2025_SyntheMol-RL- a flexible reinforcement learning framework for designing novel and synthesizable antibiotics.pdf ‚Üí 63 pages
  PDF: 2024_SyntheMol-RL a flexible reinforcement learning framework for designing novel and synthesizable antibiotics.pdf ‚Üí 63 pages
  PDF: rule_for_lnp_design.pdf ‚Üí 3 pages
  MD:  LNP_DESIGN_RULES.md ‚Üí 4711 chars
  MD:  paper_summary.md ‚Üí 2866 chars
  MD:  synthmol_rl.md ‚Üí 659 chars
  PY:  lnp_reaction.py ‚Üí 4604 chars
  CSV: final_liver.csv ‚Üí 292 compounds
  CSV: filtered_building_blocks.csv ‚Üí 217279 blocks (summary only)

Total documents: 154


## 2. Build FAISS Vector Store

In [3]:
FAISS_INDEX_PATH = '../data/faiss_lnp_index'

splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = splitter.split_documents(docs)
print(f'Split into {len(chunks)} chunks')

embeddings = BedrockEmbeddings(model_id='amazon.titan-embed-text-v2:0', region_name=REGION)

if os.path.exists(FAISS_INDEX_PATH):
    print('Loading existing FAISS index...')
    vectorstore = FAISS.load_local(FAISS_INDEX_PATH, embeddings, allow_dangerous_deserialization=True)
else:
    print('Building FAISS index (this may take a minute)...')
    vectorstore = FAISS.from_documents(chunks, embeddings)
    vectorstore.save_local(FAISS_INDEX_PATH)
    print(f'Saved to {FAISS_INDEX_PATH}')

retriever = vectorstore.as_retriever(search_kwargs={'k': 6})
print(f'‚úì FAISS ready ‚Äî {len(chunks)} vectors')

Split into 457 chunks


Loading existing FAISS index...
‚úì FAISS ready ‚Äî 457 vectors


## 3. Define Agent State & LLM

In [4]:
class ReactomeState(TypedDict):
    query: str
    retrieved_context: str
    reaction_analysis: str
    design_rules_check: str
    synthesis_plan: str
    final_answer: str

llm = ChatBedrock(model_id=MODEL_ID, region_name=REGION)
print('‚úì State & LLM ready')

‚úì State & LLM ready


## 4. LangGraph Nodes

In [5]:
def retrieval_node(state: ReactomeState) -> dict:
    """RAG retrieval from FAISS."""
    print('üîç Retrieving relevant documents...')
    docs = retriever.invoke(state['query'])
    context = '\n\n---\n\n'.join(
        f"[{d.metadata.get('source_type', 'unknown')}] {d.page_content}" for d in docs
    )
    print(f'  Retrieved {len(docs)} chunks')
    return {'retrieved_context': context}


def reaction_expert_node(state: ReactomeState) -> dict:
    """Analyze applicable reaction templates."""
    print('‚öóÔ∏è Reaction Expert analyzing...')
    prompt = f"""You are an expert in ionizable lipid synthesis reactions.
Using the following context about available reaction templates and research:

{state['retrieved_context']}

Question: {state['query']}

Analyze which reaction templates (SMARTS) are applicable. For each relevant reaction:
1. Reaction ID and type
2. Reactant requirements (functional groups)
3. Expected product structure
4. Any known issues or limitations
5. Recommended reaction conditions"""
    result = llm.invoke(prompt).content
    print('‚úì Reaction analysis done')
    return {'reaction_analysis': result}


def design_rules_node(state: ReactomeState) -> dict:
    """Check against LNP design rules."""
    print('üìè Checking design rules...')
    prompt = f"""You are an expert in ionizable lipid design rules for LNP formulations.
Using the following context:

{state['retrieved_context']}

Question: {state['query']}

Evaluate against LNP design constraints:
1. Tail configuration rules (2-4 tails, max 2 distinct types, symmetry preference)
2. MCTS tree structure compatibility
3. Head-tail compatibility
4. Synthesizability assessment
5. Any rule violations or warnings"""
    result = llm.invoke(prompt).content
    print('‚úì Design rules check done')
    return {'design_rules_check': result}


def synthesis_planner_node(state: ReactomeState) -> dict:
    """Plan synthesis route combining reactions + rules."""
    print('üß™ Planning synthesis route...')
    prompt = f"""You are a synthesis planning expert for ionizable lipids.

Reaction Analysis:
{state['reaction_analysis']}

Design Rules Check:
{state['design_rules_check']}

Original Question: {state['query']}

Provide a concrete synthesis plan:
1. Step-by-step synthesis route with specific reaction IDs
2. Building block selection criteria
3. Expected intermediate and final products
4. MCTS-compatible action sequence
5. Potential optimization points"""
    result = llm.invoke(prompt).content
    print('‚úì Synthesis plan done')
    return {'synthesis_plan': result}


def final_answer_node(state: ReactomeState) -> dict:
    """Synthesize all analyses into final answer."""
    print('üìã Generating final answer...')
    prompt = f"""Synthesize the following expert analyses into a clear, actionable answer.

Question: {state['query']}

Reaction Analysis:
{state['reaction_analysis']}

Design Rules Check:
{state['design_rules_check']}

Synthesis Plan:
{state['synthesis_plan']}

Provide a comprehensive answer covering:
1. Direct answer to the question
2. Recommended reaction templates with IDs
3. Design rule compliance summary
4. Actionable synthesis route
5. Caveats and next steps"""
    result = llm.invoke(prompt).content
    print('‚úì Final answer ready')
    return {'final_answer': result}

print('‚úì All nodes defined')

‚úì All nodes defined


## 5. Build LangGraph

In [6]:
workflow = StateGraph(ReactomeState)

workflow.add_node('retrieve', retrieval_node)
workflow.add_node('reaction_expert', reaction_expert_node)
workflow.add_node('design_rules', design_rules_node)
workflow.add_node('synthesis_planner', synthesis_planner_node)
workflow.add_node('final_answer', final_answer_node)

# Flow: retrieve ‚Üí [reaction_expert, design_rules] (parallel) ‚Üí synthesis_planner ‚Üí final_answer
workflow.add_edge(START, 'retrieve')
workflow.add_edge('retrieve', 'reaction_expert')
workflow.add_edge('retrieve', 'design_rules')
workflow.add_edge('reaction_expert', 'synthesis_planner')
workflow.add_edge('design_rules', 'synthesis_planner')
workflow.add_edge('synthesis_planner', 'final_answer')
workflow.add_edge('final_answer', END)

graph = workflow.compile()
print('‚úì LangGraph compiled')
print()
print('Graph flow:')
print('  START ‚Üí retrieve ‚Üí [reaction_expert ‚à• design_rules] ‚Üí synthesis_planner ‚Üí final_answer ‚Üí END')

‚úì LangGraph compiled

Graph flow:
  START ‚Üí retrieve ‚Üí [reaction_expert ‚à• design_rules] ‚Üí synthesis_planner ‚Üí final_answer ‚Üí END


## 6. Run the Agent

In [7]:
query = """I want to design a novel ionizable lipid with 3 tails using amine head groups.
Which reaction templates should I use, and what is the step-by-step synthesis route?"""

result = graph.invoke({
    'query': query,
    'retrieved_context': '',
    'reaction_analysis': '',
    'design_rules_check': '',
    'synthesis_plan': '',
    'final_answer': '',
})

print('\n' + '=' * 60)
print('AGENT COMPLETE')
print('=' * 60)

üîç Retrieving relevant documents...
  Retrieved 6 chunks
üìè Checking design rules...
‚öóÔ∏è Reaction Expert analyzing...
‚úì Reaction analysis done
‚úì Design rules check done
üß™ Planning synthesis route...
‚úì Synthesis plan done
üìã Generating final answer...
‚úì Final answer ready

AGENT COMPLETE


In [8]:
# Display results
sections = [
    ('‚öóÔ∏è Reaction Analysis', 'reaction_analysis'),
    ('üìè Design Rules Check', 'design_rules_check'),
    ('üß™ Synthesis Plan', 'synthesis_plan'),
]
for title, key in sections:
    print(f'\n{title}')
    print('-' * 50)
    print(result[key][:800] + '...' if len(result[key]) > 800 else result[key])
    print()


‚öóÔ∏è Reaction Analysis
--------------------------------------------------
# Analysis: Designing a Novel Ionizable Lipid with 3 Tails Using Amine Head Groups

Based on the research context provided, here's a comprehensive analysis for synthesizing a 3-tailed ionizable lipid with an amine head group:

## Overview of Synthesis Strategy

According to the papers, ionizable lipids are synthesized by **sequentially adding lipid tails to lipid heads**. For a 3-tailed lipid, you'll need:
- **1 ionizable lipid head** with amine functionality
- **3 lipid tails** with reactive functional groups
- **Sequential reactions** to attach each tail

## Key Design Constraints

### Head Group Requirements:
1. **Must contain 1-3 functional groups** (excluding the ionizable amine) to facilitate tail attachment
2. **Molecular weight**: ‚â§500 g/mol
3. **LogP**: <0 (hydrophilic preference)...


üìè Design Rules Check
--------------------------------------------------
# Analysis of Your Ionizable Lipid Desig

In [9]:
# Final Answer
print('üìã FINAL ANSWER')
print('=' * 60)
print(result['final_answer'])

üìã FINAL ANSWER
# Comprehensive Synthesis Plan for 3-Tailed Ionizable Lipid with Amine Head

## 1. DIRECT ANSWER

To design a novel ionizable lipid with 3 tails using amine head groups, you should:

**Use these reaction templates in sequence:**
1. **Esterification** (primary choice) - for attaching carboxylic acid tails to hydroxyl groups
2. **Amide formation** (alternative) - for attaching carboxylic acid tails to amine groups
3. **Epoxide ring opening** (for branching) - creates additional hydroxyl attachment points

**Synthesis approach:** Sequential tail addition starting from a multi-functional amine head (e.g., triethanolamine scaffold) with 3 reactive sites, adding one lipid tail at a time.

---

## 2. RECOMMENDED REACTION TEMPLATES

### **Template 1: ESTERIFICATION (Primary Recommendation)**

**Reaction ID:** Carboxylic Acid + Alcohol ‚Üí Ester

**SMARTS Pattern:**
```
[C:1](=O)[OH].[O:2][H] >> [C:1](=O)[O:2]
```

**Reactants:**
- **Head/Intermediate:** Hydroxyl group (-OH)
-

## 7. Interactive Query

In [10]:
# Try your own query
custom_query = "What are the issues with reaction 10012 (N-methylation) and 10017 (amine+aldehyde‚Üíamide)? How should I fix them?"

result2 = graph.invoke({
    'query': custom_query,
    'retrieved_context': '',
    'reaction_analysis': '',
    'design_rules_check': '',
    'synthesis_plan': '',
    'final_answer': '',
})

print('\nüìã ANSWER')
print('=' * 60)
print(result2['final_answer'])

üîç Retrieving relevant documents...
  Retrieved 6 chunks
üìè Checking design rules...
‚öóÔ∏è Reaction Expert analyzing...
‚úì Reaction analysis done
‚úì Design rules check done
üß™ Planning synthesis route...
‚úì Synthesis plan done
üìã Generating final answer...
‚úì Final answer ready

üìã ANSWER
# Comprehensive Fix for Reactions 10012 and 10017

## 1. DIRECT ANSWER

### **Reaction 10012 (N-methylation): INVALID LEAVING GROUP**
- **Problem**: `[*:4][CH3:5]` suggests a methyl group can transfer from any atom, but C-C bonds don't break spontaneously. This will match ANY methyl-containing compound (toluene, ethane, acetic acid), causing massive false positives in your search tree.
- **Root cause**: Missing electrophilic activation (no leaving group like I, Br, OTs)

### **Reaction 10017 (Amine + Aldehyde ‚Üí Amide): CHEMICALLY INCORRECT**
- **Problem**: This reaction doesn't exist in organic chemistry. Aldehydes react with primary amines to form **imines** (C=N bonds), not amides (

## Summary

**RAG Stack:**
- **Vector DB:** FAISS (local, no server, industry standard)
- **Embeddings:** Amazon Titan Embed Text v2 (via Bedrock)
- **LLM:** Claude Sonnet 4.5 (via Bedrock)

**Graph Flow:**
```
START ‚Üí RAG Retrieval ‚Üí [Reaction Expert ‚à• Design Rules] ‚Üí Synthesis Planner ‚Üí Final Answer ‚Üí END
```

**Data Sources:**
- Research papers (PDFs) on lipid generation & SyntheMol-RL
- LNP design rules (MCTS tree structure, tail constraints)
- 13 reaction templates (SMARTS) with known issues
- 217K building blocks, 293 liver-scored compounds