# Framework Demo: Single Question Solving



## üìñ Overview



This notebook demonstrates the **Adaptive LLM-Symbolic Reasoning Framework** solving a TREC clinical trial matching problem.



### Framework Capabilities



- **Automatic Problem Type Identification:** Detects reasoning problem type (SAT, CSP, FOL, LP, etc.)

- **Dynamic Solver Selection:** Chooses appropriate solver based on problem type

- **Hybrid Reasoning:** Combines LLM analysis with formal symbolic solvers

- **Dual Model Support:** Works with Azure OpenAI (gpt-4o, gemini, ...) or local models (Qwen2.5-Coder-7B, ...)



### Available Solvers



| Solver | Problem Type |
|--------|--------------|
| SMT | Satisfiability Modulo Theories |
| LP | Logic Programming |
| FOL | First-order Logics |
| CSP | Constraint Satisfaction Problems |


---



**‚ö†Ô∏è Prerequisites:** Configure `config.yaml` with API keys and model settings before running.

### Processing Workflow
<img src="./main_pic_single.png"  width="800">

## Step 1: Load Problem Sample

Load a clinical trial matching problem from the TREC dataset.

In [13]:
import json
import yaml
import warnings
warnings.filterwarnings("ignore")

# Load configuration
with open('config.yaml', 'r') as file:
    config = yaml.safe_load(file)

# Load a sample from TREC dataset
data_path = config['data_dir']['trec']
with open(data_path, 'r') as f:
    trec_data = json.load(f)

# Select the first sample (you can modify the index to try other samples)
sample = trec_data[0]
if not isinstance(sample['answer'], list):
    sample['answer'] = [sample['answer']]

print(f"Sample ID: {sample['id']}")
print(f"Problem Type: {sample['type']}")
print(f"\nProblem Description:")
print(sample['problem'])
print(f"\nGround Truth Answer: {sample['answer'][0]}")

Sample ID: SMT_0
Problem Type: Analytic Reasoning

Problem Description:
You get a trial and a patient and have to say if there is a match:

TRIAL: Inclusion Criteria:

* Adult patients (age \>18 years) with acute pancreatitis according to the revised Atlanta criteria;12
* Informed consent;
* Known time of debut of symptoms.

Exclusion Criteria:

* Chronic pancreatitis;
* Pregnancy;
* Known malignant disease;
* More than 72 hours from debut of symptoms to inclusion.

PATIENT: The patient is a 57-year-old man with abdominal pain and vomiting. The pain started gradually about 20 hours ago in the epigastric and periumbilical regions, radiating to his back. He drinks around 60 units of alcohol per week and smokes 22 cigarettes per day. He is healthy with no history of allergies or using any medications. His family history is positive for type 2 diabetes (his father and sister). He lives alone and has no children. The abdomen is tender and soft. His bowel sounds are normal. His heart rate is

---

## Option A: Reasoning with Azure OpenAI (gpt-4o)

This section demonstrates using Azure OpenAI API for reasoning.

### Stage 1: Configure LLM

Initialize the Azure OpenAI client with credentials from `config.yaml`.

In [14]:
from agents.generation.api import AzureOpenAIGenerator

# Initialize Azure OpenAI (using gpt-4o-azure config from config.yaml)
azure_config = config['api_config']['gpt-4o-azure']
llm = AzureOpenAIGenerator(
    model_name=azure_config['model_name'],
    api_key=azure_config['api_key'],
    model_version=azure_config['openai_api_version'],
    azure_endpoint=azure_config['azure_endpoint']
)

print("‚úì Azure OpenAI (gpt-4o) initialized successfully")

‚úì Azure OpenAI (gpt-4o) initialized successfully


### Stage 2: Create Reasoning Plan

The **router** analyzes the problem and creates an execution plan:
- Identifies the problem type (e.g., SAT, CSP, FOL)
- Selects appropriate solver(s) from the portfolio
- Constructs a DAG (Directed Acyclic Graph) showing the execution workflow

In [15]:
from agents.meta_agents.planner import Planner

# Initialize Carnap router with the LLM
router = Planner(generator=llm)
print("‚úì Carnap router initialized")

# Router creates execution plan by analyzing the problem
plan, memory, problem_ids = router(sample)
plan = plan[0]

print("‚úì Plan created successfully")
print(f"\nIdentified problem type(s): {[memory.read(f'problem_type_{pid}') for pid in problem_ids]}")
print(f"Agents in plan: {list(plan.agents.keys())}")

2025-10-27 19:54:32,460 | INFO | Registered agent '<PLAN_START>'
2025-10-27 19:54:32,461 | INFO | Registered agent '<PLAN_END>'
2025-10-27 19:54:32,462 | INFO | Registered agent 'lp_solver'
2025-10-27 19:54:32,462 | INFO | Registered agent 'fol_solver'
2025-10-27 19:54:32,463 | INFO | Registered agent 'csp_solver'
2025-10-27 19:54:32,463 | INFO | Registered agent 'smt_solver'
2025-10-27 19:54:32,464 | INFO | Registered agent 'ilp_solver'
2025-10-27 19:54:32,464 | INFO | Registered agent 'epistemic_solver'
2025-10-27 19:54:32,464 | INFO | Registered agent 'risk_solver'
2025-10-27 19:54:32,465 | INFO | Registered agent 'compositional_solver'
2025-10-27 19:54:32,465 | INFO | Registered agent 'causal_solver'


‚úì Carnap router initialized


2025-10-27 19:54:35,156 | INFO | HTTP Request: POST https://lunarchatgpt.openai.azure.com/openai/deployments/lunar-chatgpt-4o/chat/completions?api-version=2024-02-01 "HTTP/1.1 200 OK"
2025-10-27 19:54:35,161 | INFO | Scratchpad WRITE problem_type_ques_1=SAT (ttl=None)
2025-10-27 19:54:35,161 | INFO | Scratchpad WRITE trial_description_ques_1=Inclusion Criteria:

* Adult patients (age >18 years) with a (ttl=None)
2025-10-27 19:54:35,162 | INFO | Scratchpad WRITE sample_description_ques_1=The patient is a 57-year-old man with abdominal pain and vom (ttl=None)
2025-10-27 19:54:35,162 | INFO | Scratchpad WRITE options_ques_1=A) True
B) False (ttl=None)
2025-10-27 19:54:35,162 | INFO | Scratchpad WRITE smt_input_ques_1={'problem': 'Trial: Inclusion Criteria:\n\n* Adult patients  (ttl=None)
2025-10-27 19:54:35,163 | INFO | Scratchpad WRITE description_ques_1=For a Boolean Satisfiability (SAT) problem, verify whether a (ttl=None)
2025-10-27 19:54:35,163 | INFO | Scratchpad WRITE GOAL=Solve th

‚úì Plan created successfully

Identified problem type(s): ['SAT']
Agents in plan: ['<PLAN_START>', 'ques_1', 'smt_solver', '<PLAN_END>']


### Stage 3: Execute Plan and Show Intermediate Results



Execute the reasoning plan using the selected solvers.



**Plan Structure (DAG)** shows the execution flow:

- `<PLAN_START> ‚Üí ques_1`: Problem enters the pipeline

- `ques_1 ‚Üí solver`: Question is passed to the appropriate solver (e.g., smt_solver)

- `solver ‚Üí <PLAN_END>`: Results flow to completion

In [16]:
from agents.meta_agents.planner import TracePersister

# Execute the reasoning plan
print("Executing reasoning plan...\n")
plan.execute(memory, TracePersister())
print("\n‚úì Reasoning completed")

# Display plan structure (DAG)
print("\nPlan Structure (DAG):")
for edge in plan.edges:
    print(f"  {edge.source} -> {edge.target}")

2025-10-27 19:54:47,171 | INFO | Execute plan with topological order...
2025-10-27 19:54:47,172 | INFO | Trace saved: <PLAN_START> ‚Üí None
2025-10-27 19:54:47,172 | INFO | Trace saved: ques_1 ‚Üí None
2025-10-27 19:54:47,173 | INFO | Scratchpad WRITE smt_problem_queue_smt_solver=[] (ttl=None)


Executing reasoning plan...



2025-10-27 19:54:51,320 | INFO | HTTP Request: POST https://lunarchatgpt.openai.azure.com/openai/deployments/lunar-chatgpt-4o/chat/completions?api-version=2024-02-01 "HTTP/1.1 200 OK"
2025-10-27 19:54:51,322 | INFO | Attempt 1/1
2025-10-27 19:54:51,323 | INFO | Running Z3 command: z3 -smt2 /tmp/tmp_p2gjszn.smt2
2025-10-27 19:54:51,337 | INFO | Z3 ran successfully. Found 1 solutions
2025-10-27 19:54:51,337 | INFO | Scratchpad WRITE smt_results_queue=[{'ori_answer': {}, 'parsed_answer': 'A', 'success': True, ' (ttl=None)
2025-10-27 19:54:51,338 | INFO | Scratchpad WRITE result_ques_1={'ori_answer': {}, 'parsed_answer': 'A', 'success': True, 'e (ttl=None)
2025-10-27 19:54:51,338 | INFO | Trace saved: smt_solver ‚Üí {'ori_answer': {}, 'parsed_answer': 'A', 'success': True, 'error': None, 'is_satisfiable': 'A', 'code': '; Declare variables and their types\n(declare-const age Int)\n(declare-const has_acute_pancreatitis Bool)\n(declare-const informed_consent Bool)\n(declare-const known_time_o


‚úì Reasoning completed

Plan Structure (DAG):
  <PLAN_START> -> ques_1
  ques_1 -> smt_solver
  smt_solver -> <PLAN_END>


### Stage 4: Extract and Evaluate Results



Extract predictions from the reasoning process and compare with ground truth.

In [18]:
# Extract results from memory
predictions = []
for pid in problem_ids:
    result = memory.read(f"result_{pid}") or memory.read(f"result_no_problem") or {}
    predictions.append(result.get('parsed_answer', None))

# Evaluate results
is_correct = all(pred == gt for pred, gt in zip(predictions, sample['answer']))

# Display results
print("=" * 60)
print("Result Comparison")
print("=" * 60)
print(f"Ground Truth: {sample['answer']}")
print(f"Prediction:   {predictions}")
print(f"\nEvaluation: {'‚úÖ Correct' if is_correct else '‚ùå Incorrect'}")
print("=" * 60)

Result Comparison
Ground Truth: ['A']
Prediction:   ['A']

Evaluation: ‚úÖ Correct


---

## Option B: Reasoning with Local Model (Qwen2.5-Coder-7B)

This section uses a **local open-source model** (Qwen2.5-Coder-7B-Instruct) instead of Azure OpenAI.

### Key Difference: LLM Initialization

The main difference from Option A is how we initialize the LLM:
- **Option A:** Uses `AzureOpenAIGenerator` to connect to Azure OpenAI API
- **Option B:** Uses `LocalGenerator` to load a local model with transformers

All other steps (routing, plan creation, execution, result extraction) follow the same process as Option A.

---

**Note:** First run includes model loading time. GPU acceleration recommended for better performance.

In [None]:
from agents.generation.local import LocalGenerator
from agents.meta_agents.planner import TracePersister, Planner

# Initialize local model (using qwen2.5-coder-7b config from config.yaml)
local_config = config['api_config']['qwen2.5-coder-7b']
llm_local = LocalGenerator(
    model_name=local_config['model_name'],
    api_key=local_config['api_key'],
    lora_path=local_config.get('lora_path', None)
)

# Initialize Carnap router
router_local = Planner(generator=llm_local)
print("‚úì Carnap router initialized with Local Model (qwen2.5-coder-7b)")

# Router creates execution plan
plan_local, memory_local, problem_ids_local = router_local(sample)
plan_local = plan_local[0]

# Execute reasoning plan
print("\nExecuting reasoning...")
plan_local.execute(memory_local, TracePersister())

# Extract results
predictions_local = []
for pid in problem_ids_local:
    result = memory_local.read(f"result_{pid}") or memory_local.read(f"result_no_problem") or {}
    predictions_local.append(result.get('parsed_answer', None))

print("‚úì Reasoning completed")

# Evaluate results
is_correct_local = all(pred == gt for pred, gt in zip(predictions_local, sample['answer']))

# Get predicted problem types
problem_types_local = [memory_local.read(f'problem_type_{pid}') for pid in problem_ids_local]
print(f"\nPredicted problem type(s): {problem_types_local}")
print(f"Agents used: {list(plan_local.agents.keys())}")

# Display plan structure
print(f"\nPlan Structure (DAG):")
for edge in plan_local.edges:
    print(f"  {edge.source} -> {edge.target}")

# Display results
print("\n" + "=" * 60)
print("Result Comparison")
print("=" * 60)
print(f"Ground Truth: {sample['answer']}")
print(f"Prediction:   {predictions_local}")
print(f"\nEvaluation: {'‚úÖ Correct' if is_correct_local else '‚ùå Incorrect'}")
print("=" * 60)