# **Run Completion System**

Generate responses from the Completion system for synthetic test questions. This notebook evaluates the **generation component** by providing fixed context (no retrieval) to test the LLM's ability to answer questions accurately.

### Key Steps:
1. **Load Test Data:** Read synthetic questions with their associated chunks
2. **Define Completion System:** Set up the LLM with instructions for answering from context
3. **Generate Responses:** Run all questions through the system with fixed context (main + 5 similar chunks)
4. **Save Results:** Store responses for groundedness evaluation

**Important:** This uses hardcoded chunks from generation (no retrieval), isolating LLM performance from search quality.

## **Load Test Data**

In [1]:
import json
import pandas as pd

dataset="synthetic_samples"

fpath=f'data/{dataset}.csv'
data = pd.read_csv(fpath)

# Deserialize similar_chunks from JSON string
data['similar_chunks'] = data['similar_chunks'].apply(json.loads)

print(f"Total samples: {len(data)}")
print(f"Grounded: {data['is_grounded'].sum()}")
print(f"Not-grounded: {(~data['is_grounded']).sum()}")

data.head(1)

Total samples: 172
Grounded: 88
Not-grounded: 84


Unnamed: 0,synthetic_question,explanation,synthetic_response,chunk_id,synthetic_chunk_id,is_grounded,main_chunk,similar_chunks,domain,difficulty,tone,language,question_length,synthetic_question_embedding
0,Does CompactCook Camping Stove come with a man...,"The provided context details product features,...",I don't have information about a manufacturer ...,c34dafd3cd3a_aHR0cHM6Ly9zdG9yYWdlcG92ZWwuYmxvY...,c34dafd3cd3a_aHR0cHM6Ly9zdG9yYWdlcG92ZWwuYmxvY...,False,# Information about product item_number: 20\nC...,[# Information about product item_number: 13\n...,Related to Customer,Intermediate,Neutral,English,11,"[-0.025468595325946808, -0.01461399719119072, ..."


In [2]:
import os
from dotenv import load_dotenv
load_dotenv(".env", override=True)

# AOAI
chatModel = "gpt-4o-mini" # os.getenv("chatModelMini")
aoai_version = os.getenv("AOAI_API_VERSION")
aoai_endpoint = os.getenv("AOAI_ENDPOINT")
aoai_key = os.getenv("FOUNDRY_KEY")

## **Define Completion System**

**Important:** This uses the exact chunks from generation (main + similar chunks).  
No retrieval is performed - we test the LLM's ability to work with fixed context.

In [3]:
from utils.llm import invoke_llm

SYSTEM_PROMPT = """You are a helpful assistant that answers questions based on the provided context.


Instructions:
- If the context contains information relevant to the question, provide a clear and accurate answer based ONLY on that context.
- If the context does NOT contain enough information to answer the question, respond with: "I don't have enough information in the provided context to answer this question."
- Do not make up information or use knowledge outside the provided context.
- Be concise and direct in your responses.
- Always respond in the same language as the most recent question."""

def generate_response(row, chatModel):
    question = row['synthetic_question']
    main_chunk = row['main_chunk']
    similar_chunks = row['similar_chunks']  # Already deserialized as list
    
    # Reconstruct full context (main + similar chunks)
    context_parts = [f"MAIN CONTEXT:\n{main_chunk}"]
    for i, chunk in enumerate(similar_chunks, 1):
        context_parts.append(f"ADDITIONAL CONTEXT {i}:\n{chunk}")
    
    full_context = "\n\n".join(context_parts)
    
    # Create user message with context and question
    user_message = f"""
    Context:
    {full_context}

    Question: {question}

    Answer:"""
    
    try:
        response = invoke_llm(
            system=SYSTEM_PROMPT,
            user=user_message,
            model=chatModel
        )
        return response
    except Exception as e:
        print(f"❌ Error generating response: {e}")
        return f"ERROR: {str(e)}"

### **Generate Responses**
Parallel processing with 10 workers to generate completion responses for all test questions.

In [4]:
from concurrent.futures import ThreadPoolExecutor
from tqdm import tqdm

# Initialize the new column
data['completion_response'] = None

print(f"🚀 Generating completion responses for {len(data)} questions...")

# Parallel processing with progress bar
with ThreadPoolExecutor(max_workers=10) as executor:
    futures = []
    for idx, row in data.iterrows():
        future = executor.submit(generate_response, row, chatModel)
        futures.append((future, idx))
    
    with tqdm(total=len(data), desc="Generating completion responses") as pbar:
        for future, idx in futures:
            try:
                response = future.result()
                data.at[idx, 'completion_response'] = response
            except Exception as e:
                print(f"\n❌ Error at index {idx}: {e}")
                data.at[idx, 'completion_response'] = f"ERROR: {str(e)}"
            pbar.update(1)

print("✅ All responses generated!")

🚀 Generating completion responses for 172 questions...


Generating completion responses: 100%|██████████| 172/172 [01:07<00:00,  2.53it/s]

✅ All responses generated!





In [5]:
# Quick analysis of generated responses
print("\n📊 Response Statistics:")
print(f"   Total responses: {len(data)}")
print(f"   Errors: {data['completion_response'].str.startswith('ERROR').sum()}")


📊 Response Statistics:
   Total responses: 172
   Errors: 0


## **Save Results**

In [6]:
# Save results to new CSV
# Re-serialize similar_chunks to JSON before saving
data['similar_chunks'] = data['similar_chunks'].apply(json.dumps)

output_path = f'data/{dataset}_with_completion_responses_{chatModel}.csv'
data.to_csv(output_path, index=False)
print(f"\n💾 Saved results to: {output_path}")
print(f"\n📋 Final columns: {list(data.columns)}")


💾 Saved results to: data/synthetic_samples_with_completion_responses_gpt-4o-mini.csv

📋 Final columns: ['synthetic_question', 'explanation', 'synthetic_response', 'chunk_id', 'synthetic_chunk_id', 'is_grounded', 'main_chunk', 'similar_chunks', 'domain', 'difficulty', 'tone', 'language', 'question_length', 'synthetic_question_embedding', 'completion_response']
