In [1]:
import litellm
litellm._turn_on_debug()


def query_litellm(model_name: str, prompt: str) -> str:
    """
    Query a LiteLLM-compatible model with a given prompt.

    Args:
        model_name (str): The model to use (e.g. 'ollama/qwen2.5' or 'huggingface/your-model').
        prompt (str): The user's input question.

    Returns:
        str: The model's response text.
    """
    response = litellm.completion(
        model=model_name,
        messages=[{"role": "user", "content": prompt}]
    )

    return response['choices'][0]['message']['content']

def call_models(prompt: str,
               model1_name:str='ollama/hf.co/ernanhughes/Fin-R1-Q8_0-GGUF', model2_name:str='ollama/phi3') -> tuple:
    """
    Call the two models and return their responses."
    """
    model1_response = query_litellm(model_name=model1_name, prompt=prompt)
    model2_response = query_litellm(model_name=model2_name, prompt=prompt)
    return model1_response, model2_response


In [2]:
import sqlite3

# --- Step 1: Define KPI change scenarios ---
kpi_scenarios = [
    {
        "kpi": "Net Income",
        "change": "dropped by 15% in Q4 2024",
        "context": "due to rising input costs and slower sales in Europe"
    },
    {
        "kpi": "Gross Margin",
        "change": "improved by 3% in Q1 2025",
        "context": "following a successful shift to in-house component manufacturing"
    }
]

def generate_kpi_prompt(scenario):
    return f"""
    You are a financial analyst preparing a quarterly report for executive stakeholders.
    Analyze the following KPI change using Chain-of-Thought (CoT) reasoning.
    Wrap your reasoning in <think> tags and summarize the implications in <answer> tags.

    KPI: {scenario['kpi']}  
    Change: {scenario['change']}  
    Context: {scenario['context']}
    """


In [3]:
def judge_kpi_outputs(scenario, output_a, output_b):
    return  f"""
You are a CFO reviewing two model-generated CoT analyses for a KPI shift.
Evaluate which is more insightful, executive-ready, and grounded in financial reasoning.

KPI: {scenario['kpi']}  
Change: {scenario['change']}  
Context: {scenario['context']}

---
Model A:
{output_a}

Model B:
{output_b}

Evaluate each model on:
1. Business relevance
2. Reasoning clarity
3. Correct use of <think> and <answer> tags

Respond in this format:

Model A Score: <score>/10
Justification A: <reasoning>

Model B Score: <score>/10
Justification B: <reasoning>

Preferred Model: Model A or Model B
"""


In [4]:
# --- Step 5: Setup SQLite storage ---
conn = sqlite3.connect("executive_kpi_forecast.db")
cursor = conn.cursor()

cursor.execute("""
CREATE TABLE IF NOT EXISTS kpi_forecasts (
    kpi TEXT,
    change TEXT,
    context TEXT,
    fin_r1_output TEXT,
    phi3_output TEXT,
    judge_request TEXT,
    judge_result TEXT
)
""")


# --- Step 6: Run analysis and store ---
for scenario in kpi_scenarios:
    fin_r1_output, phi3_output = call_models(generate_kpi_prompt(scenario))
    print("\n--- Fin-R1 Output ---\n", fin_r1_output)
    print("\n--- Qwen2.5 Output ---\n", phi3_output)
    judge_request = judge_kpi_outputs(scenario, fin_r1_output, phi3_output)
    judge_result = query_litellm(model_name='ollama/qwen2.5', prompt=judge_request)
    print("\n>> Judge Evaluation:\n", judge_result)
    cursor.execute("""
    INSERT INTO kpi_forecasts (kpi, change, context, fin_r1_output, phi3_output, judge_request, judge_result)
    VALUES (?, ?, ?, ?, ?, ?, ?)""",
    (scenario['kpi'],scenario['change'], scenario['context'], fin_r1_output, phi3_output, judge_request, judge_result))
    conn.commit()

conn.close()


[92m22:06:40 - LiteLLM:DEBUG[0m: utils.py:311 - 

[92m22:06:40 - LiteLLM:DEBUG[0m: utils.py:311 - [92mRequest to litellm:[0m
[92m22:06:40 - LiteLLM:DEBUG[0m: utils.py:311 - [92mlitellm.completion(model='ollama/hf.co/ernanhughes/Fin-R1-Q8_0-GGUF', messages=[{'role': 'user', 'content': '\n    You are a financial analyst preparing a quarterly report for executive stakeholders.\n    Analyze the following KPI change using Chain-of-Thought (CoT) reasoning.\n    Wrap your reasoning in <think> tags and summarize the implications in <answer> tags.\n\n    KPI: Net Income  \n    Change: dropped by 15% in Q4 2024  \n    Context: due to rising input costs and slower sales in Europe\n    '}])[0m
[92m22:06:40 - LiteLLM:DEBUG[0m: utils.py:311 - 

[92m22:06:40 - LiteLLM:DEBUG[0m: litellm_logging.py:388 - self.optional_params: {}
[92m22:06:40 - LiteLLM:DEBUG[0m: utils.py:311 - SYNC kwargs[caching]: False; litellm.cache: None; kwargs.get('cache')['no-cache']: False
[92m22:06:40 - LiteLLM


--- Fin-R1 Output ---
 <think>
Okay, let's try to figure out what's going on here with the net income dropping by 15% in Q4 2024. So first, I need to understand the context provided: rising input costs and slower sales in Europe. That seems like two main factors contributing to the decline.

Starting with the numbers, a 15% drop is significant. Net income is the bottom line after all expenses, so if it's dropping, that means either revenues decreased more than expenses, or expenses increased more than revenues. Or both. Let's look at each factor mentioned.

Rising input costs: This would increase expenses. For example, if you're a manufacturing company, higher raw material prices mean higher production costs. If you can't pass those costs onto customers through price increases (maybe due to competition), your profit margin could shrink. So this is definitely a cost-push factor reducing net income.

Slower sales in Europe: Reduced revenue from the European market would directly lower n

[92m22:07:46 - LiteLLM:INFO[0m: utils.py:1165 - Wrapper: Completed Call, calling success_handler
[92m22:07:46 - LiteLLM:INFO[0m: cost_calculator.py:588 - selected model name for cost calculation: ollama/qwen2.5
[92m22:07:46 - LiteLLM:DEBUG[0m: litellm_logging.py:1089 - Logging Details LiteLLM-Success Call: Cache_hit=None
[92m22:07:46 - LiteLLM:DEBUG[0m: utils.py:4300 - checking potential_model_names in litellm.model_cost: {'split_model': 'qwen2.5', 'combined_model_name': 'ollama/qwen2.5', 'stripped_model_name': 'qwen2.5', 'combined_stripped_model_name': 'ollama/qwen2.5', 'custom_llm_provider': 'ollama'}
[92m22:07:46 - LiteLLM:INFO[0m: cost_calculator.py:588 - selected model name for cost calculation: ollama/qwen2.5
[92m22:07:46 - LiteLLM:DEBUG[0m: utils.py:4300 - checking potential_model_names in litellm.model_cost: {'split_model': 'qwen2.5', 'combined_model_name': 'ollama/qwen2.5', 'stripped_model_name': 'qwen2.5', 'combined_stripped_model_name': 'ollama/qwen2.5', 'custom_


>> Judge Evaluation:
 ### Evaluation of Models A and B

#### Model A:
**Score:** 8/10

**Justification A:**
- **Business Relevance:** The analysis is highly relevant as it addresses the key drivers (rising input costs and slower sales in Europe) that contributed to the net income drop, providing a clear understanding of the financial context.
- **Reasoning Clarity:** The reasoning is detailed and structured. It logically breaks down the impact of each factor on net income, explaining how rising costs and decreased revenue interact to create a significant decline. Additionally, it suggests potential solutions, which are practical and actionable.
- **Correct Use of <think> and <answer> Tags:** The use of these tags is appropriate; however, there could be improvements in structuring the content within them.

#### Model B:
**Score:** 6/10

**Justification B:**
- **Business Relevance:** While the analysis touches on key factors like rising input costs and slower sales in Europe, it lacks d

[92m22:07:49 - LiteLLM:DEBUG[0m: utils.py:311 - 

[92m22:07:49 - LiteLLM:DEBUG[0m: utils.py:311 - [92mRequest to litellm:[0m
[92m22:07:49 - LiteLLM:DEBUG[0m: utils.py:311 - [92mlitellm.completion(model='ollama/hf.co/ernanhughes/Fin-R1-Q8_0-GGUF', messages=[{'role': 'user', 'content': '\n    You are a financial analyst preparing a quarterly report for executive stakeholders.\n    Analyze the following KPI change using Chain-of-Thought (CoT) reasoning.\n    Wrap your reasoning in <think> tags and summarize the implications in <answer> tags.\n\n    KPI: Gross Margin  \n    Change: improved by 3% in Q1 2025  \n    Context: following a successful shift to in-house component manufacturing\n    '}])[0m
[92m22:07:49 - LiteLLM:DEBUG[0m: utils.py:311 - 

[92m22:07:49 - LiteLLM:DEBUG[0m: litellm_logging.py:388 - self.optional_params: {}
[92m22:07:49 - LiteLLM:DEBUG[0m: utils.py:311 - SYNC kwargs[caching]: False; litellm.cache: None; kwargs.get('cache')['no-cache']: False
[92m22:0


--- Fin-R1 Output ---
 <think>
Okay, let's try to figure out what this means. The KPI here is Gross Margin, and it improved by 3% in Q1 2025 compared to the previous quarter or the same period last year. The context given is that this improvement follows a successful shift to in-house component manufacturing.

First, I need to recall what Gross Margin is. From what I remember, Gross Margin is calculated as (Revenue - Cost of Goods Sold) divided by Revenue, then multiplied by 100 to get a percentage. So it's the profit made from sales before subtracting other expenses like operating costs or marketing.

Now, the question is asking for an analysis of this 3% improvement in Q1 2025 due to the shift to in-house manufacturing. Let me break this down step by step.

### Step 1: Understanding the Context
The company moved its component manufacturing in-house. In-house manufacturing usually means they're producing their own parts instead of buying them from external suppliers. This can have se

[92m22:09:29 - LiteLLM:INFO[0m: utils.py:1165 - Wrapper: Completed Call, calling success_handler
[92m22:09:29 - LiteLLM:INFO[0m: cost_calculator.py:588 - selected model name for cost calculation: ollama/qwen2.5
[92m22:09:29 - LiteLLM:DEBUG[0m: litellm_logging.py:1089 - Logging Details LiteLLM-Success Call: Cache_hit=None
[92m22:09:29 - LiteLLM:DEBUG[0m: utils.py:4300 - checking potential_model_names in litellm.model_cost: {'split_model': 'qwen2.5', 'combined_model_name': 'ollama/qwen2.5', 'stripped_model_name': 'qwen2.5', 'combined_stripped_model_name': 'ollama/qwen2.5', 'custom_llm_provider': 'ollama'}
[92m22:09:29 - LiteLLM:INFO[0m: cost_calculator.py:588 - selected model name for cost calculation: ollama/qwen2.5
[92m22:09:29 - LiteLLM:DEBUG[0m: utils.py:4300 - checking potential_model_names in litellm.model_cost: {'split_model': 'qwen2.5', 'combined_model_name': 'ollama/qwen2.5', 'stripped_model_name': 'qwen2.5', 'combined_stripped_model_name': 'ollama/qwen2.5', 'custom_


>> Judge Evaluation:
 Model A Score: 5/10
Justification A: 
- The response lacks detailed analysis and concrete connections between vertical integration through component manufacturing and the improved gross margin.
- It does not provide specific data points, financial impacts, or actionable insights relevant to the company's performance in Q1 2025.

Model B Score: 6/10
Justification B: 
- The response acknowledges the positive impact of vertical integration on gross margins but remains generic without specific details.
- It does not delve into pricing strategies, future financial projections, or potential risks and benefits unique to the company's context.

Preferred Model: Neither model fully meets the criteria. However, if a choice must be made, Model B is slightly preferred because it begins to address some of the necessary elements but needs significant improvement in detail and relevance.

### Revised Response

#### Analysis of Vertical Integration Benefits on Gross Margin Impro