# General Tips
## Using virtual environments
**Step 1:** CD to desired directory and Create a Virtual Environment `python3 -m venv myenv`. (Run `py -3.13 -m venv myenv` for a specific version of python)

Check your python installed versions with `py -0` on Windows (`python3 --version` on Linux)

**Step 2:** Activate the Environment `source myenv/bin/activate` (on Linux) and `myenv\Scripts\activate` (on Windows).

**Step 3:** Install Any Needed Packages. e.g: `pip install requests pandas`. Or better to use `requirements.txt` file (`pip install -r requirements.txt`)

**Step 4:** List All Installed Packages using `pip list`

## Connecting the Jupyter Notebook to the vistual env
1. Make sure that myenv is activate (`myenv\Scripts\activate`)
2. Run this inside the virtual environment: `pip install ipykernel`
3. Still inside the environment: `python -m ipykernel install --user --name=myenv --display-name "Whatever Python Kernel Name"`
   
   --name=myenv: internal identifier for the kernel
   
   --display-name: name that shows up in VS Code kernel picker
4. Open VS Code and select the kernel

   At the top-right, click "Select Kernel".
   Look for “Whatever Python Kernel Name” — pick that.
5. If you don’t see it right away, try: Reloading VS Code, Or running Reload Window from Command Palette (Ctrl+Shift+P)

## Useful Commands
1. Use `py -0` to check which python installation we have on Windows

In [1]:
# Step 1: Load and Preview FinanceBench Data

from datasets import load_dataset
import pandas as pd

# Load the FinanceBench dataset (train split)
ds = load_dataset("PatronusAI/financebench", split="train")

# Display the number of records
print(f"Number of records: {len(ds)}")

# Convert first 3 records to a DataFrame for easy preview
df_preview = pd.DataFrame(ds[:3])
pd.set_option('display.max_colwidth', 300)
display(df_preview.T)  # Transposed for easier reading

# Optionally: check the structure of the 'evidence' field in the first record
print("Sample 'evidence' field:")
print(ds[0]['evidence'])

  from .autonotebook import tqdm as notebook_tqdm


Number of records: 150


Unnamed: 0,0,1,2
financebench_id,financebench_id_03029,financebench_id_04672,financebench_id_00499
company,3M,3M,3M
doc_name,3M_2018_10K,3M_2018_10K,3M_2022_10K
question_type,metrics-generated,metrics-generated,domain-relevant
question_reasoning,Information extraction,Information extraction,Logical reasoning (based on numerical reasoning)
domain_question_num,,,dg06
question,What is the FY2018 capital expenditure amount (in USD millions) for 3M? Give a response to the question by relying on the details shown in the cash flow statement.,Assume that you are a public equities analyst. Answer the following question by primarily using information that is shown in the balance sheet: what is the year end FY2018 net PPNE for 3M? Answer in USD billions.,Is 3M a capital-intensive business based on FY2022 data?
answer,$1577.00,$8.70,"No, the company is managing its CAPEX and Fixed Assets pretty efficiently, which is evident from below key metrics:\nCAPEX/Revenue Ratio: 5.1%\nFixed assets/Total Assets: 20%\nReturn on Assets= 12.4%"
justification,"The metric capital expenditures was directly extracted from the company 10K. The line item name, as seen in the 10K, was: Purchases of property, plant and equipment (PP&E).","The metric ppne, net was directly extracted from the company 10K. The line item name, as seen in the 10K, was: Property, plant and equipment â net.",CAPEX/Revenue\nFixed Assets/Total Assets\nROA=Net Income/Total Assets
dataset_subset_label,OPEN_SOURCE,OPEN_SOURCE,OPEN_SOURCE


Sample 'evidence' field:
[{'evidence_text': 'Table of Contents \n3M Company and Subsidiaries\nConsolidated Statement of Cash Flow s\nYears ended December 31\n \n(Millions)\n \n2018\n \n2017\n \n2016\n \nCash Flows from Operating Activities\n \n \n \n \n \n \n \nNet income including noncontrolling interest\n \n$\n5,363 \n$\n4,869 \n$\n5,058 \nAdjustments to reconcile net income including noncontrolling interest to net cash\nprovided by operating activities\n \n \n \n \n \n \n \nDepreciation and amortization\n \n \n1,488 \n \n1,544 \n \n1,474 \nCompany pension and postretirement contributions\n \n \n(370) \n \n(967) \n \n(383) \nCompany pension and postretirement expense\n \n \n410 \n \n334 \n \n250 \nStock-based compensation expense\n \n \n302 \n \n324 \n \n298 \nGain on sale of businesses\n \n \n(545) \n \n(586) \n \n(111) \nDeferred income taxes\n \n \n(57) \n \n107 \n \n 7 \nChanges in assets and liabilities\n \n \n \n \n \n \n \nAccounts receivable\n \n \n(305) \n \n(245) \n \n(313)

In [7]:
# Step 2: Normalize FinanceBench Data (using all evidence texts)

import pandas as pd

def extract_evidence_text(evidence_list):
    """Concatenate all evidence_text fields from the evidence list."""
    if evidence_list and isinstance(evidence_list, list):
        if len(evidence_list) > 1:
            print(f"containing {len(evidence_list)} evidence items")
        texts = [e.get("evidence_text", "") for e in evidence_list if "evidence_text" in e]
        return "\n\n".join(texts)
    return ""

# Build a normalized DataFrame
data = []
for record in ds:
    data.append({
        "financebench_id": record.get("financebench_id"),
        "company": record.get("company"),
        "doc_name": record.get("doc_name"),
        "question": record.get("question"),
        "answer": record.get("answer"),
        "evidence_text": extract_evidence_text(record.get("evidence", []))
    })

df = pd.DataFrame(data)

# Show a preview
print("Preview of normalized data:")
display(df.head(3).T)

containing 3 evidence items
containing 2 evidence items
containing 2 evidence items
containing 2 evidence items
containing 2 evidence items
containing 2 evidence items
containing 2 evidence items
containing 2 evidence items
containing 2 evidence items
containing 2 evidence items
containing 2 evidence items
containing 3 evidence items
containing 2 evidence items
containing 2 evidence items
containing 2 evidence items
containing 2 evidence items
containing 2 evidence items
containing 3 evidence items
containing 2 evidence items
containing 2 evidence items
containing 2 evidence items
containing 2 evidence items
containing 2 evidence items
containing 2 evidence items
containing 3 evidence items
containing 2 evidence items
containing 2 evidence items
containing 2 evidence items
containing 2 evidence items
containing 2 evidence items
containing 2 evidence items
containing 2 evidence items
containing 2 evidence items
containing 2 evidence items
containing 2 evidence items
Preview of normalize

Unnamed: 0,0,1,2
financebench_id,financebench_id_03029,financebench_id_04672,financebench_id_00499
company,3M,3M,3M
doc_name,3M_2018_10K,3M_2018_10K,3M_2022_10K
question,What is the FY2018 capital expenditure amount (in USD millions) for 3M? Give a response to the question by relying on the details shown in the cash flow statement.,Assume that you are a public equities analyst. Answer the following question by primarily using information that is shown in the balance sheet: what is the year end FY2018 net PPNE for 3M? Answer in USD billions.,Is 3M a capital-intensive business based on FY2022 data?
answer,$1577.00,$8.70,"No, the company is managing its CAPEX and Fixed Assets pretty efficiently, which is evident from below key metrics:\nCAPEX/Revenue Ratio: 5.1%\nFixed assets/Total Assets: 20%\nReturn on Assets= 12.4%"
evidence_text,"Table of Contents \n3M Company and Subsidiaries\nConsolidated Statement of Cash Flow s\nYears ended December 31\n \n(Millions)\n \n2018\n \n2017\n \n2016\n \nCash Flows from Operating Activities\n \n \n \n \n \n \n \nNet income including noncontrolling interest\n \n$\n5,363 \n$\n4,869 \n$\n5,058...","Table of Contents \n3M Company and Subsidiaries\nConsolidated Balance Shee t\nAt December 31\n \n \n \nDecember 31,\n \nDecember 31,\n \n(Dollars in millions, except per share amount)\n \n2018\n \n2017\n \nAssets\n \n \n \n \n \nCurrent assets\n \n \n \n \n \nCash and cash equivalents\n \n$\n2,8...","3M Company and Subsidiaries\nConsolidated Statement of Income\nYears ended December 31\n(Millions, except per share amounts)\n2022\n2021\n2020\nNet sales\n$\n34,229 $\n35,355 $\n32,184\n\n3M Company and Subsidiaries\nConsolidated Balance Sheet\nAt December 31\n(Dollars in millions, except per sh..."


In [42]:
# Step 3A: Configure the list of LLM models for Ollama (via LangChain)

LLM_MODELS = [
    "llama3:8b",
    "gemma:7b",
    "deepseek-r1:7b"
]

In [43]:
# Step 3B: LangChain Ollama LLM Setup
from langchain_community.llms import Ollama

# Build a dictionary of LLMs (keyed by model name)
llm_dict = {}
for model_name in LLM_MODELS:
    llm = Ollama(model=model_name)
    llm_dict[model_name] = llm

print(f"LLMs configured: {list(llm_dict.keys())}")

LLMs configured: ['llama3:8b', 'gemma:7b', 'deepseek-r1:7b']


In [44]:
from langchain.prompts import PromptTemplate

prompt_template = PromptTemplate(
    input_variables=["context", "question"],
    template=(
        "You are a financial assistant. Using only the information provided in the context below, "
        "answer the question as concisely and factually as possible. "
        "Do not provide explanations, reasoning, or analysis—only provide the answer itself.\n\n"
        "Context:\n{context}\n\n"
        "Question:\n{question}\n\n"
        "Answer:"
    )
)

# Example: formatting a prompt for the first row
test_prompt = prompt_template.format(
    context=df.iloc[0]["evidence_text"],
    question=df.iloc[0]["question"]
)
print(test_prompt)

You are a financial assistant. Using only the information provided in the context below, answer the question as concisely and factually as possible. Do not provide explanations, reasoning, or analysis—only provide the answer itself.

Context:
Table of Contents 
3M Company and Subsidiaries
Consolidated Statement of Cash Flow s
Years ended December 31
 
(Millions)
 
2018
 
2017
 
2016
 
Cash Flows from Operating Activities
 
 
 
 
 
 
 
Net income including noncontrolling interest
 
$
5,363 
$
4,869 
$
5,058 
Adjustments to reconcile net income including noncontrolling interest to net cash
provided by operating activities
 
 
 
 
 
 
 
Depreciation and amortization
 
 
1,488 
 
1,544 
 
1,474 
Company pension and postretirement contributions
 
 
(370) 
 
(967) 
 
(383) 
Company pension and postretirement expense
 
 
410 
 
334 
 
250 
Stock-based compensation expense
 
 
302 
 
324 
 
298 
Gain on sale of businesses
 
 
(545) 
 
(586) 
 
(111) 
Deferred income taxes
 
 
(57) 
 
107 
 
 7

In [45]:
# Step 4: Run each question/context through each LLM, collect and store results

from tqdm.auto import tqdm

num_samples = 5  # Max = 150
df_subset = df.head(num_samples)  # Creates a new DataFrame with the first N rows

results = []

for model_name, llm in llm_dict.items():
    print(f"Running inference with model: {model_name}")
    for idx, row in tqdm(df_subset.iterrows(), total=len(df_subset), desc=f"{model_name}"):
        # Format the prompt
        prompt = prompt_template.format(
            context=row["evidence_text"],
            question=row["question"]
        )
        # Call the LLM (LangChain wrapper)
        try:
            generated_answer = llm(prompt)
        except Exception as e:
            print(f"Error on row {idx}, model {model_name}: {e}")
            generated_answer = "ERROR: " + str(e)
        
        # Store results
        results.append({
            "financebench_id": row["financebench_id"],
            "company": row["company"],
            "doc_name": row["doc_name"],
            "question": row["question"],
            "evidence_text": row["evidence_text"],
            "reference_answer": row["answer"],
            "model": model_name,
            "generated_answer": generated_answer
        })

# Convert results to DataFrame
results_df = pd.DataFrame(results)
print("Preview of results:")
display(results_df.head(3).T)

# Save results for future evaluation
results_df.to_csv("financebench_llm_results.csv", index=False)
results_df.to_json("financebench_llm_results.json", orient="records", lines=True)

Running inference with model: llama3:8b


llama3:8b: 100%|██████████| 5/5 [00:26<00:00,  5.23s/it]


Running inference with model: gemma:7b


gemma:7b: 100%|██████████| 5/5 [00:41<00:00,  8.37s/it]


Running inference with model: deepseek-r1:7b


deepseek-r1:7b: 100%|██████████| 5/5 [04:29<00:00, 53.83s/it]

Preview of results:





Unnamed: 0,0,1,2
financebench_id,financebench_id_03029,financebench_id_04672,financebench_id_00499
company,3M,3M,3M
doc_name,3M_2018_10K,3M_2018_10K,3M_2022_10K
question,What is the FY2018 capital expenditure amount (in USD millions) for 3M? Give a response to the question by relying on the details shown in the cash flow statement.,Assume that you are a public equities analyst. Answer the following question by primarily using information that is shown in the balance sheet: what is the year end FY2018 net PPNE for 3M? Answer in USD billions.,Is 3M a capital-intensive business based on FY2022 data?
evidence_text,"Table of Contents \n3M Company and Subsidiaries\nConsolidated Statement of Cash Flow s\nYears ended December 31\n \n(Millions)\n \n2018\n \n2017\n \n2016\n \nCash Flows from Operating Activities\n \n \n \n \n \n \n \nNet income including noncontrolling interest\n \n$\n5,363 \n$\n4,869 \n$\n5,058...","Table of Contents \n3M Company and Subsidiaries\nConsolidated Balance Shee t\nAt December 31\n \n \n \nDecember 31,\n \nDecember 31,\n \n(Dollars in millions, except per share amount)\n \n2018\n \n2017\n \nAssets\n \n \n \n \n \nCurrent assets\n \n \n \n \n \nCash and cash equivalents\n \n$\n2,8...","3M Company and Subsidiaries\nConsolidated Statement of Income\nYears ended December 31\n(Millions, except per share amounts)\n2022\n2021\n2020\nNet sales\n$\n34,229 $\n35,355 $\n32,184\n\n3M Company and Subsidiaries\nConsolidated Balance Sheet\nAt December 31\n(Dollars in millions, except per sh..."
reference_answer,$1577.00,$8.70,"No, the company is managing its CAPEX and Fixed Assets pretty efficiently, which is evident from below key metrics:\nCAPEX/Revenue Ratio: 5.1%\nFixed assets/Total Assets: 20%\nReturn on Assets= 12.4%"
model,llama3:8b,llama3:8b,llama3:8b
generated_answer,"According to the cash flow statement, the FY2018 capital expenditure amount for 3M is:\n\n$(1,577)\n\nin USD millions. This is listed under ""Purchases of property, plant and equipment (PP&E)"" in the Cash Flows from Investing Activities section.","The year-end FY2018 net PPNE (Property, Plant and Equipment) for 3M is $0.874 billion.",Yes.


In [46]:
# Step 5: Manual Inspection with Pandas

# A. Load Results (if needed)
import pandas as pd

# Load CSV results
results_df = pd.read_csv("financebench_llm_results.csv")

# OR, load JSON if you prefer
# results_df = pd.read_json("financebench_llm_results.json", lines=True)

# Assume results_df is loaded and has columns:
# 'financebench_id', 'question', 'reference_answer', 'model', 'generated_answer'

# Pivot so each row is a question, and each model's answer is a column
pivot_df = results_df.pivot_table(
    index=['financebench_id', 'question', 'reference_answer'],
    columns='model',
    values='generated_answer',
    aggfunc='first'  # In case of duplicates
).reset_index()

# Optional: Rename model columns for clarity (remove multiindex if it appears)
pivot_df.columns.name = None

# Show more rows, and wider columns for easier reading
pd.set_option('display.max_colwidth', 300)
display(pivot_df)


Unnamed: 0,financebench_id,question,reference_answer,deepseek-r1:7b,gemma:7b,llama3:8b
0,financebench_id_00499,Is 3M a capital-intensive business based on FY2022 data?,"No, the company is managing its CAPEX and Fixed Assets pretty efficiently, which is evident from below key metrics:\nCAPEX/Revenue Ratio: 5.1%\nFixed assets/Total Assets: 20%\nReturn on Assets= 12.4%","<think>\nAlright, so I need to determine if 3M is a capital-intensive business based on the 2022 financial data provided. Let me break this down step by step.\n\nFirst, what's a capital-intensive business? From what I remember, it refers to a company that requires a large amount of physical asse...","Yes, 3M is a capital-intensive business based on FY2022 data, as evidenced by its high property, plant and equipment investments and low cash flow from investing activities.",Yes.
1,financebench_id_01226,"What drove operating margin change as of FY2022 for 3M? If operating margin is not a useful metric for a company like this, then please state that and explain why.","Operating Margin for 3M in FY2022 has decreased by 1.7% primarily due to: \n-Decrease in gross Margin\n-mostly one-off charges including Combat Arms Earplugs litigation, impairment related to exiting PFAS manufacturing, costs related to exiting Russia and divestiture-related restructuring\ncharges","<think>\nOkay, I need to figure out what drove the change in operating margin for 3M in FY2022 based on the provided context. The question also asks if operating margin isn't a useful metric and why.\n\nFirst, looking at the context given. It mentions that SG&A (SG& A is selling, general, and ad...","The provided text does not contain information regarding operating margin for FY2022, so I am unable to answer the question.",The significant litigation costs primarily related to Combat Arms Earplugs litigation.
2,financebench_id_01865,"If we exclude the impact of M&A, which segment has dragged down 3M's overall growth in 2022?",The consumer segment shrunk by 0.9% organically.,"<think>\nOkay, so I need to figure out which business segment is dragging down 3M's overall sales growth for 2022 when excluding M&A. Let me look at the context provided.\n\nFirst, I see that the question mentions excluding the impact of M&A. The data table shows different percentages under each...","Consumer segment has dragged down 3M's overall growth in 2022, excluding the impact of M&A.",Consumer
3,financebench_id_03029,What is the FY2018 capital expenditure amount (in USD millions) for 3M? Give a response to the question by relying on the details shown in the cash flow statement.,$1577.00,"<think>\nOkay, so I need to find the FY2018 capital expenditure amount for 3M using the provided cash flow statements. Let me look through the context given.\n\nFirst, I remember that capital expenditures are typically recorded under the investing activities section of a cash flow statement. In ...","The FY2018 capital expenditure amount for 3M is (1,577) million USD.","According to the cash flow statement, the FY2018 capital expenditure amount for 3M is:\n\n$(1,577)\n\nin USD millions. This is listed under ""Purchases of property, plant and equipment (PP&E)"" in the Cash Flows from Investing Activities section."
4,financebench_id_04672,Assume that you are a public equities analyst. Answer the following question by primarily using information that is shown in the balance sheet: what is the year end FY2018 net PPNE for 3M? Answer in USD billions.,$8.70,"<think>\nOkay, so I need to figure out the year-end FY2018 net property, plant, and equipment (PPNE) for 3M based on the provided balance sheet. Let me break this down step by step.\n\nFirst, looking at the table, the total assets are listed as $36,500 million. Then, under Property, Plant, and E...","The provided text does not contain information regarding the year end FY2018 net PPNE for 3M, so I am unable to answer this question from the given context.","The year-end FY2018 net PPNE (Property, Plant and Equipment) for 3M is $0.874 billion."
