# Finch: Personal Injury Case Scoring Demo

This notebook demonstrates the key features of the Finch pipeline: extracting structured features from call transcripts, training and evaluating a logistic regression model, and scoring new leads.

## 1. Setup and Imports

In [None]:
import pandas as pd
import jsonlines
from lead_score import extract_key_elements_with_llm, score_lead, score_lead_with_model
from llm_logreg_features import process_llm_outputs, train_logistic_regression
import joblib
import os

## 2. Load Example Transcripts

We'll load a small sample of transcripts for demonstration.

In [None]:
# Adjust this path to your sample transcripts file
transcript_path = 'filtered_transcripts.jsonl'
sample_transcripts = []
with jsonlines.open(transcript_path) as reader:
    for i, obj in enumerate(reader):
        if i >= 3: break  # Only show a few for demo
        sample_transcripts.append(obj)
pd.DataFrame(sample_transcripts)

## 3. Extract Structured Features with LLM

We'll use the OpenAI LLM to extract structured fields from each transcript. (This requires a valid OpenAI API key in your config.)

In [None]:
llm_outputs = []
for record in sample_transcripts:
    llm_result_str = extract_key_elements_with_llm(record)
    # Remove markdown formatting if present
    import re, json
    cleaned = re.sub(r'^```json|```$', '', llm_result_str.strip(), flags=re.MULTILINE).strip()
    try:
        llm_result = json.loads(cleaned)
    except Exception:
    import ast
    try:
        llm_result = ast.literal_eval(cleaned)
    except Exception:
        llm_result = {'raw_output': llm_result_str}
    merged = dict(record)
    merged.update(llm_result)
    llm_outputs.append(merged)
pd.DataFrame(llm_outputs)

## 4. Convert LLM Outputs to ML Features

We convert the extracted fields into numeric features for model training.

In [None]:
# Write LLM outputs to a temporary JSONL for feature conversion
import tempfile
tmp_jsonl = tempfile.NamedTemporaryFile(mode='w+', delete=False, suffix='.jsonl')
for obj in llm_outputs:
    tmp_jsonl.write(json.dumps(obj) + '\n')
tmp_jsonl.close()
tmp_csv = tempfile.NamedTemporaryFile(mode='w+', delete=False, suffix='.csv')
process_llm_outputs(tmp_jsonl.name, tmp_csv.name, label_field='label')
tmp_csv.close()
features_df = pd.read_csv(tmp_csv.name)
features_df

## 5. Train Logistic Regression Model

Let's train a logistic regression model on these features (demo only; use more data for real training).

In [None]:
model = train_logistic_regression(features_df, label_col='label')
# Save model for later demo
joblib.dump(model, 'demo_logreg_model.joblib')

## 6. Score New Leads

We can now score new leads using the trained model.

In [None]:
scored = []
for record in llm_outputs:
    result = score_lead_with_model(record, 'demo_logreg_model.joblib')
    record.update(result)
    scored.append(record)
pd.DataFrame(scored)[['call_id', 'model_score', 'label']]

## 7. Visualize Results

Let's plot the model scores for a quick look.

In [None]:
import matplotlib.pyplot as plt
scores = [r['model_score'] for r in scored]
plt.hist(scores, bins=10)
plt.xlabel('Predicted Probability of Positive Outcome')
plt.ylabel('Number of Leads')
plt.title('Distribution of Model Scores')
plt.show()

---

This notebook demonstrates the core workflow of the Finch pipeline: LLM extraction, feature engineering, ML model training, and model-based scoring. For production, use larger datasets and proper train/test splits.