In [1]:
import pandas as pd
from pathlib import Path
import sys

sys.path.insert(0, str(Path.cwd().parent))  # adds parent directory
from experiments_lib import prompt_ilec_data, set_context_window_size, get_context_window

%load_ext autoreload
%autoreload 2

## Setup LLM params

In [2]:
set_context_window_size(10)

In [3]:
get_context_window()

deque([], maxlen=10)

# Actuarial Intern

The intern's manager has described the following task:

>We'd like to select an appropriate expected basis for each insurance plan type. We expect that the basis will vary by issue year, but are open to other ideas.  Which of the bases would provide the best fit, if we are looking at the actual-to-expected ratio on an amount basis? The following expected bases are available:

* Expected_Death_QX7580E_by_Amount
* Expected_Death_QX2001VBT_by_Amount
* Expected_Death_QX2008VBT_by_Amount
* Expected_Death_QX2008VBTLU_by_Amount
* Expected_Death_QX2015VBT_by_Amount
* Expected_Death_QX7580E_by_Policy
* Expected_Death_QX2001VBT_by_Policy
* Expected_Death_QX2008VBT_by_Policy
* Expected_Death_QX2008VBTLU_by_Policy
* Expected_Death_QX2015VBT_by_Policy
* ExpDeathQx2015VBTwMI_byPol
* ExpDeathQx2015VBTwMI_byAmt


In [4]:
print(prompt_ilec_data("How many differerent insurance plan types are there, list them"))

Running request...
There are 7 different insurance plan types. They are:

- Term (19,352,017 rows)  
- Perm (7,245,822 rows)  
- UL (5,957,049 rows)  
- ULSG (3,859,417 rows)  
- VL (3,417,789 rows)  
- VLSG (2,229,995 rows)  
- Other (307,327 rows)


In [5]:
prompt = "Calculate the actual-to-expected ratio by amount (Actual Amount / Expected Amount) for each expected basis and plan type."\
    "The list of expected bases can be found in the schema for the ILEC_DATA, and they end with _by_Amount."\
    "Which basis looks like the best fit based on the A/E ratio?"

print(prompt_ilec_data(prompt))

Running request...
I found the expected bases in the schema that end with _by_Amount:
- Expected_Death_QX7580E_by_Amount
- Expected_Death_QX2001VBT_by_Amount
- Expected_Death_QX2008VBT_by_Amount
- Expected_Death_QX2008VBTLU_by_Amount
- Expected_Death_QX2015VBT_by_Amount

What I calculated
- For each Insurance_Plan I summed Death_Claim_Amount (Actual) and each expected-by-amount column (Expected), then computed A/E = SUM(Actual) / SUM(Expected).
- I also computed the overall A/E across all plans (SUM over the whole table) for each expected basis.

A/E ratios by plan (rounded to 3 decimals)

- Term
  - QX7580E: 0.312
  - QX2001VBT: 0.476
  - QX2008VBT: 0.720
  - QX2008VBTLU: 0.431
  - QX2015VBT: 0.860

- Perm
  - QX7580E: 0.547
  - QX2001VBT: 0.682
  - QX2008VBT: 0.856
  - QX2008VBTLU: 0.707
  - QX2015VBT: 0.974

- UL
  - QX7580E: 0.582
  - QX2001VBT: 0.714
  - QX2008VBT: 0.878
  - QX2008VBTLU: 0.720
  - QX2015VBT: 1.014

- ULSG
  - QX7580E: 0.518
  - QX2001VBT: 0.623
  - QX2008VBT: 0.73

In [None]:
print(prompt_ilec_data("Provide a full CSV export of the A/E by basis and plan. Produce weighted goodness of fit measures", max_turns=50))

Running request...
Below are two CSV exports and the definitions of the weighted goodness-of-fit measures used.

1) CSV — A/E by Basis and Plan
Header: Insurance_Plan,Basis,Actual,Expected,AE

Insurance_Plan,Basis,Actual,Expected,AE
UL,QX7580E,57058510287.0,98056562359.99681,0.5818938469158247
VL,QX7580E,13178949122.0,27758762694.487873,0.4747671669320109
Perm,QX7580E,84799296392.0,154917254663.3217,0.5473844509850918
Term,QX7580E,78797530934.0,252205821017.7701,0.3124334347875659
VLSG,QX7580E,5040814763.0,11696038074.404696,0.43098481134660355
Other,QX7580E,301733854.0,748125981.7596339,0.4033195763236363
ULSG,QX7580E,34991664706.0,67566221632.30181,0.5178869538750605
Term,QX2001VBT,78797530934.0,165435567335.8289,0.4763034467312796
VLSG,QX2001VBT,5040814763.0,8540010964.813704,0.5902585820754813
Other,QX2001VBT,301733854.0,539915571.7927892,0.5588537722631206
ULSG,QX2001VBT,34991664706.0,56141747215.2847,0.6232735253468109
Perm,QX2001VBT,84799296392.0,124377476247.60884,0.68178981396

In [15]:
df = pd.read_csv("actuarial_intern_results.csv")
df = df.loc[:, ~df.columns.str.startswith("Exp_")]
df = df.sort_values(["Insurance_Plan", "AE"])

amt_cols = ["Actual", "Expected"]
pct_cols = ["AE"]

# If your numeric columns are PROPORTIONS (e.g., 0.123 -> 12.3%):
df_disp = df.copy()
df_disp[pct_cols] = (df_disp[pct_cols] * 100).round(3)

fmt = {c: "{:.3f}%".format for c in pct_cols}
for amt_col in amt_cols:
    fmt[amt_col] = lambda x: f"${(x/1_000_000):,.3f}M"

# Notebook (HTML) view:
df_disp.style.format(fmt)

# Plain text (e.g., console/logs):
print(df_disp.to_string(formatters=fmt))


   Insurance_Plan        Basis       Actual      Expected       AE
5           Other      QX7580E    $301.734M     $748.126M  40.332%
26          Other  QX2008VBTLU    $301.734M     $610.143M  49.453%
9           Other    QX2001VBT    $301.734M     $539.916M  55.885%
18          Other    QX2008VBT    $301.734M     $407.893M  73.974%
31          Other    QX2015VBT    $301.734M     $347.421M  86.850%
2            Perm      QX7580E $84,799.296M $154,917.255M  54.738%
11           Perm    QX2001VBT $84,799.296M $124,377.476M  68.179%
25           Perm  QX2008VBTLU $84,799.296M $119,969.833M  70.684%
19           Perm    QX2008VBT $84,799.296M  $99,070.552M  85.595%
32           Perm    QX2015VBT $84,799.296M  $87,027.914M  97.439%
3            Term      QX7580E $78,797.531M $252,205.821M  31.243%
23           Term  QX2008VBTLU $78,797.531M $183,024.053M  43.053%
7            Term    QX2001VBT $78,797.531M $165,435.567M  47.630%
14           Term    QX2008VBT $78,797.531M $109,476.478M  71.

In [16]:
print(prompt_ilec_data("Which expected basis do you think is the best and why?", max_turns=50))

Running request...
Short answer — it depends on your objective:

- If your goal is to get the aggregate expected amount (SUM(Expected)) closest to observed deaths (i.e., unbiased total A/E), pick Expected_Death_QX2015VBT_by_Amount. It has the Overall A/E closest to 1 (0.931).
- If your goal is to minimize plan-level deviations (absolute and squared residuals) and get the most consistent fit across plans, pick Expected_Death_QX7580E_by_Amount. QX7580E has the lowest WAPE (≈1.126), lowest Weighted-RMSE (≈3.723) and the smallest Pearson Chi2 — meaning it minimizes weighted absolute and squared errors across plan types even though it underpredicts in aggregate (Overall A/E ≈ 0.447).

Why:
- QX2015VBT: Overall_AE = 0.931 (best for total-dollar unbiasedness). But it has the largest Weighted-RMSE and Pearson Chi2 (and largest WAPE) — large squared/absolute deviations exist across plans even though totals are close.
- QX7580E: Overall_AE = 0.447 (substantial aggregate underprediction), but it 

In [17]:
print(prompt_ilec_data("Provide suggestions for issue year splits that might need to use different expected bases.", max_turns=500))

Running request...
Short answer / recommendation
- Use Expected_Death_QX2015VBT_by_Amount as the default expected basis for most issue years (it is the best single basis for 96 of the issue years in the data).
- For a small set of older / isolated issue years, use different bases because they produce an A/E closer to 1. Those exact years and the preferred basis are listed below.
- Implementation options:
  - Default = QX2015VBT, and apply per-issue-year overrides for the listed years (recommended — simple and preserves the good overall fit).
  - Or if you prefer fewer buckets, treat Issue_Year <= 1930 as a “legacy” bucket that receives special handling (older bases or per-year overrides), use QX2015VBT for most years >= 1931, and apply a few year-level overrides (1984–85, 1992, 1930).

Exact issue years where a non-QX2015 basis is recommended (AE closest to 1)
- Issue_Year 1910 → QX7580E
- Issue_Year 1911 → QX2001VBT
- Issue_Year 1912 → QX7580E
- Issue_Year 1913 → QX7580E
- Issue_Year 

In [18]:
print(prompt_ilec_data("Use judgement to set the issue year bins, but start with a list of all the issue years so you know you've included them all. Test and refine using the raw numbers.", max_turns=500))

Running request...
Step 1 — issue years present (so we know we included them all)
- Issue years found in ILEC_DATA:
  1906, 1908, 1909, 1910, 1911, 1912, 1913, 1914, 1915, 1916, 1917, 1918, 1919, 1920, 1921, 1922, 1923, 1924, 1925, 1926, 1927, 1928, 1929, 1930, 1931, 1932, 1933, 1934, 1935, 1936, 1937, 1938, 1939, 1940, 1941, 1942, 1943, 1944, 1945, 1946, 1947, 1948, 1949, 1950, 1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, 1971, 1972, 1973, 1974, 1975, 1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018

(Notes: a few very early years have zero actual claims in the table — e.g., 1906/1908/1909 — they are included above because they exist in ILEC_DATA; 1907 does not appear.)

Step 2 — proposed issue-year bins (j

In [19]:
print(prompt_ilec_data("Is there anything else I should consider in assigning the best expected basis by issue year to this data?", max_turns=500))

Running request...
Yes — you’ve done the right core work (per-year A/E and a sensible set of bins). A few additional technical, statistical and practical considerations will improve the robustness and implementability of the final issue-year → basis decision. Below I list what to think about, why it matters, and concrete actions/tests you can run next.

1) Use exposure and credibility, not just dollar A/E
- Why: tiny years (low expected $ or few claims) can have very noisy A/E; choosing a basis based solely on a single-year AE risks overfitting.
- Action: compute expected-dollar and claim-count exposures by year. Require a minimum exposure (e.g., Expected >= $X or Claims >= N) to accept a per-year override; otherwise pool with neighbors. Compute credibility weights (e.g., z = Expected / (Expected + k)) and shrink year-level AE toward the default basis using credibility.

2) Look at count-based A/E and by-number vs by-amount
- Why: bases were compared by amount only. If claim frequency 

## Additional Notes

This later parts of the analysis change substantially upon re-run, even though the temperature is set to zero.

Although the prompt was misleading, it ignored the VBT w/ MI expected basis.

Statistical analysis is suspect, does not call out relationship to Poisson, or that the log of the A/E should be used if assuming normality.



