# Customizing the Summarized Row with `set_custom_summarized_row_fn`

The **summarized row** is a compact summary section inserted into the instruction prompt just before the task questions. By default, it includes:
- The most recent **genetic** information
- The most recent **Line of Therapy (LoT)** start
- The **last known value** for each target variable being forecasted

In many use cases, you may want to customize this section — for example, to:
- Add domain-specific summaries (e.g., latest vital signs, risk scores)
- Simplify or remove sections that aren't relevant to your dataset
- Change the formatting or language of the summary

TwinWeaver's `ConverterInstruction` class exposes the `set_custom_summarized_row_fn()` method to let you plug in your own logic.

This notebook demonstrates:
1. Generating output with the **default** summarized row
2. Writing and applying a **custom** summarized row function
3. Comparing the results side by side

## Setup

We follow the same setup as the main data preparation notebook.

In [None]:
import pandas as pd

from twinweaver import (
    DataManager,
    Config,
    DataSplitterForecasting,
    DataSplitterEvents,
    ConverterInstruction,
    DataSplitter,
)

### Load Data

In [None]:
# Load data - generated example data
df_events = pd.read_csv("../../example_data/events.csv")
df_constant = pd.read_csv("../../example_data/constant.csv")
df_constant_description = pd.read_csv("../../example_data/constant_description.csv")

### Configuration

In [None]:
config = Config()

# Required settings
config.split_event_category = "lot"
config.event_category_forecast = ["lab"]
config.data_splitter_events_variables_category_mapping = {
    "death": "death",
    "progression": "next progression",
}
config.constant_columns_to_use = ["birthyear", "gender", "histology", "smoking_history"]
config.constant_birthdate_column = "birthyear"

### Data Manager & Splitters

In [None]:
dm = DataManager(config=config)
dm.load_indication_data(df_events=df_events, df_constant=df_constant, df_constant_description=df_constant_description)
dm.process_indication_data()
dm.setup_unique_mapping_of_events()
dm.setup_dataset_splits()
dm.infer_var_types()

In [None]:
# Setup splitters
data_splitter_events = DataSplitterEvents(dm, config=config)
data_splitter_events.setup_variables()

data_splitter_forecasting = DataSplitterForecasting(data_manager=dm, config=config)
data_splitter_forecasting.setup_statistics()

data_splitter = DataSplitter(data_splitter_events, data_splitter_forecasting)

### Generate Splits for a Patient

We pick a patient and generate splits so we can compare the default and custom summarized rows on the same data.

In [None]:
patientid = dm.all_patientids[4]
patient_data = dm.get_patient_data(patientid)

forecasting_splits, events_splits, reference_dates = data_splitter.get_splits_from_patient_with_target(
    patient_data,
)
print(f"Patient: {patientid}")
print(f"Number of forecasting split groups: {len(forecasting_splits)}")
print(f"Number of event split groups: {len(events_splits)}")

---

## Step 1: Default Summarized Row

First, let's create a converter with the **default** summarized row function and generate an instruction to see what it looks like.

In [None]:
converter_default = ConverterInstruction(
    nr_tokens_budget_total=8192,
    config=config,
    dm=dm,
    variable_stats=data_splitter_forecasting.variable_stats,
)

p_default = converter_default.forward_conversion(
    forecasting_splits=forecasting_splits[0],
    event_splits=events_splits[0],
    override_mode_to_select_forecasting="both",
)

print("=" * 80)
print("DEFAULT INSTRUCTION (last 2500 chars to see the summarized row + tasks):")
print("=" * 80)
print(p_default["instruction"][-2500:])

---

## Step 2: Define a Custom Summarized Row Function

Now let's write a custom function. The function **must** follow this signature:

```python
def my_custom_fn(self, events_df: pd.DataFrame, combined_meta: dict) -> str:
    ...
```

**Parameters:**
- `self` — the `ConverterInstruction` instance (gives you access to `self.config`, `self.dm`, etc.)
- `events_df` — a DataFrame of the patient's events up to the split date
- `combined_meta` — a dict with keys:
  - `"dates_per_variable"`: maps target variable names → list of future dates being forecasted
  - `"variable_name_mapping"`: maps variable names → descriptive names

**Returns:** a string that will be inserted into the instruction prompt between the event history and the task questions.

### Example 1: Minimal custom summary

This example creates a simple summary that lists only the last known lab values, skipping genetic and LoT information entirely.

In [None]:
def custom_summarized_row_minimal(self, events_df, combined_meta):
    """
    A minimal custom summarized row that only shows the last known value
    for each target variable being forecasted.
    """
    ret = "\nSummary of latest known values:\n"

    dates_per_variable = combined_meta.get("dates_per_variable", {})
    variable_name_mapping = combined_meta.get("variable_name_mapping", {})

    if not dates_per_variable:
        ret += "\tNo target variables to summarize.\n"
        return ret

    # Sort events by date to get the most recent values
    sorted_events = events_df.sort_values(self.config.date_col)

    for var_name in sorted(dates_per_variable.keys()):
        descriptive_name = variable_name_mapping.get(var_name, var_name)
        var_data = sorted_events[sorted_events[self.config.event_name_col] == var_name]
        if not var_data.empty:
            last_val = var_data[self.config.event_value_col].iloc[-1]
            ret += f"\t- {descriptive_name}: {last_val}\n"
        else:
            ret += f"\t- {descriptive_name}: not available\n"

    return ret

### Apply the Custom Function

Use `set_custom_summarized_row_fn()` to register our custom function on the converter.

In [None]:
converter_custom = ConverterInstruction(
    nr_tokens_budget_total=8192,
    config=config,
    dm=dm,
    variable_stats=data_splitter_forecasting.variable_stats,
)

# Register the custom summarized row function
converter_custom.set_custom_summarized_row_fn(custom_summarized_row_minimal)

p_custom = converter_custom.forward_conversion(
    forecasting_splits=forecasting_splits[0],
    event_splits=events_splits[0],
    override_mode_to_select_forecasting="both",
)

print("=" * 80)
print("CUSTOM INSTRUCTION (last 1500 chars):")
print("=" * 80)
print(p_custom["instruction"][-1500:])

---

## Step 3: A More Advanced Custom Function

This example builds a richer summary that includes:
- A count of total events in the patient history
- The most recent treatment/drug event
- The last known values for target variables (with trend direction)

In [None]:
def custom_summarized_row_advanced(self, events_df, combined_meta):
    """
    A more advanced custom summarized row that includes event counts,
    the most recent drug, and last known target values with trend indicators.
    """
    ret = "\n--- Patient Summary ---\n"

    # 1. Total event count by category
    category_counts = events_df[self.config.event_category_col].value_counts()
    ret += "Event counts: "
    ret += ", ".join([f"{cat}={count}" for cat, count in category_counts.items()])
    ret += "\n"

    # 2. Most recent drug/treatment
    drug_events = events_df[events_df[self.config.event_category_col] == "drug"]
    if not drug_events.empty:
        last_drug = drug_events.sort_values(self.config.date_col).iloc[-1]
        ret += f"Most recent treatment: {last_drug[self.config.event_descriptive_name_col]}\n"
    else:
        ret += "Most recent treatment: none recorded\n"

    # 3. Target variable last values with simple trend
    dates_per_variable = combined_meta.get("dates_per_variable", {})
    variable_name_mapping = combined_meta.get("variable_name_mapping", {})

    if dates_per_variable:
        ret += "Latest lab values:\n"
        sorted_events = events_df.sort_values(self.config.date_col)

        for var_name in sorted(dates_per_variable.keys()):
            descriptive_name = variable_name_mapping.get(var_name, var_name)
            var_data = sorted_events[sorted_events[self.config.event_name_col] == var_name]

            if len(var_data) >= 2:
                try:
                    prev_val = float(var_data[self.config.event_value_col].iloc[-2])
                    last_val = float(var_data[self.config.event_value_col].iloc[-1])
                    trend = "↑" if last_val > prev_val else ("↓" if last_val < prev_val else "→")
                    ret += f"\t{descriptive_name}: {last_val} ({trend})\n"
                except (ValueError, TypeError):
                    last_val = var_data[self.config.event_value_col].iloc[-1]
                    ret += f"\t{descriptive_name}: {last_val}\n"
            elif len(var_data) == 1:
                last_val = var_data[self.config.event_value_col].iloc[-1]
                ret += f"\t{descriptive_name}: {last_val}\n"
            else:
                ret += f"\t{descriptive_name}: not available\n"

    ret += "--- End Summary ---\n"
    return ret

In [None]:
# Apply the advanced custom function
converter_advanced = ConverterInstruction(
    nr_tokens_budget_total=8192,
    config=config,
    dm=dm,
    variable_stats=data_splitter_forecasting.variable_stats,
)

converter_advanced.set_custom_summarized_row_fn(custom_summarized_row_advanced)

p_advanced = converter_advanced.forward_conversion(
    forecasting_splits=forecasting_splits[0],
    event_splits=events_splits[0],
    override_mode_to_select_forecasting="both",
)

print("=" * 80)
print("ADVANCED CUSTOM INSTRUCTION (last 2000 chars):")
print("=" * 80)
print(p_advanced["instruction"][-2000:])

---

## Step 4: Error Handling

TwinWeaver validates the function signature when you call `set_custom_summarized_row_fn()`. The function **must** have `self` as the first parameter, followed by at least two more parameters (`events_df` and `combined_meta`).

Here's what happens if you pass an invalid function:

In [None]:
# This will raise a TypeError because the signature is wrong (missing 'self')
def bad_fn(events_df, combined_meta):
    return "This won't work"


try:
    converter_default.set_custom_summarized_row_fn(bad_fn)
except TypeError as e:
    print(f"Caught expected error: {e}")

If the function has the correct signature but raises an error at runtime (e.g., accesses a column that doesn't exist), the error will be raised during `forward_conversion()` or `forward_conversion_inference()`.

In [None]:
# This has the correct signature but will fail at runtime
def broken_fn(self, events_df, combined_meta):
    return events_df["nonexistent_column"]  # KeyError!


converter_broken = ConverterInstruction(
    nr_tokens_budget_total=8192,
    config=config,
    dm=dm,
    variable_stats=data_splitter_forecasting.variable_stats,
)
converter_broken.set_custom_summarized_row_fn(broken_fn)

try:
    converter_broken.forward_conversion(
        forecasting_splits=forecasting_splits[0],
        event_splits=events_splits[0],
        override_mode_to_select_forecasting="both",
    )
except TypeError as e:
    print(f"Caught runtime error: {e}")

---

## Summary

| Step | What | How |
|------|------|-----|
| 1 | Define a function | `def my_fn(self, events_df, combined_meta) -> str:` |
| 2 | Register it | `converter.set_custom_summarized_row_fn(my_fn)` |
| 3 | Generate prompts | `converter.forward_conversion(...)` uses your custom function |

**Key points:**
- The function signature **must** start with `(self, events_df, combined_meta, ...)`
- `self` gives access to the full `ConverterInstruction` instance (config, data manager, etc.)
- `events_df` contains all patient events up to the split date
- `combined_meta["dates_per_variable"]` tells you which variables are being forecasted
- `combined_meta["variable_name_mapping"]` maps variable names to descriptive names
- The returned string is inserted between the event history and the task questions in the instruction
- The custom function also works with `forward_conversion_inference()` for inference prompts