In [1]:
from graphnet.models.data_representation.graphs import ClusterSummaryFeatures



In [2]:
FEATURES = ["dom_x", "dom_y", "dom_z", "dom_time", "charge"]

In [3]:
print(ClusterSummaryFeatures.__doc__)

Represent pulse maps as clusters with summary features.

    If `cluster_on` is set to the xyz coordinates of optical modules
    e.g. `cluster_on = ['dom_x', 'dom_y', 'dom_z']`, each node will be
    a unique optical module and the pulse information (e.g. charge, time)
    is summarized.
    NOTE: Developed to be used with features
        [dom_x, dom_y, dom_z, charge, time]

    Possible features per cluster:
    - total charge
        feature name: `total_charge`
    - charge accumulated after <X> time units
        feature name: `charge_after_<X>ns`
    - time of first hit in the optical module
        feature name: `time_of_first_hit`
    - time spread per optical module
        feature name: `time_spread`
    - time std per optical module
        feature name: `time_std`
    - time took to collect <X> percent of total charge per cluster
        feature name: `time_after_charge_pct<X>`
    - number of pulses per clusters
        feature name: `counts`

    For more details on some

In [4]:
# help(ClusterSummaryFeatures)

In [5]:
node_definition = ClusterSummaryFeatures(cluster_on=["dom_x", "dom_y", "dom_z"],
                                         input_feature_names= FEATURES, 
                                         charge_label= 'charge', 
                                         time_label = 'dom_time', 
                                         total_charge = True, 
                                         charge_after_t = [10, 50, 100], 
                                         time_of_first_hit = True, 
                                         time_spread= True, 
                                         time_std = True, 
                                         time_after_charge_pct = [1, 3, 5, 11, 15, 20, 50, 80],
                                         charge_standardization= 'log', 
                                         time_standardization = 0.001, 
                                         order_in_time= True, 
                                         add_counts = True)

# each node will be a unique optical module and the pulse information (e.g. charge, time) is summarized.

[1;34mgraphnet[0m [MainProcess] [32mINFO    [0m 2026-01-05 11:43:22 - ClusterSummaryFeatures.__init__ - Writing log to [1mlogs/graphnet_20260105-114322.log[0m


## ClusterSummaryFeatures (DOM-level clusters)

This node definition **clusters pulses into optical modules (DOMs)** using:

- `cluster_on = ["dom_x", "dom_y", "dom_z"]`

So **each cluster corresponds to one unique DOM position** `(dom_x, dom_y, dom_z)`.  
All pulses that share the same `(dom_x, dom_y, dom_z)` are aggregated, and the pulse-level information
(e.g., `charge`, `dom_time`) is summarized into **cluster-level features**.

### Output features per cluster (and what they mean)

> Assumes the input includes at least: `dom_x, dom_y, dom_z, charge, dom_time`.

#### Charge summary features

- **`total_charge`**  
  Sum of `charge` over all pulses in the DOM cluster.  
  Interpretable as the total detected light (charge) in that DOM for the event.

- **`charge_after_10ns`**  
  Accumulated charge collected up to 10 ns (after ordering pulses in time).  
  Captures how much charge arrives *early* in the DOM.

- **`charge_after_50ns`**  
  Accumulated charge collected up to 50 ns (after ordering pulses in time).

- **`charge_after_100ns`**  
  Accumulated charge collected up to 100 ns (after ordering pulses in time).

#### Time summary features

- **`time_of_first_hit`**  
  Time of the earliest pulse in the DOM cluster (`min(dom_time)`).  
  Indicates when the DOM first records light.

- **`time_spread`**  
  Temporal extent of activity in the DOM: `max(dom_time) - min(dom_time)`.  
  Measures how long the DOM keeps receiving pulses.

- **`time_std`**  
  Standard deviation of pulse times in the DOM cluster.  
  Captures how tightly clustered or dispersed the pulse times are.

#### Time-to-charge-percentile features

These quantify **how quickly charge accumulates** in time.  
Pulses are ordered by time, charge is accumulated, and the feature is the time at which
the accumulated charge reaches a given fraction of `total_charge`.

- **`time_after_charge_pct1`**  — time to reach **1%** of total charge  
- **`time_after_charge_pct3`**  — time to reach **3%** of total charge  
- **`time_after_charge_pct5`**  — time to reach **5%** of total charge  
- **`time_after_charge_pct11`** — time to reach **11%** of total charge  
- **`time_after_charge_pct15`** — time to reach **15%** of total charge  
- **`time_after_charge_pct20`** — time to reach **20%** of total charge  
- **`time_after_charge_pct50`** — time to reach **50%** of total charge (median charge time)  
- **`time_after_charge_pct80`** — time to reach **80%** of total charge

#### Count feature

- **`counts`**  
  A measure of the number of pulses in the DOM cluster.  
  In GraphNeT this is typically **`log10(number_of_pulses_in_cluster)`** (see class docstring).

---

### Notes on standardization (affects values, not names)

- `charge_standardization = "log"`  
  Charge-derived features (e.g., `total_charge`, `charge_after_*`) are transformed to **log10 scale**.

- `time_standardization = 0.001`  
  Time-derived features (e.g., `time_of_first_hit`, `time_spread`, `time_std`, `time_after_charge_pct*`)
  are typically multiplied by `0.001` (a unit scaling factor).

- `order_in_time = True`  
  Ensures pulses are time-ordered before computing time-dependent and accumulation features.
