# Watchlists from Providence


**Purpose**:
1. Demonstrate how to get usable predictions out of Providence models

2. Discuss the paradigms of "integrating Providence into production"

**Raytheon Technologies proprietary**

Export controlled - see license file


In [18]:
# %load production_watchlist_prediction.py
from pathlib import Path
import torch
from providence.dataloaders import ProvidenceDataLoader
from providence.datasets.adapters import BackblazeQuarter
from providence.datasets.backblaze import BackblazeDataset
from providence.datasets.core import ProvidenceDataset, DataSubsetId
from providence.distributions import Weibull
from providence.metrics import MetricsCalculator
from providence.nn.module import ProvidenceModule

In [5]:

def relative_dir(p: str) -> Path:
    return Path('.').parent / p

In [20]:
# just a wrapper around instantiating a ProvidenceDataset, which is easy to satisfy
ds: ProvidenceDataset = BackblazeDataset(
    subset_choice=DataSubsetId.Train,
    quarter=BackblazeQuarter._2019_Q4,
    data_dir=relative_dir("../.data")
)

In [21]:
ds.event_indicator_column, ds.grouping_field, ds.tte_column

('failure', 'serial_number', 'tte')

In [9]:
# some model that we speed-trained
model: ProvidenceModule = torch.load(relative_dir("../.tmp/ProvidenceRNN-epoch003-checkpoint-full.pt"),
    map_location='cpu'
)

### Quick and dirty getting the RUL

In [23]:

dl = ProvidenceDataLoader(ds, batch_size=1)
# wrapper: Weibull.Params(alpha: Tensor, beta: Tensor)

inferred_weibull_params = [model(inputs, lengths) for (inputs, lengths, _) in dl]

predicted_rul = [Weibull.mode(params) for params in inferred_weibull_params]
t = torch.arange(2, 1, step=-1)  # countdown 10 -> 1
predicted_probabilities = [Weibull.pdf(params, t) for params in inferred_weibull_params]

print(f"{predicted_probabilities = }")
print(f"{predicted_rul = }")

# this can work, but you lose the ids, which isn't desirable... Again, only for small data

predicted_probabilities = [tensor([[[0.0017]],

        [[0.0003]],

        [[0.0003]],

        [[0.0002]],

        [[0.0002]],

        [[0.0002]],

        [[0.0003]],

        [[0.0003]],

        [[0.0002]],

        [[0.0002]],

        [[0.0002]],

        [[0.0002]],

        [[0.0002]],

        [[0.0002]],

        [[0.0002]],

        [[0.0002]],

        [[0.0002]],

        [[0.0002]],

        [[0.0002]],

        [[0.0002]],

        [[0.0002]],

        [[0.0002]],

        [[0.0002]],

        [[0.0002]],

        [[0.0002]],

        [[0.0002]],

        [[0.0002]],

        [[0.0002]],

        [[0.0002]],

        [[0.0002]],

        [[0.0002]],

        [[0.0002]],

        [[0.0002]],

        [[0.0002]],

        [[0.0002]],

        [[0.0002]],

        [[0.0002]],

        [[0.0002]],

        [[0.0002]],

        [[0.0002]],

        [[0.0002]],

        [[0.0002]],

        [[0.0002]],

        [[0.0002]],

        [[0.0002]],

        [[0.0002]],

       

## How to actually do it

In [None]:
# a better way, that keeps everything in the foreground: use MetricsCalculator

calculator = MetricsCalculator(model, Weibull, ds)
# outputs from the model for every device, for every timestep in the dataset
outputs_df = calculator.outputs_per_device()

In [None]:

# compute the weibull parameters, then the corresponding probability at timestep.
# Example uses tte=1, you might want a spread in an engagement, say 1, 5, 20, 50...
outputs_df["w"] = Weibull.Params(outputs_df["alpha"], outputs_df["beta"])

In [28]:
probs = Weibull.pdf(Weibull.Params(
    torch.from_numpy(outputs_df["alpha"].to_numpy()),
    torch.from_numpy(outputs_df["beta"].to_numpy())),
    torch.tensor([1.0]) # What's the probability for the next time step?
)

If you want to assess multiple time steps and tiled $\alpha, \beta$ to distribution of likelihood

In [30]:
outputs_df["prob_fail_in_1_step"] = probs.numpy()

In [32]:
outputs_df[80:90]

Unnamed: 0,tte,censor,alpha,beta,mean,median,mode,id,prob_fail_in_1_step
80,12.0,0.0,61.041393,2.635849,54.240299,53.117313,50.935974,Z4D019KB,5.2e-05
81,11.0,0.0,60.419025,2.634489,53.686405,52.571964,50.405888,Z4D019KB,5.3e-05
82,10.0,0.0,59.960049,2.633787,53.278126,52.170666,50.017467,Z4D019KB,5.5e-05
83,9.0,0.0,58.094105,2.630473,51.618092,50.538261,48.435658,Z4D019KB,6e-05
84,8.0,0.0,59.951927,2.634861,53.271591,52.166557,50.019119,Z4D019KB,5.5e-05
85,7.0,0.0,59.233665,2.632864,52.632114,51.536129,49.404358,Z4D019KB,5.7e-05
86,6.0,0.0,59.501381,2.633723,52.870533,51.771408,49.634354,Z4D019KB,5.6e-05
87,5.0,0.0,59.132774,2.632858,52.542465,51.448334,49.320168,Z4D019KB,5.7e-05
88,4.0,0.0,58.491142,2.631736,51.971653,50.887062,48.776398,Z4D019KB,5.9e-05
89,3.0,0.0,58.946575,2.633062,52.377144,51.286888,49.166447,Z4D019KB,5.7e-05


In [37]:
outputs_df[outputs_df["id"] == "Z4D019KB"].apply(lambda row: (
    Weibull.pdf( Weibull.Params(row["alpha"], row["beta"]), torch.tensor([90]))
    ), axis="columns"
)

0     [tensor(0.0002)]
1     [tensor(0.0027)]
2     [tensor(0.0031)]
3     [tensor(0.0035)]
4     [tensor(0.0035)]
            ...       
87    [tensor(0.0043)]
88    [tensor(0.0041)]
89    [tensor(0.0042)]
90    [tensor(0.0047)]
91    [tensor(0.0048)]
Length: 92, dtype: object

In [58]:

predictions_for_each_device = ( # yapf: skip
    outputs_df.sort_values(by=["id", "tte"], ascending=[True, False])
    [["id", "tte", "censor", "mode", "prob_fail_in_1_step"]]
    .groupby("id")
    .tail(1)
    .rename({"mode": "RUL"}, axis="columns")
)

In [59]:
predictions_for_each_device

Unnamed: 0,id,tte,censor,RUL,prob_fail_in_1_step
91,Z4D019KB,1.0,0.0,50.412067,0.000053
91,Z4D04WV5,1.0,0.0,49.275341,0.000057
91,Z4D09FH2,1.0,0.0,49.441166,0.000057
91,Z4D09FQ5,1.0,0.0,49.226860,0.000057
90,ZA101JHT,1.0,0.0,58.986031,0.000037
...,...,...,...,...,...
56,ZJV5KQY4,1.0,1.0,48.539429,0.000067
91,ZJV5LFEH,1.0,0.0,50.081440,0.000058
15,ZJV5LWDJ,1.0,1.0,48.299519,0.000074
35,ZJV5MDP1,1.0,1.0,46.776276,0.000078


In [61]:
near_failure = (predictions_for_each_device["tte"] < 10) & predictions_for_each_device["censor"] == 1
predictions_for_each_device[near_failure]

Unnamed: 0,id,tte,censor,RUL,prob_fail_in_1_step
51,ZA108HT7,1.0,1.0,65.356628,0.000029
50,ZA10JGLG,1.0,1.0,61.802681,0.000034
51,ZA10NFKE,1.0,1.0,61.852013,0.000034
84,ZA10YPL3,1.0,1.0,66.497635,0.000027
40,ZA10Z4BZ,1.0,1.0,65.139740,0.000030
...,...,...,...,...,...
83,ZJV5JSG5,1.0,1.0,50.810848,0.000057
3,ZJV5JWCJ,1.0,1.0,49.132351,0.000075
56,ZJV5KQY4,1.0,1.0,48.539429,0.000067
15,ZJV5LWDJ,1.0,1.0,48.299519,0.000074


### Semantic differences in probabilities

One potentialy paradigm:
```
customer needs watchlist
    /          \
axiom         providence
    \          /
      watchdog
```

Another, where the UI (i.e. the watchlist) is variable:
Extract a ranking from Providence, just by sorting the likelihood of an event i.e. a failure.
For devices "a", "b", "c", and the inference models `axiom` and `providence`, we can have
```
axiom     : [("a", 1), ("b", 3), ("c", 2)] -> a c b
providence: [("a", "70%"), ("b", "90%"), ("c", "20%")] -> b a c
```

If you want to infuse axiom rankings with providence probabilities, you could annotate a la `(a 70%), (c 20%), (b 90%)`