Producing lists out of prediction files
===
This notebooks aims to produce lists as JSON file with each entry containing fields:
- siret
- periode
- score
- timestamp
- algo
- alert

It takes as inputs a set of CSV prediction files produced by `predictsignauxfaibles`, typically:
- one file corresponding to the "default" model
- one file corresponding to the "small" model

In [None]:
%config Completer.use_jedi = False

In [None]:
# Set logging level to INFO
import logging
logging.getLogger().setLevel(logging.INFO)

# Import required libraries and modules
from datetime import datetime
import pandas as pd
from pathlib import Path
import json

from predictsignauxfaibles.config import OUTPUT_FOLDER

Functions to make the alert flag

In [None]:
from predictsignauxfaibles.make_list import merge_models, assign_flag, make_alert

Let's load CSV data produced by a run with the default model and a run with the small model:

In [None]:
default = pd.read_csv("/home/simon.lebastard/predictsignauxfaibles/predictsignauxfaibles/model_runs/20210507-195755/predictions-20210507-195755.csv")
small = pd.read_csv("/home/simon.lebastard/predictsignauxfaibles/predictsignauxfaibles/model_runs/20210507-195735/predictions-20210507-195735.csv")

In [None]:
merged = merge_models(model_list = [default, small])

In [None]:
log_splits_size(merged, t_rouge= 0.75, t_orange = 0.3)

In [None]:
merged["alert"] = merged["predicted_probability"].apply(
    lambda x: assign_flag(x, t_rouge=0.75, t_orange=0.3)
)

In [None]:
merged

In [None]:
list_id = datetime.now().strftime("%Y%m%d-%H%M%S")
run_path = Path(OUTPUT_FOLDER) / f"{list_id}"
run_path.mkdir(parents=True, exist_ok=True)

with open(run_path / "scores.json", "w") as stats_file:
    stats_file.write(json.dumps(merged.to_json()))

In [None]:
run_path

Preparing a new dummy list
---
From what was output by the succesful run of `python3 -m predictsignauxfaibles` using the new function explain, let's produce a list that we can communicate to the front-end team.

Collection `Scores` on MongoDB needs to receive that looks like this:
```
{
    "siret": "12345678901234",
    "periode": "2019-01-01",
    "score": 0.996714234,
    "batch": "1904",
    "timestamp": 2019-01-01T14:56:58.418+00:00,
    "algo": "algo_avec_urssaf",
    "alert" :"Alerte seuil F1"
}
```

In [None]:
merged["alert"] = predictions.predicted_probability.apply(decision_function, args = (.45, .38))
merged["periode"] = "2020-02-01"
merged["batch"] = "<BATCH_NAME>"
merged["algo"] = conf.model_name

In [None]:
merged

In [None]:
pred_dict = merged.to_dict('records')

In [None]:
import json

js = json.dumps(pred_dict) #allow_nan=False
with open("/home/simon.lebastard/predictsignauxfaibles/data/explain/scores_export_test.json", "w", encoding="utf-8") as file:
    file.write(js)