# Analyse des scores de risque Signaux Faibles - Création d'indicateurs régionaux
Dans le cadre d'une demande du Ministère du Travail, Signaux Faibles réalise une analyse agrégée aux niveaux géographiques de la région et du département, pour fournir différents indicateurs de risque territorialisés.

Ce notebook vise à charger les données provenant de nos prédictions de risque pour Mars 2020, et à produire des indicateurs agrégés qui seront ultérieurement aposés sur des fonds de cartographie.

In [2]:
%config Completer.use_jedi = False

In [1]:
import os.path
from pymongo import MongoClient
from pymongo.cursor import Cursor

import pandas as pd
import numpy as np
from datetime import datetime
import pytz

# Set logging level to INFO
import logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)

from predictsignauxfaibles.utils import MongoDBQuery, MongoParams
import predictsignauxfaibles.config as global_config

import config as cab_config
import utils


# Part 1 - Métriques à mars 2020 (modèle)

In [3]:
load_features_from_file = True
load_scores_from_file = True
features_path = "/home/simon.lebastard/predictsignauxfaibles/data/features_2103.json"
scores_path = "/home/simon.lebastard/predictsignauxfaibles/data/scores_2103.json"
postproc_path = "/home/simon.lebastard/predictsignauxfaibles/data/postproc_2103.json"

## Fetching features and scores data
### Option 1: Fetch data from each dataset

In [4]:
features = utils.load_features(date_min="2020-02-01", date_max="2020-02-28", from_file=True, filepath=features_path)
logging.info(f"Loaded {features.shape[0]} rows and {features.shape[1]} columns")

INFO:root:Succesfully loaded Features data from /home/simon.lebastard/predictsignauxfaibles/data/features_2103.json
INFO:root:Loaded 956765 rows and 31 columns


In [5]:
scores = utils.load_scores(batch_name="2102_altares", algo_name="mars2021_v0", from_file=True, filepath=scores_path)
logging.info(f"Loaded {scores.shape[0]} rows and {scores.shape[1]} columns")

INFO:root:Succesfully loaded Scores data from /home/simon.lebastard/predictsignauxfaibles/data/scores_2103.json
INFO:root:Loaded 657296 rows and 7 columns


In [6]:
features["periode"] = features.periode.apply(utils.datetime_to_str)
scores["periode"] = scores.periode.apply(utils.datetime_to_str)

In [7]:
df = pd.merge(scores, features, on=['siret', 'periode'], how='inner')
df["code_reg"] = df.region.apply(utils.map_region_to_code)

In [None]:
if not os.path.isfile(postproc_path):
    logging.info("Saving joined post-processed data to disk...")
    df.to_json(postproc_path, orient="records", default_handler=str)
    logging.info(f"Saved to {postproc_path}")

### Option 2: Load data directly from df stored on disk

In [None]:
if os.path.isfile(postproc_path):
    print("Loading post-processed data to disk")
    df = pd.read_json(postproc_path, orient="records")

## Aggregation of region-wide features

Niveaux de granularité considérés:
- région
- département

Pour chaque niveau de granularité:
- compter le nombre d'établissements flaguées rouge par région
- compter le nombre d'établissements flaguées orange par région
- compter le nombre d'établissements flaguées en rouge OU en orange
- rapporter ces nombre d'établissements au nombre total d'établissements dans la zone géographique
- compter le nombre de défaillances effectives sur une période donnée, et calculer le ratio correspondants

Pour toutes les grandeurs calculées précédemment, calculer des équivalents en nombre d'employés concernés.

Pour les établissements ayant un risque de défaillance modéré ou fort (flagguée en rouge OU en orange), communiquer:
- ratio $\frac{dette_{ouvriere}}{cotisation}$ moyen
- recours moyen à l'activité partielle

On pourra éventuellement ajouter à ces premières métriques:
- des ratios financiers provenant de la DE de la Banque de France
- le nombre de jour moyen de retard de paiement aux fournisseurs (donnée Paydex)

### Preprocessing steps

In [8]:
def preprocess(df):
    # create an outcome flag based only on failures since the beginning of the COVID crisis
    df["failure"] = (df["time_til_failure"]>=0) & (df["time_til_failure"]<12) # todo: automatiser le nombre de mois à regarder vers l'avant: entre mars 2020 et <THIS_MONTH>
    df["failure"] = df.failure.astype(int)

    # encode alert level into integer
    df["alert_flag"] = df.alert.replace({"Pas d'alerte": 0, "Alerte seuil F1": 1, "Alerte seuil F2": 2})
    df["alert_bin"] = (df.alert_flag > 0)

    # ratio dette/cotisation sur la part salariale des cotisations sociales
    df["ratio_dette_ouvriere"] = df["montant_part_ouvriere"] / df["cotisation"]
    df["ratio_dette_patronale"] = df["montant_part_patronale"] / df["cotisation"]
    return df

def replace_nans(df, replace_dct):
    for field, rpl in replace_dct.items():
        df[field].fillna(value=rpl, inplace=True)

In [9]:
replace_nans(df, cab_config.NAN_RPL)
df = preprocess(df)

Vérification des effectifs par catégorie:
- entrée en procédure collective
- flagging par l'algorithme SF
- flagging binaire par l'algorithme SF (True si un établissement est flagué en rouge OU en orange, False si l'établissement est flagué Vert, ie non flagué)

### Building aggregation dataframe

In [None]:
cab_config.FEATURES_LIST

In [10]:
def aggregate_stats(geo_attr: str, outcome_attr: str, siret_per_ape3_min: int=10, siret_per_naf_min: int=25):
        assert outcome_attr in ["alert_flag", "alert_bin", "failure", "outcome"]
                
        risk_ape3_stats = df.groupby(by=[geo_attr,outcome_attr,"code_naf","libelle_naf","code_ape_niveau3","libelle_ape3"]).agg(
                siret_count=('siret', 'count'),
                effectif_tot=('effectif', 'sum'),
        )
        risk_naf_stats = df.groupby(by=[geo_attr,outcome_attr,"code_naf","libelle_naf"]).agg(
                siret_count=('siret', 'count'),
                effectif_tot=('effectif', 'sum')
        )
        risk_stats = df.groupby(by=[geo_attr,outcome_attr]).agg(
            siret_count=('siret', 'count'),
            effectif_tot=('effectif', 'sum'),
            ratiodette_ouvr_avg=('ratio_dette_ouvriere', 'mean'),
            ratiodette_patr_avg=('ratio_dette_patronale', 'mean'),
            ratiodette_avg=('ratio_dette', 'mean'),
            apart_autr_avg=('apart_heures_autorisees', 'mean'),
            apart_cons_avg=('apart_heures_consommees', 'mean'),
            apart_cumcons_avg=('apart_heures_consommees_cumulees', 'mean'),
            paydex_avg=('paydex_nb_jours', 'mean'),
            taux_endettement_avg=('taux_endettement', 'mean'),
        )
        national_stats = df.groupby(by=[outcome_attr]).agg(
            siret_count=('siret', 'count'),
            effectif_tot=('effectif', 'sum'),
            ratiodette_ouvr_avg=('ratio_dette_ouvriere', 'mean'),
            ratiodette_patr_avg=('ratio_dette_patronale', 'mean'),
            ratiodette_avg=('ratio_dette', 'mean'),
            apart_autr_avg=('apart_heures_autorisees', 'mean'),
            apart_cons_avg=('apart_heures_consommees', 'mean'),
            apart_cumcons_avg=('apart_heures_consommees_cumulees', 'mean'),
            paydex_avg=('paydex_nb_jours', 'mean'),
            taux_endettement_avg=('taux_endettement', 'mean'),
        )
        
        # Calcul du ratio des effectifs/établissements par rapport au total de la zone géographique
        risk_stats['siret_rate'] = risk_stats.siret_count / risk_stats.groupby(by=geo_attr).siret_count.sum()
        national_stats['siret_rate'] = national_stats.siret_count / national_stats.siret_count.sum()
        
        risk_stats['effectif_rate'] = risk_stats.siret_count / risk_stats.groupby(by=geo_attr).effectif_tot.sum()
        national_stats['effectif_rate'] = national_stats.effectif_tot / national_stats.effectif_tot.sum()
        
        ####################
        ### ANALYSES SECTORIELLES
        ####################

        risk_ape3_base = risk_ape3_stats.loc[risk_ape3_stats.index.get_level_values("code_naf").isin(cab_config.NAF_INDUSTRY)]
        risk_ape3_base = risk_ape3_base[risk_ape3_base["siret_count"] >= siret_per_ape3_min]
        risk_naf_base = risk_naf_stats.loc[risk_naf_stats.index.get_level_values("code_naf").isin(cab_config.NAF_INDUSTRY)]
        risk_naf_base = risk_naf_base[risk_naf_base["siret_count"] >= siret_per_naf_min]

        #############
        ## Analyses sectorielles - Nombre d'employés
        #############
        # APE level 3
        risk_stats['ape3_mostatrisk_eff_abs'] = risk_ape3_base.groupby(by=[geo_attr,outcome_attr]).effectif_tot.idxmax()
        risk_stats['ape3_mostatrisk_eff_abs_tot'] = risk_ape3_base.groupby(by=[geo_attr,outcome_attr]).effectif_tot.max()
        risk_stats['ape3_mostatrisk_eff_abs_code'] = risk_stats['ape3_mostatrisk_eff_abs'].apply(lambda x: x[4] if isinstance(x, tuple) else "None")
        risk_stats['ape3_mostatrisk_eff_abs_libelle'] = risk_stats['ape3_mostatrisk_eff_abs'].apply(lambda x: x[5] if isinstance(x, tuple) else "None")
        risk_stats.drop(['ape3_mostatrisk_eff_abs'], axis=1, inplace=True)
        
        ape3_risk_overrepresentation_eff = risk_ape3_base.effectif_tot.div(risk_ape3_base.groupby(by=[geo_attr,"code_naf","libelle_naf","code_ape_niveau3","libelle_ape3"]).agg({"effectif_tot": 'sum'}).effectif_tot, axis=0)
        risk_stats['ape3_risk_eff_most_overrepresented'] = ape3_risk_overrepresentation_eff.groupby(by=[geo_attr,outcome_attr]).idxmax()
        risk_stats['ape3_risk_eff_most_overrepresented_rate'] = ape3_risk_overrepresentation_eff.groupby(by=[geo_attr,outcome_attr]).max()
        risk_stats['ape3_risk_eff_most_overrepresented_code'] = risk_stats['ape3_risk_eff_most_overrepresented'].apply(lambda x: x[3] if isinstance(x, tuple) else "None")
        risk_stats['ape3_risk_eff_most_overrepresented_libelle'] = risk_stats['ape3_risk_eff_most_overrepresented'].apply(lambda x: x[4] if isinstance(x, tuple) else "None")
        risk_stats.drop(['ape3_risk_eff_most_overrepresented'], axis=1, inplace=True)
        
        for naf in cab_config.NAF_INDUSTRY:
            risk_stats[f"ape3_mostatrisk_eff_abs_{naf}"] = risk_ape3_base.loc[risk_ape3_base.index.get_level_values("code_naf")==naf].groupby(by=[geo_attr,outcome_attr]).effectif_tot.idxmax()
            risk_stats[f"ape3_mostatrisk_eff_abs_{naf}_tot"] = risk_ape3_base.loc[risk_ape3_base.index.get_level_values("code_naf").isin(cab_config.NAF_INDUSTRY)].groupby(by=[geo_attr,outcome_attr]).effectif_tot.max()
            risk_stats[f"ape3_mostatrisk_eff_abs_{naf}_code"] = risk_stats[f"ape3_mostatrisk_eff_abs_{naf}"].apply(lambda x: x[4] if isinstance(x, tuple) else "None")
            risk_stats[f"ape3_mostatrisk_eff_abs_{naf}_libelle"] = risk_stats[f"ape3_mostatrisk_eff_abs_{naf}"].apply(lambda x: x[5] if isinstance(x, tuple) else "None")
            risk_stats.drop([f"ape3_mostatrisk_eff_abs_{naf}"], axis=1, inplace=True)
        
            ape3_risk_overrepresentation_eff = risk_ape3_base.loc[risk_ape3_base.index.get_level_values("code_naf")==naf].effectif_tot.div(risk_ape3_base.loc[risk_ape3_base.index.get_level_values("code_naf")==naf].groupby(by=[geo_attr,"code_naf","libelle_naf","code_ape_niveau3","libelle_ape3"]).agg({"effectif_tot": 'sum'}).effectif_tot, axis=0)
            risk_stats[f"ape3_risk_eff_most_overrepresented_{naf}"] = ape3_risk_overrepresentation_eff.groupby(by=[geo_attr,outcome_attr]).idxmax()
            risk_stats[f"ape3_risk_eff_most_overrepresented_{naf}_rate"] = ape3_risk_overrepresentation_eff.groupby(by=[geo_attr,outcome_attr]).max()
            risk_stats[f"ape3_risk_eff_most_overrepresented_{naf}_code"] = risk_stats[f"ape3_risk_eff_most_overrepresented_{naf}"].apply(lambda x: x[3] if isinstance(x, tuple) else "None")
            risk_stats[f"ape3_risk_eff_most_overrepresented_{naf}_libelle"] = risk_stats[f"ape3_risk_eff_most_overrepresented_{naf}"].apply(lambda x: x[4] if isinstance(x, tuple) else "None")
            risk_stats.drop([f"ape3_risk_eff_most_overrepresented_{naf}"], axis=1, inplace=True)
        
        # NAF
        risk_stats['naf_mostatrisk_eff_abs'] = risk_naf_base.groupby(by=[geo_attr,outcome_attr]).effectif_tot.idxmax()
        risk_stats['naf_mostatrisk_eff_abs_tot'] = risk_naf_base.groupby(by=[geo_attr,outcome_attr]).effectif_tot.max()
        risk_stats['naf_mostatrisk_eff_abs_code'] = risk_stats.naf_mostatrisk_eff_abs.apply(lambda x: x[2] if isinstance(x, tuple) else "None")
        risk_stats['naf_mostatrisk_eff_abs_libelle'] = risk_stats.naf_mostatrisk_eff_abs.apply(lambda x: x[3] if isinstance(x, tuple) else "None")
        risk_stats.drop(['naf_mostatrisk_eff_abs'], axis=1, inplace=True)
        
        naf_risk_overrepresentation_eff = risk_naf_base.effectif_tot.div(risk_naf_base.groupby(by=[geo_attr,"code_naf","libelle_naf"]).agg({"effectif_tot": 'sum'}).effectif_tot, axis=0)
        risk_stats['naf_risk_eff_most_overrepresented'] = naf_risk_overrepresentation_eff.groupby(by=[geo_attr,outcome_attr]).idxmax()
        risk_stats['naf_risk_eff_most_overrepresented_rate'] = naf_risk_overrepresentation_eff.groupby(by=[geo_attr,outcome_attr]).max()
        risk_stats['naf_risk_eff_most_overrepresented_code'] = risk_stats['naf_risk_eff_most_overrepresented'].apply(lambda x: x[1] if isinstance(x, tuple) else "None")
        risk_stats['naf_risk_eff_most_overrepresented_libelle'] = risk_stats['naf_risk_eff_most_overrepresented'].apply(lambda x: x[2] if isinstance(x, tuple) else "None")
        risk_stats.drop(['naf_risk_eff_most_overrepresented'], axis=1, inplace=True)
        
        #############
        ## Analyses sectorielles - Nombre d'établissements
        #############
        # APE level 3
        risk_stats['ape3_mostatrisk_etab_abs'] = risk_ape3_base.groupby(by=[geo_attr,outcome_attr]).siret_count.idxmax()
        risk_stats['ape3_mostatrisk_etab_abs_tot'] = risk_ape3_base.groupby(by=[geo_attr,outcome_attr]).siret_count.max()
        risk_stats['ape3_mostatrisk_etab_abs_code'] = risk_stats['ape3_mostatrisk_etab_abs'].apply(lambda x: x[4] if isinstance(x, tuple) else "None")
        risk_stats['ape3_mostatrisk_etab_abs_libelle'] = risk_stats['ape3_mostatrisk_etab_abs'].apply(lambda x: x[5] if isinstance(x, tuple) else "None")
        risk_stats.drop(['ape3_mostatrisk_etab_abs'], axis=1, inplace=True)
        
        ape3_risk_overrepresentation_etab = risk_ape3_base.siret_count.div(risk_ape3_base.groupby(by=[geo_attr,"code_naf","libelle_naf","code_ape_niveau3","libelle_ape3"]).agg({"siret_count": 'sum'}).siret_count, axis=0)
        risk_stats['ape3_risk_etab_most_overrepresented'] = ape3_risk_overrepresentation_etab.groupby(by=[geo_attr,outcome_attr]).idxmax()
        risk_stats['ape3_risk_etab_most_overrepresented_rate'] = ape3_risk_overrepresentation_etab.groupby(by=[geo_attr,outcome_attr]).max()
        risk_stats['ape3_risk_etab_most_overrepresented_code'] = risk_stats['ape3_risk_etab_most_overrepresented'].apply(lambda x: x[3] if isinstance(x, tuple) else "None")
        risk_stats['ape3_risk_etab_most_overrepresented_libelle'] = risk_stats['ape3_risk_etab_most_overrepresented'].apply(lambda x: x[4] if isinstance(x, tuple) else "None")
        risk_stats.drop(['ape3_risk_etab_most_overrepresented'], axis=1, inplace=True)
        
        for naf in cab_config.NAF_INDUSTRY:
            risk_stats[f"ape3_mostatrisk_etab_abs_{naf}"] = risk_ape3_base.loc[risk_ape3_base.index.get_level_values("code_naf")==naf].groupby(by=[geo_attr,outcome_attr]).siret_count.idxmax()
            risk_stats[f"ape3_mostatrisk_etab_abs_{naf}_tot"] = risk_ape3_base.loc[risk_ape3_base.index.get_level_values("code_naf").isin(cab_config.NAF_INDUSTRY)].groupby(by=[geo_attr,outcome_attr]).siret_count.max()
            risk_stats[f"ape3_mostatrisk_etab_abs_{naf}_code"] = risk_stats[f"ape3_mostatrisk_etab_abs_{naf}"].apply(lambda x: x[4] if isinstance(x, tuple) else "None")
            risk_stats[f"ape3_mostatrisk_etab_abs_{naf}_libelle"] = risk_stats[f"ape3_mostatrisk_etab_abs_{naf}"].apply(lambda x: x[5] if isinstance(x, tuple) else "None")
            risk_stats.drop([f"ape3_mostatrisk_etab_abs_{naf}"], axis=1, inplace=True)
        
            ape3_risk_overrepresentation_etab = risk_ape3_base.loc[risk_ape3_base.index.get_level_values("code_naf")==naf].siret_count.div(risk_ape3_base.loc[risk_ape3_base.index.get_level_values("code_naf")==naf].groupby(by=[geo_attr,"code_naf","libelle_naf","code_ape_niveau3","libelle_ape3"]).agg({"siret_count": 'sum'}).siret_count, axis=0)
            risk_stats[f"ape3_risk_etab_most_overrepresented_{naf}"] = ape3_risk_overrepresentation_etab.groupby(by=[geo_attr,outcome_attr]).idxmax()
            risk_stats[f"ape3_risk_etab_most_overrepresented_{naf}_rate"] = ape3_risk_overrepresentation_etab.groupby(by=[geo_attr,outcome_attr]).max()
            risk_stats[f"ape3_risk_etab_most_overrepresented_{naf}_code"] = risk_stats[f"ape3_risk_etab_most_overrepresented_{naf}"].apply(lambda x: x[3] if isinstance(x, tuple) else "None")
            risk_stats[f"ape3_risk_etab_most_overrepresented_{naf}_libelle"] = risk_stats[f"ape3_risk_etab_most_overrepresented_{naf}"].apply(lambda x: x[4] if isinstance(x, tuple) else "None")
            risk_stats.drop([f"ape3_risk_etab_most_overrepresented_{naf}"], axis=1, inplace=True)
        
        # NAF
        risk_stats['naf_mostatrisk_etab_abs'] = risk_naf_base.groupby(by=[geo_attr,outcome_attr]).siret_count.idxmax()
        risk_stats['naf_mostatrisk_etab_abs_tot'] = risk_naf_base.groupby(by=[geo_attr,outcome_attr]).siret_count.max()
        risk_stats['naf_mostatrisk_etab_abs_code'] = risk_stats.naf_mostatrisk_etab_abs.apply(lambda x: x[2] if isinstance(x, tuple) else "None")
        risk_stats['naf_mostatrisk_etab_abs_libelle'] = risk_stats.naf_mostatrisk_etab_abs.apply(lambda x: x[3] if isinstance(x, tuple) else "None")
        risk_stats.drop(['naf_mostatrisk_etab_abs'], axis=1, inplace=True)
        
        naf_risk_overrepresentation_etab = risk_naf_base.siret_count.div(risk_naf_base.groupby(by=[geo_attr,"code_naf","libelle_naf"]).agg({"siret_count": 'sum'}).siret_count, axis=0)
        risk_stats['naf_risk_etab_most_overrepresented'] = naf_risk_overrepresentation_etab.groupby(by=[geo_attr,outcome_attr]).idxmax()
        risk_stats['naf_risk_etab_most_overrepresented_rate'] = naf_risk_overrepresentation_etab.groupby(by=[geo_attr,outcome_attr]).max()
        risk_stats['naf_risk_etab_most_overrepresented_code'] = risk_stats['naf_risk_etab_most_overrepresented'].apply(lambda x: x[1] if isinstance(x, tuple) else "None")
        risk_stats['naf_risk_etab_most_overrepresented_libelle'] = risk_stats['naf_risk_etab_most_overrepresented'].apply(lambda x: x[2] if isinstance(x, tuple) else "None")
        risk_stats.drop(['naf_risk_etab_most_overrepresented'], axis=1, inplace=True)
                
        # Rapport des indicateurs calculés à l'échelle nationale: on calcule la sur/sous-représentativité par rapport à la France entière
        risk_stats['siret_rate_to_ntl_avg_to_ntl_avg'] = (risk_stats.siret_rate - national_stats.siret_rate) / national_stats.siret_rate
        risk_stats['effectif_rate_to_ntl_avg_to_ntl_avg'] = (risk_stats.effectif_rate - national_stats.effectif_rate) / national_stats.effectif_rate
        risk_stats['ratiodette_ouvr_avg_to_ntl_avg'] = (risk_stats.ratiodette_ouvr_avg - national_stats.ratiodette_ouvr_avg) / national_stats.ratiodette_ouvr_avg
        risk_stats['ratiodette_patr_avg_to_ntl_avg'] = (risk_stats.ratiodette_patr_avg - national_stats.ratiodette_patr_avg) / national_stats.ratiodette_patr_avg
        risk_stats['ratiodette_avg_to_ntl_avg'] = (risk_stats.ratiodette_avg - national_stats.ratiodette_avg) / national_stats.ratiodette_avg
        risk_stats['apart_autr_avg_to_ntl_avg'] = (risk_stats.apart_autr_avg - national_stats.apart_autr_avg) / national_stats.apart_autr_avg
        risk_stats['apart_cons_avg_to_ntl_avg'] = (risk_stats.apart_cons_avg - national_stats.apart_cons_avg) / national_stats.apart_cons_avg
        risk_stats['apart_cumcons_avg_to_ntl_avg'] = (risk_stats.apart_cumcons_avg - national_stats.apart_cumcons_avg) / national_stats.apart_cumcons_avg
        risk_stats['paydex_avg_to_ntl_avg'] = (risk_stats.paydex_avg - national_stats.paydex_avg) / national_stats.paydex_avg
        risk_stats['taux_endettement_avg_to_ntl_avg'] = (risk_stats.taux_endettement_avg - national_stats.taux_endettement_avg) / national_stats.taux_endettement_avg
                
        return risk_stats

In [11]:
reg_risk_stats = aggregate_stats(geo_attr="region", outcome_attr="alert_flag", siret_per_ape3_min=0, siret_per_naf_min=0)
reg_riskbin_stats = aggregate_stats(geo_attr="region", outcome_attr="alert_bin", siret_per_ape3_min=0, siret_per_naf_min=0)
reg_fail_stats = aggregate_stats(geo_attr="region", outcome_attr="failure", siret_per_ape3_min=0, siret_per_naf_min=0)

dpt_risk_stats = aggregate_stats(geo_attr="departement", outcome_attr="alert_flag", siret_per_ape3_min=0, siret_per_naf_min=0)
dpt_riskbin_stats = aggregate_stats(geo_attr="departement", outcome_attr="alert_bin", siret_per_ape3_min=0, siret_per_naf_min=0)
dpt_fail_stats = aggregate_stats(geo_attr="departement", outcome_attr="failure", siret_per_ape3_min=0, siret_per_naf_min=0)

In [12]:
reg_risque_rouge = reg_risk_stats[reg_risk_stats.index.get_level_values("alert_flag")==2]
reg_risque_orange = reg_risk_stats[reg_risk_stats.index.get_level_values("alert_flag")==1]
reg_risque_orange_ou_rouge = reg_riskbin_stats[reg_riskbin_stats.index.get_level_values("alert_bin")==True]
reg_risque_vert = reg_riskbin_stats[reg_riskbin_stats.index.get_level_values("alert_bin")==False]
reg_fail = reg_fail_stats[reg_fail_stats.index.get_level_values("failure")==True]
reg_nofail = reg_fail_stats[reg_fail_stats.index.get_level_values("failure")==False]

dpt_risque_rouge = dpt_risk_stats[dpt_risk_stats.index.get_level_values("alert_flag")==2]
dpt_risque_orange = dpt_risk_stats[dpt_risk_stats.index.get_level_values("alert_flag")==1]
dpt_risque_orange_ou_rouge = dpt_riskbin_stats[dpt_riskbin_stats.index.get_level_values("alert_bin")==True]
dpt_risque_vert = dpt_riskbin_stats[dpt_riskbin_stats.index.get_level_values("alert_bin")==False]
dpt_fail = dpt_fail_stats[dpt_fail_stats.index.get_level_values("failure")==True]
dpt_nofail = dpt_fail_stats[dpt_fail_stats.index.get_level_values("failure")==False]

In [13]:
reg_risk_outpath_root = "/home/simon.lebastard/predictsignauxfaibles/predictsignauxfaibles/notebooks/exports/reg_2103"
dpt_risk_outpath_root = "/home/simon.lebastard/predictsignauxfaibles/predictsignauxfaibles/notebooks/exports/dpt_2103"

reg_risk_stats.to_csv(f"{reg_risk_outpath_root}_riskflag_all.csv")
dpt_risk_stats.to_csv(f"{dpt_risk_outpath_root}_riskflag_all.csv")
reg_riskbin_stats.to_csv(f"{reg_risk_outpath_root}_riskbin_all.csv")
dpt_riskbin_stats.to_csv(f"{dpt_risk_outpath_root}_riskbin_all.csv")
reg_fail_stats.to_csv(f"{reg_risk_outpath_root}_failures_all.csv")
dpt_fail_stats.to_csv(f"{dpt_risk_outpath_root}_failures_all.csv")

reg_risque_rouge.to_csv(f"{reg_risk_outpath_root}_risque_rouge.csv")
reg_risque_orange.to_csv(f"{reg_risk_outpath_root}_risque_orange.csv")
reg_risque_orange_ou_rouge.to_csv(f"{reg_risk_outpath_root}_risque_orange_ou_rouge.csv")
reg_risque_vert.to_csv(f"{reg_risk_outpath_root}_risque_vert.csv")
reg_fail.to_csv(f"{reg_risk_outpath_root}_defaillance.csv")
reg_nofail.to_csv(f"{reg_risk_outpath_root}_sans_defaillance.csv")

dpt_risque_rouge.to_csv(f"{dpt_risk_outpath_root}_risque_rouge.csv")
dpt_risque_orange.to_csv(f"{dpt_risk_outpath_root}_risque_orange.csv")
dpt_risque_orange_ou_rouge.to_csv(f"{dpt_risk_outpath_root}_risque_orange_ou_rouge.csv")
dpt_risque_vert.to_csv(f"{dpt_risk_outpath_root}_risque_vert.csv")
dpt_fail.to_csv(f"{dpt_risk_outpath_root}_defaillance.csv")
dpt_nofail.to_csv(f"{dpt_risk_outpath_root}_sans_defaillance.csv")