# Vorhersagen mechanischer Fehlfunktion
## Einleitung
Die vorausschauende Wartung konzentriert sich auf die Vorhersage des Zeitpunkts eines
Gerätefehlers und die Verhinderung dieses Fehlers mithilfe der Wartungsüberwachung. So kann die
Wartung geplant werden, bevor sich ein Problem manifestiert. Eines der besonderen Merkmale der
vorausschauenden Wartung ist, dass die Wartungshäufigkeit ein absolutes Minimum darstellt und
dazu beiträgt, ungeplante reaktive Wartungsarbeiten und die mit der vorbeugenden Wartung
verbundenen Kosten zu vermeiden.

## Problemstellung
In diesem Beispeil, werden Sie anhand der gemessenen Daten der Windrädern vorhersagen, ob das Gerät
in der nächsten Woche ausfallen wird (wie unteres Bild gezeigt). Die Daten wurden durch ein Supervisory
Control And Data Acquisition (SCADA) System gesammelt. Um vielfältige Informationen zu sammeln, z.B.
Umgebungsinformationen (Temperatur, Feuchtigkeit), Gerätestatusinformationen (Strom,
Spannung, Vibration) und Parameterinformationen, wurden 75 Sensoren an den Geräten montiert.


Alle gesammelten Informationen sind in der folgenden Tabelle aufgeführt.

| Sensor Nr | Information | Sensor Nr | Information   | Sensor Nr | Information      |
|----------|:-----------------:|------------------:|------------------:|-----------------|------------------------------------|
| 1  |  Wheel speed   |               26 | Inverter inlet temperature |      51       |  Pitch motor 1 power estimation          |
| 2  |  hub angle     |               27 | inverter outlet temperature             |       52       |  Pitch motor 2 power estimation         |
| 3  |  blade 1 angle  |               28 | inverter inlet pressure |       53       |  Pitch motor 3 power estimation          |
| 4  |  blade 2 angle        |               29 | inverter outlet pressure             |    54          |   Fan current status value         |
| 5  |  blade 3 angle        |               30 | generator power limit value |     55        |     hub current status value       |
| 6  |  pitch motor 1 current        |               31 | reactive power set value             |     56         |   yaw state value         |
| 7  |  pitch motor 2 current        |               32 | Rated hub speed |      57        |    yaw request value        |
| 8  |  Pitch motor 3 current        |               33 | wind tower ambient temperature       |        58      |   blade 1 battery box temperature         |
| 9  |  overspeed sensor speed detection value |               34 | generator stator temperature 1 |        59      |  blade 2 battery box temperature          |
| 10 |  5 second yaw against wind average      |               35 | generator stator temperature 2             |      60        |   blade 3 battery box temperature         |
| 11 |  x direction vibration value   |               36 | generator stator temperature 3 |      61       |   vane 1 pitch motor temperature         |
| 12 |  y direction vibration value   |               37 | generator stator temperature 4             |      62        |  blade 2 pitch motor temperature          |
| 13 |  hydraulic brake pressure      |               38 | generator stator temperature 5 |      63        |     blade 3 pitch motor temperature       |
| 14 |  Aircraft weather station wind speed      |               39 | generator stator temperature 6             |      64        |    blade 1 inverter box temperature        |
| 15 |  wind direction absolute value        |               40 | generator air temperature 1 |      65        |    blade 2 inverter box temperature        |
| 16 |  atmospheric pressure        |               41 | generator air temperature 2             |      66        |    blade 3 inverter box temperature        |
| 17 |  reactive power control status        |               42 | main bearing temperature 1 |       67       |   blade 1 super capacitor voltage         |
| 18 |  inverter grid side current        |               43 | main bearing temperature 2             |      68        |    blade 2 super capacitor voltage        |
| 19 |  inverter grid side voltage        |               44 | Wheel temperature |      69        |    blade 3 super capacitor voltage        |
| 20 |  Inverter grid side active power        |               45 | Wheel control cabinet temperature             |      70       |   drive 1 thyristor temperature         |
| 21 |  inverter grid side reactive power        |               46 | Cabin temperature |      71       |   Drive 2 thyristor temperature         |
| 22 |  inverter generator side power        |               47 | Cabin control cabinet temperature             |      72        |            | Drive 3 thyristor temperature
| 23 |  generator operating frequency        |               48 | Inverter INU temperature|      73        |  Drive 1 output torque          |
| 24 |  generator current'        |               49 | Inverter ISU temperature             |      74        |    Drive 2 output torque        |
| 25 |  generator torque        |               50 | Inverter INU RMIO temperature             |      75        |     Drive 3 output torque       |

## Datenbeschreibung
Die Trainingsdatensätze werden in jeweils einer CSV-Datei gespeichert. Jede CSV-Datei enthält innerhalb von 4500 Minuten abgetastete Informationen. Das Label jeder CSV gibt an, ob der windrad innerhalb der nächsten Woche eine Fehlfunktion aufweist. Entsprechende Labelsinformationen finden Sie in Datei train_label.csv. Das Inhaltsformat von train_label.csv ist wie folgt:

| ID                                         | Label | 
|----------|:-----------------|
| 01725e06-98ea-3447-83c0-b3aa70feff62.csv   |       0        |   
| 02c2cada-dbbe-304b-95b2-076ddba766c9.csv   |        1       |     

**0**: Das entsprechende Gerät ist innerhalb der nächsten Woche nicht ausgefallen

**1**: Das entsprechende Gerät ist innerhalb der nächsten Woche ausgefallen

# Import

In [1]:
import warnings
warnings.filterwarnings("ignore")
import os
import pandas as pd

from tqdm import tqdm
import numpy as np
from sklearn.metrics import f1_score
from sklearn.neighbors import LocalOutlierFactor

# Parameter optimization
from skopt.space import Integer, Real, Categorical, Identity
from skopt.utils import use_named_args
from skopt import gp_minimize
from skopt.plots import plot_convergence

# Model
from sklearn import svm
from sklearn.model_selection import cross_val_score

import matplotlib.pyplot as plt

import seaborn as sns

from operator import itemgetter
import itertools
import multiprocessing
from joblib import Parallel, delayed

from collections import Counter

# Globale Variabeln

In [2]:
TRAIN_LABEL_PATH = os.path.abspath("data/train_label.csv")
TEST_LABEL_PATH = os.path.abspath("data/test_label.csv")
TRAIN_PATH = os.path.abspath("data/train/")
TEST_PATH = os.path.abspath("data/test/")

In [3]:
train_label = pd.read_csv(TRAIN_LABEL_PATH)

folder_list = os.listdir(TRAIN_PATH)

In [4]:
def test_duplicate(df: pd.DataFrame):
    sensors = df.columns
    dup_sensors = []
    dup_sensors_names = []
    zero_value_sensors = []
    for i in range(sensors.size):
        if np.sum(df[sensors[i]]) == 0:
            zero_value_sensors.append(str(i + 1).zfill(2))
        for j in range(i + 1, sensors.size):
            res = df[sensors[i]] - df[sensors[j]]
            u = np.unique(res)
            if (len(u) == 1) and (u[0] == 0):
                dup_sensors.append(str(i + 1).zfill(2) + "_" + str(j + 1).zfill(2))
                dup_sensors_names.append((sensors[i], sensors[j]))
    return dup_sensors, dup_sensors_names, zero_value_sensors


In [5]:
def search_outlier(df: pd.DataFrame, dev_factor: float = 3.):
    sensors = df.columns
    ol_list = []
    for i in range(sensors.size):
        data = df[sensors[i]]
        boolL = [np.abs(data - data.mean()) > (dev_factor * data.std())]
        ol = np.where(boolL)[1].tolist()
        ol_list.append(ol)
    return ol_list

## Testen mit einer Datei

In [6]:
Data_frame = pd.read_csv("data/train/004/0000f25f-4d58-3eee-bbc3-5c7b7759ee66.csv")
sensors = Data_frame.columns
Data_frame.head()

Unnamed: 0,Wheel speed,hub angle,blade 1 angle,blade 2 angle,blade 3 angle,pitch motor 1 current,pitch motor 2 current,Pitch motor 3 current,overspeed sensor speed detection value,5 second yaw against wind average,...,blade 3 inverter box temperature,blade 1 super capacitor voltage,blade 2 super capacitor voltage,blade 3 super capacitor voltage,drive 1 thyristor temperature,Drive 2 thyristor temperature,Drive 3 thyristor temperature,Drive 1 output torque,Drive 2 output torque,Drive 3 output torque
0,0.14,47.74,21.0,21.0,21.0,2.12,2.16,3.08,0.16,-59.4,...,300.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,1.08,159.52,21.0,21.0,21.0,2.62,1.7,3.14,1.12,-69.8,...,300.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,1.3,241.24,21.0,21.0,21.0,2.26,1.5,3.02,1.35,31.4,...,300.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.68,355.5,21.0,21.0,21.0,2.54,2.42,3.22,0.74,-1.8,...,300.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.51,155.02,21.0,21.0,21.0,2.44,1.58,2.92,0.52,-41.2,...,300.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [7]:
ol = search_outlier(Data_frame)

outlier = list(itertools.chain.from_iterable(ol))
counts = Counter(outlier)
df = pd.DataFrame.from_dict(counts, orient="index",columns=["Anzahl"])
df.index.name("Zeitpunkt")
df.plot(kind='bar',figsize=(25,10),fontsize=35)
plt.show()

TypeError: 'NoneType' object is not callable

In [None]:
ds, dn, zs = test_duplicate(Data_frame)
print("Dopplungen: ")
print(ds)
print("Sensoren mit nur 0 als Wert: ")
print(zs)

Dopplungen: 
['03_04', '03_05', '04_05', '16_17', '18_20', '18_21', '18_22', '18_25', '18_31', '18_51', '18_52', '18_53', '18_67', '18_68', '18_69', '18_70', '18_71', '18_72', '18_73', '18_74', '18_75', '20_21', '20_22', '20_25', '20_31', '20_51', '20_52', '20_53', '20_67', '20_68', '20_69', '20_70', '20_71', '20_72', '20_73', '20_74', '20_75', '21_22', '21_25', '21_31', '21_51', '21_52', '21_53', '21_67', '21_68', '21_69', '21_70', '21_71', '21_72', '21_73', '21_74', '21_75', '22_25', '22_31', '22_51', '22_52', '22_53', '22_67', '22_68', '22_69', '22_70', '22_71', '22_72', '22_73', '22_74', '22_75', '25_31', '25_51', '25_52', '25_53', '25_67', '25_68', '25_69', '25_70', '25_71', '25_72', '25_73', '25_74', '25_75', '31_51', '31_52', '31_53', '31_67', '31_68', '31_69', '31_70', '31_71', '31_72', '31_73', '31_74', '31_75', '51_52', '51_53', '51_67', '51_68', '51_69', '51_70', '51_71', '51_72', '51_73', '51_74', '51_75', '52_53', '52_67', '52_68', '52_69', '52_70', '52_71', '52_72', '52_7

## Kompletter Trainingsdatensatz

In [None]:
analyse_list = []
num_cores = multiprocessing.cpu_count()

def analyse_file(folder_path, file_name):
    df = pd.read_csv(os.path.join(folder_path,file_name))
    ol = search_outlier(df)
    dup, dup_n, zvs = test_duplicate(df)
    label = train_label.loc[train_label["file_name"]==file_name]["ret"].values[0]
    return file_name, ol, dup, dup_n, zvs, label

def analyse_folder(folder_path):
    files = os.listdir(folder_path)
    processed_list = Parallel(n_jobs=num_cores)(delayed(analyse_file)(folder_path,f) for f in files)
    dup_list = [ele[2] for ele in processed_list]
    return set(itertools.chain.from_iterable(dup_list)), dup_list, processed_list

for folder in tqdm(folder_list,desc="data"):
    folder_path = os.path.join(TRAIN_PATH, folder)
    dup_set, dup_list, file_list = analyse_folder(folder_path)
    for d in range(len(dup_list)):
        dup_set = dup_set.intersection(dup_list[d])
    dup_filtered = list(dup_set)
    dup_filtered.sort(key=itemgetter(0))
    analyse_list.append((folder,file_list, dup_filtered))

data: 100%|██████████| 17/17 [20:12<00:00, 71.34s/it]


# Auswertung

## Sensoren mit nur 0 als Wert

In [None]:
%matplotlib inline

for item in analyse_list:  
    print(item[0])
    zl0 = []
    zl1 = []
    for fl in item[1]:
        #print(fl[4])
        if fl[5] == 0:
            zl0.extend(fl[4])
        else:
            zl1.extend(fl[4])
    zl0.sort()
    zl1.sort()
    counts0 = Counter(zl0)
    df0 = pd.DataFrame.from_dict(counts0, orient="index",columns=["Label_0"])
    counts1 = Counter(zl1)
    df1 = pd.DataFrame.from_dict(counts1, orient="index",columns=["Label_1"])
    df = pd.merge(df0, df1, how="outer", left_index=True, right_index=True)
    df.index.name = "Sensor Nummer"
    #cutoff = len(item[1]) * 0.75
    #df = df[df["Anzahl"] > cutoff]
    #print(df.index.values)
    #print(df.head())
    df.plot(kind='bar',figsize=(50,15),fontsize=35)
    plt.show()

NameError: name 'analyse_list' is not defined

## Identische Werte zwischen Sensoren

In [None]:
%matplotlib inline

for item in analyse_list:  
    print(item[0])
    dl0 = []
    dl1 = []
    c0 = 0
    c1 = 0
    for fl in item[1]:
        #print(fl[4])
        if fl[5] == 0:
            dl0.extend(fl[2])
            c0 += 1
        else:
            dl1.extend(fl[2])
            c1 += 1
    #dl = [fl[2] for fl in item[1]]
    #dl0 = list(itertools.chain.from_iterable(dl0))   
    counts0 = Counter(dl0)
    df0 = pd.DataFrame.from_dict(counts0, orient="index",columns=["Label_0"])
    #dl1 = list(itertools.chain.from_iterable(dl1))
    counts1 = Counter(dl1)
    df1 = pd.DataFrame.from_dict(counts1, orient="index",columns=["Label_1"])
    df = pd.merge(df0, df1, how="outer", left_index=True, right_index=True)
    df.index.name = "Sensor Nummer-Kombination"
    cutoff0 = c0 * 0.75
    cutoff1 = c1 * 0.75
    con0 = df["Label_0"] > cutoff0
    con1 = df["Label_1"] > cutoff1
    df = df[con0 | con1]
    print(df.index.values)
    #print(df.head())
    df.plot(kind='bar',figsize=(150,30),fontsize=35)
    save_path = os.path.join("data", "plots", item[0])
    plt.title(item[0])
    plt.savefig(save_path)
    plt.show()

# Diagramme der Auswertung ohne Aufteilung

In [None]:
%matplotlib inline

for item in analyse_list:  
    print(item[0])
    zl = [fl[4] for fl in item[1]]
    zl = list(itertools.chain.from_iterable(zl))
    zl.sort()
    counts = Counter(zl)
    df = pd.DataFrame.from_dict(counts, orient="index",columns=["Anzahl"])
    df.index.name = "Sensor Nummer"
    #cutoff = len(item[1]) * 0.75
    #df = df[df["Anzahl"] > cutoff]
    #print(df.index.values)
    #print(df.head())
    df.plot(kind='bar',figsize=(50,15),fontsize=35)
    plt.show()

In [None]:
%matplotlib inline

for item in analyse_list:  
    print(item[0])
    dl = [fl[2] for fl in item[1]]
    dl = list(itertools.chain.from_iterable(dl))
    counts = Counter(dl)
    df = pd.DataFrame.from_dict(counts, orient="index",columns=["Anzahl"])
    df.index.name = "Sensor Nummer-Kombination"
    cutoff = len(item[1]) * 0.75
    df = df[df["Anzahl"] > cutoff]
    print(df.index.values)
    #print(df.head())
    df.plot(kind='bar',figsize=(50,15),fontsize=35)
    plt.show()

    