# Overview
This notebook serves as a user guide and examples for the `wtphm` package.

More will be added to it.

# Grouping together similar events

The `get_grouped_event_data` function groups together similar faults or events, and turns them into the "same" event.

An example would be faults across different pitch motors on different turbine blades being grouped as the same type of fault. This is useful, as there typically _very_ fault samples on wind turbines, so treating these as three separate types of faults would give even fewer samples for each class.

In [18]:
import wtphm
import pandas as pd

# to fully display the dataframes:
pd.set_option('display.max_colwidth', 100)

events = pd.read_csv('examples/events_data.csv',
                     parse_dates=['time_on', 'time_off'])
events.duration = pd.to_timedelta(events.duration)

events.head()

Unnamed: 0,turbine_num,code,time_on,time_off,duration,stop_cat,description
0,2,9,2015-11-01 00:03:56,2015-11-01 00:23:56,00:20:00,ok,description anonymised
1,15,93,2015-11-01 00:04:45,2015-11-01 00:05:47,00:01:02,ok,description anonymised
2,15,97,2015-11-01 00:05:47,2015-11-01 00:34:37,00:28:50,ok,description anonymised
3,9,93,2015-11-01 00:07:12,2015-11-01 00:08:14,00:01:02,ok,description anonymised
4,11,93,2015-11-01 00:07:46,2015-11-01 00:08:48,00:01:02,ok,description anonymised


In [19]:
events.stop_cat.unique()

array(['ok', 'sensor', 'maintenance', 'fault_pt', 'test', 'fault_fc',
       'grid', 'fault_misc', 'fault_bk', 'fault_az', 'curtailed',
       'fault_tower', 'fault_battery', 'fault_gn', 'fault_gb'],
      dtype=object)

Note the `events` data used in the examples here is anonymised - all codes have been mapped to a random set of numbers, and descriptions have been removed

In [28]:
# codes that cause the turbine to come to a stop
stop_codes = events[(events.stop_cat.isin(
    ['maintenance', 'test', 'sensor', 'grid'])) |
                    (events.stop_cat.str.contains('fault'))]\
    .code.unique()

# these are groups of codes, where each group represents a set of pitch-related
# events, where each memeber of the set represents the same event but along a
# different blade axis
pitch_code_groups = [[300, 301, 302], [400, 401], [500, 501, 502], [600, 601],
                     [700, 701, 702]]

events[events.code.isin([i for s in pitch_code_groups for i in s])].head()

Unnamed: 0,turbine_num,code,time_on,time_off,duration,stop_cat,description
946,2,502,2015-11-01 21:04:26,2015-11-01 21:04:36,00:00:10,fault_pt,description anonymised pitch axis 3
948,2,601,2015-11-01 21:04:28,2015-11-01 21:04:36,00:00:08,fault_pt,description anonymised pitch axis 2
953,2,601,2015-11-01 21:04:36,2015-11-01 21:04:36,00:00:00,fault_pt,description anonymised pitch axis 2
965,2,600,2015-11-01 21:04:36,2015-11-01 21:04:36,00:00:00,fault_pt,description anonymised pitch axis 1
966,2,600,2015-11-01 21:04:36,2015-11-01 21:04:36,00:00:00,fault_pt,description anonymised pitch axis 1


As can be seen, the events data has a number of different codes for data along different pitch axes.

Below, we group these together as the same code:

In [29]:
# group the data
grouped_events, grouped_stop_codes = wtphm.batch.get_grouped_event_data(
    event_data=events, code_groups=pitch_code_groups,
    fault_codes=stop_codes)

grouped_events[grouped_events.code.isin(
    [i for s in pitch_code_groups for i in s])].head()

Unnamed: 0,turbine_num,code,time_on,time_off,duration,stop_cat,description
946,2,500,2015-11-01 21:04:26,2015-11-01 21:04:36,00:00:10,fault_pt,description anonymised pitch axis 1/2/3 (original codes 500/501/502)
948,2,600,2015-11-01 21:04:28,2015-11-01 21:04:36,00:00:08,fault_pt,description anonymised pitch axis 1/2 (original codes 600/601)
970,2,600,2015-11-01 21:04:36,2015-11-01 21:04:36,00:00:00,fault_pt,description anonymised pitch axis 1/2 (original codes 600/601)
969,2,600,2015-11-01 21:04:36,2015-11-01 21:04:36,00:00:00,fault_pt,description anonymised pitch axis 1/2 (original codes 600/601)
968,2,600,2015-11-01 21:04:36,2015-11-01 21:04:36,00:00:00,fault_pt,description anonymised pitch axis 1/2 (original codes 600/601)


# Identifying events associated with a stoppage

Now, we get the batches. These represent groups of events all linked to the same down-time event, as described in [1]. More information can be found in the documentation for the `batch.get_batch_data` function.

In [30]:
# create the batches
batches = wtphm.batch.get_batch_data(
    event_data=grouped_events, fault_codes=grouped_stop_codes, ok_code=207,
    t_sep_lim='1 hours')

batches.head()

Unnamed: 0,turbine_num,fault_start_codes,all_start_codes,start_time,fault_end_time,down_end_time,fault_dur,down_dur,fault_event_ids,all_event_ids
0,2,"(144, 500)","(144, 500)",2015-11-01 21:04:26,2015-11-01 21:17:26,2015-11-01 21:21:32,00:13:00,00:17:06,"Int64Index([ 947, 946, 948, 950, 949, 951, 957, 961, 956, 953, 960,  954, ...","Int64Index([ 947, 946, 948, 949, 950, 951, 952, 963, 970, 969, 968,  967, ..."
1,2,"(68, 113, 144, 500)","(68, 113, 144, 500)",2015-11-05 14:24:09,2015-11-05 14:25:28,2015-11-05 14:46:18,00:01:19,00:22:09,"Int64Index([2564, 2563, 2562, 2561, 2565, 2566, 2567, 2574, 2569, 2570, 2571,  2572, ...","Int64Index([2564, 2563, 2562, 2561, 2565, 2566, 2567, 2568, 2581, 2580, 2579,  2578, ..."
2,2,"(53,)","(53,)",2015-11-06 20:32:16,2015-11-06 20:32:16,2015-11-06 20:32:21,00:00:00,00:00:05,"Int64Index([2995], dtype='int64')","Int64Index([2995, 2996, 2997], dtype='int64')"
3,2,"(68, 113, 144, 500)","(68, 113, 144, 500)",2015-11-07 10:01:49,2015-11-07 10:03:03,2015-11-07 10:07:08,00:01:14,00:05:19,"Int64Index([3330, 3331, 3329, 3328, 3332, 3333, 3334, 3344, 3343, 3342, 3341,  3345, ...","Int64Index([3328, 3329, 3330, 3331, 3332, 3333, 3334, 3335, 3336, 3337, 3338,  3339, ..."
4,2,"(144, 500)","(144, 500)",2015-11-13 23:37:21,2015-11-13 23:39:08,2015-11-13 23:59:31,00:01:47,00:22:10,"Int64Index([5237, 5238, 5239, 5240, 5241, 5242, 5244, 5245, 5252, 5247, 5248,  5249, ...","Int64Index([5237, 5238, 5240, 5239, 5241, 5242, 5243, 5244, 5245, 5246, 5258,  5257, ..."


In [35]:
batches.batch_cat.unique()

array(['pitch', 'test', 'grid', 'azimuth', 'sensor', 'maintenance',
       'tower', 'fc', 'gearbox', 'battery', 'msf', 'brake', 'generator'],
      dtype=object)

# Labelling the SCADA data - times leading up to a fault

To label SCADA data for classification purposes, we use the `classification.scada_labelling.label_stoppages` function.

## Labelling times leading up to a fault
We can label the data leading up to stoppages as "pre-fault" data. This can be done in a "pre-fault-window", e.g. between 48 hours before the fault and 2 hours before the fault, or between one hour before the fault and as the fault happens.

In the following, we identify downtimes where the turbine was down due to pitch system faults, for more than an hour, and where repairs were carried out.

Then we label the SCADA the scada data for the 48h leading up to these faults as such. We also remove the actual entries corresponding to the faults from the data.

In [None]:
# get batches related to pitch system faults that lasted more than 2 hours and
# where the repair counter was active
pitch_batches = batches[
    (batches.down_dur >= '2 hours') & (batches.batch_cat == 'pitch') &
    (batches.repair == True)
]

scada_pre = wtphm.classification.label_stoppages(
        scada_data=scada, fault_batches=pitch_batches,
        pre_stop_lims=['48 hours', '6 hours'])

Note that, we can see the corresponding batch_id for the scada data returned with the new column, batch_id.
A value of -1 means that there is no corresponding fault batch.

See also that there is a new column, pre_stop.

In [None]:
scada_pre.head()

# References
[1] *Leahy, Kevin, et al. “A robust prescriptive framework and performance metric for diagnosing and predicting wind turbine faults based on SCADA and alarms data with case study.” Energies 11.7 (2018): 1738.*