# Overview
This notebook serves as a user guide and examples for the `wtphm` package.

More will be added to it.

In [2]:
cd C:\Users\leahy\Google Drive\UCC\PhD\Code\modules\wtphm

C:\Users\leahy\Google Drive\UCC\PhD\Code\modules\wtphm


# Importing Libraries and Data
The data used in this tutorial is available in this repo in the examples folder. Note that it is 2 months' of real data for 2 turbines, but has been fully anonymised. For the ``events``, all codes have been mapped to a random set of numbers, and descriptions have been removed. For the ``scada`` data, all values (except the availability counters) have been normalised between 0 and 1.

In [4]:
import wtphm
import pandas as pd

# to fully display the dataframes:
pd.set_option('display.max_colwidth', 300)

events = pd.read_csv('examples/event_data.csv',
                     parse_dates=['time_on', 'time_off'])
events.duration = pd.to_timedelta(events.duration)
scada  = pd.read_csv('examples/scada_data.csv',
                     parse_dates=['time'])
events.head()

Unnamed: 0,turbine_num,code,time_on,time_off,duration,stop_cat,description
0,22,9,2015-11-01 00:03:56,2015-11-01 00:23:56,0 days 00:20:00,ok,description anonymised
1,21,93,2015-11-01 00:09:54,2015-11-01 00:10:56,0 days 00:01:02,ok,description anonymised
2,21,97,2015-11-01 00:10:56,2015-11-01 00:37:39,0 days 00:26:43,ok,description anonymised
3,22,165,2015-11-01 00:16:39,2015-11-06 05:03:35,5 days 04:46:56,ok,description anonymised
4,22,93,2015-11-01 00:23:56,2015-11-01 00:24:58,0 days 00:01:02,ok,description anonymised


In [5]:
scada.head()

Unnamed: 0,time,turbine_num,wind_speed,kw,wind_speed_sd,wind_speed_max,torque_actual_value,blade_1_actual_angle,blade_2_actual_angle,blade_3_actual_angle,...,sot,dt,lot,wot,est,mt,rt,eect,num_48h,dur_48h
0,2015-11-01 00:00:00,22,0.148473,0.009655,0.064693,0.110283,0.025785,0.458179,0.458418,0.036115,...,600.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,13
1,2015-11-01 00:10:00,22,0.125081,0.004962,0.066886,0.084016,0.020163,0.466428,0.465187,0.050519,...,600.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,13
2,2015-11-01 00:20:00,22,0.121183,0.004913,0.060307,0.086624,0.020841,0.473221,0.470761,0.062381,...,600.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,13
3,2015-11-01 00:30:00,22,0.137752,0.004454,0.067982,0.104322,0.020841,0.62892,0.598517,0.334251,...,600.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,13
4,2015-11-01 00:40:00,22,0.17154,0.040889,0.066886,0.113077,0.075126,0.460969,0.460708,0.040987,...,600.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,13


# Grouping together similar events

The `batch.get_grouped_event_data` function groups together similar faults or events, and turns them into the "same" event.

An example would be faults across different pitch motors on different turbine blades being grouped as the same type of fault. This is useful, as there typically _very_ few fault samples on wind turbines, so treating these as three separate types of faults would give even fewer samples for each class.

In [6]:
# codes that cause the turbine to come to a stop
stop_codes = events[
    (events.stop_cat.isin(['maintenance', 'test', 'sensor', 'grid'])) |
    (events.stop_cat.str.contains('fault'))].code.unique()
# each of these lists represents a set of pitch-related events, where each
# memeber of the set represents the same event but along a different blade axis
pitch_code_groups = [[300, 301, 302], [400, 401], [500, 501, 502], [600, 601],
                     [700, 701, 702]]
events[events.code.isin([i for s in pitch_code_groups for i in s])].head()

Unnamed: 0,turbine_num,code,time_on,time_off,duration,stop_cat,description
112,22,502,2015-11-01 21:04:26,2015-11-01 21:04:36,00:00:10,fault_pt,description anonymised pitch axis 3
114,22,601,2015-11-01 21:04:28,2015-11-01 21:04:36,00:00:08,fault_pt,description anonymised pitch axis 2
119,22,601,2015-11-01 21:04:36,2015-11-01 21:04:36,00:00:00,fault_pt,description anonymised pitch axis 2
131,22,600,2015-11-01 21:04:36,2015-11-01 21:04:36,00:00:00,fault_pt,description anonymised pitch axis 1
132,22,600,2015-11-01 21:04:36,2015-11-01 21:04:36,00:00:00,fault_pt,description anonymised pitch axis 1


As can be seen, the events data has a number of different codes for data along different pitch axes.

Below, we group these together as the same code:

In [7]:
# group the data
grouped_events, grouped_stop_codes = wtphm.batch.get_grouped_event_data(
    event_data=events, code_groups=pitch_code_groups,
    fault_codes=stop_codes)

grouped_events[grouped_events.code.isin(
    [i for s in pitch_code_groups for i in s])].head()

Unnamed: 0,turbine_num,code,time_on,time_off,duration,stop_cat,description
112,22,500,2015-11-01 21:04:26,2015-11-01 21:04:36,00:00:10,fault_pt,description anonymised pitch axis 1/2/3 (original codes 500/501/502)
114,22,600,2015-11-01 21:04:28,2015-11-01 21:04:36,00:00:08,fault_pt,description anonymised pitch axis 1/2 (original codes 600/601)
136,22,600,2015-11-01 21:04:36,2015-11-01 21:04:36,00:00:00,fault_pt,description anonymised pitch axis 1/2 (original codes 600/601)
135,22,600,2015-11-01 21:04:36,2015-11-01 21:04:36,00:00:00,fault_pt,description anonymised pitch axis 1/2 (original codes 600/601)
134,22,600,2015-11-01 21:04:36,2015-11-01 21:04:36,00:00:00,fault_pt,description anonymised pitch axis 1/2 (original codes 600/601)


# Identifying events associated with a stoppage

Now, we get the batches. These represent groups of events all linked to the same down-time event, as described in [1]. As always, for more information, view the function documentation.

In [8]:
# create the batches
batches = wtphm.batch.get_batch_data(
    event_data=grouped_events, fault_codes=grouped_stop_codes, ok_code=207,
    t_sep_lim='1 hours')

batches.head()

Unnamed: 0,turbine_num,fault_root_codes,all_root_codes,start_time,fault_end_time,down_end_time,fault_dur,down_dur,fault_event_ids,all_event_ids
0,22,"(144, 500)","(144, 500)",2015-11-01 21:04:26,2015-11-01 21:17:26,2015-11-01 21:21:32,00:13:00,00:17:06,"Int64Index([112, 113, 114, 116, 115, 117, 127, 122, 120, 119, 126, 130, 128,  133, 132, 123, 134, 135, 136, 129, 131, 137, 138, 139, 142, 143,  144, 145, 146, 151, 150, 148, 149, 155, 154, 153, 161, 162, 166,  168, 170, 171, 172],  dtype='int64')","Int64Index([112, 113, 114, 116, 115, 117, 118, 129, 136, 135, 134, 133, 132,  131, 128, 130, 126, 119, 120, 121, 122, 127, 124, 125, 123, 137,  138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 152, 151, 150,  148, 149, 153, 154, 155, 156, 157, 158, 159, 160, 161..."
1,22,"(68, 113, 144, 500)","(68, 113, 144, 500)",2015-11-05 14:24:09,2015-11-05 14:25:28,2015-11-05 14:46:18,00:01:19,00:22:09,"Int64Index([456, 457, 459, 458, 460, 461, 462, 467, 470, 464, 465, 466, 469,  468, 472, 473, 474, 475, 476, 477, 478, 479, 482, 484, 481],  dtype='int64')","Int64Index([459, 458, 457, 456, 460, 461, 462, 463, 476, 475, 474, 473, 472,  471, 468, 469, 467, 466, 465, 464, 470, 477, 478, 479, 481, 482,  483, 484, 486, 487, 488, 489, 491, 490],  dtype='int64')"
2,22,"(53,)","(53,)",2015-11-06 20:32:16,2015-11-06 20:32:16,2015-11-06 20:32:21,00:00:00,00:00:05,"Int64Index([545], dtype='int64')","Int64Index([545, 547, 546], dtype='int64')"
3,22,"(68, 113, 144, 500)","(68, 113, 144, 500)",2015-11-07 10:01:49,2015-11-07 10:03:03,2015-11-07 10:07:08,00:01:14,00:05:19,"Int64Index([660, 661, 658, 659, 664, 663, 662, 675, 674, 673, 672, 671, 669,  668, 666, 670, 678, 676, 679, 680, 677, 683, 684, 685, 686],  dtype='int64')","Int64Index([658, 659, 660, 661, 662, 663, 664, 665, 675, 674, 673, 672, 671,  667, 669, 668, 666, 670, 680, 679, 677, 676, 678, 682, 683, 684,  685, 686, 687, 688, 689, 690, 691, 692, 693, 694, 695],  dtype='int64')"
4,22,"(144, 500)","(144, 500)",2015-11-13 23:37:21,2015-11-13 23:39:08,2015-11-13 23:59:31,00:01:47,00:22:10,"Int64Index([1182, 1183, 1185, 1184, 1189, 1186, 1187, 1190, 1192, 1193, 1194,  1195, 1196, 1197, 1198, 1199, 1200, 1201, 1202, 1203, 1205, 1206,  1204],  dtype='int64')","Int64Index([1182, 1183, 1184, 1185, 1186, 1187, 1188, 1189, 1190, 1191, 1203,  1201, 1200, 1199, 1198, 1202, 1196, 1195, 1194, 1193, 1197, 1192,  1204, 1205, 1206, 1207, 1208, 1209, 1210, 1211, 1212, 1213],  dtype='int64')"


# Assigning high-level root cause to the stoppage
The library can also assign high-level root causes to each stoppage, as shown below.

The ``get_batch_stop_cats`` function uses the ``get_root_cats``, ``get_root_cat_counts``, ``get_most_common_cats``, ``get_cat_all_ids``, ``get_cat_present_ids`` and ``get_counter_active_ids`` functions to assign stop categories.

The internal logic is identical to that described in [1].

In [10]:
batches = wtphm.batch.get_batch_stop_cats(
    batch_data=batches, event_data=events, scada_data=scada, grid_col='lot',
    maint_col='mt', rep_col='rt')

batches.head()

Unnamed: 0,turbine_num,fault_root_codes,all_root_codes,start_time,fault_end_time,down_end_time,fault_dur,down_dur,fault_event_ids,all_event_ids,batch_cat,repair
0,22,"(144, 500)","(144, 500)",2015-11-01 21:04:26,2015-11-01 21:17:26,2015-11-01 21:21:32,00:13:00,00:17:06,"Int64Index([112, 113, 114, 116, 115, 117, 127, 122, 120, 119, 126, 130, 128,  133, 132, 123, 134, 135, 136, 129, 131, 137, 138, 139, 142, 143,  144, 145, 146, 151, 150, 148, 149, 155, 154, 153, 161, 162, 166,  168, 170, 171, 172],  dtype='int64')","Int64Index([112, 113, 114, 116, 115, 117, 118, 129, 136, 135, 134, 133, 132,  131, 128, 130, 126, 119, 120, 121, 122, 127, 124, 125, 123, 137,  138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 152, 151, 150,  148, 149, 153, 154, 155, 156, 157, 158, 159, 160, 161...",fault_pt,False
1,22,"(68, 113, 144, 500)","(68, 113, 144, 500)",2015-11-05 14:24:09,2015-11-05 14:25:28,2015-11-05 14:46:18,00:01:19,00:22:09,"Int64Index([456, 457, 459, 458, 460, 461, 462, 467, 470, 464, 465, 466, 469,  468, 472, 473, 474, 475, 476, 477, 478, 479, 482, 484, 481],  dtype='int64')","Int64Index([459, 458, 457, 456, 460, 461, 462, 463, 476, 475, 474, 473, 472,  471, 468, 469, 467, 466, 465, 464, 470, 477, 478, 479, 481, 482,  483, 484, 486, 487, 488, 489, 491, 490],  dtype='int64')",fault_pt,False
2,22,"(53,)","(53,)",2015-11-06 20:32:16,2015-11-06 20:32:16,2015-11-06 20:32:21,00:00:00,00:00:05,"Int64Index([545], dtype='int64')","Int64Index([545, 547, 546], dtype='int64')",test,False
3,22,"(68, 113, 144, 500)","(68, 113, 144, 500)",2015-11-07 10:01:49,2015-11-07 10:03:03,2015-11-07 10:07:08,00:01:14,00:05:19,"Int64Index([660, 661, 658, 659, 664, 663, 662, 675, 674, 673, 672, 671, 669,  668, 666, 670, 678, 676, 679, 680, 677, 683, 684, 685, 686],  dtype='int64')","Int64Index([658, 659, 660, 661, 662, 663, 664, 665, 675, 674, 673, 672, 671,  667, 669, 668, 666, 670, 680, 679, 677, 676, 678, 682, 683, 684,  685, 686, 687, 688, 689, 690, 691, 692, 693, 694, 695],  dtype='int64')",fault_pt,False
4,22,"(144, 500)","(144, 500)",2015-11-13 23:37:21,2015-11-13 23:39:08,2015-11-13 23:59:31,00:01:47,00:22:10,"Int64Index([1182, 1183, 1185, 1184, 1189, 1186, 1187, 1190, 1192, 1193, 1194,  1195, 1196, 1197, 1198, 1199, 1200, 1201, 1202, 1203, 1205, 1206,  1204],  dtype='int64')","Int64Index([1182, 1183, 1184, 1185, 1186, 1187, 1188, 1189, 1190, 1191, 1203,  1201, 1200, 1199, 1198, 1202, 1196, 1195, 1194, 1193, 1197, 1192,  1204, 1205, 1206, 1207, 1208, 1209, 1210, 1211, 1212, 1213],  dtype='int64')",fault_pt,False


# Plots

In [None]:
durations = batches[(batches.start_time >= 'nov 2015') &
                    (batches.batch_cat != 'msf') &
                    (batches.start_time <= 'apr 20 2016')].groupby(
    'batch_cat').down_dur.sum().reset_index().sort_values(by='down_dur')
durations.down_dur = durations.down_dur.apply(
    lambda x: x / np.timedelta64(1, 'h'))

durations.loc[durations.batch_cat == 'test', 'batch_cat'] = 'no'
durations.loc[durations.batch_cat == 'azimuth', 'batch_cat'] = 'az'
durations.loc[durations.batch_cat == 'brake', 'batch_cat'] = 'bk'
durations.loc[durations.batch_cat == 'gearbox', 'batch_cat'] = 'gb'
durations.loc[durations.batch_cat == 'grid', 'batch_cat'] = 'gd'
durations.loc[durations.batch_cat == 'generator', 'batch_cat'] = 'gn'
durations.loc[durations.batch_cat == 'maintenance', 'batch_cat'] = 'ma'
durations.loc[durations.batch_cat == 'pitch', 'batch_cat'] = 'pt'
durations.loc[durations.batch_cat == 'sensor', 'batch_cat'] = 'sn'
durations.loc[durations.batch_cat == 'tower', 'batch_cat'] = 'to'
durations.loc[durations.batch_cat == 'battery', 'batch_cat'] = 'ba'

sns.set(font_scale=1.2)
sns.set_style('white')

fig, ax = plt.subplots(figsize=(6,4))
sns.barplot(data=durations, x='batch_cat', y='down_dur', ax=ax,
            color=sns.color_palette()[0])
ax.set(xlabel='Stop Category', ylabel='Total Downtime (hrs)')
ax.yaxis.grid()

# Labelling the SCADA data - times leading up to a fault

To label SCADA data for classification purposes, we use the `classification.scada_labelling.label_stoppages` function.

## Labelling times leading up to a fault
We can label the data leading up to stoppages as "pre-fault" data. This can be done in a "pre-fault-window", e.g. between 48 hours before the fault and 2 hours before the fault, or between one hour before the fault and as the fault happens.

In the following, we identify downtimes where the turbine was down due to pitch system faults, for more than an hour, and where repairs were carried out.

Then we label the SCADA the scada data for the 48h leading up to these faults as such. We also remove the actual entries corresponding to the faults from the data.

In [11]:
# get batches related to pitch system faults that lasted more than 2 hours and
# where the repair counter was active
pitch_batches = batches[
    (batches.down_dur >= '2 hours') & (batches.batch_cat == 'pitch') &
    (batches.repair == True)
]

scada_pre = wtphm.classification.label_stoppages(
        scada_data=scada, fault_batches=pitch_batches,
        pre_stop_lims=['48 hours', '6 hours'])

Note that, we can see the corresponding batch_id for the scada data returned with the new column, batch_id.
A value of -1 means that there is no corresponding fault batch.

See also that there is a new column, pre_stop.

In [10]:
scada_pre.head()

Unnamed: 0,time,turbine_num,wind_speed,kw,wind_speed_sd,wind_speed_max,torque_actual_value,blade_1_actual_angle,blade_2_actual_angle,blade_3_actual_angle,...,wot,est,mt,rt,eect,num_48h,dur_48h,stoppage,batch_id,pre_stop
0,2015-11-01 00:00:00,22,0.148473,0.009655,0.064693,0.110283,0.025785,0.458179,0.458418,0.036115,...,0.0,0.0,0.0,0.0,0.0,2.0,13,0,-1,0
1,2015-11-01 00:10:00,22,0.125081,0.004962,0.066886,0.084016,0.020163,0.466428,0.465187,0.050519,...,0.0,0.0,0.0,0.0,0.0,2.0,13,0,-1,0
2,2015-11-01 00:20:00,22,0.121183,0.004913,0.060307,0.086624,0.020841,0.473221,0.470761,0.062381,...,0.0,0.0,0.0,0.0,0.0,2.0,13,0,-1,0
3,2015-11-01 00:30:00,22,0.137752,0.004454,0.067982,0.104322,0.020841,0.62892,0.598517,0.334251,...,0.0,0.0,0.0,0.0,0.0,2.0,13,0,-1,0
4,2015-11-01 00:40:00,22,0.17154,0.040889,0.066886,0.113077,0.075126,0.460969,0.460708,0.040987,...,0.0,0.0,0.0,0.0,0.0,2.0,13,0,-1,0


# References
[1] *Leahy, Kevin, et al. “A robust prescriptive framework and performance metric for diagnosing and predicting wind turbine faults based on SCADA and alarms data with case study.” Energies 11.7 (2018): 1738.*