
# Method training

Train a couple of models using a set of different predictors and pre-processing.

The data set is comprised of one year FO flow data from the engines of the cruise ship MS Birka. The system has two flow meters for each engine side, which means 4 engines for each flow meter. On the ship there are legacy volume flow meters which are logged within the machinery logging systemn. The volume flow meter is consistent within a very small margin with the mass flow meter, which is manually recorded by the crew once each day.


In [1]:
# Train a TPOT model

import pandas as pd
import sklearn
import time
import numpy as np
from tpot import TPOTRegressor
from sklearn.model_selection import train_test_split
from sklearn.externals import joblib
#%%

t1 = time.time()
print('Loading database ...')
df = pd.read_hdf('database/all_data_comp.h5','table')
print('Time to load database:', time.time()-t1)
#%%



Loading database ...
Time to load database: 3.813807725906372


In [2]:

# Variable names.
import var_names
d = var_names.d

# Check if variables exist in the dictonary..
# for names in d:
#     if d[names] in list(df):
#         pass
#     else:
#         print('*** VAR MISSING *** ', d[names], ' *** VAR MISSING ***')
#%%

gen = 50
cores=-1 # -1 = use all of them, can be

import train_model as tm
#
# def train_tpot(name,X,y,gen,cores):
#



In [3]:

####
#### Training the first set with only one predictor
####
##
# Features and target for Eng 1/3


test_name = str('eng_13_exh_T_predictor_'+time.strftime('%y%m%d'))
features = [d['ae1_exh_T'],
          d['ae3_exh_T'],
          d['me1_exh_T'],
          d['me3_exh_T'],
          d['fo_booster_13']
          ]

print('Features and predictions for training...:\n')
for n in features:
    print('- ',d[n])

print('\nDate: ',time.strftime('%y%m%d'))
print('Time: ',time.strftime('%H:%M:%S'))

# Drop Nan from the DataFrame.

# Create training arrays, X_13 is the features for engine pair 1 and 3

df_train = df[features].dropna()
X = np.array(df_train.drop(labels=d['fo_booster_13'],axis=1))
y = np.array(df_train[d['fo_booster_13']])

tm.train_tpot(test_name,X,y,gen,cores)

##
##
##%%
## Training next engine pair.
##
# X_24 features for engine 2, 4


test_name = str('eng24_exh_T_predictor_'+time.strftime('%y%m%d'))
features = [d['ae2_exh_T'],
          d['ae4_exh_T'],
          d['me2_exh_T'],
          d['me4_exh_T'],
          d['fo_booster_24']
          ]

print('Features and predictions for training...:\n')
for n in features:
    print('- ',d[n])
print('\nDate: ',time.strftime('%y%m%d'))
print('Time: ',time.strftime('%H:%M:%S'))


df_train = df[features].dropna()
X = np.array(df_train.drop(labels=d['fo_booster_24'],axis=1))
y = np.array(df_train[d['fo_booster_24']])

tm.train_tpot(test_name,X,y,gen,cores)

Features and predictions for training...:

-  ae1_exh_T
-  ae3_exh_T
-  me1_exh_T
-  me3_exh_T
-  fo_booster_13

Date:  180116
Time:  11:44:10
Training with TPOT ....  gen_50eng_13_exh_T_predictor_180116_180116


Optimization Progress:   4%|▍         | 100/2550 [02:44<14:37:33, 21.49s/pipeline]

Generation 1 - Current best internal CV score: 0.014959897981288467


Optimization Progress:   6%|▌         | 150/2550 [03:49<6:35:42,  9.89s/pipeline] 

Generation 2 - Current best internal CV score: 0.014959897981288467


Optimization Progress:   8%|▊         | 200/2550 [07:43<15:25:56, 23.64s/pipeline]

Generation 3 - Current best internal CV score: 0.014952865598080284


Optimization Progress:  10%|▉         | 250/2550 [12:14<20:25:46, 31.98s/pipeline]

Generation 4 - Current best internal CV score: 0.014776053422085943


Optimization Progress:  12%|█▏        | 300/2550 [14:51<20:44:31, 33.19s/pipeline]

Generation 5 - Current best internal CV score: 0.014776053422085943


Optimization Progress:  14%|█▎        | 350/2550 [19:54<24:31:16, 40.13s/pipeline]

Generation 6 - Current best internal CV score: 0.014776053422085943


Optimization Progress:  16%|█▌        | 400/2550 [25:48<23:30:22, 39.36s/pipeline]

Generation 7 - Current best internal CV score: 0.014774665579096319


Optimization Progress:  18%|█▊        | 450/2550 [32:27<35:17:32, 60.50s/pipeline]

Generation 8 - Current best internal CV score: 0.014774665579096319


Optimization Progress:  20%|█▉        | 500/2550 [40:00<37:38:57, 66.12s/pipeline]

Generation 9 - Current best internal CV score: 0.014774665579096319


Optimization Progress:  22%|██▏       | 550/2550 [47:28<49:38:49, 89.36s/pipeline] 

Generation 10 - Current best internal CV score: 0.014774665579096319


Optimization Progress:  24%|██▎       | 602/2550 [59:56<60:19:17, 111.48s/pipeline]

Generation 11 - Current best internal CV score: 0.014774665579096319


Optimization Progress:  26%|██▌       | 653/2550 [1:09:37<62:06:02, 117.85s/pipeline]

Generation 12 - Current best internal CV score: 0.014774665579096319


Optimization Progress:  28%|██▊       | 703/2550 [1:16:00<38:22:00, 74.78s/pipeline] 

Generation 13 - Current best internal CV score: 0.014684535190116962


Optimization Progress:  30%|██▉       | 754/2550 [1:24:53<31:35:56, 63.34s/pipeline]

Generation 14 - Current best internal CV score: 0.014627763492190377


Optimization Progress:  32%|███▏      | 804/2550 [1:30:02<30:33:40, 63.01s/pipeline]

Generation 15 - Current best internal CV score: 0.014627763492190377


Optimization Progress:  34%|███▎      | 856/2550 [1:39:46<57:50:48, 122.93s/pipeline]

Generation 16 - Current best internal CV score: 0.014595877919351023


Optimization Progress:  36%|███▌      | 906/2550 [1:45:37<35:12:16, 77.09s/pipeline] 

Generation 17 - Current best internal CV score: 0.014595877919351023


Optimization Progress:  38%|███▊      | 957/2550 [1:53:59<18:36:37, 42.06s/pipeline]

Generation 18 - Current best internal CV score: 0.014595877919351023


Optimization Progress:  40%|███▉      | 1008/2550 [2:04:07<16:07:52, 37.66s/pipeline]

Generation 19 - Current best internal CV score: 0.014595877919351023


Optimization Progress:  41%|████▏     | 1058/2550 [2:10:47<16:05:45, 38.84s/pipeline]

Generation 20 - Current best internal CV score: 0.014595877919351023


Optimization Progress:  43%|████▎     | 1108/2550 [2:15:57<14:46:06, 36.87s/pipeline]

Generation 21 - Current best internal CV score: 0.014595877919351023


Optimization Progress:  45%|████▌     | 1158/2550 [2:21:30<22:21:36, 57.83s/pipeline]

Generation 22 - Current best internal CV score: 0.014412731007302787


Optimization Progress:  47%|████▋     | 1209/2550 [2:30:16<25:41:34, 68.97s/pipeline]

Generation 23 - Current best internal CV score: 0.014412731007302787


Optimization Progress:  49%|████▉     | 1259/2550 [2:37:06<27:48:44, 77.56s/pipeline] 

Generation 24 - Current best internal CV score: 0.014412731007302787


Optimization Progress:  51%|█████▏    | 1309/2550 [2:41:25<18:47:05, 54.49s/pipeline]

Generation 25 - Current best internal CV score: 0.014412731007302787


Optimization Progress:  53%|█████▎    | 1360/2550 [2:48:39<12:32:21, 37.93s/pipeline]

Generation 26 - Current best internal CV score: 0.014412731007302787


Optimization Progress:  55%|█████▌    | 1410/2550 [2:53:45<6:39:17, 21.02s/pipeline] 

Generation 27 - Current best internal CV score: 0.014412731007302787


Optimization Progress:  57%|█████▋    | 1460/2550 [2:59:24<7:10:19, 23.69s/pipeline] 

Generation 28 - Current best internal CV score: 0.014388477729960636


Optimization Progress:  59%|█████▉    | 1510/2550 [3:06:58<8:37:06, 29.83s/pipeline] 

Generation 29 - Current best internal CV score: 0.014388477729960636


Optimization Progress:  61%|██████    | 1560/2550 [3:12:51<6:33:52, 23.87s/pipeline]

Generation 30 - Current best internal CV score: 0.014388477729960636


Optimization Progress:  63%|██████▎   | 1613/2550 [3:20:54<8:33:57, 32.91s/pipeline] 

Generation 31 - Current best internal CV score: 0.014370071129270776


Optimization Progress:  65%|██████▌   | 1663/2550 [3:29:05<6:50:44, 27.78s/pipeline]

Generation 32 - Current best internal CV score: 0.014370071129270776


Optimization Progress:  67%|██████▋   | 1713/2550 [3:37:12<14:00:12, 60.23s/pipeline] 

Generation 33 - Current best internal CV score: 0.014370071129270776


Optimization Progress:  69%|██████▉   | 1764/2550 [3:45:18<10:18:47, 47.24s/pipeline]

Generation 34 - Current best internal CV score: 0.014361516399694763


Optimization Progress:  71%|███████   | 1814/2550 [3:52:44<5:43:14, 27.98s/pipeline] 

Generation 35 - Current best internal CV score: 0.014361516399694763


Optimization Progress:  73%|███████▎  | 1864/2550 [3:58:47<2:49:40, 14.84s/pipeline]

Generation 36 - Current best internal CV score: 0.014361516399694763


Optimization Progress:  75%|███████▌  | 1914/2550 [4:05:05<3:21:54, 19.05s/pipeline]

Generation 37 - Current best internal CV score: 0.014361516399694763


Optimization Progress:  77%|███████▋  | 1965/2550 [4:13:46<3:57:49, 24.39s/pipeline]

Generation 38 - Current best internal CV score: 0.014331359386087327


Optimization Progress:  79%|███████▉  | 2017/2550 [4:22:42<10:35:57, 71.59s/pipeline]

Generation 39 - Current best internal CV score: 0.014331359386087327


Optimization Progress:  81%|████████  | 2067/2550 [4:31:40<5:35:18, 41.65s/pipeline] 

Generation 40 - Current best internal CV score: 0.014331359386087327


Optimization Progress:  83%|████████▎ | 2118/2550 [4:40:25<6:07:15, 51.01s/pipeline]

Generation 41 - Current best internal CV score: 0.014331359386087327


Optimization Progress:  85%|████████▌ | 2170/2550 [4:48:40<3:58:11, 37.61s/pipeline]

Generation 42 - Current best internal CV score: 0.014331359386087327


Optimization Progress:  87%|████████▋ | 2220/2550 [4:55:08<2:39:04, 28.92s/pipeline]

Generation 43 - Current best internal CV score: 0.014331359386087327


Optimization Progress:  89%|████████▉ | 2270/2550 [5:05:20<1:45:53, 22.69s/pipeline]

Generation 44 - Current best internal CV score: 0.014331359386087327


Optimization Progress:  91%|█████████ | 2322/2550 [5:16:40<3:57:20, 62.46s/pipeline] 

Generation 45 - Current best internal CV score: 0.014331359386087327


Optimization Progress:  93%|█████████▎| 2372/2550 [5:24:50<2:39:26, 53.74s/pipeline]

Generation 46 - Current best internal CV score: 0.014331359386087327


Optimization Progress:  95%|█████████▍| 2422/2550 [5:30:14<51:38, 24.21s/pipeline]  

Generation 47 - Current best internal CV score: 0.014331359386087327


Optimization Progress:  97%|█████████▋| 2472/2550 [5:35:51<23:07, 17.78s/pipeline]

Generation 48 - Current best internal CV score: 0.014331359386087327


Optimization Progress:  99%|█████████▉| 2524/2550 [5:42:48<10:13, 23.59s/pipeline]

Generation 49 - Current best internal CV score: 0.014331359386087327


                                                                                  

Generation 50 - Current best internal CV score: 0.014331359386087327

Best pipeline: RandomForestRegressor(ExtraTreesRegressor(PolynomialFeatures(input_matrix, degree=2, include_bias=False, interaction_only=False), bootstrap=False, max_features=0.45, min_samples_leaf=17, min_samples_split=3, n_estimators=100), bootstrap=False, max_features=0.1, min_samples_leaf=2, min_samples_split=3, n_estimators=100)
0.0140278966886
Time to train...: 20840.912600040436
Saving the model ...
gen_50eng_13_exh_T_predictor_180116_180116  saved ... 
Features and predictions for training...:

-  ae2_exh_T
-  ae4_exh_T
-  me2_exh_T
-  me4_exh_T
-  fo_booster_24

Date:  180116
Time:  17:31:31
Training with TPOT ....  gen_50eng24_exh_T_predictor_180116_180116


Optimization Progress:   4%|▍         | 100/2550 [03:17<7:10:24, 10.54s/pipeline]

Generation 1 - Current best internal CV score: 0.016272481103666768


Optimization Progress:   6%|▌         | 150/2550 [05:30<13:40:24, 20.51s/pipeline]

Generation 2 - Current best internal CV score: 0.015683845661954875


Optimization Progress:   8%|▊         | 200/2550 [08:32<13:29:04, 20.66s/pipeline]

Generation 3 - Current best internal CV score: 0.01492815305771098


Optimization Progress:  10%|▉         | 251/2550 [15:36<24:14:49, 37.97s/pipeline]

Generation 4 - Current best internal CV score: 0.01492815305771098


Optimization Progress:  12%|█▏        | 303/2550 [23:53<33:12:45, 53.21s/pipeline]

Generation 5 - Current best internal CV score: 0.014899357661953022


Optimization Progress:  14%|█▍        | 353/2550 [31:55<50:56:17, 83.47s/pipeline] 

Generation 6 - Current best internal CV score: 0.014804081206529355


Optimization Progress:  16%|█▌        | 404/2550 [40:18<56:26:25, 94.68s/pipeline] 

Generation 7 - Current best internal CV score: 0.014804081206529355


Optimization Progress:  18%|█▊        | 454/2550 [46:26<40:35:38, 69.72s/pipeline]

Generation 8 - Current best internal CV score: 0.014804081206529355


Optimization Progress:  20%|█▉        | 505/2550 [56:36<46:49:47, 82.44s/pipeline] 

Generation 9 - Current best internal CV score: 0.014801486543541959


Optimization Progress:  22%|██▏       | 555/2550 [1:03:40<39:28:09, 71.22s/pipeline]

Generation 10 - Current best internal CV score: 0.014801486543541959


Optimization Progress:  24%|██▍       | 607/2550 [1:13:56<56:20:53, 104.40s/pipeline]

Generation 11 - Current best internal CV score: 0.014763727411597757


Optimization Progress:  26%|██▌       | 659/2550 [1:25:55<57:11:22, 108.87s/pipeline]

Generation 12 - Current best internal CV score: 0.014763727411597757


Optimization Progress:  28%|██▊       | 711/2550 [1:33:21<19:57:35, 39.07s/pipeline] 

Generation 13 - Current best internal CV score: 0.014763727411597757


Optimization Progress:  30%|██▉       | 762/2550 [1:40:56<21:44:41, 43.78s/pipeline]

Generation 14 - Current best internal CV score: 0.014762297353561039


Optimization Progress:  32%|███▏      | 812/2550 [1:44:47<13:28:34, 27.91s/pipeline]

Generation 15 - Current best internal CV score: 0.014686766973617139


Optimization Progress:  34%|███▍      | 862/2550 [1:49:39<23:08:18, 49.35s/pipeline]

Generation 16 - Current best internal CV score: 0.014686766973617139


Optimization Progress:  36%|███▌      | 913/2550 [1:57:37<23:38:11, 51.98s/pipeline]

Generation 17 - Current best internal CV score: 0.014600557436622742


Optimization Progress:  38%|███▊      | 964/2550 [2:08:41<40:23:18, 91.68s/pipeline] 

Generation 18 - Current best internal CV score: 0.014600557436622742


Optimization Progress:  40%|███▉      | 1015/2550 [2:17:07<19:47:02, 46.40s/pipeline]

Generation 19 - Current best internal CV score: 0.014600557436622742


Optimization Progress:  42%|████▏     | 1065/2550 [2:20:53<13:05:45, 31.75s/pipeline]

Generation 20 - Current best internal CV score: 0.014600557436622742


Optimization Progress:  44%|████▎     | 1115/2550 [2:26:00<19:21:46, 48.58s/pipeline]

Generation 21 - Current best internal CV score: 0.014600557436622742


Optimization Progress:  46%|████▌     | 1166/2550 [2:32:49<31:25:43, 81.75s/pipeline] 

Generation 22 - Current best internal CV score: 0.014600557436622742


Optimization Progress:  48%|████▊     | 1217/2550 [2:41:30<22:28:01, 60.68s/pipeline]

Generation 23 - Current best internal CV score: 0.014600557436622742


Optimization Progress:  50%|████▉     | 1268/2550 [2:48:57<9:03:51, 25.45s/pipeline] 

Generation 24 - Current best internal CV score: 0.014600557436622742


Optimization Progress:  52%|█████▏    | 1318/2550 [2:53:51<14:25:29, 42.15s/pipeline]

Generation 25 - Current best internal CV score: 0.014527514184977223


Optimization Progress:  54%|█████▎    | 1368/2550 [2:58:31<15:16:39, 46.53s/pipeline]

Generation 26 - Current best internal CV score: 0.014527514184977223


Optimization Progress:  56%|█████▌    | 1419/2550 [3:07:40<14:13:40, 45.29s/pipeline]

Generation 27 - Current best internal CV score: 0.014446339583908141


Optimization Progress:  58%|█████▊    | 1469/2550 [3:14:31<23:09:17, 77.11s/pipeline] 

Generation 28 - Current best internal CV score: 0.014446339583908141


Optimization Progress:  60%|█████▉    | 1519/2550 [3:19:50<14:02:22, 49.02s/pipeline]

Generation 29 - Current best internal CV score: 0.014446339583908141


Optimization Progress:  62%|██████▏   | 1570/2550 [3:29:20<17:31:34, 64.38s/pipeline]

Generation 30 - Current best internal CV score: 0.014443454824824693


Optimization Progress:  64%|██████▎   | 1620/2550 [3:37:53<16:41:16, 64.60s/pipeline]

Generation 31 - Current best internal CV score: 0.014402800949981216


Optimization Progress:  66%|██████▌   | 1671/2550 [3:47:59<11:24:09, 46.70s/pipeline]

Generation 32 - Current best internal CV score: 0.014402800949981216


Optimization Progress:  68%|██████▊   | 1722/2550 [3:55:02<10:46:24, 46.84s/pipeline]

Generation 33 - Current best internal CV score: 0.014402800949981216


Optimization Progress:  69%|██████▉   | 1772/2550 [3:59:56<5:02:04, 23.30s/pipeline] 

Generation 34 - Current best internal CV score: 0.014402800949981216


Optimization Progress:  71%|███████▏  | 1822/2550 [4:04:27<2:25:57, 12.03s/pipeline]

Generation 35 - Current best internal CV score: 0.014402800949981216


Optimization Progress:  73%|███████▎  | 1872/2550 [4:10:06<3:48:01, 20.18s/pipeline]

Generation 36 - Current best internal CV score: 0.014402800949981216


Optimization Progress:  75%|███████▌  | 1923/2550 [4:18:10<4:42:08, 27.00s/pipeline]

Generation 37 - Current best internal CV score: 0.0143513374725668


Optimization Progress:  77%|███████▋  | 1973/2550 [4:23:26<8:59:16, 56.08s/pipeline] 

Generation 38 - Current best internal CV score: 0.0143513374725668


Optimization Progress:  79%|███████▉  | 2023/2550 [4:30:22<4:07:45, 28.21s/pipeline]

Generation 39 - Current best internal CV score: 0.014305626001784888


Optimization Progress:  81%|████████▏ | 2073/2550 [4:36:15<3:40:23, 27.72s/pipeline]

Generation 40 - Current best internal CV score: 0.014305626001784888


Optimization Progress:  83%|████████▎ | 2125/2550 [4:43:50<5:06:14, 43.23s/pipeline]

Generation 41 - Current best internal CV score: 0.014305626001784888


Optimization Progress:  85%|████████▌ | 2175/2550 [4:53:50<10:31:06, 100.98s/pipeline]

Generation 42 - Current best internal CV score: 0.014305626001784888


Optimization Progress:  87%|████████▋ | 2225/2550 [4:58:40<3:07:23, 34.59s/pipeline]  

Generation 43 - Current best internal CV score: 0.014305626001784888


Optimization Progress:  89%|████████▉ | 2276/2550 [5:05:36<1:38:32, 21.58s/pipeline]

Generation 44 - Current best internal CV score: 0.014305626001784888


Optimization Progress:  91%|█████████▏| 2328/2550 [5:16:48<1:28:36, 23.95s/pipeline]

Generation 45 - Current best internal CV score: 0.014305626001784888


Optimization Progress:  93%|█████████▎| 2378/2550 [5:22:29<1:05:29, 22.85s/pipeline]

Generation 46 - Current best internal CV score: 0.014305626001784888


Optimization Progress:  95%|█████████▌| 2429/2550 [5:30:56<1:11:52, 35.64s/pipeline]

Generation 47 - Current best internal CV score: 0.014298457744336907


Optimization Progress:  97%|█████████▋| 2480/2550 [5:39:37<35:27, 30.40s/pipeline]  

Generation 48 - Current best internal CV score: 0.014298457744336907


Optimization Progress:  99%|█████████▉| 2532/2550 [5:50:30<25:01, 83.44s/pipeline]   

Generation 49 - Current best internal CV score: 0.01428670859466906


                                                                                  

Generation 50 - Current best internal CV score: 0.01428670859466906

Best pipeline: RandomForestRegressor(CombineDFs(CombineDFs(MinMaxScaler(Normalizer(input_matrix, norm=l1)), input_matrix), input_matrix), bootstrap=False, max_features=0.15, min_samples_leaf=1, min_samples_split=3, n_estimators=100)
0.0133966106313
Time to train...: 21321.895904541016
Saving the model ...
gen_50eng24_exh_T_predictor_180116_180116  saved ... 
