<a href="https://colab.research.google.com/github/fboldt/SignalProcessing/blob/master/0.1RF_CWRU_EvaluationFramework.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

The code is separated into four sections.

* In section "CWRU dataset" the CWRU Matlab files are downloaded, the acquisitions are extracted from the Matlab files into Numpy arrays. Then, the data is segmented and the samples are selected. The samples are selected by their labels using regular expressions. 

* Section "Experimenter" defines the splitters mentioned in this work, i.e. GroupShuffleKFold and BySeverityKFold, and the setup of each experiment. The samples are grouped by their original labels also using regular expressions. For instance, to group samples by load, the regular expression '_\d' may be used. A list of evaluation methods is defined in this section. 

* Section "Classification Models" defines the estimators and their feature extraction methods. To instantiate a classification method with feature extraction, a Pipeline must be made. A list of classification methods is defined in this section. 

* Finally, section "Performing Experiments" executes the experiments as they were defined in the previous sections, showing and saving  their results. It iterates one list of classification methods and one list of evaluation methods in $r$ rounds. In this work, four classification methods were tested by three evaluation methods in four rounds, resulting in $4\times 3\times 4 = 48$  experiments. New classification or evaluation methods can be tested by adding them in their respective list.

The code can be executed direct in the Colab environment when few samples and simple classifier methods are tested. For the experiments presented in this work, it must be run in a local GPU, with enough memory and processing capacity.
The results are presented for each round of each experiment of each classification method. The average and standard deviation of the rounds is also presented, as well, the average of the differences among the classification methods.

New feature extraction methods must receive a 3-D Numpy array and returns a 2-D Numpy array. This is necessary because the same raw dataset is used for methods that need feature extraction of the signal acquisitions, like K-NN, SVM and Random Forest, and convolutional neural networks that deal with 3-D arrays. The experiments presented here used just one channel of each acquisition, but newer experiments may use more channels.

New classifications methods that need feature extraction may be easily added. It is only required a Pipeline with the feature extraction method and the classifier, like those presented in the code. A new neural network base architecture must be wrapped in a scikit-learn estimator, and its definition must be in the method *fit*. It cannot be defined in the method *\_\_init\_\_* due to the implementation of the Keras library. If the network architecture is defined in the method *\_\_init\_\_*, it will remember the samples across the folds and the rounds, giving outstanding results, that will be never replicated in a real scenario. It is worth to highlight that some parameters of the network, like kernel size and number of filters, should be selected by a GridSearchCV method to provide fairer results when compared with other methods.

#CWRU database

In [1]:
debug = False

## CWRU files.

Associate each Matlab file name to a bearing condition in a Python dictionary.
The dictionary keys identify the conditions.

There are only four normal conditions, with loads of 0, 1, 2 and 3 hp.
All conditions end with an underscore character followed by an algarism representing the load applied during the acquisitions.
The remaining conditions follow the pattern:


* First two characters represent the bearing location, i.e. drive end (DE) and fan end (FE).
* The following two characters represent the failure location in the bearing, i.e. ball (BA), Inner Race (IR) and Outer Race (OR).
* The next three algarisms indicate the severity of the failure, where 007 stands for 0.007 inches and 0021 for 0.021 inches.
* For Outer Race failures, the character @ is followed by a number that indicates different load zones. 

In [2]:
def cwru_12khz():
  '''
  Retuns a dictionary with the names of all Matlab files read in 12kHz located in
  http://csegroups.case.edu/sites/default/files/bearingdatacenter/files/Datafiles/.
  The dictionary keys represent the bearing condition.
  '''
  matlab_files_name = {}
  # Normal
  matlab_files_name["Normal_0"] = "97.mat"
  matlab_files_name["Normal_1"] = "98.mat"
  matlab_files_name["Normal_2"] = "99.mat"
  matlab_files_name["Normal_3"] = "100.mat"
  # DE Inner Race 0.007 inches
  matlab_files_name["DEIR.007_0"] = "105.mat"
  matlab_files_name["DEIR.007_1"] = "106.mat"
  matlab_files_name["DEIR.007_2"] = "107.mat"
  matlab_files_name["DEIR.007_3"] = "108.mat"
  # DE Ball 0.007 inches
  matlab_files_name["DEB.007_0"] = "118.mat"
  matlab_files_name["DEB.007_1"] = "119.mat"
  matlab_files_name["DEB.007_2"] = "120.mat"
  matlab_files_name["DEB.007_3"] = "121.mat"
  # DE Outer race 0.007 inches centered @6:00
  matlab_files_name["DEOR.007@6_0"] = "130.mat"
  matlab_files_name["DEOR.007@6_1"] = "131.mat"
  matlab_files_name["DEOR.007@6_2"] = "132.mat"
  matlab_files_name["DEOR.007@6_3"] = "133.mat"
  # DE Outer race 0.007 inches centered @3:00
  matlab_files_name["DEOR.007@3_0"] = "144.mat"
  matlab_files_name["DEOR.007@3_1"] = "145.mat"
  matlab_files_name["DEOR.007@3_2"] = "146.mat"
  matlab_files_name["DEOR.007@3_3"] = "147.mat"
  # DE Outer race 0.007 inches centered @12:00
  matlab_files_name["DEOR.007@12_0"] = "156.mat"
  matlab_files_name["DEOR.007@12_1"] = "158.mat"
  matlab_files_name["DEOR.007@12_2"] = "159.mat"
  matlab_files_name["DEOR.007@12_3"] = "160.mat"
  # DE Inner Race 0.014 inches
  matlab_files_name["DEIR.014_0"] = "169.mat"
  matlab_files_name["DEIR.014_1"] = "170.mat"
  matlab_files_name["DEIR.014_2"] = "171.mat"
  matlab_files_name["DEIR.014_3"] = "172.mat"
  # DE Ball 0.014 inches
  matlab_files_name["DEB.014_0"] = "185.mat"
  matlab_files_name["DEB.014_1"] = "186.mat"
  matlab_files_name["DEB.014_2"] = "187.mat"
  matlab_files_name["DEB.014_3"] = "188.mat"
  # DE Outer race 0.014 inches centered @6:00
  matlab_files_name["DEOR.014@6_0"] = "197.mat"
  matlab_files_name["DEOR.014@6_1"] = "198.mat"
  matlab_files_name["DEOR.014@6_2"] = "199.mat"
  matlab_files_name["DEOR.014@6_3"] = "200.mat"
  # DE Ball 0.021 inches
  matlab_files_name["DEB.021_0"] = "222.mat"
  matlab_files_name["DEB.021_1"] = "223.mat"
  matlab_files_name["DEB.021_2"] = "224.mat"
  matlab_files_name["DEB.021_3"] = "225.mat"
  # FE Inner Race 0.021 inches
  matlab_files_name["FEIR.021_0"] = "270.mat"
  matlab_files_name["FEIR.021_1"] = "271.mat"
  matlab_files_name["FEIR.021_2"] = "272.mat"
  matlab_files_name["FEIR.021_3"] = "273.mat"
  # FE Inner Race 0.014 inches
  matlab_files_name["FEIR.014_0"] = "274.mat"
  matlab_files_name["FEIR.014_1"] = "275.mat"
  matlab_files_name["FEIR.014_2"] = "276.mat"
  matlab_files_name["FEIR.014_3"] = "277.mat"
  # FE Ball 0.007 inches
  matlab_files_name["FEB.007_0"] = "282.mat"
  matlab_files_name["FEB.007_1"] = "283.mat"
  matlab_files_name["FEB.007_2"] = "284.mat"
  matlab_files_name["FEB.007_3"] = "285.mat"
  # DE Inner Race 0.021 inches
  matlab_files_name["DEIR.021_0"] = "209.mat"
  matlab_files_name["DEIR.021_1"] = "210.mat"
  matlab_files_name["DEIR.021_2"] = "211.mat"
  matlab_files_name["DEIR.021_3"] = "212.mat"
  # DE Outer race 0.021 inches centered @6:00
  matlab_files_name["DEOR.021@6_0"] = "234.mat"
  matlab_files_name["DEOR.021@6_1"] = "235.mat"
  matlab_files_name["DEOR.021@6_2"] = "236.mat"
  matlab_files_name["DEOR.021@6_3"] = "237.mat"
  # DE Outer race 0.021 inches centered @3:00
  matlab_files_name["DEOR.021@3_0"] = "246.mat"
  matlab_files_name["DEOR.021@3_1"] = "247.mat"
  matlab_files_name["DEOR.021@3_2"] = "248.mat"
  matlab_files_name["DEOR.021@3_3"] = "249.mat"
  # DE Outer race 0.021 inches centered @12:00
  matlab_files_name["DEOR.021@12_0"] = "258.mat"
  matlab_files_name["DEOR.021@12_1"] = "259.mat"
  matlab_files_name["DEOR.021@12_2"] = "260.mat"
  matlab_files_name["DEOR.021@12_3"] = "261.mat"
  # FE Inner Race 0.007 inches
  matlab_files_name["FEIR.007_0"] = "278.mat"
  matlab_files_name["FEIR.007_1"] = "279.mat"
  matlab_files_name["FEIR.007_2"] = "280.mat"
  matlab_files_name["FEIR.007_3"] = "281.mat"
  # FE Ball 0.014 inches
  matlab_files_name["FEB.014_0"] = "286.mat"
  matlab_files_name["FEB.014_1"] = "287.mat"
  matlab_files_name["FEB.014_2"] = "288.mat"
  matlab_files_name["FEB.014_3"] = "289.mat"
  # FE Ball 0.021 inches
  matlab_files_name["FEB.021_0"] = "290.mat"
  matlab_files_name["FEB.021_1"] = "291.mat"
  matlab_files_name["FEB.021_2"] = "292.mat"
  matlab_files_name["FEB.021_3"] = "293.mat"
  # FE Outer race 0.007 inches centered @6:00
  matlab_files_name["FEOR.007@6_0"] = "294.mat"
  matlab_files_name["FEOR.007@6_1"] = "295.mat"
  matlab_files_name["FEOR.007@6_2"] = "296.mat"
  matlab_files_name["FEOR.007@6_3"] = "297.mat"
  # FE Outer race 0.007 inches centered @3:00
  matlab_files_name["FEOR.007@3_0"] = "298.mat"
  matlab_files_name["FEOR.007@3_1"] = "299.mat"
  matlab_files_name["FEOR.007@3_2"] = "300.mat"
  matlab_files_name["FEOR.007@3_3"] = "301.mat"
  # FE Outer race 0.007 inches centered @12:00
  matlab_files_name["FEOR.007@12_0"] = "302.mat"
  matlab_files_name["FEOR.007@12_1"] = "305.mat"
  matlab_files_name["FEOR.007@12_2"] = "306.mat"
  matlab_files_name["FEOR.007@12_3"] = "307.mat"
  # FE Outer race 0.014 inches centered @3:00
  matlab_files_name["FEOR.014@3_0"] = "310.mat"
  matlab_files_name["FEOR.014@3_1"] = "309.mat"
  matlab_files_name["FEOR.014@3_2"] = "311.mat"
  matlab_files_name["FEOR.014@3_3"] = "312.mat"
  # FE Outer race 0.014 inches centered @6:00
  matlab_files_name["FEOR.014@6_0"] = "313.mat"
  # FE Outer race 0.021 inches centered @6:00
  matlab_files_name["FEOR.021@6_0"] = "315.mat"
  # FE Outer race 0.021 inches centered @3:00
  matlab_files_name["FEOR.021@3_1"] = "316.mat"
  matlab_files_name["FEOR.021@3_2"] = "317.mat"
  matlab_files_name["FEOR.021@3_3"] = "318.mat"
  # DE Inner Race 0.028 inches
  matlab_files_name["DEIR.028_0"] = "3001.mat"
  matlab_files_name["DEIR.028_1"] = "3002.mat"
  matlab_files_name["DEIR.028_2"] = "3003.mat"
  matlab_files_name["DEIR.028_3"] = "3004.mat"
  # DE Ball 0.028 inches
  matlab_files_name["DEB.028_0"] = "3005.mat"
  matlab_files_name["DEB.028_1"] = "3006.mat"
  matlab_files_name["DEB.028_2"] = "3007.mat"
  matlab_files_name["DEB.028_3"] = "3008.mat"
  return matlab_files_name

def files_debug():
  """
  Associate each Matlab file name to a bearing condition in a Python dictionary. 
  The dictionary keys identify the conditions.
  
  NOTE: Used only for debug.
  """
  matlab_files_name = {}
  # Normal
  matlab_files_name["Normal_0"] = "97.mat"
  matlab_files_name["Normal_1"] = "98.mat"
  matlab_files_name["Normal_2"] = "99.mat"
  matlab_files_name["Normal_3"] = "100.mat"
  # FE Inner Race 0.007 inches
  matlab_files_name["FEIR.007_2"] = "280.mat"
  # DE Outer race 0.014 inches centered @6:00
  matlab_files_name["DEOR.014@6_1"] = "198.mat"
  # FE Outer race 0.021 inches centered @6:00
  matlab_files_name["FEOR.021@6_0"] = "315.mat"
  # DE Ball 0.028 inches
  matlab_files_name["DEB.028_3"] = "3008.mat"
  return matlab_files_name

##Download Matlab files
Downloads the Matlab files in the dictionary matlab_files_name.

In [3]:
import urllib.request
import os.path

def download_cwrufiles(matlab_files_name):
  '''
  Downloads the Matlab files in the dictionary matlab_files_name.
  '''
  url="http://csegroups.case.edu/sites/default/files/bearingdatacenter/files/Datafiles/"
  n = len(matlab_files_name)
  for i,key in enumerate(matlab_files_name):
    file_name = matlab_files_name[key]
    if not os.path.exists(file_name):
      urllib.request.urlretrieve(url+file_name, file_name)
    print("{}/{}\t{}\t{}".format(i+1, n, key, file_name))


##Extract data from Matlab files
Extracts the acquisitions of each Matlab file in the dictionary matlab_files_name.

In [4]:
import scipy.io
import numpy as np

def get_tensors_from_matlab(matlab_files_name):
  '''
  Extracts the acquisitions of each Matlab file in the dictionary matlab_files_name.
  '''
  acquisitions = {}
  for key in matlab_files_name:
    file_name = matlab_files_name[key]
    matlab_file = scipy.io.loadmat(file_name)
    for position in ['DE','FE', 'BA']:
      keys = [key for key in matlab_file if key.endswith(position+"_time")]
      if len(keys)>0:
        array_key = keys[0]
        acquisitions[key+position.lower()] = matlab_file[array_key].reshape(1,-1)[0]
  return acquisitions


##Downloading pickle file
Following, some auxiliary functions to download a pickle file in a google drive account.
The pickle file already has the acquisitions propertly extracted.
Therefore, these functions might speed up the whole process.

In [5]:
import requests
import os.path

def download_file_from_google_drive(id, destination):
    URL = "https://docs.google.com/uc?export=download"
    session = requests.Session()
    response = session.get(URL, params = { 'id' : id }, stream = True)
    token = get_confirm_token(response)
    if token:
        params = { 'id' : id, 'confirm' : token }
        response = session.get(URL, params = params, stream = True)
    save_response_content(response, destination)    

def get_confirm_token(response):
    for key, value in response.cookies.items():
        if key.startswith('download_warning'):
            return value
    return None

def save_response_content(response, destination):
    CHUNK_SIZE = 32768
    with open(destination, "wb") as f:
        for chunk in response.iter_content(CHUNK_SIZE):
            if chunk: # filter out keep-alive new chunks
                f.write(chunk)

# https://drive.google.com/file/d/1qJezMiROz9NAYafPUDPh9BFkxYF4nOi2/view?usp=sharing
file_id = "1qJezMiROz9NAYafPUDPh9BFkxYF4nOi2"
if not debug:
  pickle_file = 'cwru.pickle'
else:
  pickle_file = 'debug.pickle'

if not os.path.isfile(pickle_file) and not debug:
  try:
    download_file_from_google_drive(file_id, destination)
  except:
    print("Download failed!")

##Save/Load data
If the cwru pickle file is already download, it will not be downloaded again, and the dictionary with the acquisitions will be loaded.
Otherwise, the desired files are downloaded and the acquisitions are extrated.

In [6]:
import pickle
import os

if not debug:
  matlab_files_name = cwru_12khz()
else:
  matlab_files_name = files_debug()

if os.path.isfile(pickle_file) and  not debug:
  with open(pickle_file, 'rb') as handle:
    acquisitions = pickle.load(handle)
else:
  download_cwrufiles(matlab_files_name)
  acquisitions = get_tensors_from_matlab(matlab_files_name)
  with open(pickle_file, 'wb') as handle:
    pickle.dump(acquisitions, handle, protocol=pickle.HIGHEST_PROTOCOL)


##Segment data
Segments the acquisitions.
  sample_size is the size of each segment.
  max_samples is used for debug purpouses and 
  reduces the number of samples from each acquisition.


In [7]:
import numpy as np
def cwru_segmentation(acquisitions, sample_size=512, max_samples=None):
  '''
  Segments the acquisitions.
  sample_size is the size of each segment.
  max_samples is used for debug purpouses and 
  reduces the number of samples from each acquisition.
  '''
  origin = []
  data = np.empty((0,sample_size,1))
  n = len(acquisitions)
  for i,key in enumerate(acquisitions):
    acquisition_size = len(acquisitions[key])
    n_samples = acquisition_size//sample_size
    if max_samples is not None and max_samples > 0 and n_samples > max_samples:
      n_samples = max_samples
    print('{}/{} --- {}: {}'.format(i+1, n, key, n_samples))
    origin.extend([key for _ in range(n_samples)])
    data = np.concatenate((data,
           acquisitions[key][:(n_samples*sample_size)].reshape(
               (n_samples,sample_size,1))))
  return data,origin

if not debug:
  signal_data,signal_origin = cwru_segmentation(acquisitions, 512)
else:
  signal_data,signal_origin = cwru_segmentation(acquisitions, 512, 15)
signal_data.shape

1/307 --- Normal_0de: 476
2/307 --- Normal_0fe: 476
3/307 --- Normal_1de: 945
4/307 --- Normal_1fe: 945
5/307 --- Normal_2de: 945
6/307 --- Normal_2fe: 945
7/307 --- Normal_3de: 948
8/307 --- Normal_3fe: 948
9/307 --- DEIR.007_0de: 236
10/307 --- DEIR.007_0fe: 236
11/307 --- DEIR.007_0ba: 236
12/307 --- DEIR.007_1de: 238
13/307 --- DEIR.007_1fe: 238
14/307 --- DEIR.007_1ba: 238
15/307 --- DEIR.007_2de: 238
16/307 --- DEIR.007_2fe: 238
17/307 --- DEIR.007_2ba: 238
18/307 --- DEIR.007_3de: 240
19/307 --- DEIR.007_3fe: 240
20/307 --- DEIR.007_3ba: 240
21/307 --- DEB.007_0de: 239
22/307 --- DEB.007_0fe: 239
23/307 --- DEB.007_0ba: 239
24/307 --- DEB.007_1de: 237
25/307 --- DEB.007_1fe: 237
26/307 --- DEB.007_1ba: 237
27/307 --- DEB.007_2de: 237
28/307 --- DEB.007_2fe: 237
29/307 --- DEB.007_2ba: 237
30/307 --- DEB.007_3de: 237
31/307 --- DEB.007_3fe: 237
32/307 --- DEB.007_3ba: 237
33/307 --- DEOR.007@6_0de: 238
34/307 --- DEOR.007@6_0fe: 238
35/307 --- DEOR.007@6_0ba: 238
36/307 --- DEOR.

(77527, 512, 1)

## Clean dataset functions
The functions below help to select samples from acquisitions and form groups according to these acquisitions, using regular expressions.

In [8]:
import re
import numpy as np

def select_samples(regex, X, y):
  '''
  Selects samples wich has some regex pattern in its name.
  '''
  mask = [re.search(regex,label) is not None for label in y]
  return X[mask],y[mask]

def join_labels(regex, y):
  '''
  Excludes some regex patterns from the labels, 
  making some samples to have the same label.
  '''
  return np.array([re.sub(regex, '', label) for label in y])

def get_groups(regex, y):
  '''
  Generates a list of groups of samples with 
  the same regex patten in its label.
  '''
  groups = list(range(len(y)))
  for i,label in enumerate(y):
    match = re.search(regex,label)
    groups[i] = match.group(0) if match else None
  return groups

##Selecting samples

In [9]:
#DE from 'de', FE from 'fe', Normal from 'de' and 'fe'
samples = '^(DE).*(de)|^(FE).*(fe)|(Normal).*'
X,y = select_samples(samples, signal_data, np.array(signal_origin))
print(len(set(y)),set(y))

113 {'DEB.028_2de', 'DEB.014_1de', 'FEIR.021_2fe', 'FEIR.007_2fe', 'FEOR.007@6_3fe', 'DEIR.014_3de', 'DEB.021_2de', 'FEIR.014_2fe', 'Normal_1de', 'FEOR.007@3_1fe', 'FEB.007_3fe', 'DEOR.014@6_3de', 'Normal_2fe', 'DEOR.021@12_1de', 'DEOR.007@12_3de', 'DEB.007_2de', 'DEB.014_2de', 'DEOR.007@6_2de', 'Normal_1fe', 'DEB.007_3de', 'DEOR.007@3_3de', 'DEIR.021_3de', 'FEOR.007@6_1fe', 'DEOR.007@6_1de', 'Normal_3de', 'DEOR.007@12_0de', 'Normal_0fe', 'FEOR.007@6_2fe', 'FEB.007_1fe', 'DEIR.014_1de', 'DEB.014_3de', 'DEB.021_3de', 'FEB.021_0fe', 'DEIR.028_1de', 'FEOR.007@3_2fe', 'DEIR.007_1de', 'DEOR.021@3_2de', 'FEIR.007_1fe', 'FEOR.014@3_1fe', 'DEOR.007@12_1de', 'DEIR.007_0de', 'FEIR.007_3fe', 'DEOR.021@6_3de', 'FEIR.014_3fe', 'DEOR.021@12_3de', 'DEIR.021_2de', 'FEIR.021_3fe', 'DEOR.021@6_1de', 'DEIR.028_2de', 'DEB.007_1de', 'DEB.028_1de', 'FEOR.007@12_0fe', 'FEOR.007@12_1fe', 'FEB.014_2fe', 'FEOR.007@6_0fe', 'FEOR.007@12_3fe', 'DEOR.021@12_0de', 'FEOR.014@3_2fe', 'FEOR.014@3_0fe', 'DEB.021_1de', '

#Experimenter

In [10]:
from sklearn.model_selection import cross_validate, KFold, PredefinedSplit

def experimenter(model, X, y, groups=None, scoring=None, cv=KFold(4, True), verbose=0):
  '''
  Performs a experiment with some estimator (model) and validation.
  It works like a cross_validate function from sklearn, however, 
  when a estimator has an internal validation with groups, 
  it maintains the groups from the external validation.
  '''
  if hasattr(model,'cv') or (hasattr(model,'steps') and any(['gs' in step[0] for step in model.steps])):
    scores = {}
    lstval = list(validation.split(X,y,groups))
    for tr,te in lstval:
      if groups is not None:
        innercv = list(GroupShuffleKFold(validation.n_splits-1).split(X[tr],y[tr],np.array(groups)[tr]))
      else:
        innercv = list(KFold(validation.n_splits-1, True).split(X[tr],y[tr]))
      if hasattr(model,'cv'):
        model.cv = innercv
      else:
        for step in model.steps:
          if 'gs' in step[0]:
            step[1].cv = innercv
      test_fold = np.zeros((len(y),), dtype=int)
      test_fold[tr] = -1
      score = cross_validate(model, X, y, groups, scoring, 
                             PredefinedSplit(test_fold), verbose=verbose)
      for k in score.keys():
        if k not in scores:
          scores[k] = []
        scores[k].extend(score[k])
    return scores
  return cross_validate(model, X, y, groups, scoring, cv, verbose=verbose)



## Custom Splitter

In [11]:
from sklearn.model_selection import KFold
from sklearn.utils import shuffle
from sklearn.utils.validation import check_array
import numpy as np

class GroupShuffleKFold(KFold):
  '''
  Neither GroupShuffleSplit nor GroupKFold are good splitters for this case.
  A custom splitter must be made.
  '''
  def __init__(self, n_splits=4, shuffle=False, random_state=None):
    super().__init__(n_splits, shuffle=shuffle, random_state=random_state)
  def get_n_splits(self, X, y, groups=None):
    return self.n_splits
  def _iter_test_indices(self, X=None, y=None, groups=None):
    if groups is None:
      raise ValueError("The 'groups' parameter should not be None.")
    groups = check_array(groups, ensure_2d=False, dtype=None)
    unique_groups, groups = np.unique(groups, return_inverse=True)
    n_groups = len(unique_groups)
    if self.n_splits > n_groups:
      raise ValueError("Cannot have number of splits n_splits=%d greater"
                        " than the number of groups: %d."
                        % (self.n_splits, n_groups))
    # Distribute groups
    indices = np.arange(n_groups)
    if self.shuffle:
      for i in range(n_groups//self.n_splits):
        if self.random_state is None:
          indices[self.n_splits*i:self.n_splits*(i+1)] = shuffle(
              indices[self.n_splits*i:self.n_splits*(i+1)])
        else:
          indices[self.n_splits*i:self.n_splits*(i+1)] = shuffle(
              indices[self.n_splits*i:self.n_splits*(i+1)],
              random_state=self.random_state+i)
    #print(unique_groups[indices]) #Debug purpose
    # Total weight of each fold
    n_samples_per_fold = np.zeros(self.n_splits)
    # Mapping from group index to fold index
    group_to_fold = np.zeros(len(unique_groups))
    # Distribute samples 
    for group_index in indices:
      group_to_fold[indices[group_index]] = group_index%(self.n_splits)
    indices = group_to_fold[groups]
    for f in range(self.n_splits):
      yield np.where(indices == f)[0]

## BySeverity Splitter

In [12]:
from sklearn.model_selection import KFold
from sklearn.utils import shuffle
from sklearn.utils.validation import check_array
import numpy as np

class BySeverityKFold(KFold):
  '''
  Splits the CWRU dataset in severities.
  '''
  # Compatibility constructor
  def __init__(self, n_splits=4, shuffle=False, random_state=None):
    super().__init__(n_splits=4, shuffle=False, random_state=None)
    self.nround = random_state
  def _iter_test_indices(self, X=None, y=None, groups=None):
    if groups is None:
      raise ValueError("The 'groups' parameter should not be None.")
    groups = check_array(groups, ensure_2d=False, dtype=None)
    unique_groups, groups = np.unique(groups, return_inverse=True)
    n_groups = len(unique_groups)
    if self.n_splits > n_groups:
      raise ValueError("Cannot have number of splits n_splits=%d greater"
                        " than the number of groups: %d."
                        % (self.n_splits, n_groups))
    # Distribute groups
    indices = np.arange(n_groups)
    nround = self.nround - random_state
    for i in range(nround//4):
      indices[i],indices[i+1] = indices[i+1],indices[i]
    for i in range(self.n_splits):      
      indices[i+self.n_splits] = (i+nround)%self.n_splits+self.n_splits
    #print(unique_groups[indices]) #Debug purpose
    # Total weight of each fold
    n_samples_per_fold = np.zeros(self.n_splits)
    # Mapping from group index to fold index
    group_to_fold = np.zeros(len(unique_groups))
    # Distribute samples 
    for group_index in indices:
      group_to_fold[indices[group_index]] = group_index%(self.n_splits)
    print(group_to_fold)
    indices = group_to_fold[groups]
    for f in range(self.n_splits):
      yield np.where(indices == f)[0]

##Experiment setup

In [13]:
from collections import namedtuple

ExperimentSetup = namedtuple('ExperimentSetup', 
                             'groups, splitter_name, shuffle, rounds')

validations = {
    # Samples with the same load cannot be in the trainning fold and
    # the test folds simultaneously. 
    "By Load": ExperimentSetup(groups = get_groups('_\d',y), 
                               splitter_name = 'GroupShuffleKFold',
                               shuffle = False,
                               rounds=1,
                               ),
    # Samples with the same severity cannot be in the trainning folds and
    # the test folds simultaneously.
    "By Severity": ExperimentSetup(groups = get_groups('(\.\d{3})|(Normal_\d)',y),
                                   splitter_name = 'BySeverityKFold',
                                   shuffle = False,
                                   rounds=8),
    # Validation usually seen in publications with CWRU bearing dataset.
    "Usual K-Fold": ExperimentSetup(groups = None, 
                                    splitter_name = 'KFold',
                                    shuffle = True,
                                    rounds=8), 
}

##Common Variables

In [14]:
# Only four conditions are considered: Normal, Ball, Inner Race and Outer Race.
selected_y = join_labels('_\d|@\d{1,3}|(de)|(fe)|\.\d{3}|(DE)|(FE)',y)
verbose = 0 #if not debug else 3
random_state = 42
scoring = ['accuracy', 'f1_macro']#, 'precision_macro', 'recall_macro']

#Classification Models

In [15]:
import warnings
warnings.filterwarnings('ignore')

##Feature Extraction Models

In [16]:
from sklearn.base import TransformerMixin

###Statistical functions

In [17]:
import numpy as np
import scipy.stats as stats

def rms(x):
  '''
  root mean square
  '''
  x = np.array(x)
  return np.sqrt(np.mean(np.square(x)))

def sra(x):
  '''
  square root amplitude
  '''
  x = np.array(x)
  return np.mean(np.sqrt(np.absolute(x)))**2

def ppv(x):
  '''
  peak to peak value
  '''
  x = np.array(x)
  return np.max(x)-np.min(x)

def cf(x):
  '''
  crest factor
  '''
  x = np.array(x)
  return np.max(np.absolute(x))/rms(x)

def ifa(x):
  '''
  impact factor
  '''
  x = np.array(x)
  return np.max(np.absolute(x))/np.mean(np.absolute(x))

def mf(x):
  '''
  margin factor
  '''
  x = np.array(x)
  return np.max(np.absolute(x))/sra(x)

def sf(x):
  '''
  shape factor
  '''
  x = np.array(x)
  return rms(x)/np.mean(np.absolute(x))

def kf(x):
  '''
  kurtosis factor
  '''
  x = np.array(x)
  return stats.kurtosis(x)/(np.mean(x**2)**2)



### Statistical Features from Time Domain

In [18]:
class StatisticalTime(TransformerMixin):
  '''
  Extracts statistical features from the time domain.
  '''
  def fit(self, X, y=None):
    return self
  def transform(self, X, y=None):
    return np.array([
                     [
                      rms(x), # root mean square
                      sra(x), # square root amplitude
                      stats.kurtosis(x), # kurtosis
                      stats.skew(x), # skewness
                      ppv(x), # peak to peak value
                      cf(x), # crest factor
                      ifa(x), # impact factor
                      mf(x), # margin factor
                      sf(x), # shape factor
                      kf(x), # kurtosis factor
                      ] for x in X[:,:,0]
                     ])


### Statistical Features from Frequency Domain

In [19]:
class StatisticalFrequency(TransformerMixin):
  '''
  Extracts statistical features from the frequency domain.
  '''
  def fit(self, X, y=None):
    return self
  def transform(self, X, y=None):
    sig = []
    for x in X[:,:,0]:
      fx = np.absolute(np.fft.fft(x)) # transform x from time to frequency domain
      fc = np.mean(fx) # frequency center
      sig.append([
                  fc, # frequency center
                  rms(fx), # RMS from the frequency domain
                  rms(fx-fc), # Root Variance Frequency
                  ])
    return np.array(sig)


###Statistical Features

In [20]:
class Statistical(TransformerMixin):
  '''
  Extracts statistical features from both time and frequency domain.
  '''
  def fit(self, X, y=None):
    return self
  def transform(self, X, y=None):
    st = StatisticalTime()
    stfeats = st.transform(X)
    sf = StatisticalFrequency()
    sffeats = sf.transform(X)
    return np.concatenate((stfeats,sffeats),axis=1)

###Wavelet Package Features

In [21]:
import numpy as np
import pywt

class WaveletPackage(TransformerMixin):
  '''
  Extracts Wavelet Package features.
  The features are calculated by the energy of the recomposed signal
  of the leaf nodes coeficients.
  '''
  def fit(self, X, y=None):
    return self
  def transform(self, X, y=None):
    def Energy(coeffs, k):
      return np.sqrt(np.sum(np.array(coeffs[-k]) ** 2)) / len(coeffs[-k])
    def getEnergy(wp):
      coefs = np.asarray([n.data for n in wp.get_leaf_nodes(True)])
      return np.asarray([Energy(coefs,i) for i in range(2**wp.maxlevel)])
    return np.array([getEnergy(pywt.WaveletPacket(data=x, wavelet='db4', 
                                                  mode='symmetric', maxlevel=4)
                                                  ) for x in X[:,:,0]])

###Heterogeneus Features

In [22]:
class Heterogeneous(TransformerMixin):
  '''
  Mixes Statistical and Wavelet Package features.
  '''
  def fit(self, X, y=None):
    return self
  def transform(self, X, y=None):
    st = StatisticalTime()
    stfeats = st.transform(X)
    sf = StatisticalFrequency()
    sffeats = sf.transform(X)
    wp = WaveletPackage()
    wpfeats = wp.transform(X)
    return np.concatenate((stfeats,sffeats,wpfeats),axis=1)


##K-NN with Heterogeneous Features

In [23]:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import GridSearchCV

knn = Pipeline([
                ('FeatureExtraction', Heterogeneous()),
                ('scaler', StandardScaler()),
                ('knn', KNeighborsClassifier()),
                ])

parameters = {'knn__n_neighbors': list(range(1,16,2))}
if not debug:
  knn = GridSearchCV(knn, parameters, verbose=verbose)
else:
  knn = GridSearchCV(knn, {'knn__n_neighbors': list(range(1,4,2))}, verbose=verbose)
knn

GridSearchCV(estimator=Pipeline(steps=[('FeatureExtraction',
                                        <__main__.Heterogeneous object at 0x7fce0bb38a00>),
                                       ('scaler', StandardScaler()),
                                       ('knn', KNeighborsClassifier())]),
             param_grid={'knn__n_neighbors': [1, 3, 5, 7, 9, 11, 13, 15]})

##SVM with Heterogeneous Features

In [24]:
from sklearn.svm import SVC
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import GridSearchCV

svm = Pipeline([
                ('FeatureExtraction', Heterogeneous()),
                ('scaler', StandardScaler()),
                ('svc', SVC()),
                ])

parameters = {
    'svc__C': [10**x for x in range(-3,2)],
    'svc__gamma': [10**x for x in range(-3,1)],
    }
if not debug:
  svm = GridSearchCV(svm, parameters, verbose=verbose)
svm

GridSearchCV(estimator=Pipeline(steps=[('FeatureExtraction',
                                        <__main__.Heterogeneous object at 0x7fce0c12c850>),
                                       ('scaler', StandardScaler()),
                                       ('svc', SVC())]),
             param_grid={'svc__C': [0.001, 0.01, 0.1, 1, 10],
                         'svc__gamma': [0.001, 0.01, 0.1, 1]})

##Random Forest with Heterogeneous Features

In [25]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import GridSearchCV

rf = Pipeline([
               ('FeatureExtraction', Heterogeneous()),
               ('scaler', StandardScaler()),
               ('rf', RandomForestClassifier()),
               ])

parameters = {
    "rf__n_estimators": [10, 20, 50, 100, 200, 500],
    "rf__max_features": [1, 5, 10, 15, 20], #list(range(1,21)),
    }
if not debug:
  rf = GridSearchCV(rf, parameters, verbose=verbose)
rf

GridSearchCV(estimator=Pipeline(steps=[('FeatureExtraction',
                                        <__main__.Heterogeneous object at 0x7fce0c12c310>),
                                       ('scaler', StandardScaler()),
                                       ('rf', RandomForestClassifier())]),
             param_grid={'rf__max_features': [1, 5, 10, 15, 20],
                         'rf__n_estimators': [10, 20, 50, 100, 200, 500]})

##Convolutional Neural Network

In [26]:
try:
  %tensorflow_version 2.x
except:
  print("Out of Colab")

Out of Colab


###F1-score macro averaged implemented for Keras

In [27]:
import tensorflow as tf
from tensorflow.keras import backend as K

def f1_score_macro(y_true,y_pred):
    def recall(y_true, y_pred):
        true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
        possible_positives = K.sum(K.round(K.clip(y_true, 0, 1)))
        recall = true_positives / (possible_positives + K.epsilon())
        return recall
    def precision(y_true, y_pred):
        true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
        predicted_positives = K.sum(K.round(K.clip(y_pred, 0, 1)))
        precision = true_positives / (predicted_positives + K.epsilon())
        return precision
    precision = precision(y_true, y_pred)
    recall = recall(y_true, y_pred)
    return 2*((precision*recall)/(precision+recall+K.epsilon()))

###ANN wrapped in a scikit-learn estimator.

In [28]:
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential
from tensorflow.keras.utils import to_categorical
from sklearn.base import BaseEstimator, ClassifierMixin
import numpy as np
from tensorflow.keras.callbacks import EarlyStopping,ReduceLROnPlateau

class ANN(BaseEstimator, ClassifierMixin):
  def __init__(self, 
               dense_layer_sizes=[64], 
               kernel_size=32, 
               filters=32, 
               n_conv_layers=2,
               pool_size=8,
               dropout=0.25,
               epochs=50,
               validation_split=0.05,
               optimizer='sgd'#'nadam'#'rmsprop'#
               ):
    self.dense_layer_sizes = dense_layer_sizes
    self.kernel_size = kernel_size
    self.filters = filters
    self.n_conv_layers = n_conv_layers
    self.pool_size = pool_size
    self.dropout = dropout
    self.epochs = epochs
    self.validation_split = validation_split
    self.optimizer = optimizer
  
  def fit(self, X, y=None):
    dense_layer_sizes = self.dense_layer_sizes
    kernel_size = self.kernel_size
    filters = self.filters
    n_conv_layers = self.n_conv_layers
    pool_size = self.pool_size
    dropout = self.dropout
    epochs = self.epochs
    optimizer = self.optimizer
    validation_split = self.validation_split

    self.labels, ids = np.unique(y, return_inverse=True)
    y_cat = to_categorical(ids)
    num_classes = y_cat.shape[1]
    
    self.model = Sequential()
    self.model.add(layers.InputLayer(input_shape=(X.shape[1],X.shape[-1])))
    for _ in range(n_conv_layers):
      self.model.add(layers.Conv1D(filters, kernel_size))#, padding='valid'))
      self.model.add(layers.Activation('relu'))
      if pool_size>1:
        self.model.add(layers.MaxPooling1D(pool_size=pool_size))
    #self.model.add(layers.Dropout(0.25))
    self.model.add(layers.Flatten())
    for layer_size in dense_layer_sizes:
        self.model.add(layers.Dense(layer_size))
        self.model.add(layers.Activation('relu'))
    if dropout>0 and dropout<1:
      self.model.add(layers.Dropout(dropout))
    self.model.add(layers.Dense(num_classes))
    self.model.add(layers.Activation('softmax'))
    self.model.compile(loss='categorical_crossentropy',
                       optimizer=optimizer,
                       metrics=[f1_score_macro])
    if validation_split>0 and validation_split<1:
      prop = int(1/validation_split)
      mask = np.array([i%prop==0 for i in range(len(y))])
      self.history = self.model.fit(X[~mask], y_cat[~mask], epochs=epochs, 
                                    validation_data=(X[mask],y_cat[mask]),
                                    callbacks=[EarlyStopping(patience=3), ReduceLROnPlateau()],
                                    verbose=False
                                    )  
    else:
      self.history = self.model.fit(X, y_cat, epochs=epochs, verbose=False)  
  
  def predict_proba(self, X, y=None):
    return self.model.predict(X)

  def predict(self, X, y=None):
    predictions = self.model.predict(X)
    return self.labels[np.argmax(predictions,axis=1)]

###ANN instantiation

In [29]:
parameters = {
    'filters': [16, 32],
    'kernel_size': [16, 32],
    'n_conv_layers': [1, 2],
    #'pool_size': [2, 4, 6, 8],
    }
ann = ANN()
if not debug:
  ann = GridSearchCV(ann, parameters, verbose=verbose)
ann

GridSearchCV(estimator=ANN(),
             param_grid={'filters': [16, 32], 'kernel_size': [16, 32],
                         'n_conv_layers': [1, 2]})

##List of Estimators

In [30]:
clfs = [
        # ('KNN - KNeighborsClassifier, Heterogeneous Features', knn),
        # ('SVM - SVC with Heterogeneous Features', svm),
        # ('ANN - Artificial Neural Network with Convolutional Layers', ann),
        ('RF - RandomForestClassifier with Heterogeneous Features', rf),
        ]
if not debug:
  dirres = 'cwru_rf'
  # dirres = 'cwru_res'
else:
  dirres = 'debugres'

#Performing Experiments

In [None]:
import numpy as np

scores = {}
trtime = {}
tetime = {}
# Estimators
for clf_name, estimator in clfs:
  if clf_name not in scores:
    scores[clf_name] = {}
    trtime[clf_name] = {}
    tetime[clf_name] = {}
  print("*"*(len(clf_name)+8),'\n***',clf_name,'***\n'+"*"*(len(clf_name)+8))
  # Validation forms
  for val_name in validations.keys():
    print("#"*(len(val_name)+8),'\n###',val_name,'###\n'+"#"*(len(val_name)+8))
    # Number of repetitions
    for r in range(validations[val_name].rounds):
      round_str = "Round {}".format(r+1)
      print("@"*(len(round_str)+8),'\n@@@',round_str,'@@@\n'+"@"*(len(round_str)+8))
      groups = validations[val_name].groups
      if val_name not in scores[clf_name]:
        scores[clf_name][val_name] = {}
      validation = eval(validations[val_name].splitter_name
                        +'(4,shuffle='+str(validations[val_name].shuffle)
                        +',random_state='+str(random_state+r)+')')
      score = experimenter(estimator, X, selected_y, groups, 
                           scoring, validation, verbose)
      for metric,s in score.items():
        print(metric, ' \t', s)
        if metric not in scores[clf_name][val_name]:
          scores[clf_name][val_name][metric] = []
        scores[clf_name][val_name][metric].append(s)

*************************************************************** 
*** RF - RandomForestClassifier with Heterogeneous Features ***
***************************************************************
############### 
### By Load ###
###############
@@@@@@@@@@@@@@@ 
@@@ Round 1 @@@
@@@@@@@@@@@@@@@
fit_time  	 [4072.0048599243164, 3971.1273851394653, 4007.050894498825, 3923.62695813179]
score_time  	 [9.202571630477905, 9.959919691085815, 9.863134145736694, 9.595574140548706]
test_accuracy  	 [0.9480130647795318, 0.9812515520238391, 0.9662363455809335, 0.9365000620116581]
test_f1_macro  	 [0.9541007434900952, 0.981709376022543, 0.9663312387552976, 0.9356125065034845]
################### 
### By Severity ###
###################
@@@@@@@@@@@@@@@ 
@@@ Round 1 @@@
@@@@@@@@@@@@@@@
[0. 1. 2. 3. 0. 1. 2. 3.]
fit_time  	 [3301.591542005539, 3844.1913075447083, 3531.6875348091125, 4570.869230747223]
score_time  	 [12.796550512313843, 9.3943030834198, 11.179861783981323, 4.535309314727783]
test_accuracy  

##Save results

In [None]:
from pathlib import Path

clf = {}
val = {}
src = {}
for c, clf_name in enumerate(scores.keys()):
  if c not in clf:
    clf[c] = clf_name
  for v, val_name in enumerate(scores[clf_name].keys()):
    if v not in val:
      val[v] = val_name
    for s, scr_name in enumerate(scores[clf_name][val_name].keys()):
      scores[clf_name][val_name][scr_name] = np.array(scores[clf_name][val_name][scr_name])
      if s not in src:
        src[s] = scr_name
      Path(dirres).mkdir(parents=True, exist_ok=True)
      np.savetxt('{}/{}-{}-{}.txt'.format(dirres,clf_name,val_name,scr_name), 
                 scores[clf_name][val_name][scr_name], delimiter=',')
      print('{}/{} - {} - {}\n'.format(dirres,clf_name.split('-')[0],val_name,scr_name),
            scores[clf_name][val_name][scr_name])


##Average & Standard Deviation

In [None]:
c,v,s = len(clf),len(val),len(src)
for i in range(s):
  print(src[i])
  for k in range(v):
    print('\t'+val[k]+' ', end='')
  print()
  for j in range(c):
    print(clf[j].split('-')[0], end='\t')
    for k in range(v):
      print("{0:.3f} ({1:.3f})".format(
          scores[clf[j]][val[k]][src[i]].mean(),
          scores[clf[j]][val[k]][src[i]].std()), end='\t')
    print()
  print()

## Experiment results



```
*************************************************************** 
*** RF - RandomForestClassifier with Heterogeneous Features ***
***************************************************************
############### 
### By Load ###
###############
@@@@@@@@@@@@@@@ 
@@@ Round 1 @@@
@@@@@@@@@@@@@@@
fit_time  	 [4072.0048599243164, 3971.1273851394653, 4007.050894498825, 3923.62695813179]
score_time  	 [9.202571630477905, 9.959919691085815, 9.863134145736694, 9.595574140548706]
test_accuracy  	 [0.9480130647795318, 0.9812515520238391, 0.9662363455809335, 0.9365000620116581]
test_f1_macro  	 [0.9541007434900952, 0.981709376022543, 0.9663312387552976, 0.9356125065034845]
################### 
### By Severity ###
###################
@@@@@@@@@@@@@@@ 
@@@ Round 1 @@@
@@@@@@@@@@@@@@@
[0. 1. 2. 3. 0. 1. 2. 3.]
fit_time  	 [3301.591542005539, 3844.1913075447083, 3531.6875348091125, 4570.869230747223]
score_time  	 [12.796550512313843, 9.3943030834198, 11.179861783981323, 4.535309314727783]
test_accuracy  	 [0.5101532567049808, 0.539150460593654, 0.5631395716847769, 0.49841521394611726]
test_f1_macro  	 [0.5280373901589933, 0.5056137271948168, 0.5454059975595691, 0.24940523394131642]
@@@@@@@@@@@@@@@ 
@@@ Round 2 @@@
@@@@@@@@@@@@@@@
[0. 1. 2. 3. 3. 0. 1. 2.]
fit_time  	 [3100.424046754837, 3838.168055295944, 3521.3323690891266, 4736.201282501221]
score_time  	 [13.206766366958618, 9.292587041854858, 12.063929080963135, 3.476682424545288]
test_accuracy  	 [0.6051151344700298, 0.5381269191402251, 0.5546652609383237, 0.33497536945812806]
test_f1_macro  	 [0.5525010407714559, 0.5046672033816783, 0.53943790799564, 0.25]
@@@@@@@@@@@@@@@ 
@@@ Round 3 @@@
@@@@@@@@@@@@@@@
[0. 1. 2. 3. 2. 3. 0. 1.]
fit_time  	 [3122.9404289722443, 3821.518924474716, 3800.3353536129, 4884.549899816513]
score_time  	 [13.460021734237671, 9.806526184082031, 10.750390529632568, 4.655071973800659]
test_accuracy  	 [0.596765688170153, 0.5278701099463053, 0.5162158997775436, 0.5002645502645503]
test_f1_macro  	 [0.5530607585702343, 0.4955428274727212, 0.5467125252922331, 0.25052798310454066]
@@@@@@@@@@@@@@@ 
@@@ Round 4 @@@
@@@@@@@@@@@@@@@
[0. 1. 2. 3. 1. 2. 3. 0.]
fit_time  	 [3124.4579796791077, 4080.3560025691986, 3529.207218170166, 4621.081122875214]
score_time  	 [13.211638689041138, 8.480123043060303, 11.16506052017212, 4.515885353088379]
test_accuracy  	 [0.5346099789177794, 0.48197150334399536, 0.5544888701339804, 0.5]
test_f1_macro  	 [0.5174391652041983, 0.5102736015712057, 0.5296003549190389, 0.25]
@@@@@@@@@@@@@@@ 
@@@ Round 5 @@@
@@@@@@@@@@@@@@@
[1. 0. 2. 3. 0. 1. 2. 3.]
fit_time  	 [3838.603193283081, 2956.779947280884, 3339.363736629486, 3658.5913541316986]
score_time  	 [7.868077516555786, 13.02027153968811, 10.6065034866333, 3.2213878631591797]
test_accuracy  	 [0.47484733934283224, 0.6205835823519071, 0.553117417449098, 0.5007923930269413]
test_f1_macro  	 [0.5051672636289304, 0.561606318832955, 0.5410200655302071, 0.25]
@@@@@@@@@@@@@@@ 
@@@ Round 6 @@@
@@@@@@@@@@@@@@@
[1. 0. 2. 3. 3. 0. 1. 2.]
fit_time  	 [2700.387570619583, 2234.948934316635, 2493.2651419639587, 3344.0244669914246]
score_time  	 [6.557081699371338, 9.528404474258423, 7.923113822937012, 2.4130032062530518]
test_accuracy  	 [0.5547594677584442, 0.612673580594129, 0.5499209277807063, 0.33497536945812806]
test_f1_macro  	 [0.5171233375499282, 0.557289673850635, 0.529242166332487, 0.25]
@@@@@@@@@@@@@@@ 
@@@ Round 7 @@@
@@@@@@@@@@@@@@@
[1. 0. 2. 3. 2. 3. 0. 1.]
fit_time  	 [2675.2328929901123, 2205.2475955486298, 2588.479674100876, 3264.188013315201]
score_time  	 [6.503838300704956, 9.486050844192505, 7.428351163864136, 3.1671860218048096]
test_accuracy  	 [0.5496417604912999, 0.600843288826423, 0.5130546774382391, 0.5007936507936508]
test_f1_macro  	 [0.5110375700866188, 0.5495968715665601, 0.543148044285606, 0.25158061116965225]
@@@@@@@@@@@@@@@ 
@@@ Round 8 @@@
@@@@@@@@@@@@@@@
[1. 0. 2. 3. 1. 2. 3. 0.]
fit_time  	 [2667.922997713089, 2251.012385368347, 2483.071825027466, 3267.6658494472504]
score_time  	 [6.51307225227356, 8.647204160690308, 8.156021356582642, 3.1517128944396973]
test_accuracy  	 [0.5230120173868575, 0.5183908045977011, 0.5542778774132292, 0.5023809523809524]
test_f1_macro  	 [0.49040240376973293, 0.5409535640281503, 0.5401308960145214, 0.2547120418848168]
```

