# Synthetic Minority Oversampling Technique (SMOTE) for Imbalanced classification
1. [Blog: SMOTE for Imbalanced Classification with Python](https://machinelearningmastery.com/smote-oversampling-for-imbalanced-classification/)

# Neural Architecture Search Network (NASNet)
1. [Paper: Learning Transferable Architectures for Scalable Image Recognition](https://arxiv.org/abs/1707.07012)
2. [Keras doc: NasNetLarge and NasNetMobile](https://keras.io/api/applications/nasnet/#nasnetlarge-function)

## 预处理
### 装载数据

In [1]:
from others import load_all_dataset, rename_dataset
X_train, y_train, X_test, y_test = load_all_dataset()
import numpy as np
np.set_printoptions(edgeitems=5,
                    linewidth=1000,
                    formatter={"float":lambda x: "{:.3f}".format(x)})

Train data
Optical Dataset composed of
46110 source samples
50862 source background samples
438 target labeled samples
8202 target unlabeled samples
29592 target background samples
 Optical Dataset labels composed of
46110 labels of source samples
438 labels of target samples

Test data
Optical Dataset composed of
0 source samples
0 source background samples
17758 target labeled samples
0 target unlabeled samples
47275 target background samples
 Optical Dataset labels composed of
0 labels of source samples
17758 labels of target samples



In [1]:
# 去除NaN
from numpy import newaxis
class FeatureExtractor:
    def transform(self, X):
        '''
        Parameters
        ----------
        `X`: ndarray of (sample, 672, 10)
            3D input dataset(sample, time, features)
        
        Returns
        -------
        `X`: ndarray of (sample, 6720)
            The filtered dataset
        '''
        np.nan_to_num(X, copy=False)
        X = X.reshape(X.shape[0], -1)
        return X

fe = FeatureExtractor()
[X_source, X_source_bkg, X_target, X_target_unlabeled, X_target_bkg,
    y_source, y_target, X_test] = rename_dataset(
    fe, X_train, y_train, X_test, y_test, show_imbalance=True)

NameError: name 'rename_dataset' is not defined

### 整理数据（Normalization, Oversampling, ...)

In [3]:
import imblearn as il
from collections import Counter
over = il.over_sampling.SMOTE(sampling_strategy=0.5) # minority/majority ratio
X_source, y_source = over.fit_resample(X_source, y_source)

under = il.under_sampling.RandomUnderSampler(sampling_strategy=1.0)
X_source, y_source = under.fit_resample(X_source, y_source)

print(X_source.shape, y_source.shape)
print(Counter(y_source))

(41340, 6720) (41340,)
Counter({0.0: 20670, 1.0: 20670})


## 搭建模型

In [4]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
model_DT = DecisionTreeClassifier(max_depth=2, random_state=44,)
model_RF = RandomForestClassifier(
    n_estimators=2, max_depth=2, random_state=44, n_jobs=-1)

# define pipeline
# over = il.over_sampling.SMOTE(sampling_strategy=0.1)
# under = il.under_sampling.RandomUnderSampler(sampling_strategy=0.5)
# steps = [('over', over), ('under', under), ('model', model_DT)]
# pipeline_DT = il.pipeline.Pipeline(steps=steps)

# steps = [('over', over), ('under', under), ('model', model_RF)]
# pipeline_RF = il.pipeline.Pipeline(steps=steps)

In [5]:
model_DT.fit(X_source, y_source)
model_RF.fit(X_source, y_source)

RandomForestClassifier(max_depth=2, n_estimators=2, n_jobs=-1, random_state=44)

## 预测概率

In [7]:
print("X_test.target.shape:", X_test.target.shape)
y_pred = model_DT.predict(X_test.target)
print("Decision Tree:", model_DT.score(X_target, y_target))
print("Decision Tree:", model_DT.score(X_test.target, y_test.target))
print("Predicted:", Counter(y_pred), y_pred.shape)
print("True:      ", Counter(y_test.target), y_test.target.shape)

print("Random Forest:", model_RF.score(X_target, y_target))
print("Random Forest:", model_RF.score(X_test.target, y_test.target))

X_test.target.shape: (17758, 6720)
Decision Tree: 0.7054794520547946
Decision Tree: 0.7059353530803019
Predicted: Counter({0.0: 13356, 1.0: 4402}) (17758,)
True:       Counter({0.0: 15464, 1.0: 2294}) (17758,)
Random Forest: 0.7191780821917808
Random Forest: 0.7348800540601419


## 查看Tensorboard

In [None]:
%tensorboard