## using backbone from pre-trained keras resnet50/vgg19 apps

In [1]:
%matplotlib inline
from tensorflow import keras
from tensorflow.keras.layers import *
from tensorflow.keras.models import *
from tensorflow.keras.applications.resnet50 import *
from tensorflow.keras.datasets import cifar10
# from keras.datasets import cifar10
# from keras.applications.resnet50 import *
# from keras.layers import *
# from keras.models import *
# import keras

import numpy as np
import pandas as pd
keras.__version__

'2.2.4-tf'

In [2]:
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz


### let's train a bin classifier, find what happens when pos/neg samples are extremely unbalanced

In [3]:
x_train_bin = x_train.copy()
y_train_bin = y_train==0
x_train_bin.shape, y_train_bin.shape

((50000, 32, 32, 3), (50000, 1))

In [4]:
x = preprocess_input(x_train_bin[0])

In [5]:
x = np.expand_dims(x, axis=0)
x.shape

(1, 32, 32, 3)

In [8]:
m = ResNet50(weights='imagenet', include_top=False)


In [9]:
for layer in m.layers:
    layer.trainable = False

In [10]:
x_features = m.predict(x=x)
x_features.shape

(1, 1, 1, 2048)

In [11]:
x = m.output
x = GlobalAveragePooling2D()(x)
x = Dense(10, activation='relu')(x)
prediction = Dense(1, activation='sigmoid')(x)

In [12]:
baseline_model = Model(inputs=m.input, outputs=prediction)
baseline_model.summary()

Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_3 (InputLayer)            [(None, None, None,  0                                            
__________________________________________________________________________________________________
conv1_pad (ZeroPadding2D)       (None, None, None, 3 0           input_3[0][0]                    
__________________________________________________________________________________________________
conv1 (Conv2D)                  (None, None, None, 6 9472        conv1_pad[0][0]                  
__________________________________________________________________________________________________
bn_conv1 (BatchNormalization)   (None, None, None, 6 256         conv1[0][0]                      
______________________________________________________________________________________________

In [13]:
baseline_model.compile(optimizer='sgd', loss='binary_crossentropy', metrics=['acc'])

In [14]:
x_valid_bin = x_test.copy()
y_valid_bin = y_test==1

In [15]:
import pandas as pd
hist = baseline_model.fit(x=x_train_bin, y=y_train_bin, validation_data=(x_valid_bin, y_valid_bin), epochs=10, 
                          verbose=1)

W0717 09:47:54.533415 140223054829376 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_grad.py:1250: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where


Train on 50000 samples, validate on 10000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [16]:
baseline_model.evaluate(x=x_valid_bin, y=y_valid_bin)



[0.4758749816417694, 0.8614]

### train the bin classifier with skewed data

In [17]:
feats = m.output

In [18]:
x = GlobalAveragePooling2D()(feats)
x = Dense(10, activation='relu')(x)
y = Dense(1, activation='sigmoid')(x)
test_model = Model(inputs=m.inputs, outputs=y)

In [19]:
test_model.compile(optimizer='sgd', loss='binary_crossentropy', metrics=['acc'])

In [20]:
mask = (y_train==1)
mask = np.squeeze(mask)
mask.shape

(50000,)

In [21]:
x_train_bin_sess1 = x_train_bin[mask]
x_train_bin_sess1.shape

(5000, 32, 32, 3)

In [22]:
y_train_bin_sess1 = y_train_bin[mask]
y_train_bin_sess1.shape

(5000, 1)

In [23]:
test_model.fit(x=x_train_bin_sess1, y=y_train_bin_sess1, epochs=10, verbose=1)

Train on 5000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x7f8614794f28>

In [24]:
test_model.evaluate(x=x_valid_bin, y=y_valid_bin)



[1.5290701082196272, 0.9]

Appearantly, that wont work since this model is only taught by possitive examples. An auto negative example gens seems to be necessary. Just keep stream in negative samples and see what happens

In [25]:
# this time stream in data with 2nd class
mask = y_train==2
mask = np.squeeze(mask)
x_train_bin_sess2 = x_train_bin[mask]
y_train_bin_sess2 = y_train_bin[mask]
mask.shape, x_train_bin_sess2.shape, y_train_bin_sess2.shape

((50000,), (5000, 32, 32, 3), (5000, 1))

In [26]:
test_model.fit(x=x_train_bin_sess2, y=y_train_bin_sess2, epochs=10, verbose=1)

Train on 5000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x7f85f72fbc50>

In [27]:
testout = test_model.predict(x_valid_bin)

The model simply goes to the other extreme end

In [28]:
np.sum(testout>0.1)

0

Even in bin classifier, skewed data hurt model badly

## Train the bin classifier incrementally

Next, we create 'perfect' learning sessions manually trying to defeat the baseline model (89% valid acc). The purpose of this experiment is to figure out whether extra negtive samples help with the model

In [29]:
def get_model_bin_classify(feature_extractor):
    x = feature_extractor.output
    x = GlobalAveragePooling2D()(x)
    x = Dense(10, activation='relu')(x)
    x = Dense(1, activation='sigmoid')(x)
    model = Model(inputs=feature_extractor.input, outputs=x)
    model.compile(optimizer='sgd', loss='binary_crossentropy', metrics=['acc'])
    return model

In [30]:
def get_train_samples(X_train, y_train, cls_li):
    mask = np.zeros(y_train.shape, dtype=np.bool)
    for cls in cls_li:
        mask += (y_train==cls)
    mask = np.squeeze(mask)
    return X_train[mask], y_train[mask]
    
def eva_bin_on_cls(model, X_valid, y_valid, cls_li, pos_samples):
    for cls in cls_li:
        mask = (y_valid==cls)
        mask = np.squeeze(mask)
        X_valid_cur_sess = X_valid[mask]
        y_valid_cur_sess = y_valid[mask]
        y_valid_cur_sess = convert_to_bin_samples(y_valid_cur_sess, pos_samples)
        loss, metric = model.evaluate(x=X_valid_cur_sess, y=y_valid_cur_sess)
        print('Evaluation loss: %f. Acc: %f' % (loss, metric))

def convert_to_bin_samples(labels, pos_samples):
    # arg1 = original labels, pos_samples = samples mark possitive
    re_label = np.zeros(labels.shape, dtype=np.bool)
    for it in pos_samples:
        re_label += (labels == it)
    return re_label

In [31]:
feat_ext = ResNet50(weights='imagenet', include_top=False)

In [32]:
test_model = get_model_bin_classify(feat_ext)

In [33]:
X_train_sess, y_train_sess = get_train_samples(x_train, y_train, [0, 1])
y_train_sess = convert_to_bin_samples(y_train_sess, [0])
test_model.fit(x=X_train_sess, y=y_train_sess, epochs=5, verbose=3)
eva_bin_on_cls(test_model, x_test, y_test, [0, 1], [0])

Train on 10000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Evaluation loss: 0.158220. Acc: 0.944000
Evaluation loss: 0.068989. Acc: 0.976000


In [34]:
X_train_sess, y_train_sess = get_train_samples(x_train, y_train, [0, 2])
y_train_sess = convert_to_bin_samples(y_train_sess, [0])
test_model.fit(x=X_train_sess, y=y_train_sess, epochs=5, verbose=3)
eva_bin_on_cls(test_model, x_test, y_test, [0, 1, 2], [0])

Train on 10000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Evaluation loss: 0.320182. Acc: 0.887000
Evaluation loss: 1.066504. Acc: 0.717000
Evaluation loss: 0.175457. Acc: 0.952000


In [35]:
X_train_sess, y_train_sess = get_train_samples(x_train, y_train, [0, 3])
y_train_sess = convert_to_bin_samples(y_train_sess, [0])
test_model.fit(x=X_train_sess, y=y_train_sess, epochs=5, verbose=3)
eva_bin_on_cls(test_model, x_test, y_test, [0, 1, 2, 3], [0])

Train on 10000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Evaluation loss: 0.415038. Acc: 0.898000
Evaluation loss: 0.885310. Acc: 0.788000
Evaluation loss: 0.366227. Acc: 0.915000
Evaluation loss: 0.066786. Acc: 0.972000


In [36]:
X_train_sess, y_train_sess = get_train_samples(x_train, y_train, [0, 4])
y_train_sess = convert_to_bin_samples(y_train_sess, [0])
test_model.fit(x=X_train_sess, y=y_train_sess, epochs=5, verbose=3)
eva_bin_on_cls(test_model, x_test, y_test, [0, 1, 2, 3, 4], [0])

Train on 10000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Evaluation loss: 0.191287. Acc: 0.957000
Evaluation loss: 2.051499. Acc: 0.555000
Evaluation loss: 0.910684. Acc: 0.828000
Evaluation loss: 0.328513. Acc: 0.906000
Evaluation loss: 0.093305. Acc: 0.969000


In [37]:
X_train_sess, y_train_sess = get_train_samples(x_train, y_train, [0, 5])
y_train_sess = convert_to_bin_samples(y_train_sess, [0])
test_model.fit(x=X_train_sess, y=y_train_sess, epochs=5, verbose=3)
eva_bin_on_cls(test_model, x_test, y_test, [0, 1, 2, 3, 4, 5], [0])

Train on 10000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Evaluation loss: 0.198906. Acc: 0.954000
Evaluation loss: 2.235072. Acc: 0.577000
Evaluation loss: 0.929616. Acc: 0.831000
Evaluation loss: 0.146471. Acc: 0.961000
Evaluation loss: 0.240733. Acc: 0.931000
Evaluation loss: 0.084341. Acc: 0.978000


In [38]:
X_train_sess, y_train_sess = get_train_samples(x_train, y_train, [0, 6])
y_train_sess = convert_to_bin_samples(y_train_sess, [0])
test_model.fit(x=X_train_sess, y=y_train_sess, epochs=5, verbose=3)
eva_bin_on_cls(test_model, x_test, y_test, [0, 1, 2, 3, 4, 5, 6], [0])

Train on 10000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Evaluation loss: 0.258557. Acc: 0.944000
Evaluation loss: 1.719180. Acc: 0.666000
Evaluation loss: 0.888452. Acc: 0.852000
Evaluation loss: 0.172473. Acc: 0.954000
Evaluation loss: 0.265078. Acc: 0.933000
Evaluation loss: 0.117410. Acc: 0.974000
Evaluation loss: 0.033669. Acc: 0.993000


In [39]:
X_train_sess, y_train_sess = get_train_samples(x_train, y_train, [0, 7])
y_train_sess = convert_to_bin_samples(y_train_sess, [0])
test_model.fit(x=X_train_sess, y=y_train_sess, epochs=5, verbose=3)
eva_bin_on_cls(test_model, x_test, y_test, [0, 1, 2, 3, 4, 5, 6, 7], [0])

Train on 10000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Evaluation loss: 0.398534. Acc: 0.919000
Evaluation loss: 1.272159. Acc: 0.733000
Evaluation loss: 0.762422. Acc: 0.861000
Evaluation loss: 0.096325. Acc: 0.976000
Evaluation loss: 0.136346. Acc: 0.968000
Evaluation loss: 0.043817. Acc: 0.988000
Evaluation loss: 0.051430. Acc: 0.980000
Evaluation loss: 0.055206. Acc: 0.988000


In [40]:
X_train_sess, y_train_sess = get_train_samples(x_train, y_train, [0, 8])
y_train_sess = convert_to_bin_samples(y_train_sess, [0])
test_model.fit(x=X_train_sess, y=y_train_sess, epochs=5, verbose=3)
eva_bin_on_cls(test_model, x_test, y_test, [0, 1, 2, 3, 4, 5, 6, 7, 8], [0])

Train on 10000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Evaluation loss: 0.451995. Acc: 0.895000
Evaluation loss: 0.321037. Acc: 0.899000
Evaluation loss: 1.492800. Acc: 0.681000
Evaluation loss: 0.511559. Acc: 0.853000
Evaluation loss: 0.646994. Acc: 0.825000
Evaluation loss: 0.413486. Acc: 0.876000
Evaluation loss: 0.394960. Acc: 0.875000
Evaluation loss: 0.766840. Acc: 0.780000
Evaluation loss: 0.150032. Acc: 0.954000


In [41]:
X_train_sess, y_train_sess = get_train_samples(x_train, y_train, [0, 9])
y_train_sess = convert_to_bin_samples(y_train_sess, [0])
test_model.fit(x=X_train_sess, y=y_train_sess, epochs=5, verbose=3)
eva_bin_on_cls(test_model, x_test, y_test, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [0])

Train on 10000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Evaluation loss: 0.390695. Acc: 0.916000
Evaluation loss: 0.149020. Acc: 0.960000
Evaluation loss: 1.930993. Acc: 0.621000
Evaluation loss: 0.461053. Acc: 0.881000
Evaluation loss: 0.612185. Acc: 0.839000
Evaluation loss: 0.250886. Acc: 0.930000
Evaluation loss: 0.228665. Acc: 0.935000
Evaluation loss: 0.406943. Acc: 0.888000
Evaluation loss: 0.945941. Acc: 0.758000
Evaluation loss: 0.124537. Acc: 0.965000


The result is rather satisfied yet confusing. Why the perf on 2nd class differs by turns? Let's see how a partial trained bin classifier perform on unseened data

## Experiments on unseen negative samples

In [42]:
X_train_sess, y_train_sess = get_train_samples(x_train, y_train, [0, 1])
y_train_sess = convert_to_bin_samples(y_train_sess, [0])
test_model.fit(x=X_train_sess, y=y_train_sess, epochs=5, verbose=1)
eva_bin_on_cls(test_model, x_test, y_test, [0, 1, 2, 3, 4, 5], [0])

Train on 10000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Evaluation loss: 0.365189. Acc: 0.931000
Evaluation loss: 0.079423. Acc: 0.983000
Evaluation loss: 2.866716. Acc: 0.533000
Evaluation loss: 0.720703. Acc: 0.832000
Evaluation loss: 1.294636. Acc: 0.697000
Evaluation loss: 0.536288. Acc: 0.856000


In [43]:
X_test_subset, Y_test_set = get_train_samples(x_test, y_test, [2])
prediction = test_model.predict(X_test_subset)

In [44]:
np.sum(prediction < 0.9) / np.squeeze(prediction).shape[0]

0.621

The model **does not generate itself well enough on unseen negative samples** even by increasing the threshold up to 0.9, yet it will **degrade its performance on positive samples sufficiently**. By which, every/relavent classifiers need to be updated by turns.

## conclusion
- Performance on seen negative samples will be intefered when newly arrived negative samples observed by model. Sometimes increasing sometimes decreasing. 
- It seems like that the model's generation capability is performing better and better when enough negative classes have been trained before.
- The model performs poorly on unseen negative samples which emphasizes the necessity of upgrade when new negative class came