- Strategy ; get higher accuracy and lower loss by increasing parameters of neural network
- Based on CNN1k
- In order to increase parameters, 1st Conv2D is changed as follows;
- Before ; model.add(layers.Conv2D(64, (5, 5), activation='relu', input_shape=(28, 28, 1)))
- After ; model.add(layers.Conv2D(64, (7, 7), activation='relu', padding='same', input_shape=(28, 28, 1)))
No | batch_size | Lr | BatchNomalization | Dropout | Min of val_loss | Max of val_accuracy | Score |
---|---|---|---|---|---|---|---|
00 | 32 | default | No | No | 0.03399 (epochs=20) | 0.99298 (epochs=34) | 0.99092 (epochs=20) |
01 | 32 | reducing | No | No | 0.02485 (epochs=47) | 0.99417 (epochs=47) | 0.99389 (epochs=47) |
02 | 32 | reducing (initial=0.004257) | No | No | 0.04623 (epochs=68) | 0.98798 (epochs=68) | |
03 | 32 | reducing | No | Yes (0.4) | 0.02212 (epochs=79) | 0.99476 (epochs=75) | 0.99450 (epochs=63) |
04 | 32 | reducing | No | Yes (0.4) | 0.02218 (epochs=55) | 0.99452 (epochs=73) | 0.99407 (epochs=55) |
05 | 32 | reducing | No | Yes (0.7) | 0.10244 (epochs=62) | 0.99167 (epochs=33) | |
06 | 32 | reducing | No | Yes (0.4) | 0.02138 (epochs=65) | 0.99512 (epochs=68) | 0.99507 (epochs=62) |
07 | 32 | reducing | No | Yes (0.4) | 0.02393 (epochs=75) | 0.99429 (epochs=44) | 0.99407 (epochs=75) |
08 | 32 | reducing | No | Yes (0.4) | 0.02300 (epochs=75) | 0.99452 (epochs=53) | |
09 | 32 | reducing | No | Yes (0.4) | 0.02332 (epochs=80) | 0.99452 (epochs=74) | |
10 | 32 | reducing | No | Yes (0.4) | 0.99432 (epochs=73) |
Standard condition of CNN1l.
keras.callbacks.ReduceLROnPlateau
is used to reduce learning rate. Parameters are as follow.
- monitor='val_loss'
- factor=0.47
- patience=5
- min_lr=0.00001
Initial learning rate of Adam optimizer is 0.001. So learning rate will change 0.001 -> 0.00047 -> 0.0002209 -> 0.000103823 -> 0.00004879681 -> 0.0000229345007 -> 0.00001077921532 .
epochs are set to 100.
-
Initial learning rate ; 0.004257 (larger than default)
-
Learning rate will change 0.004257 -> 0.0020 -> 0.00094 -> 0.00044 -> 0.00020 -> 0.000098 -> 0.000005
-
Parameters of
keras.callbacks.ReduceLROnPlateau
are- monitor='val_loss',
- factor=0.47,
- patience=5,
- min_lr=0.00005,
- verbose=1
- Based on 01 (Learning Rate reducing starting lr=0.001 (default))
- Set Dropout(0.4) after every Conv2D
- Based on 03
- 2nd Conv2D layer
- Before ; model.add(layers.Conv2D(128, (3, 3), activation='relu'))
- After ; model.add(layers.Conv2D(128, (5, 5), activation='relu'))
- Based on 03, set Dropout(0.7) instead of Dropout(0.4)
- Based on 03, channels of Conv2D are doubled
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 28, 28, 64) 3200
_________________________________________________________________
dropout (Dropout) (None, 28, 28, 64) 0
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 14, 14, 64) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 12, 12, 128) 73856
_________________________________________________________________
dropout_1 (Dropout) (None, 12, 12, 128) 0
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 6, 6, 128) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 4, 4, 128) 147584
_________________________________________________________________
dropout_2 (Dropout) (None, 4, 4, 128) 0
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 28, 28, 128) 6400
_________________________________________________________________
dropout (Dropout) (None, 28, 28, 128) 0
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 14, 14, 128) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 12, 12, 256) 295168
_________________________________________________________________
dropout_1 (Dropout) (None, 12, 12, 256) 0
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 6, 6, 256) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 4, 4, 256) 590080
_________________________________________________________________
dropout_2 (Dropout) (None, 4, 4, 256) 0
- Based on 06, again channels of Conv2D are doubled
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 28, 28, 256) 12800
_________________________________________________________________
dropout (Dropout) (None, 28, 28, 256) 0
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 14, 14, 256) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 12, 12, 512) 1180160
_________________________________________________________________
dropout_1 (Dropout) (None, 12, 12, 512) 0
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 6, 6, 512) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 4, 4, 512) 2359808
_________________________________________________________________
dropout_2 (Dropout) (None, 4, 4, 512) 0
_________________________________________________________________
flatten (Flatten) (None, 8192) 0
_________________________________________________________________
dense (Dense) (None, 256) 2097408
_________________________________________________________________
dense_1 (Dense) (None, 10) 2570
=================================================================
Total params: 5,652,746
Trainable params: 5,652,746
Non-trainable params: 0
_________________________________________________________________
- Based on 07; parameters of ImageDataGenerator are changed
datagen = ImageDataGenerator(rotation_range=30,
width_shift_range=0.20,
height_shift_range=0.20,
shear_range=0.2,
zoom_range=0.2,
fill_mode='nearest')
datagen = ImageDataGenerator(rotation_range=35,
width_shift_range=0.25,
height_shift_range=0.20,
shear_range=2,
zoom_range=0.2,
fill_mode='nearest')
- Based on 07, filter of the 1st Cond2D is changed from (7,7) to (9,9)
- Based on 07, trained with no varlidation data. All data are used as train data.
- 00, epochs=20 ; 0.99092
- 01, epochs=47 ; 0.99389
- 03
- epochs=79 ; 0.99392
- epochs=63 ; 0.99450 (316 / 2326 = 0.1358)
- epochs=50 ; 0.99410
- epochs=43 ; 0.99392
- 04
- epochs=55 ; 0.99407
- epochs=41 ; 0.99382
- 06
- epochs=65 ; 0.99492 (271 / 2053 = 0.1320)
- epochs=62 ; 0.99507 (260 / 2069 = 0.1257)
- 07
- epochs=75 ; 0.99407
- epochs=43 ; 0.99403
- 10
- epochs=73 ; 0.99432
- According to train data, accuracy is higher and loss is lower than CNN1h/00.
- "val_accuracy" and "val_loss" are not stable. The values of them are similar to those of CNN1h/00.
- It seems learning rate under 10^-4 does not work. In the begining, it might be larger learning rate is preferable (?)
- For the stable reduction of loss or val_loss, BatchNormalization may be work well.
- Bad (accuracy is low and loss is high.)
- val_loss is smaller than that of 01.
- accuracy is getting higher and loss is getting lower, which agrees with the increase of parameter.
- It's better to try "grid search" to find appropriate parameters, isn't it? Or, try same cases with more parameters.
- There are more parameters in this case than that of previous one, so the rate of Dropout should be higher(?)
- Bad (accuracy is low and loss is high.)
- Seems good
- val_loss is lower and val_accuracy is higher than those of 03.
- loss and accuracy are better than 06, but val_loss and val_accuracy are seems same as those of 06.
- I think, this CNN can be trained more than 06, but cannot predict test data due to variation of train date is not enough. So parameters of ImageDataGenerator should be changed (?)
- Comparing to 07, accuracy and loss is worse, but val_accuracy and val_loss are the same
- Comparing to 07, accuracy and loss is worse, but val_accuracy and val_loss are the same
- Seems same as 08