### Objective
Comparing inference times for keras_retinanet with different backbones and image sizes, on CPU and GPU <br>
The measurements were run on the hardware: <br>
<b>CPU</b>: Intel® Core™ i9-9880H CPU @ 2.30GHz × 16, 16 Gb RAM <br>
<b>GPU</b>: GeForce RTX 2080 with Max-Q Design, 8Gb Memory

### Not to scroll down long page - results here
The number in brackets shows inference time relatively to resnet50 backbone 

#### CPU

| backbone           | 1333x800       |  2000x1500      |  3000x2250     |  4000x3000       |
| -------------------| -------------- |-----------------|----------------|------------------|
| resnet50           |  0.662 (1x)    |   2.171 (1x)    |  5.055 (1x)    |   9.967 (1x)     | 
| mobilenet128       |  0.364 (0.55x) |   1.140 (0.53x) |  2.678 (0.53x) |   4.767 (0.48x)  |
| mobilenet224       |  0.372 (0.56x) |   3.323 (1.53x) |  2.641 (0.52x) |   4.708 (0.47x)  |
| mobilenetv3_small  |  0.434 (0.65x) |   1.379 (0.64x) |  3.565 (0.71x) |   5.869 (0.59x)  |
| EfficientNetB0     |  0.868 (1.31x) |   2.959 (1.36x) | 11.656 (2.31x) |  12.571 (1.26x)  |
| EfficientNetB1     |  1.117 (1.78x) |   3.779 (1.74x) |  9.125 (1.81x) |  16.296 (1.63x)  |

Also some smaller sizes

| backbone           |   224x224      |   500x375       |  1000x750      |  
| -------------------| -------------- |-----------------|----------------|
| resnet50           |  0.042 (1x)    |   0.134 (1x)    |  0.544 (1x)    |
| mobilenet128       |  0.026 (0.62x) |   0.082 (0.61x) |  0.285 (0.52x) |
| mobilenetv3_small  |  0.037 (0.88x) |   0.097 (0.72x) |  0.336 (0.62x) |
| EfficientNetB0     |  0.055 (1.31x) |   0.166 (1.24x) |  0.671 (1.23x) |

### GPU

| backbone           | 1333x800       |  2000x1500      |  3000x2250     |  4000x3000       |
| -------------------| -------------- |-----------------|----------------|------------------|
| resnet50           |  0.094 (1x)    |   0.230 (1x)    |  0.519 (1x)    |   0.820 (1x)     | 
| mobilenet128       |  0.071 (0.75x) |   0.112 (0.49x) |  0.259 (0.45x) |   0.448 (0.55x)  |
| mobilenet224       |  0.065 (0.69x) |   0.106 (0.46x) |  0.234 (0.45x) |   0.420 (0.51x)  |
| mobilenetv3_small  |  0.077 (0.82x) |   0.129 (0.56x) |  0.282 (0.54x) |   0.475 (0.58x)  |
| EfficientNetB0     |  0.084 (0.89x) |   0.195 (0.85x) |  0.439 (0.85x) |   0.779 (0.95x)  |
| EfficientNetB1     |  0.088 (0.94x) |   0.243 (1.05x) |  0.535 (1.03x) |   0.940 (1.15x)  |

#### Large sizes with similar to resnet50 inference time (CPU)

| backbone           | Size       |  Inference time | 
| -------------------| -----------|-----------------|
| resnet50           |  1333x800  |   0.662         |
| mobilenet128       |  1792x1076 |   0.667         |
| mobilenet224       |  1716x1030 |   0.615         |
| mobilenetv3_small  |  1635x981  |   0.661         |

### Experiments code sources/output

In [1]:
import numpy as np
import progressbar
import random
import time
# need keras_retinanet be installed in the system/virtualenv
# for instance, by running 'pip install . --user' in lacmus project directory
from keras_retinanet.utils.image import read_image_bgr, preprocess_image, resize_image
from keras_retinanet.preprocessing.pascal_voc import PascalVocGenerator
from keras_retinanet.utils.gpu import setup_gpu
from keras_retinanet import models

Using TensorFlow backend.


In [35]:
def measure_processing_time(model, generator, samples_count=100):
    inference_time = 0.0
    # time for loading, preprocessing, resizing etc. 
    accessory_time = 0.0
    
    # warm up
    image = generator.load_image(0)
    for i in range(3):
        boxes, scores, labels = model.predict_on_batch(np.expand_dims(image, axis=0))
    
    #run and measure
    for i in progressbar.progressbar(range(samples_count)):
        start = time.time()
        image_index = random.randint(0, generator.size() - 1)
        image = generator.load_image(image_index) 
        image = generator.preprocess_image(image)
        #image, scale = generator.resize_image(image)
        scale = 1.0
        accessory_end = time.time()
        accessory_time += accessory_end - start
        
        # process image
        boxes, scores, labels = model.predict_on_batch(np.expand_dims(image, axis=0))
        inference_end = time.time()
        inference_time += inference_end - accessory_end
        
        # correct for image scale
        boxes /= scale
        accessory_time += time.time() - inference_end
        
    return inference_time / samples_count, accessory_time / samples_count

In [3]:
dataset_path = '../data/laddv4/full'

In [4]:
def create_model(backbone_name, num_classes=1):
    backbone_factory = models.backbone(backbone_name)
    model = backbone_factory.retinanet(num_classes)
    return models.convert_model(model)

In [5]:
sizes = [
    (1333, 800),
    (2000, 1500),
    (3000, 2250),
    (4000, 3000)
]

### CPU

In [6]:
backbone = 'resnet50'
model = create_model(backbone)
for (max_side, min_side) in sizes:
        generator = PascalVocGenerator(dataset_path, 'trainval', image_min_side=min_side, image_max_side=max_side)
        inference, _ = measure_processing_time(model, generator)
        print(backbone, str(max_side) + 'x' + str(min_side), inference)


tracking <tf.Variable 'Variable:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_1:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_2:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_3:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_4:0' shape=(9, 4) dtype=float32> anchors
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where



100% (100 of 100) |######################| Elapsed Time: 0:01:32 Time:  0:01:32


resnet50 1333x800 0.6628084802627563


100% (100 of 100) |######################| Elapsed Time: 0:04:04 Time:  0:04:04


resnet50 2000x1500 2.1718873810768127


100% (100 of 100) |######################| Elapsed Time: 0:08:53 Time:  0:08:53


resnet50 3000x2250 5.055593535900116


100% (100 of 100) |######################| Elapsed Time: 0:17:05 Time:  0:17:05


resnet50 4000x3000 9.969695029258729


In [7]:
mobile_backbones = [
    'mobilenet128_0.1',
    'mobilenet224_0.1',
    'mobilenet_v3_small'
]

In [8]:
for backbone in mobile_backbones:
    model = create_model(backbone)
    for (max_side, min_side) in sizes:
        generator = PascalVocGenerator(dataset_path, 'trainval', image_min_side=min_side, image_max_side=max_side)
        inference, _ = measure_processing_time(model, generator)
        print(backbone, str(max_side) + 'x' + str(min_side), inference)

tracking <tf.Variable 'Variable_5:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_6:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_7:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_8:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_9:0' shape=(9, 4) dtype=float32> anchors


100% (100 of 100) |######################| Elapsed Time: 0:01:03 Time:  0:01:03


mobilenet128_0.1 1333x800 0.3642017483711243


100% (100 of 100) |######################| Elapsed Time: 0:02:21 Time:  0:02:21


mobilenet128_0.1 2000x1500 1.1396586179733277


100% (100 of 100) |######################| Elapsed Time: 0:04:56 Time:  0:04:56


mobilenet128_0.1 3000x2250 2.67762978553772


100% (100 of 100) |######################| Elapsed Time: 0:08:26 Time:  0:08:26


mobilenet128_0.1 4000x3000 4.766540586948395
tracking <tf.Variable 'Variable_10:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_11:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_12:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_13:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_14:0' shape=(9, 4) dtype=float32> anchors


100% (100 of 100) |######################| Elapsed Time: 0:01:03 Time:  0:01:03


mobilenet224_0.1 1333x800 0.3723181390762329


100% (100 of 100) |######################| Elapsed Time: 0:13:31 Time:  0:13:31


mobilenet224_0.1 2000x1500 7.838997864723206


100% (100 of 100) |######################| Elapsed Time: 0:04:50 Time:  0:04:50


mobilenet224_0.1 3000x2250 2.640888912677765


100% (100 of 100) |######################| Elapsed Time: 0:08:18 Time:  0:08:18


mobilenet224_0.1 4000x3000 4.70863573551178
tracking <tf.Variable 'Variable_15:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_16:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_17:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_18:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_19:0' shape=(9, 4) dtype=float32> anchors


100% (100 of 100) |######################| Elapsed Time: 0:01:08 Time:  0:01:08


mobilenet_v3_small 1333x800 0.43451151371002195


100% (100 of 100) |######################| Elapsed Time: 0:02:43 Time:  0:02:43


mobilenet_v3_small 2000x1500 1.3799791264533996


100% (100 of 100) |######################| Elapsed Time: 0:06:22 Time:  0:06:22


mobilenet_v3_small 3000x2250 3.5652736115455625


100% (100 of 100) |######################| Elapsed Time: 0:10:16 Time:  0:10:16


mobilenet_v3_small 4000x3000 5.869212965965271


In [9]:
efficientnet_backbones = [
    'EfficientNetB0',
    'EfficientNetB1',
    #'EfficientNetB2',
    #'EfficientNetB3',
    'EfficientNetB4',
    #'EfficientNetB5',
    #'EfficientNetB6',
    'EfficientNetB7'
]

In [10]:
for backbone in efficientnet_backbones:
    model = create_model(backbone)
    for (max_side, min_side) in sizes:
        generator = PascalVocGenerator(dataset_path, 'trainval', image_min_side=min_side, image_max_side=max_side)
        inference, _ = measure_processing_time(model, generator)
        print(backbone, str(max_side) + 'x' + str(min_side), inference)

tracking <tf.Variable 'Variable_20:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_21:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_22:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_23:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_24:0' shape=(9, 4) dtype=float32> anchors


100% (100 of 100) |######################| Elapsed Time: 0:01:52 Time:  0:01:52


EfficientNetB0 1333x800 0.8687555909156799


100% (100 of 100) |######################| Elapsed Time: 0:05:22 Time:  0:05:22


EfficientNetB0 2000x1500 2.959630126953125


100% (100 of 100) |######################| Elapsed Time: 0:19:53 Time:  0:19:53


EfficientNetB0 3000x2250 11.656407475471497


100% (100 of 100) |######################| Elapsed Time: 0:21:26 Time:  0:21:26


EfficientNetB0 4000x3000 12.571136150360108
tracking <tf.Variable 'Variable_25:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_26:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_27:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_28:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_29:0' shape=(9, 4) dtype=float32> anchors


100% (100 of 100) |######################| Elapsed Time: 0:02:17 Time:  0:02:17


EfficientNetB1 1333x800 1.1175025749206542


100% (100 of 100) |######################| Elapsed Time: 0:06:44 Time:  0:06:44


EfficientNetB1 2000x1500 3.779855136871338


100% (100 of 100) |######################| Elapsed Time: 0:15:39 Time:  0:15:39


EfficientNetB1 3000x2250 9.125654451847076


100% (100 of 100) |######################| Elapsed Time: 0:27:39 Time:  0:27:39


EfficientNetB1 4000x3000 16.296377265453337
tracking <tf.Variable 'Variable_30:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_31:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_32:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_33:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_34:0' shape=(9, 4) dtype=float32> anchors


100% (100 of 100) |######################| Elapsed Time: 0:03:45 Time:  0:03:45


EfficientNetB4 1333x800 1.9716568541526795


  6% (6 of 100) |#                       | Elapsed Time: 0:00:38 ETA:   0:08:42

KeyboardInterrupt: 

Rerun some strange result

In [14]:
backbone = 'mobilenet224_0.1'
model = create_model(backbone)
generator = PascalVocGenerator(dataset_path, 'trainval', image_min_side=1500, image_max_side=2000)

inference, _ = measure_processing_time(model, generator, samples_count=20)
print(backbone, str(max_side) + 'x' + str(min_side), inference)

100% (20 of 20) |########################| Elapsed Time: 0:01:13 Time:  0:01:13


mobilenet224_0.1 2000x1500 3.3226078152656555


Some smaller sizes

In [18]:
small_sizes = [
    (224, 224),
    (500, 375),
    (1000, 750)
]

In [19]:
backbones = [
    'resnet50',
    'mobilenet128_0.1',
    'mobilenet_v3_small',
    'EfficientNetB0'
]

In [20]:
for backbone in backbones:
    model = create_model(backbone)
    for (max_side, min_side) in small_sizes:
        generator = PascalVocGenerator(dataset_path, 'trainval', image_min_side=min_side, image_max_side=max_side)
        inference, _ = measure_processing_time(model, generator)
        print(backbone, str(max_side) + 'x' + str(min_side), inference)

tracking <tf.Variable 'Variable_60:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_61:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_62:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_63:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_64:0' shape=(9, 4) dtype=float32> anchors


100% (100 of 100) |######################| Elapsed Time: 0:00:26 Time:  0:00:26


resnet50 224x224 0.04232275009155274


100% (100 of 100) |######################| Elapsed Time: 0:00:37 Time:  0:00:37


resnet50 500x375 0.13475305080413819


100% (100 of 100) |######################| Elapsed Time: 0:01:20 Time:  0:01:20


resnet50 1000x750 0.544783263206482
tracking <tf.Variable 'Variable_65:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_66:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_67:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_68:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_69:0' shape=(9, 4) dtype=float32> anchors


100% (100 of 100) |######################| Elapsed Time: 0:00:25 Time:  0:00:25


mobilenet128_0.1 224x224 0.026973366737365723


100% (100 of 100) |######################| Elapsed Time: 0:00:31 Time:  0:00:31


mobilenet128_0.1 500x375 0.08248742818832397


100% (100 of 100) |######################| Elapsed Time: 0:00:52 Time:  0:00:52


mobilenet128_0.1 1000x750 0.2859938931465149
tracking <tf.Variable 'Variable_70:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_71:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_72:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_73:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_74:0' shape=(9, 4) dtype=float32> anchors


100% (100 of 100) |######################| Elapsed Time: 0:00:27 Time:  0:00:27


mobilenet_v3_small 224x224 0.03709912061691284


100% (100 of 100) |######################| Elapsed Time: 0:00:34 Time:  0:00:34


mobilenet_v3_small 500x375 0.09725175619125366


100% (100 of 100) |######################| Elapsed Time: 0:00:58 Time:  0:00:58


mobilenet_v3_small 1000x750 0.33656031847000123
tracking <tf.Variable 'Variable_75:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_76:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_77:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_78:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_79:0' shape=(9, 4) dtype=float32> anchors


100% (100 of 100) |######################| Elapsed Time: 0:00:30 Time:  0:00:30


EfficientNetB0 224x224 0.05594579219818115


100% (100 of 100) |######################| Elapsed Time: 0:00:42 Time:  0:00:42


EfficientNetB0 500x375 0.1666136837005615


100% (100 of 100) |######################| Elapsed Time: 0:01:33 Time:  0:01:33


EfficientNetB0 1000x750 0.6717886352539062


### GPU

In [6]:
setup_gpu(0)






In [7]:
backbone = 'resnet50'
model = create_model(backbone)
for (max_side, min_side) in sizes:
        generator = PascalVocGenerator(dataset_path, 'trainval', image_min_side=min_side, image_max_side=max_side)
        inference, _ = measure_processing_time(model, generator)
        print(backbone, str(max_side) + 'x' + str(min_side), inference)


tracking <tf.Variable 'Variable:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_1:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_2:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_3:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_4:0' shape=(9, 4) dtype=float32> anchors
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where



100% (100 of 100) |######################| Elapsed Time: 0:00:36 Time:  0:00:36


resnet50 1333x800 0.09466996431350708


100% (100 of 100) |######################| Elapsed Time: 0:00:51 Time:  0:00:51


resnet50 2000x1500 0.23067954540252686


100% (100 of 100) |######################| Elapsed Time: 0:01:17 Time:  0:01:17


resnet50 3000x2250 0.5190374398231506


100% (100 of 100) |######################| Elapsed Time: 0:01:51 Time:  0:01:51


resnet50 4000x3000 0.8200269746780395


In [8]:
mobile_backbones = [
    'mobilenet128_0.1',
    'mobilenet224_0.1',
    'mobilenet_v3_small'
]

In [9]:
for backbone in mobile_backbones:
    model = create_model(backbone)
    for (max_side, min_side) in sizes:
        generator = PascalVocGenerator(dataset_path, 'trainval', image_min_side=min_side, image_max_side=max_side)
        inference, _ = measure_processing_time(model, generator)
        print(backbone, str(max_side) + 'x' + str(min_side), inference)

tracking <tf.Variable 'Variable_5:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_6:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_7:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_8:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_9:0' shape=(9, 4) dtype=float32> anchors


100% (100 of 100) |######################| Elapsed Time: 0:00:33 Time:  0:00:33


mobilenet128_0.1 1333x800 0.0714805269241333


100% (100 of 100) |######################| Elapsed Time: 0:00:38 Time:  0:00:38


mobilenet128_0.1 2000x1500 0.11212804794311523


100% (100 of 100) |######################| Elapsed Time: 0:00:54 Time:  0:00:54


mobilenet128_0.1 3000x2250 0.25909081935882566


100% (100 of 100) |######################| Elapsed Time: 0:01:13 Time:  0:01:13


mobilenet128_0.1 4000x3000 0.44818278789520266
tracking <tf.Variable 'Variable_10:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_11:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_12:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_13:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_14:0' shape=(9, 4) dtype=float32> anchors


100% (100 of 100) |######################| Elapsed Time: 0:00:32 Time:  0:00:32


mobilenet224_0.1 1333x800 0.0659174394607544


100% (100 of 100) |######################| Elapsed Time: 0:00:38 Time:  0:00:38


mobilenet224_0.1 2000x1500 0.10697633743286133


100% (100 of 100) |######################| Elapsed Time: 0:00:51 Time:  0:00:51


mobilenet224_0.1 3000x2250 0.23436393737792968


100% (100 of 100) |######################| Elapsed Time: 0:01:10 Time:  0:01:10


mobilenet224_0.1 4000x3000 0.42085651636123655
tracking <tf.Variable 'Variable_15:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_16:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_17:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_18:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_19:0' shape=(9, 4) dtype=float32> anchors


100% (100 of 100) |######################| Elapsed Time: 0:00:33 Time:  0:00:33


mobilenet_v3_small 1333x800 0.07704132795333862


100% (100 of 100) |######################| Elapsed Time: 0:00:41 Time:  0:00:41


mobilenet_v3_small 2000x1500 0.129308979511261


100% (100 of 100) |######################| Elapsed Time: 0:00:56 Time:  0:00:56


mobilenet_v3_small 3000x2250 0.2826223611831665


100% (100 of 100) |######################| Elapsed Time: 0:01:14 Time:  0:01:14


mobilenet_v3_small 4000x3000 0.47521668434143066


In [10]:
efficientnet_backbones = [
    'EfficientNetB0',
    'EfficientNetB1',
    'EfficientNetB2',
    'EfficientNetB3',
    'EfficientNetB4',
    'EfficientNetB5',
    'EfficientNetB6',
    'EfficientNetB7'
]

In [None]:
for backbone in efficientnet_backbones:
    model = create_model(backbone)
    for (max_side, min_side) in sizes:
        generator = PascalVocGenerator(dataset_path, 'trainval', image_min_side=min_side, image_max_side=max_side)
        inference, _ = measure_processing_time(model, generator)
        print(backbone, str(max_side) + 'x' + str(min_side), inference)

tracking <tf.Variable 'Variable_20:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_21:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_22:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_23:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_24:0' shape=(9, 4) dtype=float32> anchors


100% (100 of 100) |######################| Elapsed Time: 0:00:34 Time:  0:00:34


EfficientNetB0 1333x800 0.08422770500183105


100% (100 of 100) |######################| Elapsed Time: 0:00:47 Time:  0:00:47


EfficientNetB0 2000x1500 0.1957192587852478


100% (100 of 100) |######################| Elapsed Time: 0:01:11 Time:  0:01:11


EfficientNetB0 3000x2250 0.43989492654800416


100% (100 of 100) |######################| Elapsed Time: 0:01:46 Time:  0:01:46


EfficientNetB0 4000x3000 0.7795923185348511
tracking <tf.Variable 'Variable_25:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_26:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_27:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_28:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_29:0' shape=(9, 4) dtype=float32> anchors


100% (100 of 100) |######################| Elapsed Time: 0:00:35 Time:  0:00:35


EfficientNetB1 1333x800 0.08809706449508667


100% (100 of 100) |######################| Elapsed Time: 0:00:52 Time:  0:00:52


EfficientNetB1 2000x1500 0.24392809867858886


100% (100 of 100) |######################| Elapsed Time: 0:01:20 Time:  0:01:20


EfficientNetB1 3000x2250 0.5353184771537781


100% (100 of 100) |######################| Elapsed Time: 0:02:02 Time:  0:02:02


EfficientNetB1 4000x3000 0.9409803867340087
tracking <tf.Variable 'Variable_30:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_31:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_32:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_33:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_34:0' shape=(9, 4) dtype=float32> anchors


100% (100 of 100) |######################| Elapsed Time: 0:00:36 Time:  0:00:36


EfficientNetB2 1333x800 0.09191283464431763


100% (100 of 100) |######################| Elapsed Time: 0:00:54 Time:  0:00:54


EfficientNetB2 2000x1500 0.2640200686454773


100% (100 of 100) |######################| Elapsed Time: 0:01:23 Time:  0:01:23


EfficientNetB2 3000x2250 0.5759849405288696


100% (100 of 100) |######################| Elapsed Time: 0:02:10 Time:  0:02:10


EfficientNetB2 4000x3000 1.02139301776886
tracking <tf.Variable 'Variable_35:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_36:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_37:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_38:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_39:0' shape=(9, 4) dtype=float32> anchors


100% (100 of 100) |######################| Elapsed Time: 0:00:35 Time:  0:00:35


EfficientNetB3 1333x800 0.10080732583999634


100% (100 of 100) |######################| Elapsed Time: 0:00:58 Time:  0:00:58


EfficientNetB3 2000x1500 0.2978340244293213


100% (100 of 100) |######################| Elapsed Time: 0:01:33 Time:  0:01:33


EfficientNetB3 3000x2250 0.6732856345176697


100% (100 of 100) |######################| Elapsed Time: 0:02:25 Time:  0:02:25


EfficientNetB3 4000x3000 1.1714683508872985
tracking <tf.Variable 'Variable_40:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_41:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_42:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_43:0' shape=(9, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_44:0' shape=(9, 4) dtype=float32> anchors


100% (100 of 100) |######################| Elapsed Time: 0:00:38 Time:  0:00:38


EfficientNetB4 1333x800 0.12273181915283203


100% (100 of 100) |######################| Elapsed Time: 0:01:03 Time:  0:01:03


EfficientNetB4 2000x1500 0.3567672824859619


100% (100 of 100) |######################| Elapsed Time: 0:01:47 Time:  0:01:47


EfficientNetB4 3000x2250 0.8189979839324951


 74% (74 of 100) |#################      | Elapsed Time: 0:02:04 ETA:   0:00:47

### Trying to find larger sizes that takes inference time as 1333x800 for resnet50 backbone 

In [5]:
setup_gpu('cpu')

In [None]:
resnet50 = create_model('resnet50')
mobilenet128 = create_model('mobilenet128_0.1')
mobilenet224 = create_model('mobilenet224_0.1')
mobilenetv3 = create_model('mobilenet_v3_small') 

In [8]:
def get_relatively_increased_size(acc_samples, width=1333, height=800):
    avg = sum(acc_samples) / len(acc_samples)
    square_increased = width * height * (1.0 / avg)
    square_original = width * height
    relative = square_increased / square_original
    return width * relative**0.5, height * relative**0.5

#### Mobilenet128

In [9]:
m128_acc = [0.55, 0.53, 0.53, 0.48, 0.62, 0.61, 0.52, 0.55, 0.45, 0.49, 0.75]
new_width, new_height = get_relatively_increased_size(m128_acc)
print(new_width, new_height)

1792.9769331741936 1076.0551736979407


In [10]:
generator = PascalVocGenerator(dataset_path, 'trainval', image_min_side=new_height, image_max_side=new_width)
inference128, _ = measure_processing_time(mobilenet128, generator)
print(inference128)




100% (100 of 100) |######################| Elapsed Time: 0:01:33 Time:  0:01:33


0.6672049713134766


#### Mobilenet224

In [19]:
m224_acc = [0.56, 0.52, 0.47, 0.69, 0.46, 0.45, 0.51]
new_width, new_height = get_relatively_increased_size(m224_acc)
print(new_width, new_height)

1843.4808338302444 1106.3650915710393


In [20]:
generator = PascalVocGenerator(dataset_path, 'trainval', image_min_side=new_height, image_max_side=new_width)
inference224, _ = measure_processing_time(mobilenet224, generator)
print(inference224)

100% (100 of 100) |######################| Elapsed Time: 0:01:39 Time:  0:01:39


0.7177152252197265


In [21]:
new_width *= 0.99
new_height *= 0.99
generator = PascalVocGenerator(dataset_path, 'trainval', image_min_side=new_height, image_max_side=new_width)
inference224, _ = measure_processing_time(mobilenet224, generator)
print(inference224)

100% (100 of 100) |######################| Elapsed Time: 0:01:36 Time:  0:01:36


0.6885157752037049


In [23]:
new_width *= 0.95
new_height *= 0.95
generator = PascalVocGenerator(dataset_path, 'trainval', image_min_side=new_height, image_max_side=new_width)
inference224, _ = measure_processing_time(mobilenet224, generator)
print(inference224)

100% (100 of 100) |######################| Elapsed Time: 0:01:30 Time:  0:01:30


0.6157496452331543


In [24]:
print(new_width, new_height)

1716.4557869751711 1030.1310049363367


#### Mobilenetv3

In [25]:
mv3_acc = [0.65, 0.64, 0.71, 0.59, 0.88, 0.72, 0.62, 0.82, 0.56, 0.54, 0.58]
new_width, new_height = get_relatively_increased_size(mv3_acc)
print(new_width, new_height)

1635.1884223142567 981.3583929868008


In [26]:
generator = PascalVocGenerator(dataset_path, 'trainval', image_min_side=new_height, image_max_side=new_width)
inferencev3, _ = measure_processing_time(mobilenetv3, generator)

100% (100 of 100) |######################| Elapsed Time: 0:01:33 Time:  0:01:33


In [27]:
print(inferencev3)

0.6613560914993286
