<a href="https://colab.research.google.com/github/ilkoretskiy/jockey_detection/blob/master/Tensorflow_Object_Detection_API_Train_jockey_detection_model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
%tensorflow_version 1.x

TensorFlow 1.x selected.


## Check GPU

In [2]:
import tensorflow as tf
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
  raise SystemError('GPU device not found')
print('Found GPU at: {}'.format(device_name))

Found GPU at: /device:GPU:0


In [3]:
tf.version

<module 'tensorflow._api.v1.version' from '/tensorflow-1.15.2/python3.6/tensorflow_core/_api/v1/version/__init__.py'>

In [0]:
import os

At this point i assume that you've already uploaded all you data in ```ml_research/jockey_detection``` folder

In [0]:
ml_folder = "/content/drive/My Drive/ml_research"
%cd {ml_folder}

## Install dependencies

In [0]:
!apt-get install -qq protobuf-compiler python-pil python-lxml python-tk
!pip install -q Cython contextlib2 pillow lxml matplotlib

## Download object detection api

In [0]:
!git clone https://github.com/tensorflow/models.git
%cd {ml_folder}/models/research
!protoc object_detection/protos/*.proto --python_out=.
!python object_detection/builders/model_builder_test.py

/content/drive/My Drive/colab_tf_od_train_results/models/research


## Download cocoapi

In [0]:
%cd {ml_folder}
!git clone https://github.com/cocodataset/cocoapi.git
%cd {ml_folder}/cocoapi/PythonAPI
!make
!cp -r pycocotools {ml_folder}/models/research/

## Download pretrained ssd mobilenet

In [0]:
%mkdir  {ml_folder}/pretrained_models/
%cd {ml_folder}/pretrained_models/
!wget http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v1_coco_2018_01_28.tar.gz
!tar -xzvf ssd_mobilenet_v1_coco_2018_01_28.tar.gz
%rm ./ssd_mobilenet_v1_coco_2018_01_28.tar.gz

## Fix ssd_mobilenet config

Go to the folder where we are going to store the config file

In [0]:
%cd {ml_folder}/jockey_detection/model

The best way to fix paths in the config that i've found is to write them directly in the file from a cell.

Just copy what you've already made on your computer and paste it here, after `%%writefile`

*It is not good enough in case when you want to make a choice of the job more flexible.*
*However i think it's good enough for now.*


In [0]:
%%writefile ./ssd_mobilenet_v1.config

# SSD with Mobilenet v1 configuration for MSCOCO Dataset.
# Users should configure the fine_tune_checkpoint field in the train config as
# well as the label_map_path and input_path fields in the train_input_reader and
# eval_input_reader. Search for "PATH_TO_BE_CONFIGURED" to find the fields that
# should be configured.

model {
  ssd {
    num_classes: 1
    box_coder {
      faster_rcnn_box_coder {
        y_scale: 10.0
        x_scale: 10.0
        height_scale: 5.0
        width_scale: 5.0
      }
    }
    matcher {
      argmax_matcher {
        matched_threshold: 0.5
        unmatched_threshold: 0.5
        ignore_thresholds: false
        negatives_lower_than_unmatched: true
        force_match_for_each_row: true
      }
    }
    similarity_calculator {
      iou_similarity {
      }
    }
    anchor_generator {
      ssd_anchor_generator {
        num_layers: 6
        min_scale: 0.2
        max_scale: 0.95
        aspect_ratios: 1.0
        aspect_ratios: 2.0
        aspect_ratios: 0.5
        aspect_ratios: 3.0
        aspect_ratios: 0.3333
      }
    }
    image_resizer {
      fixed_shape_resizer {
        height: 300
        width: 300
      }
    }
    box_predictor {
      convolutional_box_predictor {
        min_depth: 0
        max_depth: 0
        num_layers_before_predictor: 0
        use_dropout: false
        dropout_keep_probability: 0.8
        kernel_size: 1
        box_code_size: 4
        apply_sigmoid_to_scores: false
        conv_hyperparams {
          activation: RELU_6,
          regularizer {
            l2_regularizer {
              weight: 0.00004
            }
          }
          initializer {
            truncated_normal_initializer {
              stddev: 0.03
              mean: 0.0
            }
          }
          batch_norm {
            train: true,
            scale: true,
            center: true,
            decay: 0.9997,
            epsilon: 0.001,
          }
        }
      }
    }
    feature_extractor {
      type: 'ssd_mobilenet_v1'
      min_depth: 16
      depth_multiplier: 1.0
      conv_hyperparams {
        activation: RELU_6,
        regularizer {
          l2_regularizer {
            weight: 0.00004
          }
        }
        initializer {
          truncated_normal_initializer {
            stddev: 0.03
            mean: 0.0
          }
        }
        batch_norm {
          train: true,
          scale: true,
          center: true,
          decay: 0.9997,
          epsilon: 0.001,
        }
      }
    }
    loss {
      classification_loss {
        weighted_sigmoid {
        }
      }
      localization_loss {
        weighted_smooth_l1 {
        }
      }
      hard_example_miner {
        num_hard_examples: 3000
        iou_threshold: 0.99
        loss_type: CLASSIFICATION
        max_negatives_per_positive: 3
        min_negatives_per_image: 0
      }
      classification_weight: 1.0
      localization_weight: 1.0
    }
    normalize_loss_by_num_matches: true
    post_processing {
      batch_non_max_suppression {
        score_threshold: 1e-8
        iou_threshold: 0.6
        max_detections_per_class: 100
        max_total_detections: 100
      }
      score_converter: SIGMOID
    }
  }
}

train_config: {
  batch_size: 24
  optimizer {
    rms_prop_optimizer: {
      learning_rate: {
        exponential_decay_learning_rate {
          initial_learning_rate: 0.004
          decay_steps: 800720
          decay_factor: 0.95
        }
      }
      momentum_optimizer_value: 0.9
      decay: 0.9
      epsilon: 1.0
    }
  }
  fine_tune_checkpoint: "/content/drive/My\ Drive/ml_research/pretrained_models/ssd_mobilenet_v1_coco_2018_01_28/model.ckpt.index"
  from_detection_checkpoint: true
  # Note: The below line limits the training process to 200K steps, which we
  # empirically found to be sufficient enough to train the pets dataset. This
  # effectively bypasses the learning rate schedule (the learning rate will
  # never decay). Remove the below line to train indefinitely.
  num_steps: 200000
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
  data_augmentation_options {
    ssd_random_crop {
    }
  }
}

train_input_reader: {
  tf_record_input_reader {
    input_path: "/content/drive/My\ Drive/ml_research/jockey_detection/data/tfrecords/jockey_train.record"
  }
  label_map_path: "/content/drive/My\ Drive/ml_research/jockey_detection/data/labelmap.pbtxt"
}

eval_config: {
  num_examples: 11
  # Note: The below line limits the evaluation process to 10 evaluations.
  # Remove the below line to evaluate indefinitely.
  # max_evals: 10
}

eval_input_reader: {
  tf_record_input_reader {
    input_path: "/content/drive/My\ Drive/ml_research/jockey_detection/data/tfrecords/jockey_eval.record"
  }
  label_map_path: "/content/drive/My\ Drive/ml_research/jockey_detection/data/labelmap.pbtxt"
  shuffle: false
  num_readers: 1
}


## Train model

In [0]:
%cd {ml_folder}/models/research/

! PYTHONPATH=${PYTHONPATH}:./:./object_detection:./slim python object_detection/model_main.py \
  --pipeline_config_path="/content/drive/My Drive/ml_research/models/jockey_detection/model/ssd_mobilenet_v1.config" \
  --model_dir="/content/drive/My Drive/ml_research/models/jockey_detection/model/" \
  --sample_1_of_n_eval_examples=1 \
  --num_train_steps=200000 \
  --alsologtostderr


## Evaluate model once

In [0]:
%cd /content/drive/My\ Drive/colab_tf_od_train_results/models/research/

! PYTHONPATH=${PYTHONPATH}:./:./object_detection:./slim python object_detection/model_main.py \
  --pipeline_config_path="/content/drive/My Drive/ml_research/models/jockey_detection/model/ssd_mobilenet_v1.config" \
  --checkpoint_dir="/content/drive/My Drive/ml_research/models/jockey_detection/model/train" \
  --model_dir="/content/drive/My Drive/ml_research/models/jockey_detection/model/eval_train" \
  --eval_training_data=True \
  --run_once \
  --alsologtostderr

! PYTHONPATH=${PYTHONPATH}:./:./object_detection:./slim python object_detection/model_main.py \
  --pipeline_config_path="/content/drive/My Drive/ml_research/models/jockey_detection/model/ssd_mobilenet_v1.config" \
  --checkpoint_dir="/content/drive/My Drive/ml_research/models/jockey_detection/model/train" \
  --model_dir="/content/drive/My Drive/ml_research/models/jockey_detection/model/eval_test" \
  --run_once \
  --alsologtostderr

## Download ngrok (optional)

You could read more about ngrok [here](https://ngrok.com/product)

It is not necessary to use ngrok if you're working with TF v2, however i found it very useful.

In [0]:
!mkdir {ml_folder}/ngrok
%cd {ml_folder}/ngrok

!wget https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip
!unzip ngrok-stable-linux-amd64.zip
!rm ngrok-stable-linux-amd64.zip
!chmod 755 {ml_folder}/ngrok/ngrok

Archive:  ngrok-stable-linux-amd64.zip
  inflating: ngrok                   


## Launching tensorboard

I'm not sure if it is only my problem, but i very often had errors when tried to launch tensorboard with enabled TF v1.

The next code supposed to work TF v1, but i can't guarantee that.

In [0]:
LOG_DIR = os.path.join(ml_folder, "jockey_detection")


get_ipython().system_raw(
    'tensorboard --logdir {} --host 0.0.0.0 --port 6006 &'
    .format(LOG_DIR)
)


Let's check if we succesfully did it.

In case of mistake you'll get 
```Failed to connect to localhost port 6006: Connection refused```

Otherwise there will be a listing of HTML code


In [0]:
! curl http://localhost:6006

Launch ngrok

In [0]:
ngrok_path = os.path.join(ml_folder, "ngrok", "ngrok")
get_ipython().system_raw('{} http 6006 &'.format(ngrok_path))

Get the url where you can observe your tensorboard

In [17]:
! curl -s http://localhost:4040/api/tunnels | python3 -c \
    "import sys, json; print(json.load(sys.stdin)['tunnels'][0]['public_url'])"

http://0bef416e.ngrok.io


## Launching tensorboard with TF v2
Please, pay attention, that you need restart runtime if you've aready switched to tf v1. 

You could do this "Runtime(tab) -> Restart runtime"

In [1]:
%cd ml_folder/"/jockey_detection/model"

/content/drive/My Drive/colab_tf_od_train_results/models/jockey_detection/model


In [10]:
%load_ext tensorboard
%tensorboard --logdir=./ --port=6006
# %reload_ext tensorboard

The tensorboard extension is already loaded. To reload it, use:
  %reload_ext tensorboard


<IPython.core.display.Javascript object>

If you see a tensorboard in a cell it means that everything went fine.

What i personally found more convenient is to make an another notebook with tensorboard launch only. In this case you don't need to restart runtime.