Skip to content

Latest commit

 

History

History
135 lines (118 loc) · 4.78 KB

File metadata and controls

135 lines (118 loc) · 4.78 KB

SSD-ResNet34 FP32 training

Description

This document has instructions for running SSD-ResNet34 FP32 training using Intel-optimized TensorFlow.

Datasets

SSD-ResNet34 training uses the COCO training dataset. Use the instructions to download and preprocess the dataset.

Quick Start Scripts

Script name Description
fp32_training.sh Runs 100 training steps using mpirun for the specified number of processes (defaults to MPI_NUM_PROCESSES=1).

Run the model

Setup your environment using the instructions below, depending on if you are using AI Tools:

Setup using AI Tools Setup without AI Tools

To run using AI Tools you will need:

  • git
  • numactl
  • contextlib2
  • cpio
  • Cython
  • horovod>=0.27.0
  • jupyter
  • lxml
  • matplotlib
  • numpy>=1.17.4
  • opencv
  • openmpi
  • openssh
  • pillow>=9.3.0
  • protoc
  • pycocotools
  • tensorflow-addons==0.18.0
  • Activate the tensorflow conda environment
    conda activate tensorflow

To run without AI Tools you will need:

  • Python 3
  • git
  • numactl
  • intel-tensorflow>=2.5.0
  • contextlib2
  • cpio
  • Cython
  • horovod>=0.27.0
  • jupyter
  • lxml
  • matplotlib
  • numpy>=1.17.4
  • opencv
  • openmpi
  • openssh
  • pillow>=9.3.0
  • protoc
  • pycocotools
  • tensorflow-addons==0.18.0
  • A clone of the AI Reference Models repo
    git clone https://github.com/IntelAI/models.git

For more information on the dependencies, see the installation instructions for object detection models at the TensorFlow Model Garden repository.

Running SSD-ResNet34 training uses code from the TensorFlow Model Garden. Clone the repo at the commit specified below, and set the TF_MODELS_DIR environment variable to point to that directory.

# Clone the tensorflow/models repo at the specified commit.
# Please note that required commit for this section is different from the one used for dataset preparation.
git clone https://github.com/tensorflow/models.git tf_models
cd tf_models
export TF_MODELS_DIR=$(pwd)
git checkout 8110bb64ca63c48d0caee9d565e5b4274db2220a
cd ..

Set the DATASET_DIR to point to the directory with COCO training TF records files and the OUTPUT_DIR to the location where log and checkpoint files will be written. Use an empty output directory to prevent checkpoint file conflicts from previous runs. You can optionally set the MPI_NUM_PROCESSES environment variable (defaults to 1). After all the setup is complete, run the quickstart script.

# cd to your AI Reference Models directory
cd models

export TF_MODELS_DIR=<path to your clone of the TensorFlow models repo>
export DATASET_DIR=<path to the dataset>
export OUTPUT_DIR=<path to the directory where log and checkpoint files will be written>
export MPI_NUM_PROCESSES=<number of MPI processes (optional, defaults to 1)>
# For a custom batch size, set env var `BATCH_SIZE` or it will run with a default value.
export BATCH_SIZE=<customized batch size value>

./quickstart/object_detection/tensorflow/ssd-resnet34/training/cpu/fp32/fp32_training.sh

Additional Resources