SSD-ResNet34 FP32 training

Description

This document has instructions for running SSD-ResNet34 FP32 training using Intel-optimized TensorFlow.

Datasets

SSD-ResNet34 training uses the COCO training dataset. Use the instructions to download and preprocess the dataset.

Quick Start Scripts

Script name	Description
`fp32_training.sh`	Runs 100 training steps using mpirun for the specified number of processes (defaults to MPI_NUM_PROCESSES=1).

Run the model

Setup your environment using the instructions below, depending on if you are using AI Tools:

Setup using AI Tools

Setup without AI Tools

To run using AI Tools you will need:

git
numactl
contextlib2
cpio
Cython
horovod>=0.27.0
jupyter
lxml
matplotlib
numpy>=1.17.4
opencv
openmpi
openssh
pillow>=9.3.0
protoc
pycocotools
tensorflow-addons==0.18.0
Activate the tensorflow conda environment
```
conda activate tensorflow
```

To run without AI Tools you will need:

Python 3
git
numactl
intel-tensorflow>=2.5.0
contextlib2
cpio
Cython
horovod>=0.27.0
jupyter
lxml
matplotlib
numpy>=1.17.4
opencv
openmpi
openssh
pillow>=9.3.0
protoc
pycocotools
tensorflow-addons==0.18.0

A clone of the AI Reference Models repo

git clone https://github.com/IntelAI/models.git

For more information on the dependencies, see the installation instructions for object detection models at the TensorFlow Model Garden repository.

Running SSD-ResNet34 training uses code from the TensorFlow Model Garden. Clone the repo at the commit specified below, and set the TF_MODELS_DIR environment variable to point to that directory.

# Clone the tensorflow/models repo at the specified commit.
# Please note that required commit for this section is different from the one used for dataset preparation.
git clone https://github.com/tensorflow/models.git tf_models
cd tf_models
export TF_MODELS_DIR=$(pwd)
git checkout 8110bb64ca63c48d0caee9d565e5b4274db2220a
cd ..

Set the DATASET_DIR to point to the directory with COCO training TF records files and the OUTPUT_DIR to the location where log and checkpoint files will be written. Use an empty output directory to prevent checkpoint file conflicts from previous runs. You can optionally set the MPI_NUM_PROCESSES environment variable (defaults to 1). After all the setup is complete, run the quickstart script.

# cd to your AI Reference Models directory
cd models

export TF_MODELS_DIR=<path to your clone of the TensorFlow models repo>
export DATASET_DIR=<path to the dataset>
export OUTPUT_DIR=<path to the directory where log and checkpoint files will be written>
export MPI_NUM_PROCESSES=<number of MPI processes (optional, defaults to 1)>
# For a custom batch size, set env var `BATCH_SIZE` or it will run with a default value.
export BATCH_SIZE=<customized batch size value>

./quickstart/object_detection/tensorflow/ssd-resnet34/training/cpu/fp32/fp32_training.sh

Additional Resources

To run more advanced use cases, see the instructions here for calling the launch_benchmark.py script directly.
To run the model using docker, please see the oneContainer workload container:
https://www.intel.com/content/www/us/en/developer/articles/containers/ssd-resnet34-fp32-training-tensorflow-container.html.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

SSD-ResNet34 FP32 training

Description

Datasets

Quick Start Scripts

Run the model

Additional Resources

Files

README.md

Latest commit

History

README.md

File metadata and controls

SSD-ResNet34 FP32 training

Description

Datasets

Quick Start Scripts

Run the model

Additional Resources