Skip to content

Object Detector and Feature Extractor

Notifications You must be signed in to change notification settings

emma-heriot-watt/perception

Repository files navigation

EMMA: Perception

Python 3.9 PyTorch Lightning Poetry
pre-commit style: black wemake-python-styleguide

Continuous Integration Tests Build and push images

Important

If you have questions or find bugs or anything, you can contact us in our organisation's discussion.

About

This repository holds the object detector and feature extractor for running things. Essentially, this is the model that takes an image and returns a series of features for that image. This repository can be used as a standalone to extract features before running things on policy, or used as an API to extract features during inference.

Writing code and running things

Run the server for the Alexa Arena

Running this command as is will automatically download and use the fine-tuned checkpoint from our HF models repo and use the same settings we used when we ran experiments within the Alexa Arena.

python src/emma_perception/commands/run_server.py

Extracting features

For training things, we need to extract the features for each image.

Here's the command you can use to extract features from images. Obviously, you can change the paths to the folder of images, and the output dir, and whatever else you want.

python src/emma_perception/commands/extract_visual_features.py --images_dir <path_to_images> --output_dir <path to output dir>
argparse arguments for the command

parser = Trainer.add_argparse_args(parser) # type: ignore[assignment]
parser.add_argument(
"-i",
"--images_dir",
required=True,
help="Path to a folder of images to extract features from",
)
parser.add_argument(
"--is_arena",
action="store_true",
help="If we are extracting features from the Arena images, use the Arena checkpoint",
)
parser.add_argument("-b", "--batch_size", type=int, default=2)
parser.add_argument("-w", "--num_workers", type=int, default=0)
parser.add_argument(
"-c", "--output_dir", default="storage/data/cache", help="Path to store visual features"
)
parser.add_argument(
"--num_gpus",
type=int,
default=None,
help="Number of GPUs to use for visual feature extraction",
)
parser.add_argument(
"opts",
default=None,
nargs=argparse.REMAINDER,
help="Modify config options using the command-line. Used for VinVL extraction",
)
return parser.parse_args()

Extracting features for the Alexa Arena

If you want to use the fine-tuned model to extract features with the model we trained on the Alexa Arena, just add --is_arena onto the above command. This will automatically download and use the fine-tuned checkpoint from our HF models repo and use the same settings we used when we ran experiments within the Alexa Arena.

Developer tooling