Model Updating

Introduction

This is a page created for anyone looking to update the MaskRCNN model. A lot of the code utilities written for this project were designed to be re-used and so the process, while tedious, should hopefully be straightforward.

Disclaimer: Before retraining this model, you should evaluate if this is the best option. Likely you are doing so with the goal of recognizing more food items. However, the current iteration of the model can recognize 35 different food items with fairly good accuracy. To extend this number a reasonable amount will require a large amount of work and a self-supervised model like DINO may be a better choice. Proceed at your own risk.

Working backwards, what we want from an eventual food perception model is to generate information about the location and classification of food items in a visual scene. At the time of writing, the two main ways to do this are via bounding boxes or semantic segmentations. Ultimately for this task we chose the latter but both are viable options. However, this document will thus explain the process for generating and predicting segmentations and so if you wish to use bounding boxes this information will be less relevant.

The crux, then, of this problem is getting training data for an eventual model. In order to create a model which can effectively generalize across a large number of food items, you will need a diverse set of pictures of the food items and their corresponding segmentation masks on the order of 10^6. While the former is possible manually, generating this many segmentations on your own is infeasible and at the time of writing the cost estimates for something like an AWS MTurk job exceed $20,000 . As such, you'll need some sort of tool to help you annotate your data. For our approach, we used a two step combination where we first approximately segmented a scene using 3D properties and then used an unsupervised segmentation model to correct these segmentations. However there is certainly more than one way to skin a cat and it's likely at the time of reading that there are better ways to solve this problem. At any rate, outlined below is the procedure one could use to modify the existing model using (roughly) identical procedures.

Data Collection

Food Items

Firstly, you'll need a list of defined food items that you want the model to be able to recognize. For a list of currently recognizable food items, see id_to_food_mapping.json. A full dataset of these food items is available in this Google Drive folder. You may need to request access from someone in the EmPRISE Lab to view these files. For any food items you want to train the model with, you will need to purchase them from a grocery store and have a storage space (i.e. big fridge) for them for the next portion.

Bagfiles

When we initially conducted this project, the next phase involved running the robot arm through a sequence of predefined poses while using the rosbag record command. This was done with a randomized plate of food underneath the robot arm. Doing so allowed us to capture RGB images of the food from various angles while also recording depth data, joint state data, and other useful data. So one way to add new food items to the set would be to collect data using the same procedure. However, if you choose to use a different method note that the following steps may have varying levels of success. The next code section explicitly relies on RGB, depth, joint state, and camera info data to generate the approximate segmentations.

Segmentations

One major contribution of this project is the annotate.py script which creates an approximate segmentation for an entire bagfile from an initial segmentation. So for every collected bagfile from above, you will need to use some sort of segmentation tool to outline the food items. In our case we used a free trial of the website segments.ai but the choice is yours. Note, though, that the code expects segmentations in the format of segments.ai and so if you choose to use another tool you will have to modify the code accordingly.

Once you have initial segmentations, you can use annotate.py to generate approximate segmentation masks for the whole bagfile. This code is not very adaptable in its current state and so you should not expect to run this code on the bagfiles you have and expect it to work perfectly. I advise you to familiarize yourself with the code before attempting to use it.

While the above code does do a great job at annotating scenes, the segmentations are merely approximate and in some cases need correction. This is where correct_masks.py comes into play. This code is tailored to use the output generated by annotate.py to correct the masks. Once again, this code was made for our use case and might require modification depending on how you intend to use it.

For both of the above pieces of code, I advise you to look at the Code Documentation page of the Wiki for a more in-depth look at how these pieces of code actually work since this page is just intended as an overview.

Modeling

Importantly, the output of correct_masks.py is a directory of .npz files which hold frame-by-frame information for the data. At each frame of an experiment, these .npz files have the image, labels, boxes, and masks specified in the format in the PyTorch MaskRCNN page. This was obviously done with the intent to use the data for this model. As such, if you choose a different modeling architecture you will need to format your data accordingly.

All source code for training and evaluating the created models is in the src/maskrcnn directory of this repo. In short, we use a custom PyTorch Dataset to load the previously mentioned .npz examples. Then in a standard machine learning loop we update the model (MaskRCNN in our case) and save the resulting model checkpoint when we reach a new low loss value.

Once you get this .pth checkpoint, you're (basically) done! This can now be loaded using PyTorch to predict new images with food items in them. You may want to integrate this into an actual ROS package like how this repo is. You can then copy detection code like what's shown here .

Help

The steps above can be tricky and so you might need help. The author of this project is Thomas (tjp93@cornell.edu) and you can reach out to him if you need help. He is an alumnus of the lab, though, and so Rajat may be a better resource.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly