Skip to content

Commit

Permalink
Updated the readme and fixed a 1.0.1 bug.
Browse files Browse the repository at this point in the history
  • Loading branch information
dbolya committed Oct 25, 2019
1 parent 5094a09 commit 866ecf4
Show file tree
Hide file tree
Showing 3 changed files with 134 additions and 14 deletions.
56 changes: 44 additions & 12 deletions README.md
Expand Up @@ -8,9 +8,14 @@
╚═╝ ╚═════╝ ╚══════╝╚═╝ ╚═╝ ╚═════╝ ╚═╝
```

A simple, fully convolutional model for real-time instance segmentation. This is the code for [our paper](https://arxiv.org/abs/1904.02689), and for the forseeable future is still in development.
A simple, fully convolutional model for real-time instance segmentation. This is the code for [our paper](https://arxiv.org/abs/1904.02689).

Here's a look at our current results for our base model (33 fps on a Titan Xp and 29.8 mAP on COCO's `test-dev`):
#### ICCV Update! Check out the trailer here:
[![IMAGE ALT TEXT HERE](https://img.youtube.com/vi/0pMfmo8qfpQ/0.jpg)](https://www.youtube.com/watch?v=0pMfmo8qfpQ)

Read [the changelog](CHANGELOG.md) for details on, well, what changed.

Some examples from our base model (33.5 fps on a Titan Xp and 29.8 mAP on COCO's `test-dev`):

![Example 0](data/yolact_example_0.png)

Expand Down Expand Up @@ -71,8 +76,8 @@ python eval.py --trained_model=weights/yolact_base_54_800000.pth --output_coco_j
```
## Qualitative Results on COCO
```Shell
# Display qualitative results on COCO. From here on I'll use a confidence threshold of 0.3.
python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.3 --top_k=100 --display
# Display qualitative results on COCO. From here on I'll use a confidence threshold of 0.15.
python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --display
```
## Benchmarking on COCO
```Shell
Expand All @@ -82,24 +87,25 @@ python eval.py --trained_model=weights/yolact_base_54_800000.pth --benchmark --m
## Images
```Shell
# Display qualitative results on the specified image.
python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.3 --top_k=100 --image=my_image.png
python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --image=my_image.png

# Process an image and save it to another file.
python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.3 --top_k=100 --image=input_image.png:output_image.png
python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --image=input_image.png:output_image.png

# Process a whole folder of images.
python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.3 --top_k=100 --images=path/to/input/folder:path/to/output/folder
python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --images=path/to/input/folder:path/to/output/folder
```
## Video
```Shell
# Display a video in real-time. "--video_multiframe" will process that many frames at once for improved performance.
python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.3 --top_k=100 --video_multiframe=2 --video=my_video.mp4
# If you want, use "--display_fps" to draw the FPS directly on the frame.
python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --video_multiframe=4 --video=my_video.mp4

# Display a webcam feed in real-time. If you have multiple webcams pass the index of the webcam you want instead of 0.
python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.3 --top_k=100 --video_multiframe=2 --video=0
python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --video_multiframe=4 --video=0

# Process a video and save it to another file. This is unoptimized.
python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.3 --top_k=100 --video=input_video.mp4:output_video.mp4
# Process a video and save it to another file. This uses the same pipeline as the ones above now, so it's fast!
python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --video_multiframe=4 --video=input_video.mp4:output_video.mp4
```
As you can tell, `eval.py` can do a ton of stuff. Run the `--help` command to see everything it can do.
```Shell
Expand All @@ -108,7 +114,7 @@ python eval.py --help


# Training
By default, we Train on COCO. Make sure to download the entire dataset using the commands above.
By default, we train on COCO. Make sure to download the entire dataset using the commands above.
- To train, grab an imagenet-pretrained model and put it in `./weights`.
- For Resnet101, download `resnet101_reducedfc.pth` from [here](https://drive.google.com/file/d/1tvqFPd4bJtakOlmn-uIA492g2qurRChj/view?usp=sharing).
- For Resnet50, download `resnet50-19c8e357.pth` from [here](https://drive.google.com/file/d/1Jy3yCdbatgXa5YYIdTCRrSV0S9V5g1rn/view?usp=sharing).
Expand All @@ -130,6 +136,32 @@ python train.py --config=yolact_base_config --resume=weights/yolact_base_10_3210
python train.py --help
```

## Multi-GPU Support
YOLACT now supports multiple GPUs seamlessly during training:

- Before running any of the scripts, run: `export CUDA_VISIBLE_DEVICES=[gpus]`
- Where you should replace [gpus] with a comma separated list of the index of each GPU you want to use (e.g., 0,1,2,3).
- You should still do this if only using 1 GPU.
- You can check the indices of your GPUs with `nvidia-smi`.
- Then, simply set the batch size to `8*num_gpus` with the training commands above. The training script will automatically scale the hyperparameters to the right values.
- If you have memory to spare you can increase the batch size further, but keep it a multiple of the number of GPUs you're using.
- If you want to allocate the images per GPU specific for different GPUs, you can use `--batch_alloc=[alloc]` where [alloc] is a comma seprated list containing the number of images on each GPU. This must sum to `batch_size`.

## Logging
YOLACT now logs training and validation information by default. You can disable this with `--no_log`. A guide on how to visualize these logs is coming soon, but now you can look at `LogVizualizer` in `utils/logger.py` for help.

## Pascal SBD
We also include a config for training on Pascal SBD annotations (for rapid experimentation or comparing with other methods). To train on Pascal SBD, proceed with the following steps:
1. Download the dataset from [here](http://home.bharathh.info/pubs/codes/SBD/download.html). It's the first link in the top "Overview" section (and the file is called `benchmark.tgz`).
2. Extract the dataset somewhere. In the dataset there should be a folder called `dataset/img`. Create the directory `./data/sbd` (where `.` is YOLACT's root) and copy `dataset/img` to `./data/sbd/img`.
4. Download the COCO-style annotations from [here](https://drive.google.com/open?id=1yLVwtkRtNxyl0kxeMCtPXJsXFFyc_FHe).
5. Extract the annotations into `./data/sbd/`.
6. Now you can train using `--config=yolact_resnet50_pascal_config`. Check that config to see how to extend it to other models.

I will automate this all with a script soon, don't worry. Also, if you want the script I used to convert the annotations, I put it in `./scripts/convert_sbd.py`, but you'll have to check how it works to be able to use it because I don't actually remember at this point.

If you want to verify our results, you can download our `yolact_resnet50_pascal_config` weights from [here](https://drive.google.com/open?id=1ExrRSPVctHW8Nxrn0SofU1lVhK5Wn0_S). This model should get 72.3 mask AP_50 and 56.2 mask AP_70. Note that the "all" AP isn't the same as the "vol" AP reported in others papers for pascal (they use an averages of the thresholds from `0.1 - 0.9` in increments of `0.1` instead of what COCO uses).

## Custom Datasets
You can also train on your own dataset by following these steps:
- Create a COCO-style Object Detection JSON annotation file for your dataset. The specification for this can be found [here](http://cocodataset.org/#format-data). Note that we don't use some fields, so the following may be omitted:
Expand Down
88 changes: 88 additions & 0 deletions scripts/convert_sbd.py
@@ -0,0 +1,88 @@
import scipy.io, scipy.ndimage
import os.path, json
import pycocotools.mask
import numpy as np

def mask2bbox(mask):
rows = np.any(mask, axis=1)
cols = np.any(mask, axis=0)
rmin, rmax = np.where(rows)[0][[0, -1]]
cmin, cmax = np.where(cols)[0][[0, -1]]

return cmin, rmin, cmax - cmin, rmax - rmin



inst_path = './inst/'
img_path = './img/'
img_name_fmt = '%s.jpg'
ann_name_fmt = '%s.mat'

image_id = 1
ann_id = 1

types = ['train', 'val']

for t in types:
with open('%s.txt' % t, 'r') as f:
names = f.read().strip().split('\n')

images = []
annotations = []

for name in names:
img_name = img_name_fmt % name

ann_path = os.path.join(inst_path, ann_name_fmt % name)
ann = scipy.io.loadmat(ann_path)['GTinst'][0][0]

classes = [int(x[0]) for x in ann[2]]
seg = ann[0]

for idx in range(len(classes)):
mask = (seg == (idx + 1)).astype(np.float)

rle = pycocotools.mask.encode(np.asfortranarray(mask.astype(np.uint8)))
rle['counts'] = rle['counts'].decode('ascii')

annotations.append({
'id': ann_id,
'image_id': image_id,
'category_id': classes[idx],
'segmentation': rle,
'area': float(mask.sum()),
'bbox': [int(x) for x in mask2bbox(mask)],
'iscrowd': 0
})

ann_id += 1

img_name = img_name_fmt % name
img = scipy.ndimage.imread(os.path.join(img_path, img_name))

images.append({
'id': image_id,
'width': img.shape[1],
'height': img.shape[0],
'file_name': img_name
})

image_id += 1

info = {
'year': 2012,
'version': 1,
'description': 'Pascal SBD',
}

categories = [{'id': x+1} for x in range(20)]

with open('pascal_sbd_%s.json' % t, 'w') as f:
json.dump({
'info': info,
'images': images,
'annotations': annotations,
'licenses': {},
'categories': categories
}, f)

4 changes: 2 additions & 2 deletions yolact.py
Expand Up @@ -408,8 +408,8 @@ def forward(self, convouts:List[torch.Tensor]):
out.append(nn.functional.max_pool2d(out[-1], 1, stride=2))

if self.relu_downsample_layers:
for idx in range(cur_idx, len(out)):
out[idx] = F.relu(out[idx], inplace=False)
for idx in range(len(out) - cur_idx):
out[idx] = F.relu(out[idx + cur_idx], inplace=False)

return out

Expand Down

0 comments on commit 866ecf4

Please sign in to comment.