Updated the readme and fixed a 1.0.1 bug.

jiajunhua · Oct 25, 2019 · 866ecf4 · 866ecf4
1 parent 5094a09
commit 866ecf4
Show file tree

Hide file tree

Showing 3 changed files with 134 additions and 14 deletions.
diff --git a/README.md b/README.md
@@ -8,9 +8,14 @@
        ╚═╝    ╚═════╝ ╚══════╝╚═╝  ╚═╝ ╚═════╝   ╚═╝ 
 ```
 
-A simple, fully convolutional model for real-time instance segmentation. This is the code for [our paper](https://arxiv.org/abs/1904.02689), and for the forseeable future is still in development.
+A simple, fully convolutional model for real-time instance segmentation. This is the code for [our paper](https://arxiv.org/abs/1904.02689).
 
-Here's a look at our current results for our base model (33 fps on a Titan Xp and 29.8 mAP on COCO's `test-dev`):
+#### ICCV Update! Check out the trailer here:
+[![IMAGE ALT TEXT HERE](https://img.youtube.com/vi/0pMfmo8qfpQ/0.jpg)](https://www.youtube.com/watch?v=0pMfmo8qfpQ)
+
+Read [the changelog](CHANGELOG.md) for details on, well, what changed.
+
+Some examples from our base model (33.5 fps on a Titan Xp and 29.8 mAP on COCO's `test-dev`):
 
 ![Example 0](data/yolact_example_0.png)
 
@@ -71,8 +76,8 @@ python eval.py --trained_model=weights/yolact_base_54_800000.pth --output_coco_j
 ```
 ## Qualitative Results on COCO
 ```Shell
-# Display qualitative results on COCO. From here on I'll use a confidence threshold of 0.3.
-python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.3 --top_k=100 --display
+# Display qualitative results on COCO. From here on I'll use a confidence threshold of 0.15.
+python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --display
 ```
 ## Benchmarking on COCO
 ```Shell
@@ -82,24 +87,25 @@ python eval.py --trained_model=weights/yolact_base_54_800000.pth --benchmark --m
 ## Images
 ```Shell
 # Display qualitative results on the specified image.
-python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.3 --top_k=100 --image=my_image.png
+python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --image=my_image.png
 
 # Process an image and save it to another file.
-python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.3 --top_k=100 --image=input_image.png:output_image.png
+python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --image=input_image.png:output_image.png
 
 # Process a whole folder of images.
-python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.3 --top_k=100 --images=path/to/input/folder:path/to/output/folder
+python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --images=path/to/input/folder:path/to/output/folder
 ```
 ## Video
 ```Shell
 # Display a video in real-time. "--video_multiframe" will process that many frames at once for improved performance.
-python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.3 --top_k=100 --video_multiframe=2 --video=my_video.mp4
+# If you want, use "--display_fps" to draw the FPS directly on the frame.
+python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --video_multiframe=4 --video=my_video.mp4
 
 # Display a webcam feed in real-time. If you have multiple webcams pass the index of the webcam you want instead of 0.
-python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.3 --top_k=100 --video_multiframe=2 --video=0
+python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --video_multiframe=4 --video=0
 
-# Process a video and save it to another file. This is unoptimized.
-python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.3 --top_k=100 --video=input_video.mp4:output_video.mp4
+# Process a video and save it to another file. This uses the same pipeline as the ones above now, so it's fast!
+python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.15 --top_k=15 --video_multiframe=4 --video=input_video.mp4:output_video.mp4
 ```
 As you can tell, `eval.py` can do a ton of stuff. Run the `--help` command to see everything it can do.
 ```Shell
@@ -108,7 +114,7 @@ python eval.py --help
 
 
 # Training
-By default, we Train on COCO. Make sure to download the entire dataset using the commands above.
+By default, we train on COCO. Make sure to download the entire dataset using the commands above.
  - To train, grab an imagenet-pretrained model and put it in `./weights`.
    - For Resnet101, download `resnet101_reducedfc.pth` from [here](https://drive.google.com/file/d/1tvqFPd4bJtakOlmn-uIA492g2qurRChj/view?usp=sharing).
    - For Resnet50, download `resnet50-19c8e357.pth` from [here](https://drive.google.com/file/d/1Jy3yCdbatgXa5YYIdTCRrSV0S9V5g1rn/view?usp=sharing).
@@ -130,6 +136,32 @@ python train.py --config=yolact_base_config --resume=weights/yolact_base_10_3210
 python train.py --help
 ```
 
+## Multi-GPU Support
+YOLACT now supports multiple GPUs seamlessly during training:
+
+ - Before running any of the scripts, run: `export CUDA_VISIBLE_DEVICES=[gpus]`
+   - Where you should replace [gpus] with a comma separated list of the index of each GPU you want to use (e.g., 0,1,2,3).
+   - You should still do this if only using 1 GPU.
+   - You can check the indices of your GPUs with `nvidia-smi`.
+ - Then, simply set the batch size to `8*num_gpus` with the training commands above. The training script will automatically scale the hyperparameters to the right values.
+   - If you have memory to spare you can increase the batch size further, but keep it a multiple of the number of GPUs you're using.
+   - If you want to allocate the images per GPU specific for different GPUs, you can use `--batch_alloc=[alloc]` where [alloc] is a comma seprated list containing the number of images on each GPU. This must sum to `batch_size`.
+
+## Logging
+YOLACT now logs training and validation information by default. You can disable this with `--no_log`. A guide on how to visualize these logs is coming soon, but now you can look at `LogVizualizer` in `utils/logger.py` for help.
+
+## Pascal SBD
+We also include a config for training on Pascal SBD annotations (for rapid experimentation or comparing with other methods). To train on Pascal SBD, proceed with the following steps:
+ 1. Download the dataset from [here](http://home.bharathh.info/pubs/codes/SBD/download.html). It's the first link in the top "Overview" section (and the file is called `benchmark.tgz`).
+ 2. Extract the dataset somewhere. In the dataset there should be a folder called `dataset/img`. Create the directory `./data/sbd` (where `.` is YOLACT's root) and copy `dataset/img` to `./data/sbd/img`.
+ 4. Download the COCO-style annotations from [here](https://drive.google.com/open?id=1yLVwtkRtNxyl0kxeMCtPXJsXFFyc_FHe).
+ 5. Extract the annotations into `./data/sbd/`.
+ 6. Now you can train using `--config=yolact_resnet50_pascal_config`. Check that config to see how to extend it to other models.
+
+I will automate this all with a script soon, don't worry. Also, if you want the script I used to convert the annotations, I put it in `./scripts/convert_sbd.py`, but you'll have to check how it works to be able to use it because I don't actually remember at this point.
+
+If you want to verify our results, you can download our `yolact_resnet50_pascal_config` weights from [here](https://drive.google.com/open?id=1ExrRSPVctHW8Nxrn0SofU1lVhK5Wn0_S). This model should get 72.3 mask AP_50 and 56.2 mask AP_70. Note that the "all" AP isn't the same as the "vol" AP reported in others papers for pascal (they use an averages of the thresholds from `0.1 - 0.9` in increments of `0.1` instead of what COCO uses).
+
 ## Custom Datasets
 You can also train on your own dataset by following these steps:
  - Create a COCO-style Object Detection JSON annotation file for your dataset. The specification for this can be found [here](http://cocodataset.org/#format-data). Note that we don't use some fields, so the following may be omitted:

diff --git a/scripts/convert_sbd.py b/scripts/convert_sbd.py
@@ -0,0 +1,88 @@
+import scipy.io, scipy.ndimage
+import os.path, json
+import pycocotools.mask
+import numpy as np
+
+def mask2bbox(mask):
+    rows = np.any(mask, axis=1)
+    cols = np.any(mask, axis=0)
+    rmin, rmax = np.where(rows)[0][[0, -1]]
+    cmin, cmax = np.where(cols)[0][[0, -1]]
+
+    return cmin, rmin, cmax - cmin, rmax - rmin
+
+
+
+inst_path = './inst/'
+img_path  = './img/'
+img_name_fmt = '%s.jpg'
+ann_name_fmt = '%s.mat'
+
+image_id = 1
+ann_id   = 1
+
+types = ['train', 'val']
+
+for t in types:
+    with open('%s.txt' % t, 'r') as f:
+        names = f.read().strip().split('\n')
+
+    images = []
+    annotations = []
+
+    for name in names:
+        img_name = img_name_fmt % name
+
+        ann_path = os.path.join(inst_path, ann_name_fmt % name)
+        ann = scipy.io.loadmat(ann_path)['GTinst'][0][0]
+
+        classes = [int(x[0]) for x in ann[2]]
+        seg = ann[0]
+
+        for idx in range(len(classes)):
+            mask = (seg == (idx + 1)).astype(np.float)
+
+            rle = pycocotools.mask.encode(np.asfortranarray(mask.astype(np.uint8)))
+            rle['counts'] = rle['counts'].decode('ascii')
+
+            annotations.append({
+                'id': ann_id,
+                'image_id': image_id,
+                'category_id': classes[idx],
+                'segmentation': rle,
+                'area': float(mask.sum()),
+                'bbox': [int(x) for x in mask2bbox(mask)],
+                'iscrowd': 0
+            })
+
+            ann_id += 1
+
+        img_name = img_name_fmt % name
+        img = scipy.ndimage.imread(os.path.join(img_path, img_name))
+
+        images.append({
+            'id': image_id,
+            'width': img.shape[1],
+            'height': img.shape[0],
+            'file_name': img_name
+        })
+
+        image_id += 1
+
+    info = {
+        'year': 2012,
+        'version': 1,
+        'description': 'Pascal SBD',
+    }
+
+    categories = [{'id': x+1} for x in range(20)]
+
+    with open('pascal_sbd_%s.json' % t, 'w') as f:
+        json.dump({
+            'info': info,
+            'images': images,
+            'annotations': annotations,
+            'licenses': {},
+            'categories': categories
+        }, f)
+
diff --git a/yolact.py b/yolact.py
@@ -408,8 +408,8 @@ def forward(self, convouts:List[torch.Tensor]):
                 out.append(nn.functional.max_pool2d(out[-1], 1, stride=2))
 
         if self.relu_downsample_layers:
-            for idx in range(cur_idx, len(out)):
-                out[idx] = F.relu(out[idx], inplace=False)
+            for idx in range(len(out) - cur_idx):
+                out[idx] = F.relu(out[idx + cur_idx], inplace=False)
 
         return out