Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to train with cityscapes dataset? #29

Closed
Flawless1202 opened this issue Oct 16, 2018 · 12 comments
Closed

How to train with cityscapes dataset? #29

Flawless1202 opened this issue Oct 16, 2018 · 12 comments

Comments

@Flawless1202
Copy link

I use the python scripts convert_cityscapes_to_coco.py and successfully convert the cityscapes dataset to coco. But when I modified the config file faster_rcnn_r50_fpn_1x.py and use the command python tools/train.py configs/faster_rcnn_r50_fpn_1x.py --gpus 2 --work_dir ./out --validate to train, I got the error:

`2018-10-16 11:05:01,156 - INFO - Distributed training: False
2018-10-16 11:05:01,600 - INFO - load model from: modelzoo://resnet50
2018-10-16 11:05:01,825 - WARNING - unexpected key in source state_dict: fc.weight, fc.bias

missing keys in source state_dict: layer3.4.bn1.num_batches_tracked, layer1.0.downsample.1.num_batches_tracked, layer3.5.bn2.num_batches_tracked, layer3.0.bn2.num_batches_tracked, layer4.1.bn3.num_batches_tracked, layer2.1.bn1.num_batches_tracked, layer3.4.bn3.num_batches_tracked, layer1.1.bn2.num_batches_tracked, layer3.3.bn3.num_batches_tracked, layer2.2.bn3.num_batches_tracked, layer3.1.bn1.num_batches_tracked, layer1.2.bn1.num_batches_tracked, layer4.0.bn1.num_batches_tracked, layer2.3.bn2.num_batches_tracked, layer3.1.bn3.num_batches_tracked, layer4.2.bn1.num_batches_tracked, layer3.4.bn2.num_batches_tracked, layer4.0.bn2.num_batches_tracked, layer4.2.bn3.num_batches_tracked, layer2.1.bn3.num_batches_tracked, layer2.0.bn3.num_batches_tracked, layer1.0.bn1.num_batches_tracked, layer1.0.bn3.num_batches_tracked, layer4.1.bn2.num_batches_tracked, layer3.0.bn1.num_batches_tracked, layer1.0.bn2.num_batches_tracked, layer3.2.bn1.num_batches_tracked, layer2.1.bn2.num_batches_tracked, layer4.1.bn1.num_batches_tracked, layer3.0.downsample.1.num_batches_tracked, layer1.2.bn3.num_batches_tracked, layer2.0.downsample.1.num_batches_tracked, layer2.3.bn3.num_batches_tracked, layer3.3.bn2.num_batches_tracked, layer1.1.bn1.num_batches_tracked, layer4.2.bn2.num_batches_tracked, layer3.3.bn1.num_batches_tracked, layer1.1.bn3.num_batches_tracked, layer2.0.bn1.num_batches_tracked, layer3.0.bn3.num_batches_tracked, layer2.3.bn1.num_batches_tracked, layer3.1.bn2.num_batches_tracked, layer4.0.downsample.1.num_batches_tracked, layer2.0.bn2.num_batches_tracked, layer2.2.bn2.num_batches_tracked, layer3.5.bn1.num_batches_tracked, layer3.2.bn3.num_batches_tracked, layer2.2.bn1.num_batches_tracked, layer3.2.bn2.num_batches_tracked, bn1.num_batches_tracked, layer3.5.bn3.num_batches_tracked, layer4.0.bn3.num_batches_tracked, layer1.2.bn2.num_batches_tracked

loading annotations into memory...
Done (t=4.59s)
creating index...
index created!
2018-10-16 11:05:09,360 - INFO - Start running, host: chenkai@Autodrive, work_dir: /home/chenkai/Documents/mmdetection/out
2018-10-16 11:05:09,360 - INFO - workflow: [('train', 1)], max: 12 epochs
Exception ignored in: <bound method _DataLoaderIter.del of <torch.utils.data.dataloader._DataLoaderIter object at 0x7f2cce44c1d0>>
Traceback (most recent call last):
File "/home/chenkai/.virtualenvs/mmdetection_py36/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 399, in del
self._shutdown_workers()
File "/home/chenkai/.virtualenvs/mmdetection_py36/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 378, in _shutdown_workers
self.worker_result_queue.get()
File "/usr/lib/python3.6/multiprocessing/queues.py", line 337, in get
return _ForkingPickler.loads(res)
File "/home/chenkai/.virtualenvs/mmdetection_py36/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 151, in rebuild_storage_fd
fd = df.detach()
File "/usr/lib/python3.6/multiprocessing/resource_sharer.py", line 57, in detach
with _resource_sharer.get_connection(self._id) as conn:
File "/usr/lib/python3.6/multiprocessing/resource_sharer.py", line 87, in get_connection
c = Client(address, authkey=process.current_process().authkey)
File "/usr/lib/python3.6/multiprocessing/connection.py", line 493, in Client
answer_challenge(c, authkey)
File "/usr/lib/python3.6/multiprocessing/connection.py", line 737, in answer_challenge
response = connection.recv_bytes(256) # reject large message
File "/usr/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/usr/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/usr/lib/python3.6/multiprocessing/connection.py", line 379, in _recv
chunk = read(handle, remaining)
ConnectionResetError: [Errno 104] Connection reset by peer
Traceback (most recent call last):
File "tools/train.py", line 82, in
main()
File "tools/train.py", line 78, in main
logger=logger)
File "/home/chenkai/.virtualenvs/mmdetection_py36/lib/python3.6/site-packages/mmdet-0.5.0+f29f020-py3.6.egg/mmdet/apis/train.py", line 59, in train_detector
_non_dist_train(model, dataset, cfg, validate=validate)
File "/home/chenkai/.virtualenvs/mmdetection_py36/lib/python3.6/site-packages/mmdet-0.5.0+f29f020-py3.6.egg/mmdet/apis/train.py", line 117, in _non_dist_train
runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
File "/home/chenkai/.virtualenvs/mmdetection_py36/lib/python3.6/site-packages/mmcv-0.2.0-py3.6.egg/mmcv/runner/runner.py", line 349, in run
epoch_runner(data_loaders[i], **kwargs)
File "/home/chenkai/.virtualenvs/mmdetection_py36/lib/python3.6/site-packages/mmcv-0.2.0-py3.6.egg/mmcv/runner/runner.py", line 255, in train
self.model, data_batch, train_mode=True, **kwargs)
File "/home/chenkai/.virtualenvs/mmdetection_py36/lib/python3.6/site-packages/mmdet-0.5.0+f29f020-py3.6.egg/mmdet/apis/train.py", line 37, in batch_processor
losses = model(**data)
File "/home/chenkai/.virtualenvs/mmdetection_py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/chenkai/.virtualenvs/mmdetection_py36/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 123, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/chenkai/.virtualenvs/mmdetection_py36/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 133, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/chenkai/.virtualenvs/mmdetection_py36/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 77, in parallel_apply
raise output
File "/home/chenkai/.virtualenvs/mmdetection_py36/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 53, in _worker
output = module(*input, **kwargs)
File "/home/chenkai/.virtualenvs/mmdetection_py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/chenkai/.virtualenvs/mmdetection_py36/lib/python3.6/site-packages/mmdet-0.5.0+f29f020-py3.6.egg/mmdet/models/detectors/base.py", line 79, in forward
return self.forward_train(img, img_meta, **kwargs)
File "/home/chenkai/.virtualenvs/mmdetection_py36/lib/python3.6/site-packages/mmdet-0.5.0+f29f020-py3.6.egg/mmdet/models/detectors/two_stage.py", line 111, in forward_train
self.train_cfg.rcnn)
File "/home/chenkai/.virtualenvs/mmdetection_py36/lib/python3.6/site-packages/mmdet-0.5.0+f29f020-py3.6.egg/mmdet/models/bbox_heads/bbox_head.py", line 73, in get_bbox_target
target_stds=self.target_stds)
File "/home/chenkai/.virtualenvs/mmdetection_py36/lib/python3.6/site-packages/mmdet-0.5.0+f29f020-py3.6.egg/mmdet/core/bbox/bbox_target.py", line 25, in bbox_target
target_stds=target_stds)
File "/home/chenkai/.virtualenvs/mmdetection_py36/lib/python3.6/site-packages/mmdet-0.5.0+f29f020-py3.6.egg/mmdet/core/utils/misc.py", line 24, in multi_apply
return tuple(map(list, zip(*map_results)))
File "/home/chenkai/.virtualenvs/mmdetection_py36/lib/python3.6/site-packages/mmdet-0.5.0+f29f020-py3.6.egg/mmdet/core/bbox/bbox_target.py", line 62, in proposal_target_single
labels, reg_num_classes)
File "/home/chenkai/.virtualenvs/mmdetection_py36/lib/python3.6/site-packages/mmdet-0.5.0+f29f020-py3.6.egg/mmdet/core/bbox/bbox_target.py", line 75, in expand_target
bbox_targets_expand[i, start:end] = bbox_targets[i, :]
RuntimeError: The expanded size of the tensor (0) must match the existing size (4) at non-singleton dimension 0`

Could you help me to solve this problem? @hellock

@hellock
Copy link
Member

hellock commented Oct 16, 2018

We will try the cityscape dataset and update the information here.

@Flawless1202
Copy link
Author

@hellock OK, thank you. And the two python scripts I use is here:
my_convert_cityscapes_to_coco.py

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals

import argparse
import h5py
import json
import os
import scipy.misc
import sys
import numpy as np

import instances2dict as cs
# import mmcv

# from matplotlib import pyplot as plt


def parse_args():
    parser = argparse.ArgumentParser(description='Convert dataset')
    parser.add_argument(
        '--dataset', help="cocostuff, cityscapes", default=None, type=str)
    parser.add_argument(
        '--outdir', help="output dir for json files", default=None, type=str)
    parser.add_argument(
        '--datadir', help="data dir for annotations to be converted",
        default=None, type=str)
    if len(sys.argv) == 1:
        parser.print_help()
        sys.exit(1)
    return parser.parse_args()


def polys_to_boxes(polys):
    """Convert a list of polygons into an array of tight bounding boxes."""
    boxes_from_polys = np.zeros((len(polys), 4), dtype=np.float32)
    for i in range(len(polys)):
        poly = polys[i]
        x0 = min(min(p[::2]) for p in poly)
        x1 = max(max(p[::2]) for p in poly)
        y0 = min(min(p[1::2]) for p in poly)
        y1 = max(max(p[1::2]) for p in poly)
        boxes_from_polys[i, :] = [x0, y0, x1, y1]

    return boxes_from_polys


def xyxy_to_xywh(xyxy):
    """Convert [x1 y1 x2 y2] box format to [x1 y1 w h] format."""
    if isinstance(xyxy, (list, tuple)):
        # Single box given as a list of coordinates
        assert len(xyxy) == 4
        x1, y1 = xyxy[0], xyxy[1]
        w = xyxy[2] - x1 + 1
        h = xyxy[3] - y1 + 1
        return (x1, y1, w, h)
    elif isinstance(xyxy, np.ndarray):
        # Multiple boxes given as a 2D ndarray
        return np.hstack((xyxy[:, 0:2], xyxy[:, 2:4] - xyxy[:, 0:2] + 1))
    else:
        raise TypeError('Argument xyxy must be a list, tuple, or numpy array.')


def convert_coco_stuff_mat(data_dir, out_dir):
    """Convert to png and save json with path. This currently only contains
    the segmentation labels for objects+stuff in cocostuff - if we need to
    combine with other labels from original COCO that will be a TODO."""
    sets = ['train', 'val']
    categories = []
    json_name = 'coco_stuff_%s.json'
    ann_dict = {}
    for data_set in sets:
        file_list = os.path.join(data_dir, '%s.txt')
        images = []
        with open(file_list % data_set) as f:
            for img_id, img_name in enumerate(f):
                img_name = img_name.replace('coco', 'COCO').strip('\n')
                image = {}
                mat_file = os.path.join(
                    data_dir, 'annotations/%s.mat' % img_name)
                data = h5py.File(mat_file, 'r')
                labelMap = data.get('S')
                if len(categories) == 0:
                    labelNames = data.get('names')
                    for idx, n in enumerate(labelNames):
                        categories.append(
                            {"id": idx, "name": ''.join(chr(i) for i in data[
                                n[0]])})
                    ann_dict['categories'] = categories
                scipy.misc.imsave(
                    os.path.join(data_dir, img_name + '.png'), labelMap)
                image['width'] = labelMap.shape[0]
                image['height'] = labelMap.shape[1]
                image['file_name'] = img_name
                image['seg_file_name'] = img_name
                image['id'] = img_id
                images.append(image)
        ann_dict['images'] = images
        print("Num images: %s" % len(images))
        with open(os.path.join(out_dir, json_name % data_set), 'wb') as outfile:
            outfile.write(json.dumps(ann_dict))


# for Cityscapes
def getLabelID(self, instID):
    if (instID < 1000):
        return instID
    else:
        return int(instID / 1000)


def convert_cityscapes_instance_only(
        data_dir, out_dir):
    """Convert from cityscapes format to COCO instance seg format - polygons"""
    sets = [
        'gtFine_val',
        # 'gtFine_train',
        # 'gtFine_test',

        # 'gtCoarse_train',
        # 'gtCoarse_val',
        # 'gtCoarse_train_extra'
    ]
    ann_dirs = [
        'gtFine_trainvaltest/gtFine/val',
        # 'gtFine_trainvaltest/gtFine/train',
        # 'gtFine_trainvaltest/gtFine/test',

        # 'gtCoarse/train',
        # 'gtCoarse/train_extra',
        # 'gtCoarse/val'
    ]
    img_dirs = [
        'cityscapes2coco/val',
        # 'cityscapes2coco/train'
    ]
    json_name = 'instancesonly_filtered_%s.json'
    ends_in = '%s_polygons.json'
    img_id = 0
    ann_id = 0
    cat_id = 1
    category_dict = {}

    category_instancesonly = [
        'person',
        'rider',
        'car',
        'truck',
        'bus',
        'train',
        'motorcycle',
        'bicycle',
    ]

    for data_set, ann_dir in zip(sets, ann_dirs):
        print('Starting %s' % data_set)
        ann_dict = {}
        images = []
        annotations = []
        ann_dir = os.path.join(data_dir, ann_dir)
        for root, _, files in os.walk(ann_dir):
            for filename in files:
                if filename.endswith(ends_in % data_set.split('_')[0]):
                    if len(images) % 50 == 0:
                        print("Processed %s images, %s annotations" % (
                            len(images), len(annotations)))
                    json_ann = json.load(open(os.path.join(root, filename)))
                    image = {}
                    image['id'] = img_id
                    img_id += 1

                    image['width'] = json_ann['imgWidth']
                    image['height'] = json_ann['imgHeight']
                    image['file_name'] = filename[:-len(
                        ends_in % data_set.split('_')[0])] + 'leftImg8bit.png'
                    image['seg_file_name'] = filename[:-len(
                        ends_in % data_set.split('_')[0])] + \
                                             '%s_instanceIds.png' % data_set.split('_')[0]
                    images.append(image)

                    fullname = os.path.abspath(os.path.join(root, image['seg_file_name']))
                    # print(fullname)
                    # print(image['file_name'], type(image['file_name']))
                    # img = mmcv.imread(os.path.join(img_dirs[0], image['file_name']))
                    # fig = plt.figure()
                    # ax = fig.add_subplot(111)
                    # plt.imshow(mmcv.bgr2rgb(img))

                    objects = cs.instances2dict_with_polygons([fullname], verbose=False)[fullname]

                    # count = 0
                    for object_cls in objects:

                        if object_cls not in category_instancesonly:
                            continue  # skip non-instance categories

                        for obj in objects[object_cls]:
                            if obj['contours'] == []:
                                print('Warning: empty contours.')
                                continue  # skip non-instance categories

                            len_p = [len(p) for p in obj['contours']]
                            if min(len_p) <= 4:
                                print('Warning: invalid contours.')
                                continue  # skip non-instance categories

                            ann = {}
                            ann['id'] = ann_id
                            ann_id += 1
                            ann['image_id'] = image['id']
                            ann['segmentation'] = obj['contours']

                            if object_cls not in category_dict:
                                category_dict[object_cls] = cat_id
                                cat_id += 1
                            ann['category_id'] = category_dict[object_cls]
                            ann['iscrowd'] = 0
                            ann['area'] = obj['pixelCount']
                            ann['bbox'] = xyxy_to_xywh(
                                polys_to_boxes(
                                    [ann['segmentation']])).tolist()[0]
                            # x, y, w, h = ann['bbox']
                            # rect = plt.Rectangle((x, y), w, h, linewidth=1,edgecolor='r',facecolor='none')
                            # ax.add_patch(rect)

                            annotations.append(ann)

                            # count+=1
                    # print("total:{} objects.".format(count))
                    # plt.show()

        ann_dict['images'] = images
        categories = [{"id": category_dict[name], "name": name} for name in
                      category_dict]
        ann_dict['categories'] = categories
        ann_dict['annotations'] = annotations
        print("Num categories: %s" % len(categories))
        print("Num images: %s" % len(images))
        print("Num annotations: %s" % len(annotations))
        with open(os.path.join(out_dir, json_name % data_set), 'w') as outfile:
            outfile.write(json.dumps(ann_dict))


if __name__ == '__main__':
    args = parse_args()
    if args.dataset == "cityscapes_instance_only":
        convert_cityscapes_instance_only(args.datadir, args.outdir)
    elif args.dataset == "cocostuff":
        convert_coco_stuff_mat(args.datadir, args.outdir)
    else:
        print("Dataset not supported: %s" % args.dataset)

instances2dict.py

#!/usr/bin/python
#
# Convert instances from png files to a dictionary
#

from __future__ import print_function, absolute_import, division
import os, sys

import numpy as np

# Cityscapes imports
from cityscapesscripts.evaluation.instance import *
from cityscapesscripts.helpers.csHelpers import *
import cv2


def instances2dict_with_polygons(imageFileList, verbose=False):
    # print(imageFileList)
    imgCount     = 0
    instanceDict = {}

    if not isinstance(imageFileList, list):
        imageFileList = [imageFileList]

    if verbose:
        print("Processing {} images...".format(len(imageFileList)))

    for imageFileName in imageFileList:
        # Load image
        img = Image.open(imageFileName)

        # Image as numpy array
        imgNp = np.array(img)

        # Initialize label categories
        instances = {}
        for label in labels:
            instances[label.name] = []

        # Loop through all instance ids in instance image
        for instanceId in np.unique(imgNp):
            if instanceId < 1000:
                continue

            instanceObj = Instance(imgNp, instanceId)
            instanceObj_dict = instanceObj.toDict()

            # instances[id2label[instanceObj.labelID].name].append(instanceObj.toDict())
            if id2label[instanceObj.labelID].hasInstances:
                mask = (imgNp == instanceId).astype(np.uint8)

                # print(cv2.findContours(mask.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE))

                _, contour, hier = cv2.findContours(mask.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)

                polygons = [c.reshape(-1).tolist() for c in contour]
                instanceObj_dict['contours'] = polygons

            instances[id2label[instanceObj.labelID].name].append(instanceObj_dict)

        imgKey = os.path.abspath(imageFileName)
        instanceDict[imgKey] = instances
        imgCount += 1

        if verbose:
            print("\rImages Processed: {}".format(imgCount), end=' ')
            sys.stdout.flush()

    if verbose:
        print("")

    # print(instanceDict)
    return instanceDict

def main(argv):
    fileList = []
    if (len(argv) > 2):
        for arg in argv:
            if ("png" in arg):
                fileList.append(arg)
    instances2dict(fileList, True)

if __name__ == "__main__":
    main(sys.argv[1:])

And the script instances2dict.py is modified on the basis of this.

@Flawless1202
Copy link
Author

@hellock Oh I have already done with this two scripts but I could not get good result, maybe because I only use 2 1080ti to train. I am very looking forward to your result.

@OceanPang
Copy link
Collaborator

OceanPang commented Oct 16, 2018

Hi @Flawless1202 , if you got a sub-optimal results and the annotations have been checked. Please refer to #25, 2 1080Ti may need lr = 0.005.

Experiments on cityscapes will be updated as soon as we are free, but maybe not in this month.

@Flawless1202
Copy link
Author

@OceanPang Oh I forgot it, thanks a lot. I will try again.

@yinglang
Copy link

I

hi, Flawless1202, Have you train cityscapes successfully? do you mind to tell me the perfermance that coco style? very appeciate to you.

@yinglang
Copy link

@Flawless1202

@Flawless1202
Copy link
Author

@yinglang hi, I have not trained the cityscapes but trained a cityscapes style dataset called Lost and Found. I think there is no problem to convert it to COCO style. And the performance is satisfy.

@yinglang
Copy link

ok, thanks @Flawless1202

@michaelisc
Copy link
Contributor

michaelisc commented Sep 26, 2019

Cityscapes was added in #1037. Follow the instructions in the README.

@hellock @OceanPang I guess this can be closed then.

@abhigoku10
Copy link

@michaelisc @hellock @Flawless1202 when i try to visualize cityscapes annotation i am getting below results . how to obtain the polygon area of the cityscapes dataset into json accurately
image

@xvjiarui
Copy link
Collaborator

Hi @abhigoku10
Do you refer to the json file like gtFine/**/*gtFine_polygons.json?

druzhkov-paul pushed a commit to druzhkov-paul/mmdetection that referenced this issue Jun 17, 2020
…oco_visualize_tool

added palette to imshow window and terminal for coco_visualize tool
FANGAreNotGnu pushed a commit to FANGAreNotGnu/mmdetection that referenced this issue Oct 23, 2023
FANGAreNotGnu pushed a commit to FANGAreNotGnu/mmdetection that referenced this issue Oct 23, 2023
* init

* Adding Hyperband (open-mmlab#4)

* refactor changes

* grammer

* asyc hyperband

* Initial commit (open-mmlab#5)

* Add dataset sanity check (open-mmlab#7)

* release resources (open-mmlab#6)

* Add dataset histogram viz and check (open-mmlab#8)

* Add dataset histogram viz and check

* Add matplotlib in setup

* Checkerpoint (open-mmlab#10)

* release resources

* rename example fils

* keep track of the best result

* serialization

* save load

* add util

* checkpoint and resume

* keeping task id

* terminator state

* rm comments

* Add autogluon backend fit and refine apis (open-mmlab#9)

* add autogluon backend fit and refine apis

* update

* update

* add some doc

* refine

* refine fit

* refine fit

* refine fit

* add guideline (open-mmlab#11)

* Add Plots for Visualization (open-mmlab#12)

* add plots

* current progress

* rm comment

* Refine fit (open-mmlab#13)

* refine fit

* minor update

* fix setup (open-mmlab#14)

* fix guide (open-mmlab#15)

* Add autogluon notebook (open-mmlab#16)

* add notebook

* update notebook

* Demo patch 1 (open-mmlab#17)

* mv dataset inside

* patch

* Demo (open-mmlab#20)

* fix

* fix 1

* add notebook

* setup (open-mmlab#19)

* Fix Checkerpoint (open-mmlab#22)

* resource

* mv tasks into object method

* Fixtypo (open-mmlab#23)

* Revert "Demo (open-mmlab#20)"

This reverts commit a8fa993b461b8cd424edbe772fe6b0264f6ee79a.

* fix

* Update AutoGluon Notebook (open-mmlab#24)

* Update notebook

* remove

* raise warning for resource (open-mmlab#25)

* [WIP] AutoGluon Distributed (open-mmlab#26)

* remote resource management

* add files

* remote resource management

* distributed scheduler

* add autogluon.distributed scheduler (open-mmlab#28)

* add cifar script and tensorboard (open-mmlab#27)

* patch for state-dict (open-mmlab#29)

* distributed with ssh helper (open-mmlab#31)

* ssh helper for distributed

* tutorial

* Refactor api and update image classification results (open-mmlab#30)

* refactor mxboard api and update img classification results

* Update notebook to work on mac

* update notebook and compact svg

* Multiprocess Queue Support MacOS (open-mmlab#33)

* Queue for Mac OS

* add queue

* Backend Tutorials (open-mmlab#32)

* init tutorial

* add figures

* add figures

* add comments

* merge and demo

* add plot

* img path

* Refine notebook and add dataset statistics (open-mmlab#34)

* refactor mxboard api and update img classification results

* Update notebook to work on mac

* update notebook and compact svg

* refine notebook and dataset

* add conda

* rm ipynb

* update notebook and dataset

* uncommnent dist

* notebook results update (open-mmlab#35)

* Add MINC experiments and Refine Data Loss Metric (open-mmlab#36)

* add minc exp

* fix bug

* add auto loss and metric

* update minc results

* fix kwargs (open-mmlab#37)

* Refine auto Dataset, Nets, Losses, Metrics, Optimizers and Pipeline (open-mmlab#38)

* add comments

* fix

* refine dataset

* Add Kaggle Shopee Classification script (open-mmlab#40)

* add kaggle shopee img classification example

* update results

* Update .gitignore

* Distributed FIFO and Bug Fix (open-mmlab#39)

* simple visualizer

* distributed scheduler progress

* local node message

* distributed fifo okay

* Add local helper (open-mmlab#42)

* add local helper

* Add Distributed ImageNet Example (open-mmlab#43)

* fix img dataset (open-mmlab#45)

* Add object detection (open-mmlab#41)

* add object detection voc

* fix

* update results and fix some issues

* fix search space

* update obj detection results

* Dist-hyperband, Doc and Tutorial (open-mmlab#48)


* dist hyperband

* add docs

* Refactor fit and dataset api (open-mmlab#50)

* advance api

* initial commit

* status

* advance api

initial commit

rm

* fix example issue (open-mmlab#51)

* current progress

* save model params (open-mmlab#53)

* add save model params

* add missing file

* resume at any point

* add missing import

* fix hyperband

* dist not implemented

* add tutorial doc (open-mmlab#55)

* mxutils

* add example and notebook

* add fit tutorial

* add notebook file

* Text Classification (open-mmlab#6)

* Initial commit for Text Classification classes

* Added results obejct in core.py

* Added Estimator package

* Rebase

* Added PyTest_Cache to git ignore

* Added FTML Optimizer

* Added impl for core.py

* Added method signatures for text classification model zoo

* Added typing hints to nets.py

* Wrapped up implementation of dataset to yield dataloaders

* Added TextData Transforms and Dataset Utils

* Added impl for pipeline

* Fixed errors + formatting commit

* Added beginner example for text_classification for sst dataset

* Added handler for data loader

* Refined DataLoaderHandler

* Printind the exception stack trace if any

* Replaced print with logs

* Fixed syntax error

* Changed default GPU counts

* Changed trial scheduler to default

* Changed Max_Training_Epochs to 5

* Fixed syntax error for string formatting

* Added metrics to the reporter

* Fixed reporter issues

* Uncommented plot training curves

* Fix import error

* Made reporter a handler

* Fixed args issue

* Added exponential search space

* Added batch_size as a hyperparam to dataset class

* Added more models to text_classification

* Removed big rnn for now

* Added rules for tokenization of text data

* Now printing network architecture as well

* Changed the rules for tokenization

* Added Dropouts and Dense Layers as a hyperparam

* Added todo to fine tune LM

* Changed upper bound for batch size

* Now printing task ID as well along with the result

* Now added task ID to the reporter as well

* Added num_gpus to the args

* Added unit tests (dummy for now)

* Added skeleton for autogluon initializers

* Added demo jupyter notebooks

* Updated IMDB notebook

* Updated Demo notebook for Stanford Sentiment Treebank dataset

* Added NER base structure

* adding pipeline + model zoo for NER

* adding LR warmup handler

* NER CoNLL2003 dataset

* NER dataset format conversion

* Added NER HPO codebase

* adding core + example for NER

* update pipeline, dataset, core

* fixes

* add eval helper code

* move data proc code to utils, fixes

* Added WNUT2017 dataset support

* fix num_classes

* fix num_classes

* add bertadam optimizer

* pre-defined parameters

* Increased the maximum sequence length

* move helper code to task utils

* Modified dataset preprocessing code

* fix class name, rebase

* fix

* Added comments for modifying and copying the NER data methods from GluonNLP toolkit

* The WNUT-2017 dataset now downloads automatically, user just needs to pass the dataset name

* pylint check(round 1)

* pylint check(round 2) and import seqeval library for fetching some NER data methods

* add multi-gpu support

* pylint check(round 3)

* Minor coding formats fix

* fix multi-gpu, working version

* Cleanup

* Minor code fix

* add default params for datasets

* Minor contructor fix

* update default seq len for wnut17

* adding demo notebook

* update demo notebook

* update notebook

* add early stopping

* update net construction config

* Initial commit for making MXBoard/TensorBoard as a handler to pass to the estimator

* Added TensorBoard requirements

* Added TensorBoard support to Text classification in the form of a handler

* Refactored the transforms, speeding up the data len functions

* Added dataset name for BERT

* Added BERTAdam optimizer for BERT

* Added BERT Networks

* Added BertClassifier block

* Added support for Bert Model to the pipeline

* Added DataLoader handler for BERT

* Now passing BertDataLoaderHandler to the Estimator, instead of using SentimentDataLoaderHandler()

* Added support for BERT Models and refactored pipeline.py in text-classification

* Bunch of pycharm formatting changes

* Added example classes for Glue SST2, MNLI and Yelp Datasets

* Fixed a typo for val set

* Fixed LR range issue

* Fixed missing argument to function call

* Fixed typo in model_zoo

* add support for ontonotes-v5, auto max seq len, cleanup

* [WIP] Unittest for Named Entity Recognition

* Added more unittest for Named Entity Recognition

* adding NER integration tests

* Added more integration test methods for NER

* Added nosetest module for NER

* fix nets, optims, batch_size api for NER & add advanced user example

* refactor Scheduler to pull out Terminator

* add missing files/fix Terminator

* set cpu affinity

* assign cpu affinity within the task

* integer casting

* terminator updates

* adding jnlpba, bc5cdr datasets + fixes

* Moved dataset/utils to text_classification_dataset/utils

* Added placeholder for buildspec.yml and pylintrc

* Refactored setup.py

* Added bdist info

* Fixed setup.py issue

* Added requirements.txt and reading it in setup.py

* Fixed wrong mapping of Sent : Label when reading tsv dataset

* Added Train Field indices and Val Field indices for TSV Datasets as kwargs

* Fixed issue of loading data lengths by using multi processing

* Added LR Warmup Handler use to Text Classification's pipeline

* Now plotting Train metrics as well at epoch end as well as fixed index issues while reading MNLI dataset

* Added support for GLUE - MRPC Dataset

* Updated the download dataset method

* Removed vocab getters and setters from dataset

* Now loading json files as SimpleDataset and removed methods to load dataset from gluonnlp

* Moved transforms from dataset to task

* Added num_workers parameter to the dataset

* Added losses/metrics and moved dataset class inside TC.task

* Removed core.py as it's not needed anymore

* Reduced code duplication by creating a lightweight dataset class

* Added MXBoard Handler to estimator

* Removed uncommented code for fifo scheduler

* Now printing the exception along with its stacktrace

* Now printing the exception along with its stacktrace

cr https://code.amazon.com/reviews/CR-11188375

* Added reading of datasets in .txt format

* Removed NER task from the CR

* Added support for multi-sentence in TextDataTransform

* Added support for multi-text datasets

* Added task specific optimizers for text classification

* Removed task-id from the reporter

* Removed task_id and EPOCH_END callback from reporter

* Removed big RNN and en_de_transformer

* Now making DataLoaderHandler a single class

* Removed ClassificationHead class

* Renamed init_env to init_hparams

* Removed MXBoard Handler

* Separated model, dataset, transforms from the method

* Added dataset.py to read GluonNLP Datasets

* Refactored the dataset class

* Fixing import issues

* Undoing formatting changes

* Undoing formatting changes

* Fixed issue with return of Batchify_Fn

* Now updating validation dataset labels as well

* Removed extra files from examples folder

* Undoing the CI changes

* Removed split and load and instead now calling nlp.split and load

* Removed initializer folder for now

* Removed Exponential

* Added _Dataset to read the different formats

* Undoing formatting changes

* Removed unused files

* Added deleted import for version

* Now passing results back to scheduler via reporter

* Addressed PR comments

* Refactor get_transform_fn into task.dataset

* Addressed PR Comments

* allow uploading files

* exception handl

* reporter

* add train val split

* split reko datasets

* handle pipeerror

* wip

* rm unused

* advanced API

* rm print

* try error

* import ok

* wip cifar training ok

* advanced api wip

* add missing file

* call method

* current progress

* controller sample okay

* rl progress, controller sample okay

* rl cifar example training okay

* rm comment

* Skopt searcher (open-mmlab#4)

* Added skopt_searcher.py for BayesOpt search routine + unit-test comparing this searcher against the RandomSampling searcher on a toy optimization problem. Remaining TODOs: 1) include script to benchmark skopt_searcher against Hyperband in real autogluon image-classification task. 2) There is an issue that get_config() may become stuck in infinite while loop (for all searchers). There is no termination condition to handle the case where all possible configs have already been tried (should be inherited from BaseSearcher or Scheduler should automatically terminate).

* edited BaseTask to allow for skopt Bayesian Optimization hyperparameters search via additional searcher argument value 'bayesopt'. Added train_cifar10.py example to compare random search with BayesOpt search (under default settings for all other flags in this script). Results are in new table added to scripts/image_classification/README.md

* Rebased master into skopt. Cleaned up documentation/comments to be more presentable

* rl training

* viz

* rl controller running okay

* rl controller state dict ready

* add dependencies

* update fit etc

* test pipeline

* merge hackthon docs

* add nas progress

* reorganize folders

* working progress

* major features

* Hackathon version freeze (open-mmlab#11)

* add image classification notebook and update api doc

* address comments

* update

* update result

* add functinoalities

* update tutorials

* update tutorial

* update

* update

* update mds

* update fit etc

* update fit hackathon

* update

* handle pipeerror

* update+

* add skopt

* add searcher and scheduler notebooks

* New version of image_classification_searcher.md	

Has a couple of remaining TODOs.  The biggest issue I see is that fit() in base_task.py cannot take any keyword arguments for constructing the Searcher.

* how to pass keyword args to searcher

removed all TODOs as well. This notebook execution still needs to be tested with the new base_task.fit code that uses 'searcher_options' as a kwarg.  

Something that is still missing from this tutorial is what is the hyperparameter search space that is actually being searched here?  A curious user probably wants to know this information.
I would add a short section right before "## Random hyperparameter search" to clarify this, for example:

By default, `image_classification.fit()` will search for hyperparameter values within the following search space: 
# TODO: explicitly list the default search space.

* added searcher_options

* add lr scheduler

* update example

* Update image_classification_scheduler.md

* minor typo correction

dict definitions for searcher_options corrected

* fix own dataset

* update large dataset test

* fixed skopt bug to handle ValueError exception

* Update image_classification_scheduler.md

* Update image_classification_scheduler.md

* try to fix large data

* address comments

* test large data

* fix pipeline

* test predict batch

* fix

* test pipeline

* img name

* Freeze autogluon version

* fit running

* add utils

* fit running, pending final fit

* evaluate

* docs

* fix merge error

* fix merge error

* fix

* fix

* init method for autogluon object attr

* docs compiled

* documentation

* rm sub docs

* address merging error

* address soem comments

* choice

* choice

* fix typo

* docs improvement

* address comments

* change list with choice

* rename

* fix typo

* address some comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants