Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

my own data set #15

Closed
oneOfThePeople opened this issue May 24, 2017 · 13 comments
Closed

my own data set #15

oneOfThePeople opened this issue May 24, 2017 · 13 comments

Comments

@oneOfThePeople
Copy link

i try to train the network on my own dataset .
and i get warning like this :

data "gt_boxes" has a shape (1L, 174L, 5L), which is larger than already allocated shape (1L, 100L, 5L). Need to re-allocate. Consider putting default_bucket_key to be the bucket taking the largest input for better memory sharing

someone know what is mean...

@realwecan
Copy link

Strange, I did not encounter this issue with my own data. What dataset do you use?

@HaozhiQi
Copy link
Collaborator

This is the mechanism of MXNet. You need to specify a somehow 'maximum' shape before building the computational graph.
Therefore, just change the '100' in those two lines https://github.com/msracver/FCIS/blob/master/fcis/train_end2end.py#L93-L94
to '200' or '500' will solve this warning.

@xiaxianxiaxian
Copy link

Hi, how to read your own data to anthor's code framework ? Do you turn them into voc(.mat file) or coco(.jsom file) annotation format ?

@realwecan
Copy link

@xiaxianxiaxian Yes that's what I have done.

@oneOfThePeople
Copy link
Author

@oh233 thank you
@xiaxianxiaxian i convert to mat file

@ogail
Copy link

ogail commented Jun 24, 2017

@oneOfThePeople could you share link/instructions how did you create the mat files? I have ground truth of polygon points for each object in my dataset and I want to convert that to be compatible with the pascal voc mat format

@oneOfThePeople
Copy link
Author

i use this code but i'm not sure if its help for any other dataset

@ziyu919
Copy link

ziyu919 commented Aug 25, 2017

@oneOfThePeople could you tell me how to create .geojson files?i tried to use the code,but the .geojson files are needed

@oneOfThePeople
Copy link
Author

its sound complicated way to create geojson and then mat....
but i used gdal library if its help

@ogail
Copy link

ogail commented Aug 29, 2017

@ziyu919 what i ended up doing is changing the code to read my own created dataset of serialized numpy arrays for each object category and instance. Way more simpler and does not require any complicated processing.

@scholltan
Copy link

@ogail is it convenient to share the scripts or part of it ?

@ogail
Copy link

ogail commented Aug 29, 2017

Imagine that your labeled data is in form of polygon points, each polygon surrounds an object of interest, here's code I had

def create_pascal_voc_sbd_dataset(raw_learning_examples, out_dir, split_thresh=0.9, split_name='val.txt'):
    """
    Creates SBD dataset for training instance segmentation networks. For more info check
    http://home.bharathh.info/pubs/codes/SBD/download.html
    :param raw_learning_examples: list of learning examples
    :param out_dir: the output dataset directory
    :param split_thresh: the percentage of the training data.  Defaults to use 90% for training.
    :param split_name: the name of the non-training output file.  Defaults to val.txt file.
    """

    # create dataset dir
    layout_dir = os.path.join(out_dir, 'ImageSets', 'Main')
    images_dir = os.path.join(out_dir, 'img')
    inst_dir = os.path.join(out_dir, 'inst')
    cls_dir = os.path.join(out_dir, 'cls')

    if not os.path.exists(out_dir):
        os.makedirs(out_dir)
    if not os.path.exists(layout_dir):
        os.makedirs(layout_dir)
    if not os.path.exists(images_dir):
        os.makedirs(images_dir)
    if not os.path.exists(inst_dir):
        os.makedirs(inst_dir)
    if not os.path.exists(cls_dir):
        os.makedirs(cls_dir)

    # shuffle the provided dataset to avoid construction bias
    shuffle(raw_learning_examples)

    # create the instance segmentation dataset
    train_cutoff = int(len(raw_learning_examples) * split_thresh)
    for learning_example_count, (img, xml_file) in enumerate(raw_learning_examples):
        print('preparing learning example {}/{}'.format(learning_example_count + 1, len(raw_learning_examples)))
        label_filename, detection_objects = parse(xml_file)
        filename_prefix = label_filename.split('.')[0]
        img_filename = filename_prefix + '.jpg'
        w, h = img.size
        cls_mask = np.zeros((h, w))
        inst_mask = np.zeros((h, w))

        for obj_id, obj in enumerate(detection_objects, 1):
            x_points = []
            y_points = []
            for x, y in obj.poly:
                x_points.append(x)
                y_points.append(y)
                # the problem in hand has one class only, hard code it as 1
                cls_mask[skipoly(np.array(y_points), np.array(x_points))] = 1
                inst_mask[skipoly(np.array(y_points),
                                  np.array(x_points))] = obj_id

        if learning_example_count < train_cutoff:
            layout_filename = 'train.txt'
        else:
            layout_filename = split_name

        # write the learning example to the proper layout file (train or validation))
        with open(os.path.join(layout_dir, layout_filename), 'a') as out_file:
            # write the learning example image
            out_file.write(filename_prefix + '\n')
        img.save(os.path.join(images_dir, img_filename))

        # write instance mask to inst folder
        np.save(os.path.join(inst_dir, filename_prefix), inst_mask)

        # write class mask to inst folder
        np.save(os.path.join(cls_dir, filename_prefix), cls_mask)

@ziyu919
Copy link

ziyu919 commented Sep 4, 2017

@ogail thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants