Custom dataset training notebook ? #152

jeromen7 · 2020-07-16T13:21:05Z

Hello,

I was wondering if someone managed to write a notebook for training DETR on a custom dataset.
I saw the issue #9 but there is a lot of messages and nobody provided a complete solution for what I am looking for.
The kaggle solution seems to work well but I don't know how to generalize it with multiple classes (and not only one for the wheat), and without the k-fold cross validation, which is great but adds a lot of computation time.
Moreover, the README in this git says that the dataset should be in COCO format with a json annotation files, but the kaggle solution never uses json files ... So I don't know the expected format of a dataset in order to do a training with DETR on it... csv file ? json file ? xyxy or xywh format or both ?
To sum up, I am looking for a simple and well-structured notebook that works on a dataset split into 2 folders (train and validation) like for example :

Imports and installations
Loading the model and configuration (setting values like num_queries, num_classes according to the object detection problem we want to solve)
Training the model (with the path to the dataset, and an example of a small dataset would be awesome to see the format)
Inference and eventually metrics generation to evaluate our model

I know I am asking a lot, but any little assistance will be very appreciated !
Thank you all very much for your help 😄

lessw2020 · 2020-07-20T17:58:01Z

Hi @jeromen7 - I've been working on one but work has delayed it. do you have a good example dataset in mind? That will help as I can't use my work ones and haven't found a good one yet.
Was hoping to find a covid mask detector or similar dataset or something pertinent. (one was proposed before for step detection but that dataset was not setup well as the maker had manually blended in flipped images making it impossible to seperate val from training images).

lessw2020 · 2020-07-20T18:02:44Z

Re: dataset format - I've been using coco format json with xyxy format (convert from coco format). Some others have had luck with cxcy.
The detr codebase was built around coco json with a change to xyxy format for input - you can look at coco.py under /datasets to see how they did it.
Note my half complete colab is here if that helps add some context for you: https://github.com/lessw2020/training-detr

woctezuma · 2020-07-20T18:09:02Z

do you have a good example dataset in mind?

Was hoping to find a covid mask detector or similar dataset or something pertinent.

What about one of these?

https://www.kaggle.com/andrewmvd/face-mask-detection

https://www.kaggle.com/mbkinaci/fruit-images-for-object-detection

https://www.kaggle.com/wobotintelligence/face-mask-detection-dataset

jeromen7 · 2020-07-21T07:25:51Z

Thank you for your answer @lessw2020 !
Now I know the appropriate format for my dataset, I will work on it.
If you need a dataset for the notebook, maybe this one will help you ?
https://drive.google.com/drive/folders/1XUR4ci88ABahff3TOxoT9GuxbjP7NwCq?usp=sharing

Thank you again for using some of your time to help us all ! 🙏

lessw2020 · 2020-07-22T04:46:40Z

Great thanks @woctezuma and @jeromen7
I think the covid mask one Jeromen posted would be easiest to work with as already in coco format (vs kaggles ones are csv)...you can use either but needs an extra translation step.
@jeromen7 - I'm unclear though on this dataset if 'no mask' = persons face with no mask? There are some images of a building etc and it can't deal with an infinite no so just wanted to confirm what no mask class means.
(maybe it's obvious once I preview with bounding boxes but I just looked at images).
Anyway let me get my upcoming work demo ready by Fri, and then get this nice covid mask dataset up and running in a notebook to show.

jeromen7 · 2020-07-23T06:47:41Z

Yes you are right @lessw2020 , the 'no mask' class if for faces with no mask, and the 'mask' class is for faces with a mask.

bconsolvo · 2020-07-27T18:52:47Z

@lessw2020 Thank you for your work so far - I am looking forward to seeing the rest of the custom training dataset Colab notebook. I took the course on creating Coco Datasets (https://www.udemy.com/course/creating-coco-datasets/), by Adam Kelly. Now that I have my own image data and annotations in the Coco format, I am looking to start training with DETR.

I haven't found anybody else who has posted an example of training with DETR on a custom dataset, yet.

woctezuma · 2020-07-27T19:44:16Z

I haven't found anybody else who has posted an example of training with DETR on a custom dataset, yet.

You can finetune DETR either directly:

code (directions to edit some files): https://gist.github.com/mlk1337/651297e28199b4bb7907fc413c49f58f
reading material: Recommendations for training Detr on custom dataset? #9

Or with the "detectron2" wrapper:

code (similar to what you would want): https://colab.research.google.com/drive/16jcaJoc6bCFAQ96jDe2HwtXj7BMD_-m5
reading material: https://github.com/facebookresearch/detr/tree/master/d2

alcinos mentioned this issue Jul 28, 2020

FineTuning with similar format dataset as coco but different classes #153

Closed

NielsRogge mentioned this issue Aug 25, 2021

DETR tutorials to use it on custom data :) #428

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom dataset training notebook ? #152

Custom dataset training notebook ? #152

jeromen7 commented Jul 16, 2020

lessw2020 commented Jul 20, 2020

lessw2020 commented Jul 20, 2020

woctezuma commented Jul 20, 2020 •

edited

jeromen7 commented Jul 21, 2020

lessw2020 commented Jul 22, 2020

jeromen7 commented Jul 23, 2020

bconsolvo commented Jul 27, 2020

woctezuma commented Jul 27, 2020 •

edited

Custom dataset training notebook ? #152

Custom dataset training notebook ? #152

Comments

jeromen7 commented Jul 16, 2020

lessw2020 commented Jul 20, 2020

lessw2020 commented Jul 20, 2020

woctezuma commented Jul 20, 2020 • edited

jeromen7 commented Jul 21, 2020

lessw2020 commented Jul 22, 2020

jeromen7 commented Jul 23, 2020

bconsolvo commented Jul 27, 2020

woctezuma commented Jul 27, 2020 • edited

woctezuma commented Jul 20, 2020 •

edited

woctezuma commented Jul 27, 2020 •

edited