DETR tutorials to use it on custom data :) #428
Comments
Hi, I've just run my notebook, it still works fine. That's weird. You can perhaps restart the runtime and run again? |
Hi @NielsRogge, I fixed the error above, only readjust "max_steps". But the time for training is too long. With 1 epoch takes over 7 hours while the official source code only takes about 30 minutes for each epoch. I adjusted the batch size but also take over 7 hours. How to solve this problem? |
Yes, it takes 30 minutes per epoch if you have 8 GPUs, as stated in the README. On a single GPU, it will take a bit longer. ;) |
It seems that authors advise people to finetune model to apply in custom dataset. If we have enough datasets(>10k), shall we need to fintune it? It's weired we must finetune detr model , but CNN model can be applied in coco-format without finetuing. |
Yes, I would fine-tune the whole model if your dataset is about 10k images. Only the class labels classifier would need to be trained from scratch. |
Do you know if it is possible to use DeepSpeed with DETR ? It could help to speedup the training, don't it ? |
Hi, DeepSpeed is mostly meant to fit very big models in one or more GPUs. Its use case is not to speed up training, iirc. |
So if I pass a big number of batch could it be useful in this case ? |
During training, is the CNN backbone trained at the same time as DETR or will pre-training be required? |
The CNN is trained at the same time as the encoder-decoder Transformer, however one starts with a pre-trained CNN and a randomly initialized encoder-decoder Transformer. One also typically uses a different learning rate between the CNN backbone and the Transformer. |
Hi, Replacing the backbone can be done as follows:
This will work out-of-the-box for convolutional backbones like ConvNeXt:
For a vision transformer-based backbone, I'd recommend using ViTDet:
|
Hi @NielsRogge, I have a question. |
Btw, I have a task in which I have to train Detr on remapped COCO dataset containing only 2 classes (class 1: person, class 2: vehicle corresponding to 3 classes in original COCO (car, truck, bus)). What should I do? |
@truong2710-cyber sure, you could do that. This is the conversion script used to rename the keys of the state dictionary from the original repo to the HF format, so you could also do it the other way around. In case you have 2 classes, then you can initialize the DETR model as follows:
This will reuse all existing layers of the DETR model pre-trained on COCO, and randomly initialize a new classification head. See also my notebook regarding fine-tuning. |
Thanks a lot @NielsRogge . If I have a A100 GPU, could you estimate how long does it take to finetune the model on my remapped COCO dataset? |
Have you managed to convert a COCO dataset to Huggingface dataset format? If so, could share the steps? |
I think Huggingface directly uses COCO format, isn't it? |
HF does not support datasets in the COCO format (See here huggingface/transformers#25337), they use their own HF format. You have to write your own COCO to HF converter for COCO segmentation datasets. Googling around I did not find it, I am yet to see a public script loading instance segmentations in COCO format to HF format. |
I made this notebook to upload a COCO dataset to the hub. The dataset lives here: https://huggingface.co/datasets/nielsr/coco-panoptic-val2017 |
Please, can you show me your training and valid loss curve on epochs? |
Some months ago, I've added DETR to HuggingFace Transformers 馃 I've replaced the original torchvision backbones (ResNets) such that you can use any backbone available in the timm repository (like EfficientNets or MobileNets etc.) 馃コ The model is implemented using the same API as other models in HuggingFace like BERT (i.e. you have
DetrModel
which is the encoder-decoder Transformer without any head on top,DetrForObjectDetection
which has the object detection and class labels classifier heads on top andDetrForSegmentation
, which adds the mask head on top).The model weights are hosted on the HuggingFace hub. The documentation can be found here: https://huggingface.co/transformers/model_doc/detr.html
I've made 5 different notebooks, illustrating how to use DETR both for inference and training on custom data, both for object detection and panoptic segmentation. You can find them here: https://github.com/NielsRogge/Transformers-Tutorials
I've also made a notebook for evaluating the model on COCO. Hope it helps for people to easily use DETR!
Issues for which this might be helpful:
#421
#419
#378
#366
#341
#307
#205
#190
#179
#153
#152
#148
#141
#125
#111
#109
#89
#84
#53
#40
#28
#21
#14
#9
PS the most epic thing was Yann LeCun himself tweeting about this: https://twitter.com/ylecun/status/1405640394143113219
The text was updated successfully, but these errors were encountered: