Skip to content
This repository has been archived by the owner on Mar 12, 2024. It is now read-only.

DETR tutorials to use it on custom data :) #428

Open
NielsRogge opened this issue Aug 25, 2021 · 21 comments
Open

DETR tutorials to use it on custom data :) #428

NielsRogge opened this issue Aug 25, 2021 · 21 comments

Comments

@NielsRogge
Copy link

NielsRogge commented Aug 25, 2021

Some months ago, I've added DETR to HuggingFace Transformers 馃 I've replaced the original torchvision backbones (ResNets) such that you can use any backbone available in the timm repository (like EfficientNets or MobileNets etc.) 馃コ The model is implemented using the same API as other models in HuggingFace like BERT (i.e. you have DetrModel which is the encoder-decoder Transformer without any head on top, DetrForObjectDetection which has the object detection and class labels classifier heads on top and DetrForSegmentation, which adds the mask head on top).

The model weights are hosted on the HuggingFace hub. The documentation can be found here: https://huggingface.co/transformers/model_doc/detr.html

I've made 5 different notebooks, illustrating how to use DETR both for inference and training on custom data, both for object detection and panoptic segmentation. You can find them here: https://github.com/NielsRogge/Transformers-Tutorials

I've also made a notebook for evaluating the model on COCO. Hope it helps for people to easily use DETR!

Issues for which this might be helpful:

#421
#419
#378
#366
#341
#307
#205
#190
#179
#153
#152
#148
#141
#125
#111
#109
#89
#84
#53
#40
#28
#21
#14
#9

PS the most epic thing was Yann LeCun himself tweeting about this: https://twitter.com/ylecun/status/1405640394143113219

@NielsRogge NielsRogge changed the title DETR tutorials to use it on custom data DETR tutorials to use it on custom data :) Aug 25, 2021
@ducvuuit
Copy link

image
I train on my dataset, at epoch 0, it always stops at 5% without any error, Ram and memory are always guaranteed during training.
Does anyone have the same problem as me?

@NielsRogge
Copy link
Author

Hi, I've just run my notebook, it still works fine. That's weird. You can perhaps restart the runtime and run again?

@ducvuuit
Copy link

ducvuuit commented Sep 4, 2021

Hi @NielsRogge, I fixed the error above, only readjust "max_steps". But the time for training is too long. With 1 epoch takes over 7 hours while the official source code only takes about 30 minutes for each epoch. I adjusted the batch size but also take over 7 hours. How to solve this problem?
Thank you in advance.

@NielsRogge
Copy link
Author

Yes, it takes 30 minutes per epoch if you have 8 GPUs, as stated in the README. On a single GPU, it will take a bit longer. ;)

@mytk2012
Copy link

It seems that authors advise people to finetune model to apply in custom dataset. If we have enough datasets(>10k), shall we need to fintune it? It's weired we must finetune detr model , but CNN model can be applied in coco-format without finetuing.

@NielsRogge
Copy link
Author

Yes, I would fine-tune the whole model if your dataset is about 10k images. Only the class labels classifier would need to be trained from scratch.

@ver0z
Copy link

ver0z commented Nov 18, 2021

Do you know if it is possible to use DeepSpeed with DETR ? It could help to speedup the training, don't it ?

@NielsRogge
Copy link
Author

Hi,

DeepSpeed is mostly meant to fit very big models in one or more GPUs. Its use case is not to speed up training, iirc.

@ver0z
Copy link

ver0z commented Nov 18, 2021

So if I pass a big number of batch could it be useful in this case ?

@ohhenrylee
Copy link

During training, is the CNN backbone trained at the same time as DETR or will pre-training be required?

@NielsRogge
Copy link
Author

The CNN is trained at the same time as the encoder-decoder Transformer, however one starts with a pre-trained CNN and a randomly initialized encoder-decoder Transformer. One also typically uses a different learning rate between the CNN backbone and the Transformer.

@NielsRogge
Copy link
Author

NielsRogge commented Oct 16, 2023

Hi,

Replacing the backbone can be done as follows:

from transformers import ConvNextConfig, DetrConfig, DetrForObjectDetection

backbone_config = ConvNextConfig(out_features=["stage1", "stage2", "stage3", "stage4"])
config = DetrConfig(backbone_config=backbone_config, use_timm_backbone=False)

model = DetrForObjectDetection(config)

This will work out-of-the-box for convolutional backbones like ConvNeXt:

import torch

pixel_values = torch.randn(1, 3, 224, 224)
outputs = model(pixel_values)

For a vision transformer-based backbone, I'd recommend using ViTDet:

from transformers import VitDetConfig, DetrConfig, DetrForObjectDetection

backbone_config = VitDetConfig(out_features=["stage1", "stage2", "stage3", "stage4"])
config = DetrConfig(backbone_config=backbone_config,  use_timm_backbone=False)

model = DetrForObjectDetection(config)

@truong2710-cyber
Copy link

truong2710-cyber commented Dec 1, 2023

Hi @NielsRogge, I have a question.
If I trained a DeTR model with your notebook (in HuggingFace), can I convert the checkpoint back to the format of the original DeTR repo https://github.com/facebookresearch/detr?
Thanks.

@truong2710-cyber
Copy link

Btw, I have a task in which I have to train Detr on remapped COCO dataset containing only 2 classes (class 1: person, class 2: vehicle corresponding to 3 classes in original COCO (car, truck, bus)). What should I do?

@NielsRogge
Copy link
Author

@truong2710-cyber sure, you could do that. This is the conversion script used to rename the keys of the state dictionary from the original repo to the HF format, so you could also do it the other way around.

In case you have 2 classes, then you can initialize the DETR model as follows:

from transformers import DetrForObjectDetection

model = DetrForObjectDetection.from_pretrained("facebook/detr-resnet-50", num_labels=2, ignore_mismatched_sizes=True)

This will reuse all existing layers of the DETR model pre-trained on COCO, and randomly initialize a new classification head. See also my notebook regarding fine-tuning.

@truong2710-cyber
Copy link

Thanks a lot @NielsRogge . If I have a A100 GPU, could you estimate how long does it take to finetune the model on my remapped COCO dataset?

@SergiyShebotnov
Copy link

Btw, I have a task in which I have to train Detr on remapped COCO dataset containing only 2 classes (class 1: person, class 2: vehicle corresponding to 3 classes in original COCO (car, truck, bus)). What should I do?

Have you managed to convert a COCO dataset to Huggingface dataset format? If so, could share the steps?

@truong2710-cyber
Copy link

Btw, I have a task in which I have to train Detr on remapped COCO dataset containing only 2 classes (class 1: person, class 2: vehicle corresponding to 3 classes in original COCO (car, truck, bus)). What should I do?

Have you managed to convert a COCO dataset to Huggingface dataset format? If so, could share the steps?

I think Huggingface directly uses COCO format, isn't it?

@SergiyShebotnov
Copy link

SergiyShebotnov commented Dec 1, 2023

HF does not support datasets in the COCO format (See here huggingface/transformers#25337), they use their own HF format. You have to write your own COCO to HF converter for COCO segmentation datasets.

Googling around I did not find it, I am yet to see a public script loading instance segmentations in COCO format to HF format.

@NielsRogge
Copy link
Author

I made this notebook to upload a COCO dataset to the hub.

The dataset lives here: https://huggingface.co/datasets/nielsr/coco-panoptic-val2017

@Kilikia123
Copy link

Kilikia123 commented Feb 17, 2024

Please, can you show me your training and valid loss curve on epochs?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants