-
Notifications
You must be signed in to change notification settings - Fork 27.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Deformable DETR #17281
Add Deformable DETR #17281
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the PR wasn't necessarily in a state ready for review. Please make sure all docstrings are finished and code is generally cleaned up before asking reviewers to look.
src/transformers/models/deformable_detr/modeling_deformable_detr.py
Outdated
Show resolved
Hide resolved
src/transformers/models/deformable_detr/modeling_deformable_detr.py
Outdated
Show resolved
Hide resolved
src/transformers/models/deformable_detr/modeling_deformable_detr.py
Outdated
Show resolved
Hide resolved
b26d1f6
to
74ec0b5
Compare
src/transformers/models/deformable_detr/modeling_deformable_detr.py
Outdated
Show resolved
Hide resolved
Addressed most comments. I would like to have:
|
We have a |
6e224b8
to
7f5fb0a
Compare
@Narsil there's an issue with the pipeline tests, I added Also, CircleCI reports the following:
I might need some help with this. |
The generic tests will always run the model on CPU, so the best way is to discard this model from the test. Doing I would also add a slow GPU test that tries to use the pipeline directly if that's OK for the CI.
Does that make sense ? If it's hard to have a GPU test (not sure we ever call those anyway for pipelines, no @LysandreJik then we can do without but even if it's not auto tested there's value in creating the test IMO (it will run on local machines that try to run the test) |
As for the missing file, It's probably because the I don't really have good pointers for that since you seem to have added the correct line. The main advice would be to do |
8f1bfbe
to
ec61d72
Compare
OK, so looking at why the custom kernel fails to build:
This occurs quite often. The build is missing Try adding |
Additionally, if we start having custom cuda kernels that are enabled by default we must include |
so installing ninja did the trick of overcoming the initial hurdle. as commented above - if we make it work it should go into Now it's failing:
because CircleCI is cpu-only and doesn't have Basically your custom cuda kernel requires @ydshieh, do you by chance know if we are planning to get @NielsRogge, does this model work on CPU at all? i.e. is there a fallback to non-custom kernel in the absense of GPUs? If it is then the code should be modified to verify if there is a CUDA environment available and if it's not available not to load the custom kernel and everything will just work. |
The model only runs on GPU and requires the custom kernel. The authors do provide a CPU version here, but it's for "debugging and testing purposes only". |
The current CircleCI jobs use the docker image |
If it is not too much work to make running on both CPU/GPU work (considering the authors provide some implementation), I would advocate doing it - also mainly for "debugging and testing purposes only". |
Hmm I looked into the code, the problem is that their CPU version doesn't accept 2 arguments ( |
But we need: a. the modeling files not fail on b. the tests for such model should all be decorated with c. the testing will have to happen on our CI that has GPUs. which means no "real-time" testing. |
I've done this as seen here: NielsRogge@ec61d72. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also not 100% sure investing time in a model that is only accessible on GPU is the best, as it restricts a lot the number of users that can play with it, and there won't be any regular tests or inference widget.
However this one is done, so let's finish this (just saying the above for the selection of future models we implement). The main problem for the tests is just the line
MSDA = load_cuda_kernels()
flagged above. It should be inside an if is_torch_cuda_available()
and the else branch should set the same object to None. Then all models should error at init if there is no GPU and the whole tests of those models should be decorated by the right require decorator.
src/transformers/models/deformable_detr/modeling_deformable_detr.py
Outdated
Show resolved
Hide resolved
Pinging @Narsil regarding excluding this model from the pipeline tests. |
Hi @NielsRogge , The best location to do this is in But the tests currently seem to be passing, so is this really necessary ? |
571a25f
to
191de2d
Compare
The documentation is not available anymore as the PR was closed or merged. |
PR is ready for review, by adding the model to the mappings this happens:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should resolve the errors in the pipeline tests.
@sgugger that didn't seem to fix the recursion error. |
bc68c55
to
48a26c6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for polishing this PR!
Hi @NielsRogge . I am following the finetuning notebook for DETR object detection. You have mentioned that DeformableDETR follows mostly same API. But I noticed that model based on Also, for the Feature-Extractor, I am confused whether we should opt for as per documentation to use To add further, I was wondering if we could add in the augmentation that the original paper follows from the official Repo. I managed to add augmentation based on functions available in official repo for Deformable-DETR. But not sure of the correctness. |
What does this PR do?
This PR implements Deformable DETR, which improves the original DETR using a new "deformable attention" module.
This model requires a custom CUDA kernel (hence it can only be run on GPU). Other than that, the API is entirely the same as DETR.
Models are on the hub.