Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Deformable DETR #17281

Merged
merged 82 commits into from
Sep 14, 2022
Merged

Conversation

NielsRogge
Copy link
Contributor

@NielsRogge NielsRogge commented May 16, 2022

What does this PR do?

This PR implements Deformable DETR, which improves the original DETR using a new "deformable attention" module.

This model requires a custom CUDA kernel (hence it can only be run on GPU). Other than that, the API is entirely the same as DETR.

Models are on the hub.

@NielsRogge NielsRogge requested review from sgugger and LysandreJik May 16, 2022 14:12
Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the PR wasn't necessarily in a state ready for review. Please make sure all docstrings are finished and code is generally cleaned up before asking reviewers to look.

docs/source/en/index.mdx Outdated Show resolved Hide resolved
setup.py Outdated Show resolved Hide resolved
src/transformers/__init__.py Outdated Show resolved Hide resolved
src/transformers/models/auto/configuration_auto.py Outdated Show resolved Hide resolved
src/transformers/models/deformable_detr/test.py Outdated Show resolved Hide resolved
@NielsRogge NielsRogge force-pushed the add_deformable_detr branch from b26d1f6 to 74ec0b5 Compare May 18, 2022 08:25
@NielsRogge
Copy link
Contributor Author

NielsRogge commented May 18, 2022

Addressed most comments. I would like to have:

  • @Narsil reviewing the initialization of the model using the custom CUDA kernel
  • @LysandreJik (and possibly @Narsil) help me out regarding making the CI green for a model that only runs on GPU. Should we define a custom CI job for this particular model?
  • @NouamaneTazi will take care of the remaining comments regarding clearer variable names/docstrings, as he has a detailed understanding of this model.

@LysandreJik
Copy link
Member

@LysandreJik (and possibly @Narsil) help me out regarding making the CI green for a model that only runs on GPU. Should we define a custom CI job for this particular model?

We have a require_torch_gpu decorator. Would it help in that case? We could add it to the model tester as a whole, if the model needs GPU to run.

@NielsRogge NielsRogge force-pushed the add_deformable_detr branch from 6e224b8 to 7f5fb0a Compare June 2, 2022 12:49
@NielsRogge
Copy link
Contributor Author

NielsRogge commented Jun 2, 2022

@Narsil there's an issue with the pipeline tests, I added DeformableDetrForObjectDetection to the object detection mapping, but this model requires the custom CUDA kernel to be run.

Also, CircleCI reports the following:

Traceback (most recent call last):
  File "utils/check_repo.py", line 764, in <module>
    check_repo_quality()
  File "utils/check_repo.py", line 753, in check_repo_quality
    check_models_are_in_init()
  File "utils/check_repo.py", line 305, in check_models_are_in_init
    for module in get_model_modules():
  File "utils/check_repo.py", line 267, in get_model_modules
    modeling_module = getattr(model_module, submodule)
  File "/home/circleci/.local/lib/python3.7/site-packages/transformers/utils/import_utils.py", line 866, in __getattr__
    value = self._get_module(name)
  File "/home/circleci/.local/lib/python3.7/site-packages/transformers/utils/import_utils.py", line 883, in _get_module
    ) from e
RuntimeError: Failed to import transformers.models.deformable_detr.modeling_deformable_detr because of the following error (look up to see its traceback):
[Errno 2] No such file or directory: '/home/circleci/.local/lib/python3.7/site-packages/transformers/models/deformable_detr/custom_kernel/vision.cpp'

I might need some help with this.

@Narsil
Copy link
Contributor

Narsil commented Jun 3, 2022

@Narsil there's an issue with the pipeline tests, I added DeformableDetrForObjectDetection to the object detection mapping, but this model requires the custom CUDA kernel to be run.

The generic tests will always run the model on CPU, so the best way is to discard this model from the test.

Doing if isinstance(pipeline.models, Deformable...): self.skipTest("This model requires a custom CUDA kernel and is NOT implemented for CPU") should be enough IMO (we know how to update later when needed).

I would also add a slow GPU test that tries to use the pipeline directly if that's OK for the CI.

@require_gpu
@require_torch
@slow
def test_slow(self):
    pipe = pipeline(model="hf-internal-testing/....", device=0)
    out = pipe(....)
    self.assertEqual(out, {....})

Does that make sense ? If it's hard to have a GPU test (not sure we ever call those anyway for pipelines, no @LysandreJik then we can do without but even if it's not auto tested there's value in creating the test IMO (it will run on local machines that try to run the test)

@Narsil
Copy link
Contributor

Narsil commented Jun 3, 2022

As for the missing file, It's probably because the setup.py doesn't properly include the file when installing transformers.

I don't really have good pointers for that since you seem to have added the correct line. The main advice would be to do
python -m build and looking at the output to check that the proper .cpp, .h .cuh are properly included in the build folder. (Installing from source with pip install -e . won't work as it always copy all the files I think so you won't see how the built version fails, maybe it does I am unsure)

@NielsRogge NielsRogge force-pushed the add_deformable_detr branch from 8f1bfbe to ec61d72 Compare June 16, 2022 15:04
@stas00
Copy link
Contributor

stas00 commented Jun 16, 2022

OK, so looking at why the custom kernel fails to build:

_ ERROR collecting tests/models/deformable_detr/test_modeling_deformable_detr.py _
src/transformers/utils/import_utils.py:893: in _get_module
    return importlib.import_module("." + module_name, self.__name__)
/usr/local/lib/python3.7/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
<frozen importlib._bootstrap>:1006: in _gcd_import
    ???
<frozen importlib._bootstrap>:983: in _find_and_load
    ???
<frozen importlib._bootstrap>:967: in _find_and_load_unlocked
    ???
<frozen importlib._bootstrap>:677: in _load_unlocked
    ???
<frozen importlib._bootstrap_external>:728: in exec_module
    ???
<frozen importlib._bootstrap>:219: in _call_with_frames_removed
    ???
src/transformers/models/deformable_detr/modeling_deformable_detr.py:49: in <module>
    MSDA = load_cuda_kernels()
src/transformers/models/deformable_detr/load_custom.py:45: in load_cuda_kernels
    "-D__CUDA_NO_HALF2_OPERATORS__",
../.local/lib/python3.7/site-packages/torch/utils/cpp_extension.py:1156: in load
    keep_intermediates=keep_intermediates)
../.local/lib/python3.7/site-packages/torch/utils/cpp_extension.py:1367: in _jit_compile
    is_standalone=is_standalone)
../.local/lib/python3.7/site-packages/torch/utils/cpp_extension.py:1438: in _write_ninja_file_and_build_library
    verify_ninja_availability()
../.local/lib/python3.7/site-packages/torch/utils/cpp_extension.py:1494: in verify_ninja_availability
    raise RuntimeError("Ninja is required to load C++ extensions")
E   RuntimeError: Ninja is required to load C++ extensions

This occurs quite often. The build is missing ninja.

Try adding pip install ninja to the CircleCI job workflow and see if it solves the problem. Please ping me if it doesn't.

@stas00
Copy link
Contributor

stas00 commented Jun 16, 2022

Additionally, if we start having custom cuda kernels that are enabled by default we must include ninja in our main python dependencies in setup.py.

@stas00
Copy link
Contributor

stas00 commented Jun 18, 2022

so installing ninja did the trick of overcoming the initial hurdle. as commented above - if we make it work it should go into setup.py's dependencies and not the job file - but for now this is good enough while we figure out how to make it work.

Now it's failing:

E   OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.

because CircleCI is cpu-only and doesn't have cuda installed by default.

Basically your custom cuda kernel requires cuda installed to build. You don't have to have a gpu to build it, but it needs to be installed.

@ydshieh, do you by chance know if we are planning to get cuda installed on CircleCI? it's easy to do via apt directly from nvidia with .deb packages. Except it's not fast if it's reinstalled on every job run.

@NielsRogge, does this model work on CPU at all? i.e. is there a fallback to non-custom kernel in the absense of GPUs? If it is then the code should be modified to verify if there is a CUDA environment available and if it's not available not to load the custom kernel and everything will just work.

@NielsRogge
Copy link
Contributor Author

The model only runs on GPU and requires the custom kernel. The authors do provide a CPU version here, but it's for "debugging and testing purposes only".

@ydshieh
Copy link
Collaborator

ydshieh commented Jun 18, 2022

The current CircleCI jobs use the docker image circleci/python:3.7. If we decide to install cuda, I think we can build a custom docker image based on it.

@ydshieh
Copy link
Collaborator

ydshieh commented Jun 18, 2022

If it is not too much work to make running on both CPU/GPU work (considering the authors provide some implementation), I would advocate doing it - also mainly for "debugging and testing purposes only".

@NielsRogge
Copy link
Contributor Author

If it is not too much work to make running on both CPU/GPU work (considering the authors provide some implementation), I would advocate doing it - also mainly for "debugging and testing purposes only".

Hmm I looked into the code, the problem is that their CPU version doesn't accept 2 arguments (level_start_index and im2col_step) which the CUDA version has, and are required for correct computation. Hence, I don't think it's possible to have a CPU version of it in the library. The authors also explicitly indicate that the layer isn't implemented on CPU.

@stas00
Copy link
Contributor

stas00 commented Jun 18, 2022

  1. OK, so if the CPU version is not the same then we won't be testing the actual modeling code - not a good idea. let's stick to testing the actual GPU modeling code.

  2. You're setting a new precedent with this model, @NielsRogge - so we need to decide how to deal with such models, so let's bring @LysandreJik and @sgugger to this discussion - I wonder if we should perhaps discuss this in a separate RFC Issue since it will probably impact other similar models in the future.

But we need:

a. the modeling files not fail on import in an environment that lacks cuda installed- so probably either using the earlier suggestion of moving the model loading into __init__ (less ideal) or using try/except and recovering gracefully if cuda env is not availble.

b. the tests for such model should all be decorated with @require_torch_gpu - so it might be tricky with common tests - I wonder if perhaps decorating the test class with @require_torch_gpu would do the trick.

c. the testing will have to happen on our CI that has GPUs. which means no "real-time" testing.

@NielsRogge
Copy link
Contributor Author

b. the tests for such model should all be decorated with @require_torch_gpu - so it might be tricky with common tests - I wonder if perhaps decorating the test class with @require_torch_gpu would do the trick.

I've done this as seen here: NielsRogge@ec61d72.

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also not 100% sure investing time in a model that is only accessible on GPU is the best, as it restricts a lot the number of users that can play with it, and there won't be any regular tests or inference widget.

However this one is done, so let's finish this (just saying the above for the selection of future models we implement). The main problem for the tests is just the line

MSDA = load_cuda_kernels()

flagged above. It should be inside an if is_torch_cuda_available() and the else branch should set the same object to None. Then all models should error at init if there is no GPU and the whole tests of those models should be decorated by the right require decorator.

.circleci/config.yml Outdated Show resolved Hide resolved
@NielsRogge
Copy link
Contributor Author

Pinging @Narsil regarding excluding this model from the pipeline tests.

@Narsil
Copy link
Contributor

Narsil commented Jun 27, 2022

Hi @NielsRogge ,

The best location to do this is in tests/pipelines/test_pipelines_xxxx.py and simply add some logic in get_test_pipeline function.

But the tests currently seem to be passing, so is this really necessary ?

@NielsRogge NielsRogge force-pushed the add_deformable_detr branch from 571a25f to 191de2d Compare July 1, 2022 07:20
@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jul 1, 2022

The documentation is not available anymore as the PR was closed or merged.

@NielsRogge
Copy link
Contributor Author

PR is ready for review, by adding the model to the mappings this happens:

ERROR tests/pipelines/test_pipelines_feature_extraction.py - RecursionError: ...
ERROR tests/pipelines/test_pipelines_object_detection.py - RecursionError: ma...
!!!!!!!!!!!!!!!!!!! Interrupted: 2 errors during collection !!!!!!!!!!!!!!!!!!!!

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should resolve the errors in the pipeline tests.

tests/pipelines/test_pipelines_feature_extraction.py Outdated Show resolved Hide resolved
tests/pipelines/test_pipelines_object_detection.py Outdated Show resolved Hide resolved
@NielsRogge
Copy link
Contributor Author

@sgugger that didn't seem to fix the recursion error.

@NielsRogge NielsRogge requested a review from sgugger September 13, 2022 15:31
Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for polishing this PR!

@robinnarsinghranabhat
Copy link

Hi @NielsRogge . I am following the finetuning notebook for DETR object detection.

You have mentioned that DeformableDETR follows mostly same API. But I noticed that model based on DeformableDetrForObjectDetection doesn't automatically add +1 to number classes.

Also, for the Feature-Extractor, I am confused whether we should opt for as per documentation to use AutoImageProcessor or DeformableDetrFeatureExtractor instead.

To add further, I was wondering if we could add in the augmentation that the original paper follows from the official Repo. I managed to add augmentation based on functions available in official repo for Deformable-DETR. But not sure of the correctness.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants