Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why Batch inference is not support currently? #2644

Open
LovingThresh opened this issue Feb 24, 2023 · 12 comments
Open

Why Batch inference is not support currently? #2644

LovingThresh opened this issue Feb 24, 2023 · 12 comments
Assignees

Comments

@LovingThresh
Copy link

AssertionError: Batch inference is not support currently, as the image size might be different in a batch.

It is Confusing.

@xiexinch
Copy link
Collaborator

Hi @LovingThresh,
Since the validation or test set images have different shapes, support batch inference at evaluation might affect the result.

@edsml-hmc122
Copy link

Hi @LovingThresh, Since the validation or test set images have different shapes, support batch inference at evaluation might affect the result.

See #2965
It would be nice to at least have support for batched inference when all images have the same dimensions.

@louan1998
Copy link

louan1998 commented Jun 25, 2023

I followed the official documentation to run the "unet_s5-d16_deeplabv3_4xb4-40k_chase-db1-128x128.py" module, and reported an error "AssertionError: Batch inference is not support currently, as the image size might be different in a batch" when iterating to 4000 times, and then I use the --resume command to continue training on the weight of 4000 iterations, and report this error again when it reaches 8000 times. This data set is also supported by mmseg. I saw that the size of the pictures inside is the same. why is that?

@edsml-hmc122
Copy link

I followed the official documentation to run the "unet_s5-d16_deeplabv3_4xb4-40k_chase-db1-128x128.py" module, and reported an error "AssertionError: Batch inference is not support currently, as the image size might be different in a batch" when iterating to 4000 times, and then I use the --resume command to continue training on the weight of 4000 iterations, and report this error again when it reaches 8000 times. This data set is also supported by mmseg. I saw that the size of the pictures inside is the same. why is that?

@louan1998 I think right now it doesn't matter what the image sizes actually are. MMSegmentation will reject the inference if batch_size != 1, even if the images are all the same size. It's just not implemented yet, from what I understand. :(

@louan1998
Copy link

louan1998 commented Jun 27, 2023

@edsml-hmc122 That's right! I noticed that too! If batch_size = 1 in val_dataloader, there is no problem! Anyway, batch_size doesn't play that big role in validation and testing, right?

@chenhuagg
Copy link

@louan1998 How did you solve it in the end?

xiexinch added a commit that referenced this issue Jul 20, 2023
Thanks for your contribution and we appreciate it a lot. The following
instructions would make your pull request more healthy and more easily
get feedback. If you do not understand some items, don't worry, just
make the pull request and seek help from maintainers.

## Motivation

#3181
#2965
#2644
#1645
#1444
#1370
#125

## Modification

Remove the assertion at data_preprocessor

## BC-breaking (Optional)

Does the modification introduce changes that break the
backward-compatibility of the downstream repos?
If so, please describe how it breaks the compatibility and how the
downstream projects should modify their code to keep compatibility with
this PR.

## Use cases (Optional)

If this PR introduces a new feature, it is better to list some use cases
here, and update the documentation.

## Checklist

1. Pre-commit or other linting tools are used to fix the potential lint
issues.
2. The modification is covered by complete unit tests. If not, please
add more unit test to ensure the correctness.
3. If the modification has potential influence on downstream projects,
this PR should be tested with downstream projects, like MMDet or
MMDet3D.
4. The documentation has been modified accordingly, like docstring or
example tutorials.
@louan1998
Copy link

louan1998 commented Jul 30, 2023

@louan1998 How did you solve it in the end?

Set the batch_size in the val_dataloader to 1
val_dataloader = dict( batch_size=1, num_workers=16, persistent_workers=True, sampler=dict(type='DefaultSampler', shuffle=False), dataset=dict( type=dataset_type, data_root=data_root, data_prefix=dict( img_path='images/validation', seg_map_path='annotations/validation'), pipeline=test_pipeline))

@edsml-hmc122
Copy link

@edsml-hmc122 That's right! I noticed that too! If batch_size = 1 in val_dataloader, there is no problem! Anyway, batch_size doesn't play that big role in validation and testing, right?

It plays a huge role if you have a lot of validation/testing data. The process is slowed down by a massive amount, and you are under-utilising your GPU. I hope the developers will be able to implement batched inference asap, it's the biggest downside of MMSegmentation in my opinion, compared to other libraries. Also, MMDetection supports batched inference from what I understand, maybe they can port the code.

@xiexinch
Copy link
Collaborator

Hi @edsml-hmc122,
I removed this limitation, you might have a try. If there are any problems, feel free to create an issue, and we'll fix it asap.
https://github.com/open-mmlab/mmsegmentation/blob/main/mmseg/models/data_preprocessor.py#L135

@edsml-hmc122
Copy link

edsml-hmc122 commented Aug 10, 2023

Hi @edsml-hmc122,
I removed this limitation, you might have a try. If there are any problems, feel free to create an issue, and we'll fix it asap.
https://github.com/open-mmlab/mmsegmentation/blob/main/mmseg/models/data_preprocessor.py#L135

Thank you, I will try to test it when I have time! Also, might be worth setting this issue to "open" again.

@edsml-hmc122
Copy link

edsml-hmc122 commented Aug 20, 2023

@xiexinch

Hi @edsml-hmc122, I removed this limitation, you might have a try. If there are any problems, feel free to create an issue, and we'll fix it asap. https://github.com/open-mmlab/mmsegmentation/blob/main/mmseg/models/data_preprocessor.py#L135

Hi, I have done some testing.
Seems that in the general sense, batched inference is working. More GPU VRAM is used and the logger shows fewer total iterations (since batches are larger).

However, the larger I make the batch size, the lower also the inference speed, so it doesn't lead to much acceleration from what I can tell.
The GPU usage is also very strange, it is mostly 0 and spikes to 100% every 1-2 seconds.
So I think this is a good start, but it doesn't seem to really accelerate the inference process yet.

Tested on an RTX 3080, using a batch size of 20 it used 5637MiB / 10240MiB VRAM.
It took 1m22s for 1000 samples of size 512x512.
Edit: With a batch size of 10, it took only ~20s.

Using a batch size of 1, the time was around 2m30s, I thought maybe the speedup should be larger.

Please re-open this issue if you think it's worth investigating.

@xiexinch
Copy link
Collaborator

@xiexinch

Hi @edsml-hmc122, I removed this limitation, you might have a try. If there are any problems, feel free to create an issue, and we'll fix it asap. https://github.com/open-mmlab/mmsegmentation/blob/main/mmseg/models/data_preprocessor.py#L135

Hi, I have done some testing. Seems that in the general sense, batched inference is working. More GPU VRAM is used and the logger shows fewer total iterations (since batches are larger).

However, the larger I make the batch size, the lower also the inference speed, so it doesn't lead to much acceleration from what I can tell. The GPU usage is also very strange, it is mostly 0 and spikes to 100% every 1-2 seconds. So I think this is a good start, but it doesn't seem to really accelerate the inference process yet.

Tested on an RTX 3080, using a batch size of 20 it used 5637MiB / 10240MiB VRAM. It took 1m22s for 1000 samples of size 512x512. Edit: With a batch size of 10, it took only ~20s.

Using a batch size of 1, the time was around 2m30s, I thought maybe the speedup should be larger.

Please re-open this issue if you think it's worth investigating.

Thanks for your feedback! We'll test this case and then find a better solution. Could you provide your config to us if it's available?

@xiexinch xiexinch reopened this Aug 21, 2023
nahidnazifi87 pushed a commit to nahidnazifi87/mmsegmentation_playground that referenced this issue Apr 5, 2024
Thanks for your contribution and we appreciate it a lot. The following
instructions would make your pull request more healthy and more easily
get feedback. If you do not understand some items, don't worry, just
make the pull request and seek help from maintainers.

## Motivation

open-mmlab#3181
open-mmlab#2965
open-mmlab#2644
open-mmlab#1645
open-mmlab#1444
open-mmlab#1370
open-mmlab#125

## Modification

Remove the assertion at data_preprocessor

## BC-breaking (Optional)

Does the modification introduce changes that break the
backward-compatibility of the downstream repos?
If so, please describe how it breaks the compatibility and how the
downstream projects should modify their code to keep compatibility with
this PR.

## Use cases (Optional)

If this PR introduces a new feature, it is better to list some use cases
here, and update the documentation.

## Checklist

1. Pre-commit or other linting tools are used to fix the potential lint
issues.
2. The modification is covered by complete unit tests. If not, please
add more unit test to ensure the correctness.
3. If the modification has potential influence on downstream projects,
this PR should be tested with downstream projects, like MMDet or
MMDet3D.
4. The documentation has been modified accordingly, like docstring or
example tutorials.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants