Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

No factory functions for strided quantized tensors #74540

Open
ezyang opened this issue Mar 22, 2022 · 9 comments
Open

No factory functions for strided quantized tensors #74540

ezyang opened this issue Mar 22, 2022 · 9 comments
Labels
feature A request for a proper, new feature. low priority We're unlikely to get around to doing this in the near future oncall: quantization Quantization support in PyTorch triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@ezyang
Copy link
Contributor

ezyang commented Mar 22, 2022

馃悰 Describe the bug

For non-quantized tensors, there is both empty and empty_strided. However, for quantized tensors there are only empty variants for functions. This means that it is difficult for quantized operators to properly preserve strides when they should (e.g., TensorIterator style ops).

If this is affecting you please comment here.

Versions

master

cc @jerryzh168 @jianyuh @raghuramank100 @jamesr66a @vkuzo

@ezyang
Copy link
Contributor Author

ezyang commented Mar 22, 2022

Note that #32867 (comment) means that not all strides are valid; the innermost dimension must be kept contiguous

@ngimel
Copy link
Collaborator

ngimel commented Mar 22, 2022

Quantization pw support differs from eager a lot (e.g. they don't support arbitrary broadcast), and they typically offload to 3rd party library rather than rely on TI, so just enabling empty_strided won't solve this

@VitalyFedyunin VitalyFedyunin added oncall: quantization Quantization support in PyTorch triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Mar 25, 2022
@github-actions github-actions bot added this to Need Triage in Quantization Triage Mar 25, 2022
@VitalyFedyunin VitalyFedyunin added the feature A request for a proper, new feature. label Mar 25, 2022
@terrychenism terrychenism added the low priority We're unlikely to get around to doing this in the near future label Apr 6, 2022
@terrychenism
Copy link
Contributor

per discussion with @vkuzo there is not a good way to use TensorIterator on quantized tensors and preserve strides, and set this issue as low priority.

@ericsorides
Copy link

Is there any news in this front?
I am trying to quantize mobilenet_v3 and this error pops up when trying to do inferences.

Thanks

@YonatanSimson
Copy link

How would one convert and empty strided tensor to a strided tensor?

@Mofang-Shi
Copy link

it seems that quint8 doesn't support torch.round()

@Lydaidai
Copy link

Hello, I'm trying to quantize yolov8, this error redirects me here. Error is located here:

class Bottleneck(nn.Module):
    """Standard bottleneck."""

def __init__(self, c1, c2, shortcut=True, g=1, k=(3, 3), e=0.5):
    """Initializes a bottleneck module with given input/output channels, shortcut option, group, kernels, and
    expansion.
    """
    super().__init__()
    c_ = int(c2 * e)  # hidden channels
    self.cv1 = Conv(c1, c_, k[0], 1)
    self.cv2 = Conv(c_, c2, k[1], 1, g=g)
    self.add = shortcut and c1 == c2

def forward(self, x):
    """'forward()' applies the YOLO FPN to input data."""
    y = self.cv2(self.cv1(x))
    x = x+y            ############### error here ###############
    return x`

and the logs:
Exception has occurred: RuntimeError (note: full exception trace is shown but execution is paused at: _run_module_as_main)

empty_strided not supported on quantized tensors yet see #74540

File "/usr/local/lib/python3.8/dist-packages/ultralytics/nn/modules/block.py", line 341, in forward
x=x+y
I try to replace it with torch.add(x, y) but the error is still there, how can I fix it?

@Lydaidai
Copy link

Hello, I'm trying to quantize yolov8, this error redirects me here. Error is located here:

class Bottleneck(nn.Module):
    """Standard bottleneck."""

def __init__(self, c1, c2, shortcut=True, g=1, k=(3, 3), e=0.5):
    """Initializes a bottleneck module with given input/output channels, shortcut option, group, kernels, and
    expansion.
    """
    super().__init__()
    c_ = int(c2 * e)  # hidden channels
    self.cv1 = Conv(c1, c_, k[0], 1)
    self.cv2 = Conv(c_, c2, k[1], 1, g=g)
    self.add = shortcut and c1 == c2

def forward(self, x):
    """'forward()' applies the YOLO FPN to input data."""
    y = self.cv2(self.cv1(x))
    x = x+y            ############### error here ###############
    return x`

and the logs: Exception has occurred: RuntimeError (note: full exception trace is shown but execution is paused at: _run_module_as_main)

empty_strided not supported on quantized tensors yet see #74540

File "/usr/local/lib/python3.8/dist-packages/ultralytics/nn/modules/block.py", line 341, in forward x=x+y I try to replace it with torch.add(x, y) but the error is still there, how can I fix it?

class Bottleneck(nn.Module):
    """Standard bottleneck."""

    def __init__(self, c1, c2, shortcut=True, g=1, k=(3, 3), e=0.5):
        """Initializes a bottleneck module with given input/output channels, shortcut option, group, kernels, and
        expansion.
        """
        super().__init__()
        c_ = int(c2 * e)  # hidden channels
        self.cv1 = Conv(c1, c_, k[0], 1)
        self.cv2 = Conv(c_, c2, k[1], 1, g=g)
        self.add = shortcut and c1 == c2
        self.quant = nn.quantized.FloatFunctional()

    def forward(self, x):
        """'forward()' applies the YOLO FPN to input data."""
        return self.quant.add(x, self.cv2(self.cv1(x))) if self.add else self.cv2(self.cv1(x))

I solved this using torch.nn.quantized.FloatFunctional().

@SalahuddinSSH
Copy link

@ezyang Hi! I am using pytorch 2.0.0 and I am also facing this issue:

2024-05-16T18:02:10,871 [WARN ] W-9000-q_1.0-stderr MODEL_LOG - 2024-05-16 18:02:10 - INFO - Backend received inference at: 1715871730
2024-05-16T18:02:10,899 [WARN ] W-9000-q_1.0-stderr MODEL_LOG - 2024-05-16 18:02:10 - ERROR - Inference failed: empty_strided not supported on quantized tensors yet see https://github.com/pytorch/pytorch/issues/74540
2024-05-16T18:02:10,899 [WARN ] W-9000-q_1.0-stderr MODEL_LOG - 2024-05-16 18:02:10 - ERROR - Error handling request: empty_strided not supported on quantized tensors yet see https://github.com/pytorch/pytorch/issues/74540
2024-05-16T18:02:10,901 [WARN ] W-9000-q_1.0-stderr MODEL_LOG - 2024-05-16 18:02:10 - WARNING - Invoking custom service failed.
2024-05-16T18:02:10,901 [WARN ] W-9000-q_1.0-stderr MODEL_LOG - Traceback (most recent call last):
2024-05-16T18:02:10,901 [WARN ] W-9000-q_1.0-stderr MODEL_LOG -   File "/Users/jussilopponen/miniconda3/envs/sshai/lib/python3.10/site-packages/ts/service.py", line 134, in predict
2024-05-16T18:02:10,901 [WARN ] W-9000-q_1.0-stderr MODEL_LOG -     ret = self._entry_point(input_batch, self.context)
2024-05-16T18:02:10,901 [WARN ] W-9000-q_1.0-stderr MODEL_LOG -   File "/private/var/folders/f4/n4pf2dwx0lq5dz711gp49pdr0000gq/T/models/500ef664d22d4e5cbbaba1952a8c6ccb/model_handler.py", line 137, in handle
2024-05-16T18:02:10,901 [WARN ] W-9000-q_1.0-stderr MODEL_LOG -     model_output = self.inference(processed_data)
2024-05-16T18:02:10,901 [WARN ] W-9000-q_1.0-stderr MODEL_LOG -   File "/private/var/folders/f4/n4pf2dwx0lq5dz711gp49pdr0000gq/T/models/500ef664d22d4e5cbbaba1952a8c6ccb/model_handler.py", line 117, in inference
2024-05-16T18:02:10,901 [WARN ] W-9000-q_1.0-stderr MODEL_LOG -     outputs = self.model(**data)
2024-05-16T18:02:10,901 [WARN ] W-9000-q_1.0-stderr MODEL_LOG -   File "/Users/jussilopponen/miniconda3/envs/sshai/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
2024-05-16T18:02:10,902 [WARN ] W-9000-q_1.0-stderr MODEL_LOG -     return self._call_impl(*args, **kwargs)
2024-05-16T18:02:10,902 [WARN ] W-9000-q_1.0-stderr MODEL_LOG -   File "/Users/jussilopponen/miniconda3/envs/sshai/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
2024-05-16T18:02:10,902 [INFO ] W-9000-q_1.0 ACCESS_LOG - /127.0.0.1:58254 "POST /predictions/q HTTP/1.1" 503 139
2024-05-16T18:02:10,902 [WARN ] W-9000-q_1.0-stderr MODEL_LOG -     return forward_call(*args, **kwargs)
2024-05-16T18:02:10,902 [WARN ] W-9000-q_1.0-stderr MODEL_LOG -   File "/private/var/folders/f4/n4pf2dwx0lq5dz711gp49pdr0000gq/T/models/500ef664d22d4e5cbbaba1952a8c6ccb/model.py", line 427, in forward
2024-05-16T18:02:10,902 [INFO ] W-9000-q_1.0 TS_METRICS - Requests5XX.Count:1.0|#Level:Host|#hostname:J3VY2LJ00G,timestamp:1715871730
2024-05-16T18:02:10,902 [WARN ] W-9000-q_1.0-stderr MODEL_LOG -     hidden_states = self.transformer(tokens, start_pos)
2024-05-16T18:02:10,902 [WARN ] W-9000-q_1.0-stderr MODEL_LOG -   File "/Users/jussilopponen/miniconda3/envs/sshai/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
2024-05-16T18:02:10,902 [DEBUG] W-9000-q_1.0 org.pytorch.serve.job.Job - Waiting time ns: 105197167, Inference time ns: 137218500
2024-05-16T18:02:10,902 [WARN ] W-9000-q_1.0-stderr MODEL_LOG -     return self._call_impl(*args, **kwargs)
2024-05-16T18:02:10,902 [WARN ] W-9000-q_1.0-stderr MODEL_LOG -   File "/Users/jussilopponen/miniconda3/envs/sshai/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
2024-05-16T18:02:10,902 [DEBUG] W-9000-q_1.0 org.pytorch.serve.wlm.WorkerThread - sent a reply, jobdone: true
2024-05-16T18:02:10,902 [WARN ] W-9000-q_1.0-stderr MODEL_LOG -     return forward_call(*args, **kwargs)
2024-05-16T18:02:10,902 [INFO ] W-9000-q_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 30
2024-05-16T18:02:10,902 [WARN ] W-9000-q_1.0-stderr MODEL_LOG -   File "/private/var/folders/f4/n4pf2dwx0lq5dz711gp49pdr0000gq/T/models/500ef664d22d4e5cbbaba1952a8c6ccb/model.py", line 349, in forward
2024-05-16T18:02:10,902 [WARN ] W-9000-q_1.0-stderr MODEL_LOG -     mask = mask.to(torch.float32).triu(diagonal=start_pos + 1).type_as(h)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature A request for a proper, new feature. low priority We're unlikely to get around to doing this in the near future oncall: quantization Quantization support in PyTorch triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
Quantization Triage
  
Need Triage
Development

No branches or pull requests

9 participants