Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Overflow of 16-bit integer in Agg renderer causes PolyCollections to be drawn at incorrect locations #23826

Open
Exploder98 opened this issue Sep 8, 2022 · 6 comments

Comments

@Exploder98
Copy link

Bug summary

Because the pixel x coordinates of scanline cells are casted from 32-bit integers to 16-bit ones in Agg (this line), some polygons that are outside the drawing area may actually be drawn at an incorrect location if they are far enough from the viewport so that the 16-bit integer used for actual drawing overflows and the value happens to be inside the drawing area.

Code for reproduction

import matplotlib.pyplot as plt
import matplotlib
matplotlib.use('TkAgg')

# Create a 120x120 px figure
fig = plt.figure(figsize=(120, 120), dpi=1)
# Add axis that fills the whole figure
ax = fig.add_axes([0, 0, 1, 1])
ax.axis('off')
# Plot one fill_between at x, y = 0, width = 100 and height = 50
ax.fill_between([0, 100], 0, 50, color='r')
# Plot another fill_between at x = 65536, y = 50, width=100 and height = 50
# 65536 causes the 16-bit integer to overflow so that the coordinate becomes actually 0
ax.fill_between([2 ** 16, 2 ** 16 + 100], 50, 100, color='b')
# Set limits to ensure that one pixel in the rasterized image corresponds to one unit in the axis
ax.set_xlim(-10, 110)
ax.set_ylim(-10, 110)
# Agg has the bug, Pdf does not
plt.savefig('bug.png', dpi=1)
plt.savefig('nobug.pdf', dpi=1)
plt.show()

Actual outcome

The PolyCollections are both drawn in the image, even though they are 65536 units apart and the x axis goes from -10 to 110.
bug

Expected outcome

There should only be the red rectangle, as in this PDF.
nobug.pdf

Additional information

I debugged this quite a bit, and turns out it is a bit tricky to replicate this. The PolyCollections need to be able to fit in the axes completely so that the optimized draw_markers function is called. Then, inside Agg, transforms are applied so that pixel coordinates are calculated. These coordinates are fine, as they use 32-bit integers. However, when Agg draws the scanlines, it converts the coordinates to 16-bit integers. If the axis is zoomed in enough and everything lines up just right, some points that have very high positive or very high negative pixel coordinates may appear in the viewport because the int16 conversion causes the 16-bit coordinate to over- or underflow and to be inside the viewport. An obvious fix could be to use int32 instead of int16 while drawing, but I don't know if that's possible or if matplotlib is even the right place to report, as this bug seems to be in Agg code. The original figure where I discovered this had many fill_betweens, and when zoomed in, some fill_betweens from earlier or later in the data would appear.

Note that at least on my machine, the figure that is shown in the window might not have the bug present, but the saved PNG file should have it.

I checked also the main branch (version 3.6.0.dev3132+gf8cf0ee5e7), and it has the bug.

Operating system

Probably anything where Agg runs (verified on RHEL8 and Windows)

Matplotlib Version

3.5.3

Matplotlib Backend

QtAgg

Python version

3.9.13

Jupyter version

No response

Installation

conda

@oscargus
Copy link
Contributor

This seems indeed a bit dodgy!

Also, at least on my install, the window changes size, jumping back and forth, when hovering over the figure with TkAgg. On QtAgg, that does not happen, but the bug is present.

As far as I know, we rely on a since long forked version of Agg, so although it may also be present in other forks, it is worth changing in our one for sure. If you have a bug fix ready, it would be much appreciated!

@Exploder98
Copy link
Author

Also, at least on my install, the window changes size, jumping back and forth, when hovering over the figure with TkAgg.

Yes, this behaviour is also present on my install. I believe this is caused by the fact that when you hover over the figure, text showing the current x and y positions of the cursor shows up and Tk resizes the window to fit that text. Qt does not do this, so the window size stays constant.

If you have a bug fix ready, it would be much appreciated!

Unfortunately, I don't have such a fix ready. I also think that switching the variable to a 32-bit integer would only make the bug less apparent, since putting the blue rectangle 2 ** 32 units away would then overflow the 32-bit integer too (well, the overflow would happen already during the pixel conversion in that case). I think a "proper" fix would be to calculate beforehand which items should be drawn and then only draw those, but I don't know how I could do that.

@oscargus
Copy link
Contributor

Ahh, yes, you are correct about the positions causing the flicker.

And correct about the 32-bit overflow. But it will probably decrease the probability a bit. Considering when the code was written, I expect 16-bit multiplications to be relatively much cheaper than 32-bit ones at that time, although nowadays it shouldn't matter much. I also guess that having images with 64k pixels is nowadays a bit more common, so pushing it a bit could for sure help, although filtering it before drawing is clearly the better option in the long run.

@jklymak
Copy link
Member

jklymak commented Sep 12, 2022

Can you please test with dpi equal to 100. There are some strange roundoff problems with dpi=1

@Exploder98
Copy link
Author

The bug happens regardless of DPI. I chose dpi=1 to keep the example simple, since with any other DPI the figure dimensions need to be scaled (divided by the DPI) so that one axis unit equals one pixel in the final image. Here are the images with dpi=100 and dimensions scaled accordingly (120 / 100), and the blue rectangle is visible in the rasterized image but not in the PDF file:
bug
nobug.pdf

@tacaswell
Copy link
Member

https://github.com/matplotlib/matplotlib/security/code-scanning?query=pr%3A23812+tool%3ACodeQL+is%3Aclosed identifies a bunch of warnings in Agg that look like https://github.com/matplotlib/matplotlib/security/code-scanning/16 .

I have a small hope that adding local up-casts at all of the places it identified may fix this. As noted above, our version of Agg has already diverged from the other people who have vendored Agg so while not ideal, it is not the end of the world or a new step to apply local changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants