-
Notifications
You must be signed in to change notification settings - Fork 27.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RenderRepaintBoundary.toImage() occasionally returns a blank image #43085
Comments
Ian, thank you very much for opening this new thread and taking the time to create this demo project. As you say, this is a critical issue that prevents you, me and as it seems many more people from upgrading to current stable version. Hopefully google team can take care of this soon, thank you guys in advance for looking into this. |
Please ignore the flutter doctor result regaring cocoapad. It's wiped out after Catalina upgrade but got it fixed. |
@gisinator really hope this issue is addressed asap. My app is already in prod and unabled to update the outstanding issues. |
@cbracken this is a pretty bad regression. |
This also happened to me, so is there any solution about it? |
Thanks for reporting this. This is not likely to get worked on by the team in the near future. We work through issues in priority order starting with TODAY, then customer: blocker then customer: critical. Once those have been burnt through we'll be chasing issues further down the list. Aside from the above labels, we use thumbs-up reactions as a means of measuring priority so if you're affected by this, adding thumbs-up reactions will help ensure this moves up our list. |
FWIW -- if this is a regression, then bisecting where it was introduced would be massively helpful in helping the team produce a fix (or just revert the offending commit). |
This also happened to me, so is there any solution about it? |
Bisected this to flutter/engine@78a8ca0 |
|
@jason-simmons and I were able to track down the issue using gapid. If you look at line 22640 you can see we are doing a glClear before we render to our frame buffer, then in 22649 we actually do the draw call. This is the case where everything worked and the glReadPixels on line 22652 gets correct output. If you look at line 19931 we are performing a glClear, again to prepare for rendering to our frame buffer. Then if you look at line 20012 you'll notice there is another glReadPixels, but between the glClear and the glReadPixels, there is no actual draw call. That is why when we do the glReadPixels, it returns an all black image. My diagnosis is that for some reason Skia is not performing the glDrawRangeElements call. Perhaps it is detecting some internal error state and is aborting the call? It's not clear from the OpenGL calls why it wouldn't perform the draw. |
When drawing the image, Skia's The image draw succeeds when
Possibly this is a sign that the engine is doing something wrong related to how we share image textures between the IO thread's GrContext and the GPU thread's context? |
@jason-simmons @gaaclarke thanks for looking into this issue. |
this issue needs more love. @gaaclarke do you think using v1.9.1+hotfix.6 with your commit(flutter/engine@78a8ca0) reverted would be too risky? |
@ianpark Things are slowed down a bit since we've tracked down the issue for Skia. We are working with them to get it triaged and addressed. Assuming the patch reverts cleanly it should be safe. It is a performance PR so hopefully the only downside would be performance. However, since it changes threading models there is a risk that the code has shifted its assumptions enough already to need to be run with the new threading model. |
Linked Skia Bug: https://bugs.chromium.org/p/skia/issues/detail?id=9581 |
@gaaclarke thanks for the update. Sorry I am getting desperate as I found the news versions of some other dependencies fail to build with 1.7.8 hence the isolation level of my app gets increased time by time. Hope Skia team would turn around quickly. |
Okay, I think the fix may need to happen in Flutter. What's happening: Cross-context images are designed so that they can only be used by one GrContext at a time. Internally, the image has the actual GL texture. When the image is referenced by a draw with a particular GrContext, we wrap the GL texture in a temporary Skia texture object, and remember the ID of the GrContext that's using it. Once that draw has finished and the Skia texture object is disposed of, we reset the ID on the cross-context image so that a different GrContext can start using it (or the same one, again). This is necessary, because some OpenGL texture state is tied to the texture object itself, so there's no safe way to be using the same texture on two threads at the same time. The error that's happening suggests that the image is still in use by another GrContext when the raster snapshot is being made. The regressing change moved the raster snapshot logic to the IO thread, so we can assume that the GPU thread has the image in use. The simplest safe fix is to synchronize with the GPU thread when doing a raster snapshot, waiting for it to flush the current frame before trying to do anything. |
Thanks @brianosman for looking into it, that makes sense. We can look into that and it shouldn't be too difficult. Are you sure this isn't something we'd want at the Skia level? Basically a lock around glReadPixels for reading and glBindFrameBuffer for writing when it comes to cross context images. |
@brianosman At the very least we should make that noop have a warning message, instead of just drawing nothing silently. |
@gaaclarke Skia CL to warn in this situation has landed: https://skia.googlesource.com/skia/+/6e1d51a2c74ce26c499cb141920007d3dda11435 |
I think the same SkImage instance is being referred to by the SkPicture passed to Picture.toImage() and by an SkPicture in the layer tree that is rendered onscreen. If the first SkPicture is rasterized on the IO thread and the second SkPicture is rasterized on the GPU thread, then it's possible for that SkImage to be consumed by both threads concurrently. I don't think we can prevent that without moving Picture.toImage() rendering back to the GPU thread. |
As far as I know, the only cross-context images are those explicitly created on the IO thread (for decoding large images that are assets in the application). It would have to be something that went through here: https://github.com/flutter/engine/blob/646b594d5e03e1873bbb021d0f4b2994777808de/lib/ui/painting/image_decoder.cc#L177 |
Here is my theory of what is happening and the bug. If "render to fbo" and "access for drawing" happen at the same time we get the collision we were experiencing. My theory is that the SkPicture and the Texture are generated just for |
Okay, I talked with @jason-simmons offline. My whole assumption is wrong. The image in contention is the actual decoded image created by the IO thread, not one that was created as a result of the This problem is because 2 different threads are trying to read from the same texture at the same time, not that one is trying to write to it while another is reading. Here is a stackoverflow question asking about reading from shared objects and getting a crash. So, it is definitely a thing that could be a problem depending on the opengl driver. Since Skia gives us no visibility to the usage of the texture, it is hidden down in SkPicture internals, there is no meaningful way we can protect usage of the texture. I believe that Skia should implement the mutual exclusion but it sounds like there is an objection to that which I don't understand yet. I'll follow up with Skia, @brianosman?. In the meantime the only safe thing Flutter can do is to move That is a shame because executing |
I concur that moving |
Would moving back to the GPU thread allow an easier fix to: #40990? This would just involve adding another endpoint to return the Picture on the GPU thread without proceeding to rasterization. |
@gaaclarke was this meant to be closed? |
@gaaclarke really appreciate for diving deep into this problem with Skia guys. I learnt quite a bit from your comments. It's shame that your performance fix is reverted but it will unblock many other devs and apps so I am very happy. :) I now clearly understand the root cause and think that reverting the code was the best move not just for unblocking the users but also to avoid misleading the fellow devs who will work on the similar area. Even I also got some sort of impression from your code that using the IO thread for calling a graphic rendering API of Skia is probably safeguarded by Skia's advanced algorithm or somehow else. @tvolkert I think this issue should remain opened until another stable release with this patch is available to the public, that would help to prevent duplicated bug reports. Do you plan to release another stable release of 1.9.1 with this patch, or would it be part of the next version? @liyuqian happy to hear that my repro code was useful :) thx! |
@tvolkert Yep, the merged PR fixes it. |
@ianpark Croeso. We close out issues once they are fixed on master. |
@tvolkert could you briefly explain the release plan of the fix for this issue? |
@tvolkert @gaaclarke any update on releasing the fix for this issue? I am happy to patch my local Flutter but not sure how to build the stable Flutter 1.9.1 with the patched engine that is 100% compatible with Flutter 1.9.1. Is there a good guideline for doing that? |
@ianpark the fix is on versions |
@tvolkert thanks for the information. I was trying to find a way to patch the fix on top of the latest stable release. Probably I should just wait for now. |
Can I get an image by this fix without doing any workarounds anymore? They are using debugNeedsPaint to check if the image is safe to obtain, or just add a delay then to get the image. |
May I ask if someone with current stable v1.12.13+hotfix.5 could verify that this issue was resolved? |
It appears to be working in stable v1.12.13+hotfix.6. but this same issue still persists with google maps |
I need to screenshot my GoogleMaps widget, but I get oly a black/blank image via Image.memory, any help in this? |
the only workaround that I am aware of at this time is to create your own "screenshot" plugin that captures the flutterView. of course, it requires that you manually crop the result to get only the area of the google map widget in your final image. It is far from ideal, but to my knowledge the only option if you want to use google maps. you can use flutter maps and your standard RepaintBoundary will work fine. |
@benneca How? Please share the source code. I tried this: https://gist.github.com/slightfoot/8eeadd8028c373df87f3a47bd4a35e36 not worked |
@klaszlo8207 take a look at this plugin, this is very similar to the approach I took |
try to call RenderRepaintBoundary.toImage() twice |
This thread has been automatically locked since there has not been any recent activity after it was closed. If you are still experiencing a similar issue, please open a new bug, including the output of |
Problem
RenderRepaintBoundary.toImage() occasionally returns blank image. It happens very often in my app when app is busy, and many other widgets are rendered together. In the demo app for reproducing the problem, it's much rare but still reproducible by repeating the test. I've created a package.
https://github.com/ianpark/flutter_capture_bug_demo
Previously closed issue that seems to be same problem:
#17687
Google team, please raise the priority for this issue:
This issue is a launch blocker of many other Flutter projects including mine, and I am still using
v1.7.8+hotfix.4
which is the last stable build where this bug does not exist.Any app depending on this feature must not upgrade to the next stable build and if you upgrade your Mac to Catalina, you won't be able to run prod build with
v1.7.8+hotfix.4
. However you cannot upgrade tov1.9.1x
due to this issue. You also cannot downgrade to Mojave. So your will get stuck.Really hope Google team take this issue seriously and find a solution.
Steps to Reproduce
As this is a race condition problem, it does not reproduce in the demo app as much as it does in the real apps. However my demo app still can reproduce the problem and around 10% fail on a real device.
When
toImage()
fails, the length of the returned bytes is exceptionally small. Usually several thousands KB. When it captures partially it could be bigger.In the demo, there is a console that you can easily tell when the failure happens as
[nth-try byte-size]
will be appended to the yellow console. And also there isfailure %
which will tell you the test result is actually worse on more constrained devices. Never seen any failure with the prod build of this demo app, but seen the problem on my prod app.Please follow this step to reproduce the problem.
1.9.1+hotfix.2
or other higher versions.Load
button and select a large image. 3MB should be probably enough but it all depends on the device performance / memory status. I even can reproduce it with much smaller images.I intentionally didn't add displaying the result image in the app as it may blur the real problem. Checking the byte length is clearly enough here.
Note that this problem also can be reproduced by
Loop
/Stop
buttons which is calling the function in a loop. However the chance is very low so it happens once in thousand time in my testing. So please use your finger :) And this smells like the race condition is triggered by user interaction handling or animations.Target Platform:
Android, iOS
Target OS version/browser:
MacOS (reproduced on Sierra, Mohave, Catrina)
Devices:
Emulator: Google Pixel 3, Nexus5
Devices: Samsung Galaxy Note, iPhone7
Logs
Adding the previous audiences:
@tvolkert @aliyigitbireroglu @andreidiaconu @gisinator @benneca @hariprasadiit
The text was updated successfully, but these errors were encountered: