Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RenderRepaintBoundary.toImage() occasionally returns a blank image #43085

Closed
ianpark opened this issue Oct 20, 2019 · 46 comments
Closed

RenderRepaintBoundary.toImage() occasionally returns a blank image #43085

ianpark opened this issue Oct 20, 2019 · 46 comments
Labels
a: images Loading, displaying, rendering images c: regression It was better in the past than it is now customer: crowd Affects or could affect many people, though not necessarily a specific customer. dependency: skia Skia team may need to help us engine flutter/engine repository. See also e: labels.

Comments

@ianpark
Copy link

ianpark commented Oct 20, 2019

Problem

RenderRepaintBoundary.toImage() occasionally returns blank image. It happens very often in my app when app is busy, and many other widgets are rendered together. In the demo app for reproducing the problem, it's much rare but still reproducible by repeating the test. I've created a package.

https://github.com/ianpark/flutter_capture_bug_demo

Previously closed issue that seems to be same problem:
#17687

Google team, please raise the priority for this issue:

This issue is a launch blocker of many other Flutter projects including mine, and I am still using v1.7.8+hotfix.4 which is the last stable build where this bug does not exist.

Any app depending on this feature must not upgrade to the next stable build and if you upgrade your Mac to Catalina, you won't be able to run prod build with v1.7.8+hotfix.4. However you cannot upgrade to v1.9.1x due to this issue. You also cannot downgrade to Mojave. So your will get stuck.

Really hope Google team take this issue seriously and find a solution.

Steps to Reproduce

As this is a race condition problem, it does not reproduce in the demo app as much as it does in the real apps. However my demo app still can reproduce the problem and around 10% fail on a real device.

When toImage() fails, the length of the returned bytes is exceptionally small. Usually several thousands KB. When it captures partially it could be bigger.

In the demo, there is a console that you can easily tell when the failure happens as [nth-try byte-size] will be appended to the yellow console. And also there is failure % which will tell you the test result is actually worse on more constrained devices. Never seen any failure with the prod build of this demo app, but seen the problem on my prod app.

demo_app_snap

Please follow this step to reproduce the problem.

  1. Clone https://github.com/ianpark/flutter_capture_bug_demo
  2. make sure you are on the stable channel and 1.9.1+hotfix.2 or other higher versions.
  3. Launch the app on an emulator or a real device in debugging mode (e.g. flutter run)
  4. Press Load button and select a large image. 3MB should be probably enough but it all depends on the device performance / memory status. I even can reproduce it with much smaller images.
  5. Press the blue save button multiple time until you see the fail% increase. Pressing the button too fast won't help because it's queued up and handled one by one. Just moderate speed is fine.

I intentionally didn't add displaying the result image in the app as it may blur the real problem. Checking the byte length is clearly enough here.

Note that this problem also can be reproduced by Loop/Stop buttons which is calling the function in a loop. However the chance is very low so it happens once in thousand time in my testing. So please use your finger :) And this smells like the race condition is triggered by user interaction handling or animations.

Target Platform:
Android, iOS

Target OS version/browser:
MacOS (reproduced on Sierra, Mohave, Catrina)

Devices:
Emulator: Google Pixel 3, Nexus5
Devices: Samsung Galaxy Note, iPhone7

Logs

While reproducing the problem, `flutter run --verbose` does not leave any extra log at all.
$ flutter analyze
Analyzing capture_error_demo...
No issues found! (ran in 1.5s)
$ flutter doctor -v
[✓] Flutter (Channel stable, v1.9.1+hotfix.2, on Mac OS X 10.15 19A602, locale ko-KR)
    • Flutter version 1.9.1+hotfix.2 at /Users/ian/Dev/flutter
    • Framework revision 2d2a1ffec9 (6 weeks ago), 2019-09-06 18:39:49 -0700
    • Engine revision b863200c37
    • Dart version 2.5.0

[✓] Android toolchain - develop for Android devices (Android SDK version 29.0.2)
    • Android SDK at /Users/ian/Library/Android/sdk
    • Android NDK location not configured (optional; useful for native profiling support)
    • Platform android-29, build-tools 29.0.2
    • Java binary at: /Applications/Android Studio.app/Contents/jre/jdk/Contents/Home/bin/java
    • Java version OpenJDK Runtime Environment (build 1.8.0_202-release-1483-b49-5587405)
    • All Android licenses accepted.

[!] Xcode - develop for iOS and macOS (Xcode 11.0)
    • Xcode at /Applications/Xcode.app/Contents/Developer
    • Xcode 11.0, Build version 11A420a
    ✗ CocoaPods not installed.
        CocoaPods is used to retrieve the iOS and macOS platform side's plugin code that responds to your plugin usage on the Dart side.
        Without CocoaPods, plugins will not work on iOS or macOS.
        For more info, see https://flutter.dev/platform-plugins
      To install:
        sudo gem install cocoapods
        pod setup

[!] Android Studio (version 3.5)
    • Android Studio at /Applications/Android Studio.app/Contents
    ✗ Flutter plugin not installed; this adds Flutter specific functionality.
    ✗ Dart plugin not installed; this adds Dart specific functionality.
    • Java version OpenJDK Runtime Environment (build 1.8.0_202-release-1483-b49-5587405)

Adding the previous audiences:
@tvolkert @aliyigitbireroglu @andreidiaconu @gisinator @benneca @hariprasadiit

@gisinator
Copy link

Ian, thank you very much for opening this new thread and taking the time to create this demo project. As you say, this is a critical issue that prevents you, me and as it seems many more people from upgrading to current stable version. Hopefully google team can take care of this soon, thank you guys in advance for looking into this.

@ianpark ianpark changed the title RenderRepaintBoundary.toImage() occasionally returns blank image RenderRepaintBoundary.toImage() occasionally returns a blank image Oct 20, 2019
@janmoppel janmoppel added a: images Loading, displaying, rendering images customer: crowd Affects or could affect many people, though not necessarily a specific customer. framework flutter/packages/flutter repository. See also f: labels. labels Oct 21, 2019
@ianpark
Copy link
Author

ianpark commented Oct 21, 2019

Please ignore the flutter doctor result regaring cocoapad. It's wiped out after Catalina upgrade but got it fixed.

@ianpark
Copy link
Author

ianpark commented Oct 21, 2019

@gisinator really hope this issue is addressed asap. My app is already in prod and unabled to update the outstanding issues.

@tvolkert
Copy link
Contributor

@cbracken this is a pretty bad regression.

@tvolkert tvolkert added engine flutter/engine repository. See also e: labels. c: regression It was better in the past than it is now and removed framework flutter/packages/flutter repository. See also f: labels. labels Oct 21, 2019
@SpiciedCrab
Copy link

This also happened to me, so is there any solution about it?

@tvolkert tvolkert added this to the Goals milestone Oct 23, 2019
@cbracken
Copy link
Member

cbracken commented Oct 23, 2019

Thanks for reporting this. This is not likely to get worked on by the team in the near future.

We work through issues in priority order starting with TODAY, then customer: blocker then customer: critical. Once those have been burnt through we'll be chasing issues further down the list.

Aside from the above labels, we use thumbs-up reactions as a means of measuring priority so if you're affected by this, adding thumbs-up reactions will help ensure this moves up our list.

@cbracken
Copy link
Member

FWIW -- if this is a regression, then bisecting where it was introduced would be massively helpful in helping the team produce a fix (or just revert the offending commit).

@xssxxssx
Copy link

This also happened to me, so is there any solution about it?

@jason-simmons
Copy link
Member

Bisected this to flutter/engine@78a8ca0

@gaaclarke

@gaaclarke
Copy link
Member

  • I tried to reproduce this on iPhone SE (thanks for the repro code). What I found was more like an 80% failure rate.
  • I put a breakpoint on OpenGL ES errors and there isn't one getting raised.
  • Changing the pixelRatio to 1.0 didn't seem to have an affect on the failure rate. If anything it made things worse.
  • Seen on Engine version 0d43469.
  • I tried to capture our render frame, that got broke at some point since since xcode waits forever for a "No capture boundary detected..." If we fix that it should be apparent what is happening.

@gaaclarke
Copy link
Member

Screen Shot 2019-10-25 at 2 53 22 PM

@jason-simmons and I were able to track down the issue using gapid.

If you look at line 22640 you can see we are doing a glClear before we render to our frame buffer, then in 22649 we actually do the draw call. This is the case where everything worked and the glReadPixels on line 22652 gets correct output.

If you look at line 19931 we are performing a glClear, again to prepare for rendering to our frame buffer. Then if you look at line 20012 you'll notice there is another glReadPixels, but between the glClear and the glReadPixels, there is no actual draw call. That is why when we do the glReadPixels, it returns an all black image.

My diagnosis is that for some reason Skia is not performing the glDrawRangeElements call. Perhaps it is detecting some internal error state and is aborting the call? It's not clear from the OpenGL calls why it wouldn't perform the draw.

@gaaclarke gaaclarke added the dependency: skia Skia team may need to help us label Oct 25, 2019
@jason-simmons
Copy link
Member

When drawing the image, Skia's draw_texture_producer calls GrTextureProducer::refTextureProxyForParams, which invokes SkImage_Lazy::lockTextureProxy.

The image draw succeeds when SkImage_Lazy can obtain a GrTextureProxy from GrProxyProvider::findOrCreateProxyByUniqueKey. If findOrCreateProxyByUniqueKey returns null, then the image draw fails silently, and the bitmap rendered in Flutter's MakeRasterSnapshot will be empty.

SkImage_Lazy::lockTextureProxy has several code paths for obtaining a texture if the GrProxyProvider cache lookup fails. One of these paths calls GrBackendTextureImageGenerator::onGenerateTexture, but that fails because of the fRefHelper->fBorrowingContextID != context->priv().contextID() check.

Possibly this is a sign that the engine is doing something wrong related to how we share image textures between the IO thread's GrContext and the GPU thread's context?

@ianpark
Copy link
Author

ianpark commented Oct 26, 2019

@jason-simmons @gaaclarke thanks for looking into this issue.

@ianpark
Copy link
Author

ianpark commented Oct 29, 2019

this issue needs more love. @gaaclarke do you think using v1.9.1+hotfix.6 with your commit(flutter/engine@78a8ca0) reverted would be too risky?

@gaaclarke
Copy link
Member

@ianpark Things are slowed down a bit since we've tracked down the issue for Skia. We are working with them to get it triaged and addressed.

Assuming the patch reverts cleanly it should be safe. It is a performance PR so hopefully the only downside would be performance. However, since it changes threading models there is a risk that the code has shifted its assumptions enough already to need to be run with the new threading model.

@gaaclarke
Copy link
Member

Linked Skia Bug: https://bugs.chromium.org/p/skia/issues/detail?id=9581

@ianpark
Copy link
Author

ianpark commented Oct 30, 2019

@gaaclarke thanks for the update. Sorry I am getting desperate as I found the news versions of some other dependencies fail to build with 1.7.8 hence the isolation level of my app gets increased time by time. Hope Skia team would turn around quickly.

@brianosman
Copy link

Okay, I think the fix may need to happen in Flutter. What's happening:

Cross-context images are designed so that they can only be used by one GrContext at a time. Internally, the image has the actual GL texture. When the image is referenced by a draw with a particular GrContext, we wrap the GL texture in a temporary Skia texture object, and remember the ID of the GrContext that's using it. Once that draw has finished and the Skia texture object is disposed of, we reset the ID on the cross-context image so that a different GrContext can start using it (or the same one, again). This is necessary, because some OpenGL texture state is tied to the texture object itself, so there's no safe way to be using the same texture on two threads at the same time.

The error that's happening suggests that the image is still in use by another GrContext when the raster snapshot is being made. The regressing change moved the raster snapshot logic to the IO thread, so we can assume that the GPU thread has the image in use. The simplest safe fix is to synchronize with the GPU thread when doing a raster snapshot, waiting for it to flush the current frame before trying to do anything.

@gaaclarke
Copy link
Member

Thanks @brianosman for looking into it, that makes sense. We can look into that and it shouldn't be too difficult.

Are you sure this isn't something we'd want at the Skia level? Basically a lock around glReadPixels for reading and glBindFrameBuffer for writing when it comes to cross context images.

@gaaclarke
Copy link
Member

@brianosman At the very least we should make that noop have a warning message, instead of just drawing nothing silently.

@brianosman
Copy link

@gaaclarke Skia CL to warn in this situation has landed: https://skia.googlesource.com/skia/+/6e1d51a2c74ce26c499cb141920007d3dda11435

@jason-simmons
Copy link
Member

I think the same SkImage instance is being referred to by the SkPicture passed to Picture.toImage() and by an SkPicture in the layer tree that is rendered onscreen.

If the first SkPicture is rasterized on the IO thread and the second SkPicture is rasterized on the GPU thread, then it's possible for that SkImage to be consumed by both threads concurrently. I don't think we can prevent that without moving Picture.toImage() rendering back to the GPU thread.

@brianosman
Copy link

As far as I know, the only cross-context images are those explicitly created on the IO thread (for decoding large images that are assets in the application). It would have to be something that went through here: https://github.com/flutter/engine/blob/646b594d5e03e1873bbb021d0f4b2994777808de/lib/ui/painting/image_decoder.cc#L177

@gaaclarke
Copy link
Member

toImage Race Condition

Here is my theory of what is happening and the bug. If "render to fbo" and "access for drawing" happen at the same time we get the collision we were experiencing. My theory is that the SkPicture and the Texture are generated just for toImage. If we wait for the GPU to finish rendering to that SkPicture we can safely read from it. Let me know if I'm missing something.

@gaaclarke
Copy link
Member

Okay, I talked with @jason-simmons offline. My whole assumption is wrong. The image in contention is the actual decoded image created by the IO thread, not one that was created as a result of the toImage logic.

This problem is because 2 different threads are trying to read from the same texture at the same time, not that one is trying to write to it while another is reading. Here is a stackoverflow question asking about reading from shared objects and getting a crash. So, it is definitely a thing that could be a problem depending on the opengl driver.

Since Skia gives us no visibility to the usage of the texture, it is hidden down in SkPicture internals, there is no meaningful way we can protect usage of the texture. I believe that Skia should implement the mutual exclusion but it sounds like there is an objection to that which I don't understand yet. I'll follow up with Skia, @brianosman?.

In the meantime the only safe thing Flutter can do is to move toImage back onto the GPU thread.

That is a shame because executing toImage on the GPU thread interferes with latency of the first frame because ShaderWarmUp uses toImage before we render the first frame. There are some benefits of performing it on the GPU thread though like more efficient shader caching (how good are we are predicting shaders as part of ShaderWarmUp is another question).

@liyuqian
Copy link
Contributor

I concur that moving toImage back to the GPU thread seems to be the best solution right now. Sorry for not catching this issue while we were reviewing flutter/engine#9813. For that, I really appreciate your reproduction app, @ianpark ! Let's put that into our golden test to safeguard similar issues in the future, @gaaclarke .

@cmkweber
Copy link

Would moving back to the GPU thread allow an easier fix to: #40990?

This would just involve adding another endpoint to return the Picture on the GPU thread without proceeding to rasterization.

@tvolkert
Copy link
Contributor

tvolkert commented Nov 1, 2019

@gaaclarke was this meant to be closed?

@ianpark
Copy link
Author

ianpark commented Nov 1, 2019

@gaaclarke really appreciate for diving deep into this problem with Skia guys. I learnt quite a bit from your comments. It's shame that your performance fix is reverted but it will unblock many other devs and apps so I am very happy. :)

I now clearly understand the root cause and think that reverting the code was the best move not just for unblocking the users but also to avoid misleading the fellow devs who will work on the similar area. Even I also got some sort of impression from your code that using the IO thread for calling a graphic rendering API of Skia is probably safeguarded by Skia's advanced algorithm or somehow else.

@tvolkert I think this issue should remain opened until another stable release with this patch is available to the public, that would help to prevent duplicated bug reports. Do you plan to release another stable release of 1.9.1 with this patch, or would it be part of the next version?

@liyuqian happy to hear that my repro code was useful :) thx!

@gaaclarke
Copy link
Member

@tvolkert Yep, the merged PR fixes it.

@gaaclarke
Copy link
Member

@ianpark Croeso. We close out issues once they are fixed on master.

@ianpark
Copy link
Author

ianpark commented Nov 6, 2019

@tvolkert could you briefly explain the release plan of the fix for this issue?

@ianpark
Copy link
Author

ianpark commented Nov 14, 2019

@tvolkert @gaaclarke any update on releasing the fix for this issue? I am happy to patch my local Flutter but not sure how to build the stable Flutter 1.9.1 with the patched engine that is 100% compatible with Flutter 1.9.1. Is there a good guideline for doing that?

@tvolkert
Copy link
Contributor

@ianpark the fix is on versions v1.10.16 and higher, which means it's currently on the dev channel. It should land on the beta channel in a few weeks and the stable channel a few weeks after that.

@ianpark
Copy link
Author

ianpark commented Nov 17, 2019

@tvolkert thanks for the information. I was trying to find a way to patch the fix on top of the latest stable release. Probably I should just wait for now.

@rockingdice
Copy link

Can I get an image by this fix without doing any workarounds anymore?
Like described in the SO question: https://stackoverflow.com/questions/57645037/unable-to-take-screenshot-in-flutter

They are using debugNeedsPaint to check if the image is safe to obtain, or just add a delay then to get the image.

@gisinator
Copy link

@ianpark the fix is on versions v1.10.16 and higher, which means it's currently on the dev channel. It should land on the beta channel in a few weeks and the stable channel a few weeks after that.

May I ask if someone with current stable v1.12.13+hotfix.5 could verify that this issue was resolved?

@benneca
Copy link

benneca commented Dec 16, 2019

@ianpark the fix is on versions v1.10.16 and higher, which means it's currently on the dev channel. It should land on the beta channel in a few weeks and the stable channel a few weeks after that.

May I ask if someone with current stable v1.12.13+hotfix.5 could verify that this issue was resolved?

It appears to be working in stable v1.12.13+hotfix.6. but this same issue still persists with google maps

@klaszlo8207
Copy link

I need to screenshot my GoogleMaps widget, but I get oly a black/blank image via Image.memory, any help in this?

@benneca
Copy link

benneca commented Jan 15, 2020

I need to screenshot my GoogleMaps widget, but I get oly a black/blank image via Image.memory, any help in this?

the only workaround that I am aware of at this time is to create your own "screenshot" plugin that captures the flutterView. of course, it requires that you manually crop the result to get only the area of the google map widget in your final image. It is far from ideal, but to my knowledge the only option if you want to use google maps. you can use flutter maps and your standard RepaintBoundary will work fine.

@klaszlo8207
Copy link

@benneca How? Please share the source code.

I tried this:

https://gist.github.com/slightfoot/8eeadd8028c373df87f3a47bd4a35e36

not worked

@benneca
Copy link

benneca commented Jan 16, 2020

@klaszlo8207 take a look at this plugin, this is very similar to the approach I took
https://pub.dev/packages/screenshot_and_share

@haifang12
Copy link

try to call RenderRepaintBoundary.toImage() twice

@github-actions
Copy link

github-actions bot commented Aug 8, 2021

This thread has been automatically locked since there has not been any recent activity after it was closed. If you are still experiencing a similar issue, please open a new bug, including the output of flutter doctor -v and a minimal reproduction of the issue.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
a: images Loading, displaying, rendering images c: regression It was better in the past than it is now customer: crowd Affects or could affect many people, though not necessarily a specific customer. dependency: skia Skia team may need to help us engine flutter/engine repository. See also e: labels.
Projects
None yet
Development

No branches or pull requests