Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

☂️ Proposal: Make on-device testing awesome 💫 #148028

Open
matanlurey opened this issue May 9, 2024 · 14 comments
Open

☂️ Proposal: Make on-device testing awesome 💫 #148028

matanlurey opened this issue May 9, 2024 · 14 comments
Assignees
Labels
P1 High-priority issues at the top of the work list t: flutter driver "flutter driver", flutter_drive, or a driver test

Comments

@matanlurey
Copy link
Contributor

matanlurey commented May 9, 2024

Some examples of things @johnmccutchan and I would like to do, but can't today:

  • Background a Flutter app with platform views, force trimmed memory, and then take a screenshot of the result
  • Find and interact with native widgets beyond screenshots (tap, read semantics, etc)
  • Encourage our team to write and maintain more integration tests, because they are easy to write and fast to iterate on

We should do something about it. A couple of options on the table include:

  • Invest more in flutter_driver, keeping integration_test more or less where it is today
  • Invest more in integration_test, keeping flutter_driver more or less where it is today
  • Either of those, but actually deprecate and work towards removing the other framework
  • Something else entirely (maybe some community solution is best and we should invest more in that?)

Outside of Flutter, here are some popular integration test solutions for similar problem spaces:

Read more about the background of flutter_driver v integration_test

Background

Flutter ships with package:flutter_test, and the accompanying command, flutter test, which runs a headless version of Flutter (called flutter_tester) and runs Dart-based unit/functional tests, called widget tests, in a fake environment where the passage of time is controlled by the tester, with many extension points are stubbed out (like platform channels), and a software-based renderer that is ~mostly platform agnostic (and does not require a GPU, for example).

This workflow provides super fast lightweight tests that are suitable for testing widgets and compositions of widgets. It's possible to interact with the widget(s) under test, observe changes as a result, and even take screenshots and compare them for golden-file testing.

Notably, this fake environment has the following limitations:

  • The test runs on a fake device1, and cannot interact with plugins
  • The passage of time is tightly controlled by the developer, and doesn't always reflect the real interactions in production
  • Platform views do not show up, and cannot be interacted with (as there is no platform) and are missing in screenshots

Flutter also ships with two "integration test" packages, flutter_driver and integration_test, which unfortunately are in a state2 of neither being completed nor deprecated. It would take a lot of words to describe the current state, so instead focusing on some key points:

Flutter Driver

Runs the test script on the host, using a different API (similar to ChromeDriver) than tests authored with flutter_test.

PROs:

  • Conceptually simple; a small limited RPC-like API "talks" to a Flutter app running on a device
  • Capable of interactions that require a host, i.e. with adb or forcing a Dart VM GC
  • Already supports functionality such as screenshots

CONs:

  • All interactions must be serializable, meaning Finders cannot be re-used across flutter_test and flutter_driver
  • All interactions and assertions happen over RPC, leading to additional latency and in some cases, flakiness/synchronization
  • Can't (at least today) run on Firebase Testlab or systems that require a single bundled APK (or similar)

Integration Test

PROs:

  • Uses largely the same API as flutter_test
  • Runs entirely in the same process as the Flutter application/under-test, without RPC or serialization
  • Can more easily use a combination of platform channels or FFI to easily "talk" to the native platform
  • Is supported by Firebase Testlab and similar systems that require a single bundled APK (or similar), and no driver script

CONs:

  • Is, from what I can tell, incomplete (it's not clear we haven't finished migrating to it for a specific reason or not)
  • Structurally more difficult to interact with host-side tooling (i.e. adb, Dart VM, etc)

/cc @goderbauer @tugorez @jonahwilliams

Footnotes

  1. It is technically possible to flutter run a flutter_test and have it run on a real device; however many of the limitations remain.

  2. Google employees can also read the internal-only go/flutter-integration-testing.

@matanlurey matanlurey added t: flutter driver "flutter driver", flutter_drive, or a driver test P1 High-priority issues at the top of the work list labels May 9, 2024
@matanlurey
Copy link
Contributor Author

matanlurey commented May 9, 2024

I did a quick search in org:flutter:

Interestingly, we use integration_test 40 times in flutter/flutter compared to 67 times of flutter_driver.

@jonahwilliams
Copy link
Member

integration_test uses flutter_driver though, so there isn't really an A or B. Some of the applications written using integration_test will still be doing a flutter_driver style test. There was also a "migration" attempt that updated a bunch of tests to use integration_test and in the process kneecapped the benchmark results.

@matanlurey
Copy link
Contributor Author

matanlurey commented May 9, 2024

The question at hand is, where do we invest our time, and what do we tell our teams to use?

In other words, say we want to add support to talk to the native platform. Do we add that exclusively in flutter_driver? Do we add it exclusively to integration_test? Do we add it in a way where both can use the functionality? Assuming we have limited time/resources, what has the best chance of making our testing story better?

@jonahwilliams
Copy link
Member

integration_test is just re-exporting parts of flutter_driver and flutter_test though

@johnmccutchan
Copy link
Contributor

@jonahwilliams thanks for all this context. I was walking my dog tonight and I thought "I bet integration_test uses driver under the hood" glad to have it confirmed without me having to ask :)

Let me try to reframe Matan's point (and correct me if I'm wrong Matan):

We need to decide what our public on-device testing API is. This may(will) be based on integration_test, flutter_driver, flutter_test, or some combination of them. We probably won't deviate much from the norm here when it comes to how widgets are tested, etc. I have a preference for easy porting of tests between unit(host) and integration(target) harnesses but it's just a preference. I expect we will have to extend the existing APIs to allow for controlling of the device (background, trim memory, enable wifi, yada yada), maybe we should introduce the concept of a 'Device' to the test harness and we can hang the platform-specific functionality off of platform-specific Device sub-classes. I dunno, just thinking out loud at this point.

Additionally I think we should prioritize:

  • Ensure that we can write all of the example tests Matan talks about in the top comment. I don't want us to design everything ahead of time but design it as we actually write these kinds of tests.
  • On-device iteration time. Matan and I were talking about leaning heavily into 'hot restart' ('reload' would leak state and not re-run tests), and unless I change the native code of my app, there is no need to rebuild apk/whatever. I'm imagining that on-device in less than one second my app is restarted and tests are run again, that makes me excited to write tests.
  • User-friendly screenshot comparison flow that allows for implementations that integrate with skia gold, scuba, and just a regular old image differ.
  • Making sure that we, the Flutter team, dog food this API internally, writing a bunch of real-bug engine bug regression tests.

@jonahwilliams
Copy link
Member

I think all of these are possible. Some of them even work today. The problem is they are not coherently presented, nor uniformly available.

For example, hot restarting flutter unit tests works! but you have to flutter run the test (which is at best .. lightly documented), and the result reporting doesn't work.

@johnmccutchan
Copy link
Contributor

I'm glad to hear that a lot of the implementation bits are already available. And, yeah, I think this is mostly about uniform implementation and coherent APIs

@jmagman
Copy link
Member

jmagman commented May 9, 2024

To link related work, @bkonyi wrote up Non-Dart Developer Tooling Capabilities Exploration, relevant sections:

@matanlurey
Copy link
Contributor Author

Update: @goderbauer @jonahwilliams @matanlurey @johnmccutchan met today and chatted about this.

We talked about either/or enhancing Flutter Driver and integration_test for on-device testing, focusing on use cases beneficial to the engine team, mostly about the capabilities and limitations of both tools, considering factors like interacting with widgets, reusing code, and synchronization.

The conclusion was to enhance flutter_driver (without breaking existing functionality) to support use cases like backgrounding a Flutter app, forcing trimmed memory, and taking screenshots, and we'd avoid any changes to integration_test at this time (though we had some cool ideas).

Next steps involve starting with scenario_app-style tests and reproducing real engine tests and regressions.

Concretely, I'll start working on a test suite in dev/integration_tests, and as-needed, add new functionality to flutter_driver (https://github.com/flutter/flutter/tree/master/packages/flutter_driver), likely in a non-public API (src/experimental or similar) as we plan to iterate on real-world scenarios.

@mateuszwojtczak
Copy link

Hi @matanlurey! That’s an excellent summary of the current solutions existing in the SDK.

Diclaimer: I’m one of the authors of Patrol - a framework made exactly to solve some of the issues you described.

I’d like to shed some light on what choices we had to make and how community feedback has driven us there.

  1. You can’t interact with native elements using integration_test

That was a main blocker for us, since almost every mobile app has some kind of a permission dialog or 3rd party SDK integration that is on the business critical path and you can’t just skip it. Also, to have real end-to-end device tests, we have to be able to interact with everything like a real user would.

Patrol introduces both convenience methods like enableWifi or pressHome, but also tap that simply enables you to interact with any native element on the screen like you would with 1st-party native tests but writing Dart code. (more examples here)

  1. Most device farms don’t support Flutter explicitly

This is related to how integration_test works. Most device farms can work with running UIAutomator tests on Android and XCUITest tests on iOS. If we can be able to run as such and inside use Flutter testing, then Flutter is just an implementation detail for those farms. There’s a lot of tooling for UI tests in the native world and we as Flutter community would like to be able to use that.

That’s what we do in Patrol - when you run patrol test we run native test execution (e.g. ./gradlew connectedAndroidTest) and from that side we run Flutter integration tests.

  1. Integration_test sends all the test results at the end which is not aligned with test execution model on the native side

Because of that, tests are not run in isolation (which can increase flakiness) and what is more important - if 99th test crashes, you lose all the results which can be very harsh (device farms are expensive). Also, every test is having “<1s” test duration, because from a native perspective the tests return immediately.

That’s what we also kinda solved in Patrol, because of running the tests “from the native side”.

  1. flutter_test API is pretty low-level
    This one is not so much relevant, but I’d like to also cover it since this is such a broad discussion. flutter_test is a great low-level API for interacting with the widget structure, but the community needs something high-level to avoid boilerplate and doing the same helper methods in every project.

This includes things like waiting for something to be interactive, finding first by default, simpler ancestor/descendant methods, etc.

We created patrol_finders package for that just adds an opinionated layer for that - and I believe this is a perfect balance that low-level flutter_test is maintained by the Flutter team and higher-level stuff is community-driven.

  1. Hot restart in tests

@johnmccutchan mentioned being able to iterate quickly with hot restart while writing integration tests. Please take a look at patrol develop command which basically lets you do that (also with writing native interactions) - more info here

Finally, we as Patrol team are very interested in what decisions are going to be made with both flutter_driver and integration_test, because:

  1. We spent a lot of time trying to solve those issues, so it's better to share the findings
  2. The decisions might break Patrol by design and it would be great to know that some time ahead.

@MarkOSullivan94
Copy link
Contributor

MarkOSullivan94 commented May 21, 2024

I have two requests which would help make the end solution fully feature complete:

  1. Having both --reporter and --file-reporter exposed and fully functional 1

This would be huge as it would allow for JSON reports to be generated and then later parsed to create reports and GitHub comments in GitHub Action workflows which would highlighting test results

  1. Code coverage

This is available with flutter test but to the best of my knowledge not available in flutter drive

Footnotes

  1. Related issue https://github.com/flutter/flutter/issues/145499

@amrgetment
Copy link

amrgetment commented May 29, 2024

I want to generate my testing report, so screenshot performance is important to me because I use it for my golden tests
In my Custom Report, I have a Golden Screen, an Original Screen, and the diff image between them with the result 0.00
I think that Flutter integration should be outside the Flutter repo so you can update frequently with no need to wait 3 months for a new Flutter release
image

@MarkOSullivan94
Copy link
Contributor

MarkOSullivan94 commented Jun 3, 2024

Requesting this to be considered as well to help avoid flakey test results which might be caused by slower than usual networking request results

Add retry to testWidgets

@matanlurey matanlurey removed the blocked Issue is blocked by another issue label Jun 4, 2024
@jiahaog
Copy link
Member

jiahaog commented Jun 5, 2024

Adding some context here regarding package:integration_test. This is the main on-device testing framework that we are endorsing for customers in Google3, and for the past couple of years we have been directing customers away from Flutter Driver to this. Googlers can refer to go/flutter-integration-testing, which mentions that we should avoid using Flutter Driver, and use package:integration_test instead. (package:integration_test was previously named package:e2e and there may be some references to it.)

  • One of the largest advantages observed over the last couple of years in package:integration_test, compared to package:flutter_driver, is that the former is significantly less of a tech island and instead leverages the ecosystem of the host platform in a Flutter-agnostic way.
    • Internally package:integration_test is composed on top of Espresso, EarlGrey and Web build rules. We provide a small shim that translates the test results to what the host platform expects, but other than that, maintenance of this has been minimal. As the underlying platform teams add more support and features, we frequently gain them for free.
      • Outside Google3, tests can also be executed from the Android side and the iOS side, though I'm not sure if the documentation is still up to date.
    • Compare that with Flutter Driver, where we had to implement and maintain a siginificant amount of host-side orchestration logic, to start a emulator, connect to the device, and run the test, across multiple platforms.
      • When performance tests was requested by a customer in the past, someone had to implement a completely bespoke stack to support it, instead of integrating it with the existing support for other non-Flutter Android and iOS apps.
  • Integration Tests can run in release mode because it does not use the VM service. We previously worked with a customer that had users on low-end devices, and it was found that the service isolate contributes to a significant amount of overhead on those devices.
  • Also note that when running with flutter test integration_test, which should be the common flow, Flutter Driver is not used at all. package:integration_test has a direct dep on package:flutter_driver, but the code to do so is not exercised during this flow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1 High-priority issues at the top of the work list t: flutter driver "flutter driver", flutter_drive, or a driver test
Projects
None yet
Development

No branches or pull requests

8 participants