Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QGIS Quality Assurance methodology and infrastructure #180

Open
SrNetoChan opened this issue May 24, 2020 · 13 comments
Open

QGIS Quality Assurance methodology and infrastructure #180

SrNetoChan opened this issue May 24, 2020 · 13 comments
Labels
Grant-2020 QEP for 2020 Grant program Type/QA Quality assurance related

Comments

@SrNetoChan
Copy link
Member

SrNetoChan commented May 24, 2020

QGIS Enhancement: QGIS Quality Assurance methodology and infrastructure

Date 2020/05/23

Authors

Contact alexandre dot neto at cooperative dot net

maintainer @SrNetoChan

Version QGIS 3.18

Summary

This QEP aims to create the necessary infrastructure and methodology to organize and encourage systematic testing before each QGIS release:

  • Setup a Testing management system to organize test cycles, assign and track tests execution (kiwi TCMS);
  • Elaborate and document a methodology to execute testing to help testers
  • Resurrect and move the tester plugin to QGIS repositories. Publish it in the QGIS official repository
  • Create an initial set of relevant test cases
  • Organize and execute the initial test cases for the next releases (3.18, 3.20, 3.22)

In the last years, the QGIS project has given important steps towards improving QGIS stability. This includes: having regular, long term, and point releases; one-month feature freeze periods with funded bug-squashing; larger unit test coverage, and continuous integration.

Unfortunately, one of the weak points has been the lack of enough testing during the feature freeze period, which may lead to releases with too many unknown bugs. These bugs are only found when general users start to use the new stable version.

This proposal aims to create the necessary infrastructure and methodology to organize and encourage systematic testing before each release:

  • Setup a Testing management system to organize test cycles, assign and track tests execution (kiwi TCMS);
  • Elaborate and document a methodology to execute testing to help testers
  • Resurrect and move the tester plugin to QGIS repositories. Publish it in the QGIS official repository
  • Create an initial set of relevant test cases
  • Organize and execute the initial test cases for the next releases (3.18, 3.20, 3.22)

Not covered by this grant funding, we plan also to:

  • Help onboarding new testers from the community
  • Create more test cases
  • Implement test cases as automated or semi-automated tests for the tester plugin

Introduction

In the last years, the QGIS project has given important steps towards improving QGIS stability. This includes: having regular, long term, and point releases; one-month feature freeze periods with funded bug-squashing; larger unit test coverage, and continuous integration.

Unfortunately, one of the weak points has been the lack of enough user testing during the feature freeze period, which may lead to releases with too many unknown bugs. These bugs are only found when general users start to use the new stable version.

Without an organizing effort, it's hard to predict how many users will test each release candidate and what features will they test.

Also, in the current situation, It's possible that QGIS support services providers are already doing some internal QA. But without communication between them, it's hard to avoid duplication of work

Proposed Solution

This proposal aims to create the necessary infrastructure and methodology to organize and encourage systematic testing before each release.
With this work, we hope to set the foundations for having a shared testing effort.

1. Setup a Testing management system to organize test cycles, assign and track test execution.

To encourage systematic testing and track its progress, we need a place to manage the test cases, describe their steps, assign the tests as tasks for testers, and log the results. We are thinking about using Kiwi TCMS

  • It is open-source
  • It's a dedicated tool, rich in features allowing to have test plans, test case, test runs, manage testers
  • Has an integration with GitHub, to for example file bug tickets
  • Can be self-hosted using docker containers
  • Or we can use their free hosted service for open source projects

2. Elaborate and document a methodology to execute testing to help testers

We need to have a document that explains each step of the testing process, from creating new test cases to set up a clean testing environment.

  • Instructions on how to prepare a clean testing environment using Virtual Machines and snapshots
  • Instructions on how to execute test cases and report issues
  • Instructions on how to create new test cases

3. Resurrect and move the qgis tester plugin to QGIS repositories. Publish it in the QGIS official repository

The QGIS tester plugin, which was created originally by the QGIS team at Boundless Spatial, allowed to have automated and semi-automated tests within a QGIS.

Tests are written in python and can be installed in the tester plugin as plugins or inside an existing plugin (to test the plugin functionality)

There are two types of tests:

  • Automated tests, run without the tester intervention and mimic user interaction with the QGIS interface and compare the outputs with expected results. It can run several tests at once and output a report of each test result.
  • Semi-automated test, where the user may be guided by step by step instructions to perform a task or is simply asked to confirm some visual output.

There are already a set of automated and semi-automated tests that can be used as examples:
https://github.com/qcooperative/qgis-core-tests

4. Create an initial set of relevant test cases

In software like QGIS, it's nearly impossible to test every single functionality. Therefore, we need to be prudent and choose a realistic set of test cases. Besides realistic, they need to be relevant to the overall stability of the software. We should aim for broadly used functionality with a high risk of issues or regressions (like new features or code refactor). We should focus on tests that are hard to be covered by unit-tests. For example:

  • Testing the installers
  • Integration tests with other software like QGIS Server, PostGIS, etc...
  • GUI dependent tests
  • Complex workflows.

5. Organize and execute the initial test cases for the next releases (3.18, 3.20, 3.22)

We propose to execute the tests ourselves, for a few releases, in Windows and Linux. But during that time, we will try to attract more testers interested in helping with these platforms or do run the test cycles in other platforms.

Affected Files

NA

Performance Implications

NA

Further Considerations/Improvements

(optional)

Backwards Compatibility

NA

Votes

(required)

@alexbruy alexbruy added the Grant-2020 QEP for 2020 Grant program label May 24, 2020
@andreasneumann
Copy link
Member

andreasneumann commented May 25, 2020

Great proposal.

From all the grant proposals submitted this year (while many of them are excellent), in my personal opinion, this is the most important one.

I think we should invest in the future more funds towards a better testing infrastructure and also fund the follow-up work necessary to actually do the testing.

@nyalldawson
Copy link
Contributor

I'm curious if you could link to some recent bug reports you think would have been caught by this setup. I know in the past (years ago) we've had releases with really bad showstopper bugs on initial release, but I honestly can't think of any in recent years. All the bugs we get now are really quite involved, which makes me wonder if this setup would need thousands and thousands of user tests to actually have caught anything...?

@gioman
Copy link

gioman commented May 26, 2020

I'm curious if you could link to some recent bug reports you think would have been caught by this setup. I know in the past (years ago) we've had releases with really bad showstopper bugs on initial release, but I honestly can't think of any in recent years. All the bugs we get now are really quite involved, which makes me wonder if this setup would need thousands and thousands of user tests to actually have caught anything...?

@nyalldawson yes you are completely right, thanks to the tests we already have (and that grow in number every day) we have avoided some major disasters as it happened from time to time in the past and QGIS is strong as it was never before. This also targets deep manual testing of new functionalities, but also some basic core functionality that should never fail/regress. A few recent examples (that seem that have not being caught by automated tests?):

qgis/QGIS#36689
qgis/QGIS#35671
qgis/QGIS#35927

Of course anyone will always be able to argue that some functionality "is not important", but the anyway the overall goal here to to not leave anything uncovered, one way or the other.

@nyalldawson
Copy link
Contributor

Thanks @gioman!

Just thinking aloud here, please correct me if I'm wrong anywhere or have misinterpreted the proposal:

qgis/QGIS#36689
qgis/QGIS#35671

Looking at these two, they are better candidates for unit tests as opposed to user run tests.

Specifically, qgis/QGIS#36689 relates to a crash when a certain type of raster dataset is loaded -- this should be caught by unit tests instead. It's unlikely that a user test would help here, as the bug was only found when this data type was loaded following the introduction of the new provider. For a user test to have picked this up you'd be relying on the user test suite including a sample of this data type (and as soon as you added this user test, you'd pick up the bug immediately!)

Similarly, I find it extremely unlike that qgis/QGIS#35671 would be helped by a user run test. To trigger this you'd need to use the tool on a layer without z values present, saving to a provider which is strict about the presence/absence of z values. So, in order to catch this, we'd have required:

  • a set of tests for all map digitizing tools (roughly 20 different tools)
  • testing these with all combinations of z / m value presence (4 combinations, so 20 x 4 = 80 tests)
  • testing these with all the different providers (~6 common providers with edit support, so 6 x 80 = 240 tests)

Let's be conservative and drop this to a best case scenario of 60 tests before the regression is flagged. I.e. to ensure that we'd have caught this particular bug we'd have to run at least 60 tests in different combinations for every release. That's an extreme amount of volunteer-power! Alternatively, a unit test would be more suitable to cover this particular case -- the one-time investment in writing the test means that there's no longer any chance of it slipping through and no volunteer time required.

qgis/QGIS#35927

This is indeed a good candidate for user testing (i.e. performance regressions), thanks for pointing this one out. That said, I'm still skeptical that we have the capacity to flag regressions like this via user-run tests. In order to find this one you'd have to have a test which requires the user to load a huge table, and then open the attribute table and trigger the interactions in a certain order. And then they'd have to time this and know what the expected length of time for the task to complete is (if not, they'd likely just think the slowness was expected). Off the top of my head, I'd estimate that running through a set of user tests covering the attribute table/form functionality in order to catch something like this would take at least 2 hours (please let me know if you disagree here!).

That's (at least) 2 hours for one component of QGIS for every release we do. It's not a big jump to estimate that a set of user tests giving decent coverage of the fundamentals of QGIS would require in the order of 100-200 hours work per release. I just don't see us having the volunteer power to make this feasible. As much as I love the idea in principle, I think in reality we're better off spending the effort writing regression tests which are run automatically by the CI and focusing our efforts there...

@andreasneumann
Copy link
Member

The examples provided by @gioman might not the best examples, but I can assure you @nyalldawson that there are still numerous issues that aren't covered by unit tests. I remember f.e. quite a few issues in the attribute tables and forms that can only be detected through user testing (like putting selected features on the top, relation reference issues, etc.)

@nyalldawson - if you insist I could several other issues I/we reported in the past that hadn't been covered by unit tests - and it would be hard to get them covered through tests. Mainly in the areas of editing, node editing, snapping, forms and attribute table.

But I totally agree that efforts should be made to improve unit test coverage, where possible. And QGIS did improve a lot due to the increased test coverage.

As far as I know both @gioman and @SrNetoChan had been involved with user testing at Boundless and probably have quite some experience in this area.

I agree with @nyalldawson that the user testers should have an open eye when they discover issues to assess whether the case they just discovered could be secured by a unit test.

@gioman
Copy link

gioman commented May 27, 2020

The examples provided by @gioman might not the best examples

@andreasneumann no, in fact were the first 3 that came to my mind without any search ;)

@nyalldawson
Copy link
Contributor

there are still numerous issues that aren't covered by unit tests

Oh, I totally agree with that!

I remember f.e. quite a few issues in the attribute tables and forms that can only be detected through user testing (like putting selected features on the top, relation reference issues, etc.)

Right -- but my concern is that in order for these to be tested, someone would have to first create a user run test for them and then rely on users to run this test for every release. If there's no user-run test covering the particular set of circumstances required to trigger the issue then obviously it still won't get caught (just like if there's no unit test covering it).

And I'm concerned that in order for this set of user run tests to be meaningful, they'd have to be absolute mammoth. A large number of developers + users DO run nightly releases as their main releases, so we do quickly pick up regressions in basic QGIS functionality (such as if a menu option stops doing anything). Accordingly these tests would need to cover all the uncommon user operations to be valuable -- and to cover all these uncommon operations is such a ridiculously huge task that I question whether there's going to be any real-world benefit in the end.

If, after this is done, we end up with say a set of 200 tests covering things like:

  • load a shapefile, open the attribute table, zoom to full extent, select some features
  • load a geopackage, open the attribute table, zoom to full extent, select some features
  • load a postgis table ,...
  • load a sql server table,...
  • load a raster, zoom to full extent, change symbol styling, identify a pixel, ...
  • load a postgis raster, zoom to etc...
  • load a wms, .etc....
    you'd very quickly run up 200 tests covering just these extremely basic tasks alone -- and the reality IS that users ARE daily testing nightly releases with these common tasks already. So we'd end up with no net benefit, yet a heap of extra maintainence effort for someone to formally run through the tests for each release.

That's why I estimated we'd need tests which take 100-200 hours per release in order for these extra user run tests to have any real benefit in the end. And that's a HUGE time commitment!

Don't get me wrong: I'm all for greater testing and stability. But I just don't see how this approach can be effective for a project like QGIS. Sure, if we had 200 hours worth of tests and paid staff to run them through every release, then there'd be no harm (and potentially a lot of benefit). But if we split that effort and spent a fraction of that time writing extra unit tests + documentation, we'd get a lot better value for the effort...

@elpaso
Copy link

elpaso commented May 27, 2020

Having seen both sides (as an ex Boundless team member and as a unit test writer) I see the pros and cons of both approaches. I totally share @nyalldawson concerns about the amount of work required to make and run these semi-automatic tests cycles, but there are a few things that needs to be considered:

  • unite tests will never cover 100% of use cases and code paths (this is not an excuse for not writing an int16 test for Loading PostGIS raster with QGIS 3.12.3 crashes QGIS#36689, I'm actually working on it)
  • test cycles can be almost fully automated (@SrNetoChan if you could attach a short animated gif showing the magic of the tester plugin in action I believe this would make things clearer)
  • unit test are not integration tests, test cycles tend to focus on the "bigger" picture and workflows

That said, I also have concerns about the sustainability of such a big effort in the long run, I see a risk of lack of resources to maintain the cycles and to run them.

IMO before we embark in this we should carefully asses these potential issues.

@andreasneumann
Copy link
Member

Perhaps we need to define the areas where user testing makes most sense. The whole editing section (node tool, construction tools, splitting, merging, etc.) is certainly an area where this would make sense. In our experience we also have lot of issues with forms (still) and how the forms interact with constraints and PostgreSQL transaction mode. Frustrating things where the user edits some features and at the end he is greeted with a message that the features can't be saved to the DB, because somewhere a constraint is not met. Or you copy/paste a geometry a feature from a different layer and you are immediately greeted with a message that the constraints are violated, before you have a chance to edit the attribute in the form.

If we restrict the user testing to certain areas as a start, I think we would add a lot of benefit but don't spend an awful lot of resources.

@andreasneumann
Copy link
Member

@nyalldawson I can see your concern about this being a potential "bottomless pit" (just like the bug fixing where >50% of our funds are spent currently), but I think it is worth a try. And with the former boundless employees we have people with quite some experience and background in this area. I think we should give this a try and then after some time evaluate - what went into this, and what came out of it.

@SrNetoChan
Copy link
Member Author

Hi @nyalldawson,

I fully understand your concerns. We do need to be pragmatic and assertive if we want these test cycles methodology and infrastructure to be useful, So I enjoy the discussion. If there's no buy in from the community, then we should just forget about it.

This kind of "manual" testing should never replace unit tests, they should serve as a complement in areas where it's harder or even impossible to test with unit tests. Things like human interaction with the interface, packaging, and integration tests where you need to connect to other services like PostGIS, Geoserver, and so on.

Like you predicted, even "only" those non-unit-test-possible scenarios can reach hundreds of hours of testing. which may not be possible to run manually for every release. There are a couple of things we can use to make it more feasible:

  1. Using the tester plugin, we can create as many fully automated tests as possible. For those, the tester only needs to install two plugins, choose all the automatic tests, and click a button. Then, he can go grab some lunch and check the results later. Those tests can be run on any platform, and not only the SOs provided by CI, so we could catch some SO specific problems.

Peek 2020-05-27 16-26

  1. Using the tester plugin, we can prepare semi-automated tests, were the plugin performs part of the actions for the testers. For example, the plugin can load a complex project, open a layout with a very complete layout, and trigger some actions, then ask questions to the tester: "do all the layers show in the main map?", "does the overview map shows a different style", "does the attribute table item s filtered and do not show elements of category X?", "when you turn atlas preview, can you see the Y state map", "does it change when you press the next feature button?" .

Peek 2020-05-27 16-30

  1. We can have a smaller set of "mandatory" tests for each release, that focus on general usage and installers. Like, check that all components are working, that the core processing providers can be used, the Python Console, that there are no error messages while loading, etc. and all the automated tests, of course.

  2. Then, have optional sets of test cases (test plans), that should be run only if there is an effective risk of regression. For example, if there was a lot of work in the attribute table for that release, then we should try to promote testing that component to see if we catch something. If on the other hand, no one touched the georeferencing plugin, testing it maybe not be a priority.

  3. Using the testing platform, we can also promote monkey testing for new features. Say you created an exciting new feature and would like people to test it on different platforms. You can create a simple test case with instructions and some data, and then ask testers to give it a try. People will try and register any anomalies on their SO.

Although people do use nightly builds and do some random testing, it's not a coordinated thing, there is no way to know what has been tested and what wasn't.

I have no idea of the acceptance of this, but I think it opens the o yet another set of non-coding activities that common users (or maybe power users) can perform and help the project, beyond documentation and translations. I know a few QGIS-PT folks that I am sure they would like to participate.

@SrNetoChan
Copy link
Member Author

One thing I forgot to explain about the tester plugin, those URLs that show in the beginning, we can have multiple if we want to run the same integration test in different endpoints/versions.

The tester plugin is a massive help if anyone has a bunch of workflows saved as models and want to make sure that they will work in a specific release.

@haubourg
Copy link
Member

I really think acceptance / integration tests will catch a lot of global issues unit tests can't catch, and then we will be able to push unit tests.

so, +1 for me, though we need to settle a solid and long term funding solution for the brave hearts that will run those - just like we need to do for bug triaging and review.

I clearly think the budget growth should go in priority to those recurrent tasks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Grant-2020 QEP for 2020 Grant program Type/QA Quality assurance related
Projects
None yet
Development

No branches or pull requests

8 participants