Content downloader tests #897

dbnicholson · 2023-10-25T23:00:14Z

⚠️ Another giant PR!

This is big, but the vast majority is test data and scaffolding for running the tasks worker. If you're interested in getting into the minutiae of navigating Kolibri, Django and pytest-django, that's great. However, the interesting part is the last commit. The first commit has an internal interface change, and the second commit changes an internal interface return value. The rest is all test stuff including a vast pile of test data.

I'd really like to land this so I can tackle #890 with at least some confidence.

Fixes: #778

As a convenience, the generateed `remotecontentimport` task was setting `node_ids` to an empty list if it wasn't specified. However, an empty list and `None` are 2 different cases. An empty list is used when no node IDs are to be imported, and `None` is used when all node IDs are to be imported. Even though we're not actually downloading all nodes here, fix the semantics and require that the thumbnails tasks explicitly request no nodes. This will be used later in testing.

This will be used in testing to check that a task was actually enqueued.

The storage hook depends on the `BackgroundTask` containing the current job ID. However, it's possible the job will change state and the hook will run before the job ID has been saved. That would prevent the hook from properly synchronizing the updated job state. To prevent that, lock the database while the task is being enqueued until the job ID is saved.

By default set the log level to INFO so there's useful information on failures. Task handling is multithreaded, so add the thread name to help debugging.

The logs won't be shown if the tests pass, but if they don't the messages can be invaluable for debugging failures.

Learning Equality obviously has no use for the test suite constantly telling it about an ephemeral deployment. More importantly, the ping task can hang and interfere with our own tasks being scheduled.

In order to exercise the collection downloader, we need channel data. The `create_channeldb_json.py` script and `db.json.template` file are used to create JSON files representing fake channels. The generated JSON files contain the data necessary to create sqlite channel databases with sqlalchemy as well as the content data inlined. 18 fake channels have been created and will be used in later commits.

The `create_contentdir` function takes the fake channel DB JSON files and creates a content directory with the channel databases and content files for testing. That's made available as the `contentdir` pytest fixture. A standalone script is provided as a convenience and for exercising the functionality outside of the test suite.

This runs an HTTP server for the test content so that we can import channels and content during tests just like they were being imported from studio.

Provide test collections using all the packs in the current endless-key-collections release. The fake packs have very regular structures and use the fake channel data.

These exercise all of the API endpoints that don't interact with the download manager.

This adds a few pytest fixtures for running channel and content import jobs. Beyond creating a facility and a facility user with appropriate permissions, 2 fixtures for handling interactions between Django and SQLAlchemy. Some of this is lifted from Kolibri and/or depends on Kolibri internals. We'll see how well they hold up over time.

While here, add a few log messages to aid when debugging failures.

dbnicholson · 2023-10-26T21:29:04Z

The downloader test was a bit flaky locally. After running it repeatedly with extensive logging, it seems that Kolibri's task worker gets hung occasionally. I couldn't figure out why that would happen, so I added pytest-retry to retry the test when it times out in either the foreground or background downloads. The 3rd commit attempts to handle one potential issue I thought of while debugging.

dylanmccall

This is looking good to me, and it works on my system, albeit with some sporadic failures which I'm gathering are more related to problems we need to fix in the code being tested. I do have one suggestion with regards to making this a little less huge, but all seems good otherwise :)

kolibri_explore_plugin/test/channels/README.md

kolibri_explore_plugin/test/plugin.py

dbnicholson · 2023-10-27T02:27:31Z

This is looking good to me, and it works on my system, albeit with some sporadic failures which I'm gathering are more related to problems we need to fix in the code being tested. I do have one suggestion with regards to making this a little less huge, but all seems good otherwise :)

I'm still getting the sporadic failures, too. I thought pytest-retry would fix that, but it actually doesn't do anything useful because it causes an error in the teardown of the failed test. I really wanted to get to the bottom of why Kolibri's worker stalls.

manuq

This is impressive! Should we revert the retry as you guys have found that is not preventing the sporadic failures? From my side this is good to go and very welcome, even with those failures.

dbnicholson · 2023-10-27T15:06:59Z

This is impressive! Should we revert the retry as you guys have found that is not preventing the sporadic failures? From my side this is good to go and very welcome, even with those failures.

I think so, but let me play with it for a little bit longer. I was using pytest-flaky, but I think I should use pytest-rerunfailures, which has almost the same interface but is maintained by the pytest developers. I think that's more likely to work correctly.

dbnicholson · 2023-10-27T15:13:11Z

This is impressive! Should we revert the retry as you guys have found that is not preventing the sporadic failures? From my side this is good to go and very welcome, even with those failures.

I think so, but let me play with it for a little bit longer. I was using pytest-flaky, but I think I should use pytest-rerunfailures, which has almost the same interface but is maintained by the pytest developers. I think that's more likely to work correctly.

Yeah, pytest-rerunfailures seems to work correctly. Let me update the PR.

It seems that Kolibri's task worker can hang occasionally, so use pytest-rerunfailures to mark tests that can be retried on failure.

This is a basic smoketest that the collection downloader can be run with any collection specified. Unfortunately, Kolibri's task worker seems to hang sometimes during either the foreground or background downloading. Use pytest-rerunfailures's `flaky` mark to try again when that happens. Fixes: #778

dbnicholson · 2023-10-27T15:29:50Z

This is impressive! Should we revert the retry as you guys have found that is not preventing the sporadic failures? From my side this is good to go and very welcome, even with those failures.

I think so, but let me play with it for a little bit longer. I was using pytest-flaky, but I think I should use pytest-rerunfailures, which has almost the same interface but is maintained by the pytest developers. I think that's more likely to work correctly.

Yeah, pytest-rerunfailures seems to work correctly. Let me update the PR.

If you're running this locally (@dylanmccall), you should pip uninstall pytest-retry since it provides the same flaky mark and I don't know how pytest will decide which one to use.

dbnicholson · 2023-10-27T15:35:27Z

Great, the rerun worked this time:

kolibri_explore_plugin/test/test_collectionviews.py::test_download_manager_clean[athlete-0001] RERUN [ 63%]
kolibri_explore_plugin/test/test_collectionviews.py::test_download_manager_clean[athlete-0001] PASSED [ 63%]

I'm going to merge this now.

dbnicholson requested review from manuq and dylanmccall October 25, 2023 23:00

dbnicholson added 12 commits October 26, 2023 15:23

jobs: Return enqueued task from enqueue_next_background_task

1d8215c

This will be used in testing to check that a task was actually enqueued.

tests: Improve pytest failure logging

3e97462

By default set the log level to INFO so there's useful information on failures. Task handling is multithreaded, so add the thread name to help debugging.

ci: Run pytest tests with debug logging

4b95c14

The logs won't be shown if the tests pass, but if they don't the messages can be invaluable for debugging failures.

tests: Disable ping tasks

9f20fa5

Learning Equality obviously has no use for the test suite constantly telling it about an ephemeral deployment. More importantly, the ping task can hang and interfere with our own tasks being scheduled.

tests: Add content server fixture

f59744b

This runs an HTTP server for the test content so that we can import channels and content during tests just like they were being imported from studio.

tests: Add test collections

bf986a6

Provide test collections using all the packs in the current endless-key-collections release. The fake packs have very regular structures and use the fake channel data.

tests: Add collectionviews API tests

c312ec3

These exercise all of the API endpoints that don't interact with the download manager.

tests: Add jobs tests

f3973d2

While here, add a few log messages to aid when debugging failures.

dbnicholson force-pushed the 778-downloader-tests branch from f29769e to 269397e Compare October 26, 2023 21:25

dylanmccall approved these changes Oct 27, 2023

View reviewed changes

kolibri_explore_plugin/test/channels/README.md Show resolved Hide resolved

kolibri_explore_plugin/test/plugin.py Show resolved Hide resolved

dbnicholson mentioned this pull request Oct 27, 2023

Skip downloading when all content available #899

Merged

manuq approved these changes Oct 27, 2023

View reviewed changes

dbnicholson added 2 commits October 27, 2023 09:16

Add pytest-rerunfailures dev dependency

a73de4c

It seems that Kolibri's task worker can hang occasionally, so use pytest-rerunfailures to mark tests that can be retried on failure.

dbnicholson force-pushed the 778-downloader-tests branch from 269397e to f8563e9 Compare October 27, 2023 15:26

dbnicholson merged commit 2a12060 into master Oct 27, 2023
3 checks passed

dbnicholson deleted the 778-downloader-tests branch October 27, 2023 15:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Content downloader tests #897

Content downloader tests #897

dbnicholson commented Oct 25, 2023

dbnicholson commented Oct 26, 2023

dylanmccall left a comment

dbnicholson commented Oct 27, 2023

manuq left a comment

dbnicholson commented Oct 27, 2023

dbnicholson commented Oct 27, 2023

dbnicholson commented Oct 27, 2023

dbnicholson commented Oct 27, 2023

Content downloader tests #897

Content downloader tests #897

Conversation

dbnicholson commented Oct 25, 2023

dbnicholson commented Oct 26, 2023

dylanmccall left a comment

Choose a reason for hiding this comment

dbnicholson commented Oct 27, 2023

manuq left a comment

Choose a reason for hiding this comment

dbnicholson commented Oct 27, 2023

dbnicholson commented Oct 27, 2023

dbnicholson commented Oct 27, 2023

dbnicholson commented Oct 27, 2023