feat: deduplicate shared URL downloads across test suites by dabrain34 · Pull Request #338 · fluendo/fluster

dabrain34 · 2026-02-26T15:07:45Z

Introduce a centralized DownloadManager that ensures each URL is downloaded at most once, eliminating duplicate downloads both across test suites and within a single test suite.

Add DownloadManager class in utils.py with download-once caching and centralized archive cleanup
Refactor TestSuite.download() to use pre-downloaded archives from the manager across all three download paths
Use a thread pool to download concurrently and make DownloadManager thread-safe so duplicate URLs are still fetched only once.

This feature allows to fast up considerably the download of AV1-ARGON* which was downloading each time the 6GB archive for every test vector.

Fix #309

Introduce a centralized DownloadManager that ensures each URL is downloaded at most once, eliminating duplicate downloads both across test suites and within a single test suite. - Add DownloadManager class in utils.py with download-once caching and centralized archive cleanup - Refactor TestSuite.download() to use pre-downloaded archives from the manager across all three download paths - Use a thread pool to download concurrently and make DownloadManager thread-safe so duplicate URLs are still fetched only once.

dabrain34 · 2026-03-16T15:35:58Z

@ylatuya ping

dabrain34 · 2026-04-21T15:47:22Z

@rsanchez87 can you have a look to this PR as well? The idea would be to fast up the build of docker images containing all the test suites

ylatuya · 2026-04-22T07:52:15Z

-                )
+        # When archive_path is provided, the archive was already downloaded
+        # by the DownloadManager — skip directly to extraction.
+        if ctx.archive_path and os.path.exists(ctx.archive_path):


Shouldn't all the download logic be in the DownloadManager? I would expect _download_single_test_vector and _download_single_archive use the download manager instead of utils.download and let the download manager handle all the checks so that those are not duplicated.

ylatuya · 2026-04-22T07:53:00Z

-                    f"Checksum mismatch for source file {os.path.basename(first_tv.source)}: {checksum} "
-                    f"instead of '{first_tv.source_checksum}'"
+            # Verify existing file: clean up corrupt, skip if valid
+            skip_download = False


All of this logic should be handled by the DownloadManager

ylatuya · 2026-04-22T07:54:14Z

        extract_all: bool = False,
        keep_file: bool = False,
        retries: int = 2,
+        download_manager: Optional["utils.DownloadManager"] = None,


I would not make the DownloadManager optional; it's simpler to maintain.

ylatuya · 2026-04-22T07:55:44Z

+                    return (url, local_path)
+
+                max_workers = max(1, min(jobs, len(unique_source_list)))
+                with ThreadPoolExecutor(max_workers=max_workers) as dl_pool:


This should be handled by the DownloadManager, the api should support a list of url's to download and let the DownloadManager handle all the parallel downloads.

ylatuya · 2026-04-22T07:59:06Z

+                    return (url, local_path)
+
+                max_workers = max(1, min(jobs, len(unique_source_list)))
+                with ThreadPoolExecutor(max_workers=max_workers) as dl_pool:


We use Pool from multiprocessing

ylatuya · 2026-04-22T07:59:39Z

@@ -328,7 +400,16 @@ def _callback_error(err: Any) -> None:

                downloads = []
                for tv in self.test_vectors.values():


This should go away if we are using the DownloadManager.

rsanchez87 · 2026-04-22T12:08:46Z

@rsanchez87 can you have a look to this PR as well? The idea would be to fast up the build of docker images containing all the test suites

@dabrain34, tested with python3 fluster.py download AV1-ARGON-PROFILE0-CORE-ANNEX-B AV1-ARGON-PROFILE1-CORE-ANNEX-B AV1-ARGON-PROFILE2-CORE-ANNEX-B
master: 49m 40s
PR: 16m 2s (~3x faster, ZIP downloaded once instead of 3 times) ✅

Also regression tests ✔️

I’ll test again once the requested changes by @ylatuya are implemented. Thanks!

dabrain34 · 2026-04-22T12:49:38Z

thanks for the test, indeed this is even better on low speed lines as we dont redownload all the time the AV1 zip file.

I'm currently addressing comments from ylatuya. When this is ready I will come back to you

dabrain34 added 2 commits February 26, 2026 16:06

Merge branch 'master' into dab_duplication_download

dc31a68

ylatuya requested changes Apr 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: deduplicate shared URL downloads across test suites#338

feat: deduplicate shared URL downloads across test suites#338
dabrain34 wants to merge 2 commits intofluendo:masterfrom
dabrain34:dab_duplication_download

dabrain34 commented Feb 26, 2026 •

edited

Loading

Uh oh!

dabrain34 commented Mar 16, 2026

Uh oh!

dabrain34 commented Apr 21, 2026 •

edited

Loading

Uh oh!

ylatuya Apr 22, 2026

Uh oh!

ylatuya Apr 22, 2026

Uh oh!

ylatuya Apr 22, 2026

Uh oh!

ylatuya Apr 22, 2026

Uh oh!

ylatuya Apr 22, 2026

Uh oh!

ylatuya Apr 22, 2026

Uh oh!

rsanchez87 commented Apr 22, 2026

Uh oh!

dabrain34 commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		@@ -328,7 +400,16 @@ def _callback_error(err: Any) -> None:

		downloads = []
		for tv in self.test_vectors.values():

Conversation

dabrain34 commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dabrain34 commented Mar 16, 2026

Uh oh!

dabrain34 commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ylatuya Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

ylatuya Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

ylatuya Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

ylatuya Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

ylatuya Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

ylatuya Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

rsanchez87 commented Apr 22, 2026

Uh oh!

dabrain34 commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dabrain34 commented Feb 26, 2026 •

edited

Loading

dabrain34 commented Apr 21, 2026 •

edited

Loading