Fix flaky ModelManagerSuggestedTests: skip launch-time HF fetch under xctest#16
Draft
mimeding wants to merge 1 commit into
Draft
Fix flaky ModelManagerSuggestedTests: skip launch-time HF fetch under xctest#16mimeding wants to merge 1 commit into
mimeding wants to merge 1 commit into
Conversation
ModelManager.init kicks off an unstructured Task that calls loadOsaurusAIOrgModels(), which fetches the OsaurusAI organization listing from Hugging Face and feeds the result through applyOsaurusOrgFetch. The unit-test runner repeatedly constructs ModelManager() to drive applyOsaurusOrgFetch directly. The background launch-time fetch races with those test calls — whichever finishes last wins, and the merge result is non-deterministic. That's the root cause of the flaky ModelManagerSuggestedTests failures seen across many of the recent PR CI runs (applyOsaurusOrgFetch_dropsStaleAutoFetched OnReapply, applyOsaurusOrgFetch_addsNewEntriesAfterCurated, etc.). Gate the launch-time fetch on a small isRunningInTestEnvironment helper that checks for any of XCTestConfigurationFilePath, XCTestBundlePath, or XCTestSessionIdentifier in the process environment. Those variables are only present inside an xctest host process; production app launches still get the HF fetch exactly as before. This is a network call, so removing it under tests also has the side benefit of making the test suite work offline / on hermetic CI runners. Co-authored-by: Michael Meding <mimeding@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Why this matters (business)
ModelManagerSuggestedTestshas been flaking in CI for a while. Looking at the recent CI runs across the new audit PRs (#2, #6, #7, #13, #14, #15) the same handful of test methods fail intermittently:applyOsaurusOrgFetch_dropsStaleAutoFetchedOnReapplyapplyOsaurusOrgFetch_addsNewEntriesAfterCuratedWhen CI is flaky, every red PR triggers a "is this me?" investigation, retry loops fill the queue, and people lose trust in the signal. The fix is one line of intent (gate a launch-time side effect on the test environment) and removes a per-run network call from the test suite.
What's wrong (technical)
ModelManager.init()kicks off an unstructuredTaskthat fetches the OsaurusAI organization listing from Hugging Face and merges the result intosuggestedModels:// Pull the OsaurusAI HF org listing once on launch so newly published // models surface in the Recommended tab without requiring a code push. Task { [weak self] in await self?.loadOsaurusAIOrgModels() }loadOsaurusAIOrgModels()eventually calls back into the sameapplyOsaurusOrgFetchthe tests are exercising directly:Test flow:
let manager = await MainActor.run { ModelManager() }— init schedules the background HF fetch.applyOsaurusOrgFetch(autoFetched: [stale]), thenapplyOsaurusOrgFetch(autoFetched: [kept]).loadOsaurusAIOrgModels()resolves and callsapplyOsaurusOrgFetch(autoFetched: <HF results>).suggestedModelslands last wins. If the HF task lands after the test'skeptcall, the test reads back HF data instead ofkeptand the#expect(after.contains { $0.id == kept.id })assertion fails.This is exactly the failure pattern the xcresult bundles show across the affected PRs.
Fix
Gate the launch-time fetch on a tiny
isRunningInTestEnvironmenthelper that checks for any ofXCTestConfigurationFilePath,XCTestBundlePath, orXCTestSessionIdentifierin the process environment. Those variables are only present inside an xctest host process. The production app launch path is byte-identical to before; only the test host skips the network fetch.Side benefit: removing a network call from the test suite means the suite works offline / on hermetic runners without depending on huggingface.co being reachable.
Verifying I'm reading the failure right
The xcresult bundles for the affected PR runs all show one of the two same-suite tests as the only failure, with messages like:
Same suite, no other changes between runs that pass and runs that fail in that suite, no relationship to the source changes on the affected PRs (which touched HTTP handler, ToolSearchService, Insights, plugin auth, etc. — nothing in
ModelManager).Changes
Test Plan
ModelManagerSuggestedTestsshould be green every time, on every PR, regardless of network reachability.XCTestConfigurationFilePathisn't set in a regular app process).Checklist
CONTRIBUTING.mdswiftc -frontend -parse; the real verification is the next batch of CI runs being green on this suite)