Skip to content

Conversation

lukekarrys
Copy link
Contributor

@lukekarrys lukekarrys commented Apr 24, 2024

Arborist CI has started failing on macos-latest now that those runners default to arm64 machines (aka Apple Silicon). I am able to reproduce the failures locally on a Macbook Pro M1.

After spending some time debugging the issue I believe it has to do with the timing of Node vs Link creation. I was able to bisect and find #5376 which removed the ability for nodes to possibly take longer to create than their link targets.

Going back to the commit before that PR the flaky test passes locally for me and fails starting with the first commit in that PR.

I'm just running the offending test in a loop and seeing if it fails, so not a perfect metric. But when it fails, I get a failure at least 10% of the time. On the old commit I was able to run it 50x with no failures. Here's what I was running locally to observe failures:

COUNT="0"

while true; do
  COUNT=$((COUNT+1))
  echo "Start $COUNT"
  if ! npm test -w workspaces/arborist --ignore-scripts -- test/arborist/load-actual.js --no-coverage -Rtap --grep selflink; then
    echo "Failed on run $COUNT"
    exit 1
  fi
done

This is definitely an edge case, but one I would like to fix in the future. Disabling this test is to temporarily get CI green while we release and make more substantial changes that are hard to do with CI flaking.

We've had other issues with symlinks and I would feel much better knowing we have defined behavior in this specific case when tracking down future potential symlink bugs.

One fix that worked locally is iterating over node.target.children sequentially instead of in Promise.all] but that is probably only a side effect of the dep ordering in the test. A fix will have to account for any order of links and node taking different amount of time.

@lukekarrys lukekarrys force-pushed the lk/arborist-selflink-test branch from 4edf804 to c1152e9 Compare April 24, 2024 16:47
@lukekarrys lukekarrys changed the title chore: disable selflink test chore: disable selflink test on apple silicon Apr 24, 2024
@wraithgar
Copy link
Member

Is this really just test flakiness or does this represent a real bug/race condition that we have for folks?

@lukekarrys
Copy link
Contributor Author

I think there is a real bug here. The only thing I'm unsure of is how much of an edge case this is. The flaky test uses a symlink back to itself, which I consider more of an edge case than if all symlinks could hit this race condition.

@liamcmitchell
Copy link
Contributor

Looks like this was handled before 2db6c08

@liamcmitchell
Copy link
Contributor

liamcmitchell commented Oct 5, 2025

Race loading link (node_modules/@scope/z/node_modules/glob) and target node (node_modules/foo/node_modules/glob). On failure I see the following sequence:

  • node: loadFSNode(), no cached, await PackageJson.normalize(real)
  • link: loadFSNode(), no cached, await PackageJson.normalize(real) (reading same file in parallel)
  • link: await newLink(), target not found, a default target is created with no parent, cache.set(realpath, link.target), await loadFSTree(link.target)
  • node: newNode(), cache.set(path, node), overwriting the link target above which is still loading

So glob children are loaded, but into a duplicate node with no parent.

Ideally solved by caching node promises.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants