Fix for race condition loading management.cattle.io.cluster schema #5319

nwmac · 2022-03-08T08:31:00Z

This PR fixes the issue reported in both the above issues - there is a rare race condition that leads to the fail-whale page being displayed:

Tracing through the code, the problem is that middleware/authenticated.js on line 265, we dispatch calls to load the management and cluster stores in parallel. The management store will load all schemas (see store/index.js line 538). The cluster store assumes that the schema for the management cluster is avaialable (see store.js line 677).

There is a race condition where if the schemas are not loaded, the call to load the cluster will fail and you'll get bumped to the fail-whale page.

This PR fixes this by adding a wait into the loadCluster code - we don't want to load the schemas in multiple places - we assume that the schemas are in the process of being loaded and wait up to 15 seconds for them - after 10 seconds we log a message to the browser console and after 15 if we still don't have the schemas, we request them synchronously (this should not be needed, but is a fail-safe).

This is hard to test, but from the description above you should be able to understand how this can happen.

Developers can test by:

Commenting out line 673 in store/index.js - this essentially removes the fix in this PR. The line to comment out is:

    await waitForMgmtClusterSchemaAvailable({ getters, dispatch });

Now, change line 538 in store/index.js from:

      mgmtSchemas:    dispatch('management/loadSchemas', true),

to:

      mgmtSchemas:    (new Promise(resolve => setTimeout(resolve, 5000))).then(() => dispatch('management/loadSchemas', true)),

This will add a 5 second delay to the loading of schemas.

In the UI, refresh to the page for the local cluster, i.e. https://127.0.0.1:8005/c/local/explorer

You should see the fail-whale page.

Now uncomment line 673 in store/index.js to put back in the fix from this PR and try again and everything should load correctly.

richard-cox

I knew this rang a bell, I hit something similar before and fixed it via store/index.js ln 708

    const fetchProjects = async() => {
      let limit = 30000;
      const sleep = 100;

      while ( limit > 0 && !state.managementReady ) {
        await setTimeout(() => {}, sleep);
        limit -= sleep;
      }

      if ( getters['management/schemaFor'](MANAGEMENT.PROJECT) ) {
        return dispatch('management/findAll', projectArgs);
      }
    };

(#4650)

The state.managementReady flag is set once loadManegement completes.

I don't think we should have two approaches doing roughly the same thing in the same function. We could do one of....

change fetchProjects to use a more generic waitForMgmtClusterSchemaAvailable,
waitForMgmtClusterSchemaAvailable could block on managementReady and the fetchProjects removed (projects could just be fetched)

Pluses and minuses to both, am I'm happy to implement either?

nwmac · 2022-03-08T16:32:54Z

@richard-cox I think we should add waitForSchema to the store then - it can take the type to wait for and maybe an optional timeout (or we just choose 30 seconds, say).

It could return true or false to indicate if the schema was available - in the case of the fix in this PR, the code could then load all of the schemas if they were still unavailable at the end of the wait.

nwmac · 2022-03-08T16:33:11Z

@richard-cox can you do that refactor?

- This was going to apply to `fetchProjects` as well, however I realised it covers a different use case (the schema may not exist and that's fine) - The generic `<store>/waitForSchema` action might come in handy for other products (harvesters `loadVirtual` might have a similar issue, i think Epinio also).

Fix for race condition loading management.cattle.io.cluster schema

76e20be

nwmac requested a review from richard-cox March 8, 2022 08:31

nwmac self-assigned this Mar 8, 2022

nwmac added this to the v2.6.4 milestone Mar 8, 2022

richard-cox requested changes Mar 8, 2022

View reviewed changes

richard-cox approved these changes Mar 9, 2022

View reviewed changes

nwmac merged commit 7816ca4 into rancher:master Mar 9, 2022

This was referenced Apr 26, 2022

fix fetch management.cattle.io.cluster race error harvester/dashboard#303

Merged

fix fetch management.cattle.io.cluster race error #5715

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix for race condition loading management.cattle.io.cluster schema #5319

Fix for race condition loading management.cattle.io.cluster schema #5319

nwmac commented Mar 8, 2022 •

edited by zube bot

Loading

richard-cox left a comment

nwmac commented Mar 8, 2022

nwmac commented Mar 8, 2022

Fix for race condition loading management.cattle.io.cluster schema #5319

Fix for race condition loading management.cattle.io.cluster schema #5319

Conversation

nwmac commented Mar 8, 2022 • edited by zube bot Loading

richard-cox left a comment

Choose a reason for hiding this comment

nwmac commented Mar 8, 2022

nwmac commented Mar 8, 2022

nwmac commented Mar 8, 2022 •

edited by zube bot

Loading