Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sudden interruption of @nx/next:build target on 18.0.5 #21895

Closed
2 of 4 tasks
khludenevav opened this issue Feb 20, 2024 · 6 comments
Closed
2 of 4 tasks

Sudden interruption of @nx/next:build target on 18.0.5 #21895

khludenevav opened this issue Feb 20, 2024 · 6 comments

Comments

@khludenevav
Copy link

khludenevav commented Feb 20, 2024

Current Behavior

I'm using nx with custom task runner. Which is supposed only to upload cache artifacts from .nx/cache to GCP.
We use buildkite for CI purposes. Months ago I noticed that some builds are invalid (in my case that was when I downloaded artifacts on some of next similar builds and they are not full (doesn't contain desired site dist, much smaller by size).
After couple months I figure out that was because of cancel-running-intermediate-builds feature.

I'm using nextjs for site build. According to logs seems build of nextjs interrupted (I'm using plugin), but GCSRemoteCache.store function still called (when artifacts are incorrect).

Also according to logs seems even if parent task is failed, Nx still launch dependent task (which seems very wrong)
image

image

Probably I'm incorrectly implementing task runner. Could you point me how to properly handle termination?

Content of nx.json
"tasksRunnerOptions": {
    "default": {
      "runner": "./libs/nx-task-runner",
      "options": {
        "captureStderr": true
      }
    }
  },


...
    "build-nextjs-app": {
      "inputs": [
        "sharedGlobals",
        "globalStyles",
        "prjFiles",
        "prjExcludeSpecs",
        "prjExcludeStories",
        "^prjFiles",
        "^prjExcludeSpecs",
        "^prjExcludeStories",
        {
          "dependentTasksOutputFiles": "apps/*/generated/**/*",
          "transitive": true
        }
      ],
      "cache": true
    },
    "move-nextjs-assets": {
      "executor": "nx-plugin:move-nextjs-assets",
      "options": {
        "appDistRoot": "{workspaceRoot}/dist/{projectRoot}/dist"
      }
    },
Content of apps/dashboard/project.json
    "build-nextjs-app": {
      "executor": "@nx/next:build",
      "dependsOn": ["generate-redirects", "generate-spotlight", "^build"],
      "outputs": ["{options.outputPath}"],
      "defaultConfiguration": "production",
      "options": {
        "outputPath": "dist/{projectRoot}"
      }
    },
    "move-nextjs-assets": {
      "dependsOn": ["build-nextjs-app"],
      "options": {
        "assetsPath": "static-assets-dashboard"
      }
    },
Content of libs/nx-task-runner/lib/index.js
const { defaultTasksRunner } = require('@nx/workspace/src/tasks-runner/default-tasks-runner');
const { GCSRemoteCache } = require('./cache');
const { logger } = require('./logger');

const tasksRunner = (tasks, options, context) => {
  if (process.env.NX_REMOTE_CACHE_BUCKET) {
    logger.log('Using Google Cloud Storage remote cache.');

    options.remoteCache = new GCSRemoteCache(process.env.NX_REMOTE_CACHE_BUCKET);
  } else {
    logger.warn(
      'Missing NX_REMOTE_CACHE_BUCKET environment variable, skipping Google Cloud cache.',
    );
  }

  return defaultTasksRunner(tasks, options, context);
};
module.exports = tasksRunner;

Content of libs/nx-task-runner/lib/cache.js
const { promises: fs, statSync } = require('fs');
const path = require('path');
const { Storage } = require('@google-cloud/storage');
const { create: tarCreate, extract: tarExtract } = require('tar');
const { withFile: withTemporaryFile } = require('tmp-promise');
const { logger } = require('./logger');

async function logFileInfo(action, remoteFileName, tmpFile, gcsFile) {
  const metadataResponse = await gcsFile.getMetadata();
  const localTarSizeInBytes = statSync(tmpFile.path).size;
  const remoteTarSizeInBytes = parseInt(metadataResponse[0].size);
  console.log(
    [
      `Successfully ${action} ${remoteFileName}`,
      `  Local size: ${localTarSizeInBytes} bytes`,
      `  Gcp metadata:`,
      `    Size: ${remoteTarSizeInBytes} bytes`,
      `    TimeCreated: ${metadataResponse[0].timeCreated}`,
    ].join('\n'),
  );
}

function getBrokenRemoteFileName(appName, hash) {
  return `broken_${appName}_${hash}.tar.gz`;
}
function getBrokenDebugRemoteFileName(appName, hash) {
  return `broken_${appName}_debug_${hash}.tar.gz`;
}

function getRemoteFileName(hash) {
  return `${hash}.tar.gz`;
}

function getAppName(terminalOutput) {
  // Means it is output from dashboard:build:production
  if (/client\/apps\/dashboard\/.env.production/.test(terminalOutput)) {
    return 'dashboard';
  }
  if (/client\/apps\/docs\/.env.production/.test(terminalOutput)) {
    return 'docs';
  }
  if (/client\/apps\/maintenance\/.env.production/.test(terminalOutput)) {
    return 'maintenance';
  }
  return null;
}

class GCSRemoteCache {
  bucket;

  constructor(bucketName) {
    const storage = new Storage();
    this.bucket = storage.bucket(bucketName);
  }

  async retrieve(hash, cacheDirectory) {
    const remoteFileName = getRemoteFileName(hash);
    const file = this.bucket.file(remoteFileName);
    try {
      const [exists] = await file.exists();
      if (!exists) {
        return true;
      }
    } catch (err) {
      logger.warn(
        `Failed to check if the file already exist in the Google Cloud Storage bucket (error below). Ignoring.`,
      );
      console.error(err);
      return false;
    }
    return withTemporaryFile(async tmpFile => {
      await file.download({
        destination: tmpFile.path,
      });

      await fs.mkdir(cacheDirectory, {
        recursive: true,
      });

      await tarExtract({
        file: tmpFile.path,
        cwd: cacheDirectory,
      });

      await logFileInfo('downloaded', remoteFileName, tmpFile, file);

      return true;
    }).catch(err => {
      logger.warn(
        'Failed to retrieve Nx cache from Google Cloud Storage bucket (error below). Ignoring.',
      );
      console.error(err);
      return false;
    });
  }

  async store(hash, cacheDirectory) {
    const remoteFileName = getRemoteFileName(hash);
    const file = this.bucket.file(remoteFileName);

    try {
      const [exists] = await file.exists();
      if (exists) {
        return true;
      }
    } catch (err) {
      logger.warn(
        `Failed to check if the file already exist in the Google Cloud Storage bucket (error below). Ignoring.`,
      );
      console.error(err);
      return false;
    }

    return withTemporaryFile(async tmpFile => {
      await tarCreate(
        {
          gzip: true,
          file: tmpFile.path,
          cwd: cacheDirectory,
        },
        [hash, `${hash}.commit`],
      );

      try {
        const terminalOutput = await fs.readFile(
          path.join(cacheDirectory, hash, 'terminalOutput'),
          {
            encoding: 'utf-8',
          },
        );

        const appName = getAppName(terminalOutput);
        if (appName) {
          // We are going to check that tar size is appropriate
          // If size inappropriate, then push that data to GCS with other name and debug info.
          const tarSizeInMb = statSync(tmpFile.path).size / 1024 / 1024;
          // App size should be 11.5 - 12.5Mb. If less that means we have an error.
          if (tarSizeInMb < 4) {
            await this.bucket.upload(tmpFile.path, {
              destination: getBrokenRemoteFileName(appName, hash),
            });
            const debugInfo = {
              time: new Date().toISOString(),
              buildkiteBranch: process.env.BUILDKITE_BRANCH,
              buildkiteBuildId: process.env.BUILDKITE_BUILD_ID,
              buildkiteJobId: process.env.BUILDKITE_JOB_ID,
              buildkiteCommit: process.env.BUILDKITE_COMMIT,
              buildkiteCommand: process.env.BUILDKITE_COMMAND,
              buildkitePullRequest: process.env.BUILDKITE_PULL_REQUEST,
              buildkitePipelineName: process.env.BUILDKITE_PIPELINE_NAME,
              buildkitePipelineSlug: process.env.BUILDKITE_PIPELINE_SLUG,
              buildkiteStepId: process.env.BUILDKITE_STEP_ID,
              buildkiteStepKey: process.env.BUILDKITE_STEP_KEY,
              buildkiteTag: process.env.BUILDKITE_TAG,
              buildkiteTriggeredFromBuildId: process.env.BUILDKITE_TRIGGERED_FROM_BUILD_ID,
            };
            console.log('Writing and uploading debug file...');
            await withTemporaryFile(async tmpDebugFile => {
              await fs.writeFile(tmpDebugFile.path, JSON.stringify(debugInfo, null, 2), {
                encoding: 'utf-8',
              });
              await this.bucket.upload(tmpDebugFile.path, {
                destination: getBrokenDebugRemoteFileName(appName, hash),
              });
              console.log(
                `Debug file uploaded to GCS: ${getBrokenDebugRemoteFileName(appName, hash)}`,
              );
            });

            return false;
          }
        }
      } catch (e) {
        console.log(e);
        return false;
      }

      await this.bucket.upload(tmpFile.path, {
        destination: remoteFileName,
      });

      const file = this.bucket.file(remoteFileName);
      await logFileInfo('uploaded', remoteFileName, tmpFile, file);

      return true;
    }).catch(err => {
      logger.warn(
        'Failed to store Nx cache in Google Cloud Storage bucket (error below). Ignoring.',
      );
      console.error(err);
      return false;
    });
  }
}

module.exports = {
  GCSRemoteCache,
};

Expected Behavior

GCSRemoteCache.store is not called when plugin build cancelled.
The target is not launched when the dependent task failed.

GitHub Repo

No response

Steps to Reproduce

It seems need to launch nx affected -t build -c production in buildkite CI and then interrupt build in the stage when plugin @nx/next:build working. Plus need to implement custom task runner.

Nx Report

>  NX   Report complete - copy this into the issue template

   Node   : 20.10.0
   OS     : darwin-arm64
   pnpm   : 8.15.3
   
   nx (global)        : 18.0.4
   nx                 : 18.0.4
   @nx/js             : 18.0.4
   @nx/jest           : 18.0.4
   @nx/eslint         : 18.0.4
   @nx/workspace      : 18.0.4
   @nx/devkit         : 18.0.4
   @nx/eslint-plugin  : 18.0.4
   @nx/next           : 18.0.4
   @nx/plugin         : 18.0.4
   @nx/react          : 18.0.4
   @nx/storybook      : 18.0.4
   @nx/web            : 18.0.4
   typescript         : 5.3.3
   ---------------------------------------
   Local workspace plugins:
         nx-plugin

Failure Logs

No response

Package Manager Version

No response

Operating System

  • macOS
  • Linux
  • Windows
  • Other (Please specify)

Additional Information

No response

@AgentEnder AgentEnder added the scope: misc Misc issues label Feb 22, 2024
@khludenevav
Copy link
Author

khludenevav commented May 3, 2024

Well, seems listen events on process and interrupt store and retrieve solves the problem. But I still consider this is a bug because why nx(or nextjs plugin), knowing that it got SIGTERM, still tries to write corrupted cache.

async store(hash, cacheDirectory) {
    if (this.stopped) {
      return false;
    }
...
}

subscribeToSignals() {
    // Can usually be generated with Ctrl+C
    process.once('SIGINT', () => {
      this.stopped = true;
    });
    // That signal sends to Gracefully shutdown app.
    process.once('SIGTERM', () => {
      this.stopped = true;
    });
    // Is generated on Windows when the console window is closed
    process.once('SIGHUP', () => {
      this.stopped = true;
    });
  }

@khludenevav
Copy link
Author

khludenevav commented May 17, 2024

Well. Code above doesn't solve the problem, we still experiencing sudden interruption of nextjs build through plugin (no signals sent)
On the screenshot there should be a lot more of output lines...
image

Correct one:
image

config of target is

"build-nextjs-app": {
      "inputs": [
        "sharedGlobals",
        "globalStyles",
        "prjFiles",
        "prjExcludeSpecs",
        "prjExcludeStories",
        "^prjFiles",
        "^prjExcludeSpecs",
        "^prjExcludeStories",
        {
          "dependentTasksOutputFiles": "apps/*/generated/**/*",
          "transitive": true
        }
      ],
      "cache": true

      "executor": "@nx/next:build",
      "dependsOn": ["generate-redirects", "generate-spotlight", "^build"],
      "outputs": ["{options.outputPath}", "{projectRoot}/.next/routes-manifest.json"],
      "defaultConfiguration": "production",
      "options": {
        "outputPath": "dist/{projectRoot}"
      },
      "configurations": {
        "development": {},
        "production": {}
      }
    },

I hope nx/nx nextjs plugin already fixed that in next releases.

Going to wait for fix: #23496
Then update to v19 and check again.

@khludenevav khludenevav changed the title Nx loads invalid cache artifacts to GCP cloud using custom task-runner when buildkite build interrupted Sudden interruption of @nx/next:build target on 18.0.5 May 17, 2024
@khludenevav
Copy link
Author

Seems that bug is related to this issue #23013
I definitely need to update..

@khludenevav
Copy link
Author

khludenevav commented Jun 17, 2024

Closing the issue. We have a huge code base and it worked on nextjs + react-router. We decided to migrate to nextjs pages router. Starting from some point nextjs start to silently fail on build and return zero code. We decreased load on nextjs by removing option transpilePackages which were required for transpile out external design-system package and seems now build is stable and 25% faster. Going to split application on couple smaller.

@khludenevav
Copy link
Author

Issue with failed nextjs when using it's router was fixed with switching to parallel series hook in the webpack.
image
To force nextjs use local webpack instead of it's bundles, need to set env variable NEXT_PRIVATE_LOCAL_WEBPACK=1 before calling next build

Copy link

github-actions bot commented Aug 9, 2024

This issue has been closed for more than 30 days. If this issue is still occuring, please open a new issue with more recent context.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 9, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants