Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Injected dependencies are not recopied after a workspace project is rebuilt #4407

Open
octogonz opened this issue Mar 4, 2022 · 17 comments
Open

Comments

@octogonz
Copy link
Member

octogonz commented Mar 4, 2022

pnpm version: 6.32.2

The real world scenario at my work involves internal company NPM packages with a relationship like this:

A --depends-on--> B --peer-depends-on--> C

...where A and C are in the monorepo, but B is developed in a separate repo and gets installed from an Artifactory private registry. This is a common situation at a big company if most teams work in a monorepo, but we share some packages with other teams who have their own repo.

Code to reproduce the issue:

This is an isolated minimal repro. I substituted @react-navigation/core to avoid involving private packages.

  1. Clone this repository: https://github.com/octogonz/pnpm-issue-4407
  2. pnpm install

In this repro:

  • test-project depends on @react-navigation/core
  • @react-navigation/core has a peer dependency on "react": "*"
  • test-project also depends on "react": "workspace:*"
  • The local workspace project react has version 16.0.0 which should satisfy the peer dependency

Actual behavior:

Progress: resolved 10, reused 10, downloaded 0, added 10, done
 WARN  Issues with peer dependencies found
test-project
└─┬ @react-navigation/core
  └── ✕ missing peer react@"*"
Peer dependencies that should be installed:
  react@"*"

Expected behavior:

The installation should succeed. PNPM should satisfy react@* by creating a symlink to the local workspace package.

Additional information:

  • node -v prints: v14.17.4
  • Windows, macOS, or Linux?: Windows
@octogonz
Copy link
Member Author

octogonz commented Mar 4, 2022

More weirdness

After completing steps 1 and 2 above, try this:

  1. rm pnpm-lock.yaml
  2. rm -Rf node_modules (in the repository root, but not under the project folders)
  3. pnpm install

Actual behavior:

This time, the installation completes without any errors/warnings:

Scope: all 2 workspace projects
Packages: +10
++++++++++
Packages are hard linked from the content-addressable store to the virtual store.
  Content-addressable store is at: D:\.pnpm-store\v3
  Virtual store is at:             node_modules/.pnpm
Progress: resolved 10, reused 10, downloaded 0, added 10, done

Expected behavior:

If step 2 printed an error, then so should step 5. I would expect that deleting pnpm-lock.yaml is sufficient to reset the installation. (?)

The error does not reappear unless you delete BOTH pnpm-lock.yaml AND test-project/node_modules/**

@octogonz
Copy link
Member Author

octogonz commented Mar 4, 2022

We were able to work around this problem by using the dependenciesMeta.*.injected feature like this:

{
  "name": "test-project",
  "version": "0.0.0",
  "dependencies": {
    "@react-navigation/core": "^5.15.0",
    "react": "workspace:*"
  },
  "dependenciesMeta": {
    "react": { "inject": true }
  }
}

However this does not seem like a correct solution. The documentation for that feature sounds like it is written for a different problem and I didn't understand what is doing technically-- is it actually correctly satisfying the versions?

And practically speaking, it would be awkward to ask every developer to remember to do this for every peer dependency. If it's a correct solution, why can't PNPM do this automatically so that the installation behavior matches people's intuitive understanding of how you satisfy a peer dependency?

@zkochan
Copy link
Member

zkochan commented Mar 5, 2022

Yes, it is correct to use the injected feature in this case.

@octogonz
Copy link
Member Author

octogonz commented Mar 5, 2022

Hmmm... this worked in my test project (from the repro steps), but it's not working in our real repo.

The pnpm-issue-4407/test-project/node_modules/react folder is symlink pointing to the physical pnpm-issue-4407/react/ folder. 👍

Whereas when I use the real NPM packages, test-project\node_modules\@hbo\hbomax-core is a symlink pointing to C:\MyRepo\node_modules\.pnpm\local+C++MyRepo+hbomax-core\node_modules\@hbo\hbomax-core\ which contains a copy of the hbomax-core workspace project folder. Because it's a copy, it is missing the build outputs after the hbomax-core project has been compiled. 👎

I don't understand why "inject": true is producing different results in these two setups. Unfortunately I don't have time right now to narrow down the difference. (The packages are internal to our company, but if it helps I could privately share the generated pnpm-lock.yaml outputs.)

@zkochan
Copy link
Member

zkochan commented Mar 5, 2022

Because it's a copy, it is missing the build outputs after the hbomax-core project has been compiled.

pnpm syncs the contents of injected dependencies after running their "prepare" or "postinstall" scripts.

@octogonz
Copy link
Member Author

octogonz commented Mar 7, 2022

@zkochan I am confused.

Let's go back to my original example:

A --depends-on--> B --peer-depends-on--> C

A and C are workspace projects, whereas B is installed from the registry.

If A satisfies the peer by depending on "C": "workspace:*", then it seems that B's installation needs a direct symlink to the C:\MyRepo\C\ project folder. That way, whenever I recompile C, the latest C:\MyRepo\C\lib\*.js outputs should be available to B.

pnpm syncs the contents of injected dependencies after running their "prepare" or "postinstall" scripts.

The prepare and postinstall events happen during publishing/installation. Whereas the relevant event here is npm run build for project C. This seems to require a direct symlink. I don't see how it can work with an installation method that copies the C folder.

@octogonz
Copy link
Member Author

octogonz commented Mar 7, 2022

I was able to sort of solve it by rewriting the peerDependencies to be regular dependencies:

.pnpmfile.cjs

function readPackage(pkg, context) {
  for (let workspaceProject of [
    "@hbo/hbomax-core",
    "@hbo/hbomax-localization-policy",
    "@hbo/hbomax-localization-policy-config",
  ]) {
    if (pkg.peerDependencies && pkg.peerDependencies[workspaceProject]) {
      console.log("Rewrite " + pkg.name + ": " + workspaceProject);

      delete pkg.peerDependencies[workspaceProject];
      if (!pkg.dependencies) {
        pkg.dependencies = {};
      }
      pkg.dependencies[workspaceProject] = "workspace:*";
    }
  }
  return pkg;
}

module.exports = { hooks: { readPackage } };

So whereas the old inject: true approach produced file: relationships:

  /@hbo/hurley-global-policies/4.2.0_76e775d50468c213fc9eed97bf9a4978:
    peerDependencies:
      '@hbo/hbomax-core': ^4.0.0
      '@hbo/hbomax-localization-policy': ^2.0.0
      '@hbo/hbomax-localization-policy-config': ^1.0.0
    dependencies:
      '@hbo/hbomax-core': file:hbomax-core
      '@hbo/hbomax-localization-policy': file:hbomax-localization-policy
      '@hbo/hbomax-localization-policy-config': file:hbomax-localization-policy-config
    transitivePeerDependencies:
      - supports-color
    dev: false

...the .pnpmfile.cjs hack instead produces link: relationships:

  /@hbo/hurley-global-policies/4.2.0:
    dependencies:
      '@hbo/hbomax-core': link:hbomax-core
      '@hbo/hbomax-localization-policy': link:hbomax-localization-policy
      '@hbo/hbomax-localization-policy-config': link:hbomax-localization-policy-config
    transitivePeerDependencies:
      - supports-color
    dev: false

Maybe what I'm proposing is this:

When a peer dependency is satisfied using a "workspace:*" dependency, it should be installed using a direct link: to the workspace folder.

@dmichon-msft
Copy link
Contributor

I believe the best solution here would be to have some pnpm sync-injected or other command that, when executed, "publishes" the specified package (ideally mimicking exactly the output of a proper publish) into the node_modules folder at all locations where it is "injected". Then ensuring it is synced at the right stage of the build is simply the duty of the build orchestrator (Rush in this case) to run the command between building it and building projects that depend on it.

This would provide a general solution for multiple classes of problems:

  • External dependencies that declare peer dependencies on packages in the workspace (this issue)
  • Referring to a local workspace package with differing peer dependency configurations (what "injected: true" was made for, to the best of my knowledge)
  • Ensuring that local workspace projects can't access files from dependencies that they wouldn't have if they lived in an external repository (this problem comes up a lot and is my interest in going through the publish machinery as opposed to just cloning everything in the folder)

This also makes monorepo builds easily distributable because building in the monorepo looks no different than building a single package in isolation with a bunch of tarballs for its dependencies.

@octogonz octogonz changed the title A dependency on a workspace project fails to satisfy a peer dependency Injected dependencies are not recopied after a workspace project is rebuilt Nov 24, 2023
@octogonz
Copy link
Member Author

Update: We have put together a solution for a pnpm-sync command that can solve the problem of updating injected dependencies during the build. In summary, this command would be invoked whenever a workspace project is rebuilt, and it will copy the build outputs into the node_modules folder to sync up the injected installations.

@g-chao has implemented a proof-of-concept prototype here:

https://github.com/g-chao/pnpm-sync-prototype

The mechanics are a bit tricky to ensure the command is run at the right times (including the watch mode scenario). Our plan is to implement it first for Rush workspaces, since Rush imposes stricter constraints about how pnpm install is performed. Then if successful, we will propose to contribute this back as an integrated feature of PNPM.

Please share questions/suggestions about by commenting in this GitHub issue. Thanks!

@chengcyber @dmichon-msft @iclanton @zkochan

@octogonz
Copy link
Member Author

Update: The pnpm-sync project was moved to a new repository and is no longer a prototype:

https://github.com/tiktok/pnpm-sync

@runspired
Copy link

#6135 tried to add a sync to pnpm

related project: https://github.com/NullVoxPopuli/pnpm-sync-dependencies-meta-injected

@NullVoxPopuli
Copy link
Contributor

@octogonz did you try pnpm-sync-dependencies-meta-injected? 😅

Wondering what feature differences there could be?

@octogonz
Copy link
Member Author

octogonz commented Apr 25, 2024

did you try pnpm-sync-dependencies-meta-injected?

No, we were not aware of your project. Thanks for sharing this!

Wondering what feature differences there could be

Looking at the two projects, the differences seem to be:

  • pnpm-sync includes an API that is used to integrate with Rush, so commands like rush build and rushx build will automatically perform copying in a large monorepo, without having to explicitly add syncing steps to the affected projects

  • pnpm-sync fully supports Rush "subspaces", which allows a single PNPM workspace to install from multiple lockfiles. (We use it in a monorepo with hundreds of lockfiles and >1,000 projects.)

  • pnpm-sync currently performs copying after the library is built ("postbuild"), rather than before the consuming app is built ("prebuild"). The implementation works by preparing a JSON file describing the paths to be copied, enabling optimized incremental copying. This also avoids needing to detect somehow whether the library's output has changed, and also avoids the need for file locking to prevent race conditions. But there are some advantages to "prebuild" approach (noted in the RFC), so we may support that model as well.

  • pnpm-sync-dependencies-meta-injected seems to have a watch mode feature, which is not implemented yet for pnpm-sync

@octogonz
Copy link
Member Author

octogonz commented May 9, 2024

We've published some docs for how & why to use pnpm-sync with Rush:

https://rushjs.io/pages/advanced/injected_deps/

This summer we're rolling out this feature in TikTok's huge frontend monorepo. If it goes well, we'd like to propose to integrate the pnpm-sync functionality directly into official PNPM. @NullVoxPopuli maybe we can join forces and incorporate your work as well.

@NullVoxPopuli
Copy link
Contributor

Maybe. -- some question:

why have a .pnpm-sync file?

All knowledge is available in the package.jsons and nothing needs to be configable.
The whole implementation of pnpm-sync-dependencies-meta-injected is < 300 lines

_why not use pnpm's own utilities? _

For example, here: https://github.com/NullVoxPopuli/pnpm-sync-dependencies-meta-injected/blob/main/cli/src/index.js#L6

pnpm-sync currently performs copying after the library is built ("postbuild")

isn't this more a concern of your wrapper/monorepo tooling, rather than pnpm-sync itself? like you could do this in a rollup closeBundle, for example -- could do the same with pnpm-sync-dependencie-meta-injected

which allows a single PNPM workspace to install from multiple lockfiles.

Is this different from this config option: https://pnpm.io/npmrc#shared-workspace-lockfile
(which you could set to false to get separate lockfiles.

Additionally, if using resolve + manypkg, you get the correct paths to the packages within the monorepo regardless of how it's set up.

includes an API that is used to integrate with Rush,

what's that look like? is it different from just running the CLI? 😅

Thanks! sorry for ignorance!

@octogonz
Copy link
Member Author

Adding pnpm-sync and pnpm-sync-dependencies-meta-injected to the website docs: pnpm/pnpm.io#532

@octogonz
Copy link
Member Author

Maybe. -- some question:

Great questions! 🙂

why have a .pnpm-sync file?

All knowledge is available in the package.jsons and nothing needs to be configable.

This JSON file is needed because pnpm-sync copies after the injected library is rebuilt ("postbuild" strategy), rather than before each consuming project gets rebuilt ("prebuild" strategy). As mentioned above, there are tradeoffs for the two approaches:

  • prebuild: pnpm-sync-dependencies-meta-injected uses the prebuild approach, which is simpler to implement -- you need a mutex, but otherwise you don't need to consider other consumers at all, and you don't need a JSON file. However, prebuild has a downside of having to detect whether the library has changed, and this check must be performed in every possible case where a consumer might get rebuilt. It is potentially a lot of checks during the build, which can be slow if checking is not highly optimized. It also neglects cases that don't involve any build action, for example VS Code IntelliSense will be incorrect if syncing is not performed.

  • postbuild: The postbuild approach avoids these complexities, because we know exactly when the library gets rebuilt, and all the work can be performed at that moment. But postbuild has its own downsides such as preparing .pnpm-sync.json as you observed, and in particular how to handle partial installs.

My opinion is that their advantages depend on your situation, and ideally the tool should support both models.

The whole implementation of pnpm-sync-dependencies-meta-injected is < 300 lines

We are applying this in a large monorepo with potentially hundreds of projects getting injected, so our requirements are perhaps more complex.

_why not use pnpm's own utilities? _

For example, here: https://github.com/NullVoxPopuli/pnpm-sync-dependencies-meta-injected/blob/main/cli/src/index.js#L6

It's a good suggestion, as the goal is to ultimately contribute this to official PNPM. The current approach tried to minimize NPM dependencies to make it more attractive for RushJS to accept as a dependency.

pnpm-sync currently performs copying after the library is built ("postbuild")

isn't this more a concern of your wrapper/monorepo tooling, rather than pnpm-sync itself? like you could do this in a rollup closeBundle, for example -- could do the same with pnpm-sync-dependencie-meta-injected

No. See my explanation above. For serious use cases, the crucial requirements for this feature are something like:

  1. Providing a way to sync that is not too confusing or cumbersome for engineers. When your monorepo has hundreds or thousands of projects, it's not realistic for engineers to keep track of injected: true settings. Ideally, the build orchestration should automatically sync without engineers having to add anything to their package.json scripts.
  2. Syncing as fast as possible, to avoid slowing down install/build too much.
  3. No edge cases that forgot to sync. If the design leads to accidents where syncing doesn't occur in certain situations, that can produce bugs that are very difficult to reproduce. It can be a frustrating experience for engineers, and waste time of our on call support.

So I think these considerations are central to the design of the tool, not a usage detail.

which allows a single PNPM workspace to install from multiple lockfiles.

Is this different from this config option: https://pnpm.io/npmrc#shared-workspace-lockfile (which you could set to false to get separate lockfiles.

Suppose we have 1,000 packages in our monorepo:

  • With shared-workspace-lockfile=false there would be 1,000 lockfiles, which is extremely inefficient. We tried this actually, and it can take hours to complete pnpm install for the entire workspace.
  • With shared-workspace-lockfile=true there would be exactly 1 centralized lockfile, which is much more efficient, but can be very complicated for people to manage. Modifying package.json requires retesting hundreds of unrelated projects belonging to other teams.
  • With Rush subspaces, you might define 5 or 10 subspaces, and then group PNPM packages into subspaces. Each subspace gets its own lockfile, but as with shared-workspace-lockfile=false you can still use workspace: to symlink between them.

includes an API that is used to integrate with Rush,

what's that look like? is it different from just running the CLI? 😅

The API report is here: pnpm-sync-lib/etc/pnpm-sync-lib.api.md

For commands like rush build (Rush's equivalent of PNPM --recursive), the API is invoked by PnpmSyncCopyOperationPlugin.ts.

For rushx (Rush's equivalent of npm run or pnpm run), the API is invoked by RushXCommandLine.ts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Priority
Development

No branches or pull requests

5 participants