Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Templates — Reusable packages to share dependencies and configuration #3

Closed
wants to merge 13 commits into from

Conversation

gluxon
Copy link
Member

@gluxon gluxon commented Mar 6, 2023

This is a proposal of a templating system for package.json files. The RFC does not propose a full templating system where any package.json can be referenced to populate any other portions. Templating is more limited specific functionality requested within the community.

Link to rendered text

Originally proposed by @zkochan in pnpm/pnpm#2713 (comment).

Previous discussions:

Copy link
Member Author

@gluxon gluxon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zkochan There are a few differences in this RFC from what we discussed previously. I think the differences are helpful, but please feel free to disagree if that's not the case.


## Unresolved Questions and Bikeshedding

- In prior discussions, this feature was referred to as "_Environments_". The initial draft proposes "_Templates_" to make it more clear that this feature is simply a `package.json` authoring mechanism. Templates/environments are not themselves installed.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proposal here ended up a bit different than the "environments" concept in Bit: https://github.com/teambit/envs

To achieve component isolation, Bit provides the extension with the Capsule APIs to create a separate development environment, detached from the original workspace and codebase. It then runs all operations (build, test, render, etc.) on the isolated environment to provide feedback.

While Bit environments tend to be fully featured and installed development environments, the goals pnpm needs to be can be accomplished by a simple templating system that only affects the package.json file.

I'm not too partial to the existing "Templates" naming. We can rename back to "Environments" or a different name if that makes more sense.

text/0003-templates.md Outdated Show resolved Hide resolved

A package can reference the template to populate different portions of its definition. The specific feature is determined by `pnpm.templates`: `extends`, `toolkit`, and `catalog`.

### Extends
Copy link
Member Author

@gluxon gluxon Mar 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually not sure if we need extends. This is definitely the most complicated part, and it's possible toolkit and catalog are sufficient in the initial version.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I care mostly about is the possibility to share fields like: overrides, patchedDependencies, packageExtensions, peerDependencyRules.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my concerns are centered most around "scripts" and not having to use mrm or such to keep those up to date across all packages/projects. that's been the largest motivator for looking at make again, and tools such as https://moonrepo.dev/. In a basic product monorepo we'd have different base scripts for services, apps, and packages.

text/0003-templates.md Outdated Show resolved Hide resolved
@gluxon gluxon marked this pull request as ready for review March 6, 2023 07:31

### Lockfile

No explicit changes will be made to the `pnpm-lock.yaml` file. Any `importers` entries referencing a template will have their rendered result saved to the lockfile.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't save the template versions to the lockfile, we'll have to fetch all template metadata files in order to verify that the lockfile is up-to-date. Which might be OK. I don't know.

But also, currently we store the integrity checksum of every package. Should store the integrity checksums of the template packages? What if someone hijacks a template package?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't save the template versions to the lockfile, we'll have to fetch all template metadata files in order to verify that the lockfile is up-to-date. Which might be OK. I don't know.

I'm also a bit unsure. My thought was that the template will be saved to the pnpm content addressable store. (I'm realizing I should have elaborated on that in the RFC.) From there, loading the template subsequent times becomes a disk block I/O operation, much like reading part of pnpm-lock.yaml would.

But also, currently we store the integrity checksum of every package. Should store the integrity checksums of the template packages? What if someone hijacks a template package?

This is a great point. We should absolutely store the integrity checksums of the template packages. Thank you for the catch. Will update the RFC.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit cb35007 now describes a new templates block.

text/0003-templates.md Outdated Show resolved Hide resolved
@zkochan
Copy link
Member

zkochan commented Mar 6, 2023

Let's consider the following scenario. I want to share a patch to a package. How would I do it? I guess I can extend the patchedDependencies field but how would I share the patchfile itself?


A package can reference the template to populate different portions of its definition. The specific feature is determined by `pnpm.templates`: `extends`, `toolkit`, and `catalog`.

### Extends
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I care mostly about is the possibility to share fields like: overrides, patchedDependencies, packageExtensions, peerDependencyRules.

text/0003-templates.md Outdated Show resolved Hide resolved
text/0003-templates.md Outdated Show resolved Hide resolved
```

We expect most monorepos to use this feature to keep dependency specifiers consistent between different in-repo packages.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add more details to the section about catalogs. If you remember, we discussed how only versions from dependencies and peerDependencies of the template will be used: https://github.com/orgs/pnpm/discussions/5974#discussioncomment-4783001

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack! Will do. Agree this is important to solidify into the RFC.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some details to catalogs here: 24c69db

I reread the initial discussion linked 2 comments above and I'm actually a bit curious for whether we want to keep the "peerDependencies can only catalog peerDependencies restriction". I'm not too opinionated and could go either way.

Just wondering if users might be surprised by the need to copy a specifier in dependencies to peerDependencies when they try to use it in the later. How would we explain that restriction to someone asking in the future?

text/0003-templates.md Outdated Show resolved Hide resolved
@zkochan
Copy link
Member

zkochan commented Mar 7, 2023

Let's also describe how this would work in a workspace. Will the projects in the workspace "inherit" templates from the root package.json? So would you be able to use one template for the whole monorepo?

Comment on lines +188 to +189
- Complex circular resolution during installation is avoided.
- Template loading is more performant. The requirement to declare all required templates up front enables fetching in a single parallelized step, rather than a waterfall fetch at each level of a theoretical template inheritance hierarchy.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these are issues. We will only allow to reference templates by exact versions. So when a template inherits another template, we will actually merge the parent template into the current one and publish the current template as a single independent template.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree most of the problems in this list go away if the parent template is rendered/merged into the current one before publishing.

  • In that world, should we still allow single inheritance only, or introduce multiple inheritance to make it easier to reference templates from different authors? I was following the TypeScript 5.0 changes and noticed tsconfig.json files will allow multiple extends entries.
  • The toolkit templates are difficult to model with inheritance since the render step moves dependencies to devDependencies. It's possible we don't want that to happen in a template. But that introduces complexity since the render/merge step now depends on whether the current package is itself a template.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have objections to support multiple inheritance.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will only allow to reference templates by exact versions.

I would likely want to use this as my-template@workspace:* instead of a specific version, within a workspace.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if the template will be inside the workspace, then it is fine to allow this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had imagined templates being usable through my-template@workspace:* as well. Will explicitly mention that in the RFC after a few more discussions resolve.

@gluxon
Copy link
Member Author

gluxon commented Mar 7, 2023

Let's consider the following scenario. I want to share a patch to a package. How would I do it? I guess I can extend the patchedDependencies field but how would I share the patchfile itself?

This is a great question I hadn't thought about. I was hoping pnpm could only fetch the manifest metadata from the registry, but perhaps we do want to fetch the entire package and link it into the CAFS. Will think more about this.

Let's also describe how this would work in a workspace. Will the projects in the workspace "inherit" templates from the root package.json? So would you be able to use one template for the whole monorepo?

Oh I see. I was imagining projects in the workspace would not "inherit" templates from the root package.json, but I do see the utility of that now that you've mentioned it. It would be nice not to specify pnpm.templates.catalog everywhere.

We should think about something like this. I think one challenge will be keeping the semantics between inheriting a pnpm.templates config from the workspace root and pnpm.templates.extends clear.

@zkochan
Copy link
Member

zkochan commented Mar 7, 2023

Also, another potentially confusing thing. Currently, most of the fields from the "pnpm" section of package.json only work in the root package.json. I mean the likes of overrides, packageExtensions, patchedDependencies, etc. So if someone will add an extension to a non-root workspace project, and the extension will have one of those fields, those fields will be ignored. This will definitely cause confusion and we need to either throw an error in this case or print a warning.

@gluxon
Copy link
Member Author

gluxon commented Mar 12, 2023

Before iterating more, I could use help deciding between 2 different directions the RFC can go in.

The extends mechanism is quite powerful and can support a lot of behavior we've been discussing. For example, extends can be used to share pnpm settings (e.g. pnpm.packageExtensions, pnpm.overrides), authoring fields (e.g. author, repository), and replace template mechanisms like toolkit.

However, I'm concerned that it might be too powerful and introduce a new form of security vulnerability on the ecosystem. For example, a package might depend on an external toolkit-like package through extends. Suppose the dependent package intentionally omits its license field — this indicates that the author retains all copyright permissions. A malicious template might then add the license field and push it to all consumers in an update. While private/public accessibility modifiers #3 (comment) allow producers to restrict what fields are inherited, we'd need a filter on the consuming end to protect against this.

On a team, the problematic workflow might look like this:

  1. Contributor A updates all package.json files to extend suspicious-template@1.0.0.
  2. Contributor A audits the rendered package.json files, and doesn't see anything suspicious.
  3. Some time later, contributor B notices suspicious-template has an update and extends suspicious-template@1.0.1.
  4. The pull request from contributor B is approved and merged.

Another field a malicious template could add is scripts.postinstall, which may allow bypassing onlyBuiltDependencies/neverBuiltDependencies settings. That's a stretch today, but would be more concerning if onlyBuiltDependencies were to ever support wildcards.

I think this leaves a few options:

  1. Ignore this problem and expect users of extends to audit each update to a template they're inheriting.
  2. Allow packages using extends to filter what fields they're expecting to inherit.
  3. Eliminate extends and provide more templating options for different use cases. This means keeping around toolkit, and adding new templates like scripts, authoring, pnpmSettings, etc.
  4. Introduce more templating options (e.g. catalog, scripts, authoring, pnpmSettings) for users that want more safety, but keep around extends for users that accept the risks.

I personally wouldn't trust any package asking for me to add it to the extends list. Even if it came from a trusted author (e.g. Vercel), I'd be wary of that template getting compromised. I'm leaning towards option 3, which replaces extends in favor of more controlled templating options (option 3) that map to a specific purpose.

This means whenever pnpm supports new package.json fields, we'll need to be thoughtful about what templating option it fits into. This can be a benefit since it keeps pnpm maintainers mindful about options that may need special behavior to work in a template. For example, I forgot that patchedDependencies references a patch file on disk that would need to be packaged before you helpfully reminded me.

@zkochan
Copy link
Member

zkochan commented Mar 12, 2023

It is not more dangerous than depending on other open source dependencies. It is actually less dangerous because the template dependency will have no dependencies of its own, it will be used only locally, and it will be required only by exact versions.

We can list the fields that should be inherited but it won't be completely secure either. For instance, if we list devDependencies as inherited, in a new version the template may include a new devDependency that wasn't present in the past. If we include overrides, then in a new version the template may add an override that will replace a package with a completely different package. Same with inherited scripts. But nevertheless, I am fine if we add a list of allowed fields or an option to inherit all. By default we can inherit nothing.

Eliminate extends and provide more templating options for different use cases. This means keeping around toolkit, and adding new templates like scripts, authoring, pnpmSettings, etc.

How is this different than providing an allowlist for the extend field?

cc @pnpm/collaborators

@gluxon
Copy link
Member Author

gluxon commented Mar 13, 2023

It is not more dangerous than depending on other open source dependencies. It is actually less dangerous because the template dependency will have no dependencies of its own, it will be used only locally, and it will be required only by exact versions.

We can list the fields that should be inherited but it won't be completely secure either. For instance, if we list devDependencies as inherited, in a new version the template may include a new devDependency that wasn't present in the past. If we include overrides, then in a new version the template may add an override that will replace a package with a completely different package. Same with inherited scripts.

I agree templates are safer in some ways. My comment was around the extends mechanism making it possible for security incidents in different ways that simply adding a dependency or dev dependency does not allow.

  1. Arbitrary code execution is easier: For example, a monorepo might set onlyBuiltDependencies to ["canvas"] thinking developers are now safe from arbitrary code execution on pnpm install. However, as you mentioned, overrides could be used to replace canvas with another dependency that has a malicious postinstall script. Alternatively patchedDependencies can allow code execution by modifying bins like typescript or eslint itself. This was a lot harder in the past because bin files users were already calling had to import the malicious code in a Node.js environment.
  2. Modifying the consuming packages's metadata: The license example previously mentioned is not possible today. The funding field might be another option malicious templates would attempt to override.

But nevertheless, I am fine if we add a list of allowed fields or an option to inherit all. By default we can inherit nothing.

My security concerns are gone if we inherit nothing by default. 👍

Eliminate extends and provide more templating options for different use cases. This means keeping around toolkit, and adding new templates like scripts, authoring, pnpmSettings, etc.

How is this different than providing an allowlist for the extend field?

It's very similar. I think the differences are a matter of personal taste rather than technical. I'm comfortable moving forward with the allowlist option, but wanted to avoid that by grouping templates into different purposes:

  • catalog — templates that share dependency specifiers
  • toolkit — templates that share devDependencies
  • authoring — templates that share author, license, funding, repository
  • scripts — templates that share scripts
  • compatibility — templates that can set pnpm.packageExtensions. (Alternative to the compatibility DB.)
  • patches — templates that can set pnpm.packageExtensions, pnpm.overrides, pnpm.patchedDependencies, pnpm.peerDependencyRules, etc.

This is simply a different alternative that avoids having to add public/private modifiers in the template package and an allowlist for extends in the consuming package. We could probably think of this as a preset list for public and extends allowlist fields.

It also makes it more clear when creating a template what purpose it's meant for. I could see confusion after users forget to update their allowlist when updating a template. In the grouped templates model, users opt into the types of risky changes they're willing to accept rather than specific package.json fields. For example, risky dependency graph changes come with compatibility, and extra risky full blown file system editing comes with patches/scripts.

@zkochan
Copy link
Member

zkochan commented Mar 13, 2023

This is simply a different alternative that avoids having to add public/private modifiers in the template package and an allowlist for extends in the consuming package.

On one hand, this provides better user experience to the consumer, which won't have to list the allowed fields. On the other hand, it forces template authors to create more templates. What if there are multiple fields that should be changed by a template to fix an issue? For instance, a single template needs to add a new dev dep and a new script that runs the dev dep.

Actually, now that I gave it a thought, if we choose a generic template, I think it should be possible for the generic template to specify "required" fields. So if you use that template you "must" allow a given list of fields. Otherwise, there's a chance that the template will not work as expected.

@shellscape
Copy link

The afore mentioned security concerns really only matter for registry-based template packages. That doesn't really apply to templates located within workspace packages.

One thing I didn't see mentioned (please correct me if wrong) - How many templates can a single package contain, and how are they loaded from the package? Consider this ESLint config: https://github.com/shellscape/dot/blob/b850d274cafaecd5df368e37f0d3238bd1ca4f20/.eslintrc.js#L2

It extends my own config from eslint-config-shellscape in addition to an export typescript. This is a relatively straightforward method of specifying the config to use, but relies on resolution of some kind there.

@kenrick95
Copy link
Member

The afore mentioned security concerns really only matter for registry-based template packages. That doesn't really apply to templates located within workspace packages.

In that case, since the mentioned use cases are in a workspace scenario, could templates be restricted to only be declared as a workspace package (i.e. a package.json can only extend from a template declared in the same workspace)? This would help to mitigate security concern I guess

@zkochan
Copy link
Member

zkochan commented Mar 14, 2023

@kenrick95 you mean limiting templates to local templates only? That would make templates a lot less useful.

@gluxon
Copy link
Member Author

gluxon commented Mar 14, 2023

The afore mentioned security concerns really only matter for registry-based template packages. That doesn't really apply to templates located within workspace packages.

That's right. Something that hadn't been stated yet was that the allowlist of fields to inherit from a template isn't necessary in an entirely local workspace. We might want to discuss more, but we could reasonably default that to allow every field in that case.

One thing I didn't see mentioned (please correct me if wrong) - How many templates can a single package contain, and how are they loaded from the package?

Open to suggestions, but the current idea would be to have 1 template per package. The template would be the package's package.json itself. There was an open question of whether pnpm would be able to simply use the NPM registry's https://registry.npmjs.org/<name>/<version> endpoint for fast resolution of the template, or if pnpm needs to download and extract the package's archive to grab the package.json within.

@zkochan
Copy link
Member

zkochan commented Mar 14, 2023

There was an open question of whether pnpm would be able to simply use the NPM registry's https://registry.npmjs.org// endpoint for fast resolution of the template, or if pnpm needs to download and extract the package's archive to grab the package.json within.

It depends on whether we want to ship additional stuff with the template. Like I mentioned patch files. Overall, we can use the package.json from that endpoint but we'll have to request the full metadata, not the abbreviated one.

@gluxon
Copy link
Member Author

gluxon commented Mar 14, 2023

On one hand, this provides better user experience to the consumer, which won't have to list the allowed fields. On the other hand, it forces template authors to create more templates.

It's probably ideal for authors to create more templates if they're intended for different purposes? I would worry about the other direction, where authors create overloaded templates that are used for multiple purposes authoring, toolkit, catalog, etc. The large template would become more difficult to reason about over time.

What if there are multiple fields that should be changed by a template to fix an issue? For instance, a single template needs to add a new dev dep and a new script that runs the dev dep.

I was hoping we could enumerate the full set of package.json fields that would make sense to group into a template, but this may not be possible. I hadn't thought the case where you'd want to add a new scripts entry after adding a new devDependencies entry. Good catch.

@zkochan
Copy link
Member

zkochan commented Mar 15, 2023

It's probably ideal for authors to create more templates if they're intended for different purposes?

nothing stops them doing that even with generic templates. But the learning curve is smaller and less work for us on designing different template types.

@zkochan zkochan mentioned this pull request Mar 26, 2023
1 task
@zkochan
Copy link
Member

zkochan commented Apr 1, 2023

I've got kind of a crazy idea. Doesn't a template with an override basically replace what peer dependencies are for? The only difference is that an override doesn't print a warning currently, when it overrides a range to an incompatible other range or version. But unlike peer dependencies, an override will never cause the same package to be duplicated twice in the dependency graph. So it is even a better approach?


Edit: actually, it is not even related to templates. It is just related to overrides.

@rafaelliu
Copy link

rafaelliu commented Apr 1, 2023

I really like that templates are resolved on-the-fly, unlike peer deps or pnpm hooks, but I worry how this would play with all other toolings. IDEs pick up dependencies to enable features (e.g mocha) and libraries too.

Could we use a meta-manisfest instead? Something like ‘package.meta.json’ same as the “in-memory” version. Whenever pnpm uses package manifest it reifies it into a ‘package.json’ with the “on-disk” version?

Only meta file would need to be version-controlled but there would be no harm in adding package.json - in fact it would make it more straightforward fornon-pnpm users to use it. Since pnpm would read package.json during pck/publish, triggering a reification, it would also just work. And reified file could also act as a cache.

There could also be a CLI command to trigger it on-demand for custom builds, work out env incompatibilities, etc

@zkochan
Copy link
Member

zkochan commented Apr 1, 2023

Actually my idea doesn't make sense because overrides will not work in a workspace, where different projects may have different versions of the peer dependencies installed.

@gluxon
Copy link
Member Author

gluxon commented Apr 1, 2023

I really like that templates are resolved on-the-fly, unlike peer deps or pnpm hooks, but I worry how this would play with all other toolings. IDEs pick up dependencies to enable features (e.g mocha) and libraries too.

This was one strong reason the catalog templates still require dependencies to still be explicitly defined in the dependencies/devDependencies blocks. It allows IDEs to continue reading package.json files and know that a specific dependency is expected in node_modules.

You're right that dependencies defined through the inheritance/extends based templates would have difficulty interacting with IDEs that read package.json.

Could we use a meta-manisfest instead? Something like ‘package.meta.json’ same as the “in-memory” version. Whenever pnpm uses package manifest it reifies it into a ‘package.json’ with the “on-disk” version?

There's definitely merit to this idea. A few questions:

  • I could see users not realizing package.meta.json is set up in a project and edit package.json directly. They would then be confused why their changes keep getting reset. Unfortunately it's not possible to comment in package.json, so we wouldn't be able to add a "This file is generated from package.meta.json warning". Having users learn about package.meta.json after a potentially frustrating surprise isn't the best introduction to pnpm templates.
  • If users were to edit non-package manager related fields like eslintConfig, a copy step to move that from package.meta.json to package.json would be necessary. This isn't the end of the world, but the existing RFC still allows package.json files to be directly modified without running commands to keep other files in sync.
  • There's definitely a case for external tooling needing to read dependencies, but I think the declaring them as catalog: versions solves that part? Are there other fields in package.json that IDEs commonly pick up?

I like the idea because it's the academically correct approach that allows full compatibility with existing tooling, but I think we're willing to make some compatibility tradeoffs for package.json authoring usability.

@gluxon
Copy link
Member Author

gluxon commented Apr 1, 2023

Actually my idea doesn't make sense because overrides will not work in a workspace, where different projects may have different versions of the peer dependencies installed.

I did want to comment that this is an awesome feature of pnpm. My team is in the middle of migrating a large monorepo with 4 million lines of TypeScript from Webpack 4 to Webpack 5, and this would be impossible to do incrementally without pnpm's peer dependencies resolution.

@gluxon
Copy link
Member Author

gluxon commented Apr 1, 2023

It's probably ideal for authors to create more templates if they're intended for different purposes?

nothing stops them doing that even with generic templates. But the learning curve is smaller and less work for us on designing different template types.

Right, it's still possible for developers to split templates that are intended for different purposes correctly with the generic option. If our solution shape can provide guardrails that limit developers into best practices out of the box, I'd prefer that.

If I can analogize this, I think this is similar to how it's possible to write goto logic that's maintainable, but it's too easy to write complicated spaghetti code such that modern programming languages prevent this footgun in the first place by disallowing it. goto lets developers do too much, and I have a similar concern with extends.


I think the choice of multiple limited template options or a singular powerful extends option (generic templates) comes down to whether we're okay with the solution surface area covering less than the initial problem surface area,

less

As opposed to more.

more

If we were to add a solution that covers more than the surface area of the problems (extends), we risk having to maintain functionality that's not completely necessary. It's difficult to remove features or functionality once it's been added. On the other hand, we can always grow the solution's surface area later by adding additional templating types as use cases arise.

Given that, my current personal leaning is to remove extends from the initial version of the RFC and evaluate if it's necessary later down the line as users provide feedback.

The risk of this direction is if we introduce authoring, scripts, or patches template options but realize later that extends is still necessary. In that world, I don't think it'll be a waste that we added separate authoring, scripts, and patches mechanisms in addition to extends. These more limited options would still be valuable for users that prefer not to give a single template full permission to their package.json and deal with allowlists.

What if there are multiple fields that should be changed by a template to fix an issue? For instance, a single template needs to add a new dev dep and a new script that runs the dev dep.

Multiple fields that need to be changed to fix an issue is still the case that is hard to account for without a generic extends-based template option.

For this specific example, I would bet that this won't be a common problem? I'd expect most developers to template scripts from a local template since these often end up being very specific to the monorepo.


All that said, I want to recognize that I'm starting to repeat points made in earlier comments. I've been thinking about this for 3 weeks and still prefer multiple limited templates since that's the safer option, but would want to avoid the discussion going in circles. If you have a strong preference @zkochan, I'm happy to update the RFC for it.

@zkochan
Copy link
Member

zkochan commented Apr 2, 2023

I am OK with your suggestion.

If someone else has objections, speak up.

@gluxon
Copy link
Member Author

gluxon commented Jul 5, 2023

I am not familiar with the RFC at npm. I was personally inspired by the environment components feature of Bit.

This part may have came from me. I attempted to do lots of research before writing this RFC. Given the problem of "sharing author, scripts, etc" #1 (comment), I found the parent package.json NPM RFC and linked to it in this RFC as a prior art: npm/rfcs#165

The most important part of this RFC is the catalog template and the toolkit template (which Dominik suggests to rename). These are one of the most requested features in the pnpm repository. I think overall the catalog template idea is well designed in this RFC and this is the part that I care most about.

I also think the catalog feature is the shining feature that I would advocate for in any monorepo with 20+ packages. We attempted to come up with something that solves as many use cases as possible, but I acknowledge that this RFC may have become too complicated as a result. I was hoping the different template flavors could be viewed/used independently. Open to splitting this RFC into multiple documents.

I am terse because @gluxon is the author of the RFC and I think he can explain himself but I can join if there will be a hot debate.

Thanks! I would say it's been a great team-effort/collaboration so far and would welcome your more elaborated thoughts if you do choose to jump in early at any point in the future.

@zkochan
Copy link
Member

zkochan commented Jul 6, 2023

I'm not completely sure either. @zkochan, do you see a problem with simply making toolkit templates copy from devDependencies to devDependencies?

No, I see no problem naming it devDependencies.

Regarding generating a "tooling compatible" package.json file. I understand the cons of this approach but there are some benefits too. Like for instance, we could use a different format for it. We could create a pnpm-package.json5 or a pnpm-package.yaml file. The ability to write comments in the package manifest is a frequently asked feature.

There are some cool monorepo managing tools like turborepo that will stop working if we will not have all the necessary fields inside a "static" package.json. I don't think it is realistic to expect them to calculate the package.json files using templates.

I also think the catalog feature is the shining feature that I would advocate for in any monorepo with 20+ packages. We attempted to come up with something solves as many use cases as possible, but I acknowledge that this RFC may have become too complicated as a result. I was hoping the different template flavors could be viewed/used independently. Open to splitting this RFC into multiple documents.

I think it would be a good idea to keep only the catalogs part in this RFC and move everything else to a new RFC.

@wesleytodd
Copy link

I had a great chat with @zkochan in a twitter DM. I shared a bit of context on what brought me to this issue and where I was coming from. I plan on spending some time tomorrow morning to write up a bit more about where I think this RFC can be improved. And will likely join the discord so that I can ask my questions in a more real time way so I don't just drop into a project I haven't really participated in before and disrupt things.

@gluxon
Copy link
Member Author

gluxon commented Jul 6, 2023

Responding to @wesleytodd #3 (comment), #3 (comment), #3 (comment)


Hey! I am sorry for the drop by comment, I don't use pnpm personally so don't really follow what y'all are doing here, but this one came to my attention and as someone who was involved in the linked npm rfc for parent package.json, I thought maybe my high level feedback might be helpful for y'all.

Having read RFC for parent package.json, I would find feedback from someone in that group very insightful. Thanks for dropping by! 👋

I will start by stating my understanding of the end user goal in case we are not aligned on that: enable "central management" of library and application developer workflows. Specifically this serves power users (prolific authors), folks who own many packages/apps (large projects), folks who steward large ecosystems of packages which work together, and lastly, the type of team I am on: platform teams. As someone who has worked in all of the above roles, I think this is important work because:

  1. Any feature you add for a power user is a "curb cut" for the rest of folks. Generally it will improve things for everyone, not just the target group. An example of this could be auto updating a central set of shared dev deps for projects. Enabled by this feature for the power user, but in the end improves the UX for all end users.

  2. Specialization of roles allows for better delegation of responsibilities to parties who know the most about the right things to do. For example a security team could publish overrides for teams who otherwise don't pay close attention to recommendations.

  3. A central control plane to deliver this type of change enables fast and efficient change rollouts. Efficiency of this sort means folks can focus on other important tasks.

I do think we're mostly aligned on the problems. I'm in agreement that some sort of solution here would be important. I similarly work on a "platform team" for my day to day.

I didn't share detailed thoughts on why I think it is too complicated, nor propose alternatives since it was a drive by comment. That said, if it is something you would like to hear I can write more. I just didn't want to waste your time (or mine) writing them up if you are committed to this direction and not looking for more detailed feedback.

I appreciate the thought behind this and I completely welcome your detailed feedback! I similarly would have asked for permission to give feedback before jumping into a different project's RFC discussions page. For those in the future, I think we're fairly open here at PNPM and usually prefer up-front feedback, but acknowledge that's an unknown until you've hopped in. 🙂


My feedback on this proposal is that it is a complicated way to approach a solution to the problem, and because of that negates a bunch of the goals above. An example of this is right above where @styfle asks about the naming of the fields. It might have a technical reason (@zkochan's response implies it does) but as a reader here I am still quite unclear on the reason and it just seems complicated. You will need to train all your users on this stuff, and if you get it wrong initially you end up with something confusing/insecure which is hard to fix (users being trained to delete their lockfiles as an example of the impact of getting things wrong). Overall, I really like the goals this is setting out to address but I think the RFC as it stands is not something I would want to use or ask users of my tools/products to use.

Just add extends like other tools do. You can bikeshed about how that applies to different fields, but at least people will understand it.

I'm extremely worried about introducing new classes of security vulnerabilities to the NPM ecosystem. In earlier drafts of the RFC, we had (simplified explanation) 2 flavors of templates:

  1. catalog
  2. extends

The security concerns around the now removed extends template flavor are elaborated on here: #3 (comment). I pushed hard for dividing the extends template into authoring, scripts, patches, etc in order to make it more clear what permissions the parent package.json has access to. So now we have a few different flavors each with their own "capabilities".

  1. catalog
  2. toolkit
  3. authoring
  4. scripts
  5. compatibility
  6. patches

Yes, I definitely acknowledge that's more complicated that simple inheritance. If I can try to convince folks why the delineation is valuable:

  • It much more closely overlaps with the known problems and use cases we've heard feedback around already: RFC: Templates — Reusable packages to share dependencies and configuration #3 (comment). I think it's important to orient the solutions around use cases than give a hammer that can hit any nail. I think a solution that's too general will eventually be used to hit a nail that shouldn't be hit.
  • We're less likely to introduce ecosystem problems involving external templates of one flavor suddenly growing in scope outside of what it was intended for in a way that consuming package.json files don't want or expect.
  • It aligns with the object-oriented programming principle of composition over inheritance. The different templates can be "composed" together without difficult to change inheritance hierarchies. I recognize I'm referring to a programming principle, but believe the idea maps well here.
  • The "what fields does extends apply to" problem still exists. In either case, we'll need to document what fields are copied. I'd actually rather encourage users to proactively look up what fields are in effect. I think the split variations encourage that, while I think users would try to guess with a theoretical extends flavor and be surprised/unhappy later.

I had a great chat with @zkochan in a twitter DM. I shared a bit of context on what brought me to this issue and where I was coming from.

Awesome! 🙂

I plan on spending some time tomorrow morning to write up a bit more about where I think this RFC can be improved.

It sounds like many of us are aligning towards having only this RFC be about catalogs, and move the other parts that involve inheritance/extends into a different RFC. I'd welcome feedback either way, but just a heads up that this RFC may be refactored significantly tomorrow.

Thanks for your thoughts @wesleytodd! 😁

@gluxon
Copy link
Member Author

gluxon commented Jul 6, 2023

I think it would be a good idea to keep only the catalogs part in this RFC and move everything else to a new RFC.

I'm not too opinionated on the exact logistics of how we break things up, but I'm leaning towards these actions:

  1. Move the catalog: proposal to RFC RFC: Catalogs — Shareable dependency version specifiers #1.
  2. Remove catalogs from this RFC and keep everything else.

The primary difference between catalogs in this RFC and workspace-consistent in RFC #1 is that catalogs are "shareable", which I still think is a very cool property with a lot of potential that @zkochan thought of.

@zkochan I think the actions above affects you the most as the primary reviewer. Does that sound okay?

Rationale

The actions above make most sense to me after revisiting the discussion in both RFCs.

  • The vast majority of the comments in this RFC pertain to templating for everything except catalogs. There was light catalog-specific discussion around the map/array syntax, but otherwise all concerns centered around either (1) whether we should have an extends flavor or (2) whether we should serialize package.json to disk. Those questions are still valid, and I think we should keep discussion of them going here rather than rewrite this RFC to make those discussions suddenly out of context.
  • The earlier RFC RFC: Catalogs — Shareable dependency version specifiers #1 discusses merge conflict resistance in depth, which is a big motivator for catalog:. We can add a simple note in the PR description that earlier versions of that RFC referred to workspace-consistent: instead of catalog:. I think that's fine since RFCs frequently see renames before ratification.

@zkochan
Copy link
Member

zkochan commented Jul 6, 2023

@gluxon ok, sounds good

@dominikg
Copy link

dominikg commented Jul 6, 2023

I still believe it is important to write the "static" package.json to disk.

Tools doing dependency analysis would be the ones i am most concerned about, but i am sure there are tools and workflows that parse other fields and get it wrong in subtle ways if something is missing or not aligned with what the template renders to the published version.

Having to hunt down bugs that were introduced by an opaque setup like this would quickly eat up any time savings you had and then some.

Writing the output to disk raises awareness because you don't have to run pack or another command to see the potential result, and avoids the issues with other tools.
If the main concern with it is that it increases (or rather does not reduce) the amount of merge conflicts, is there a way to tackle these differently?
pnpm has access to additional information from the lockfile and could look at both ends of the merge as well. So a pnpm resolve-merge-conflicts command would be able to do a better job than git alone.

@zkochan
Copy link
Member

zkochan commented Jul 6, 2023

I still believe it is important to write the "static" package.json to disk.

For stuff like "scripts", maybe. But for the catalog protocol, I am not so sure.

Tools doing dependency analysis would be the ones i am most concerned about

Like what tools? I don't think I use such tools at the moment, that is why I ask.

All the version information would still be available in a "static" state inside the lockfile.

@dominikg
Copy link

dominikg commented Jul 6, 2023

from personal experience: vite-plugin-svelte reads the package dependencies to autogenerate extra vite config. There are other vite plugins doing similar things. While most of these work with packages in node_modules, it would fall apart if they did fs reads to workspace: packages.

And there are a lot of vulnerability scanner/auditor/corporate snakeoil things reading package.json directly.

@darcyclarke
Copy link

Please don't use package.json as a "template" or a "consumer". package.json should be at rest the static result of any extension/templating tools (putting tooling-specific guardrails on doesn't help). Creating a net-new config/file for this type of work will also give you a lot of freedom & ensures end-users don't end up with dangerous partial states of their package.json if they didn't run x, y or z tool/command. Please also be verbose about what the file is & it's purpose (ie. pnpm-package-template.json5 vs. pnpm-package.json5 as @zkochan noted).

Lastly, & where I think I've seen the most discussion, if a template were ever going to also be a valid package.json (as it is currently spec'd), then the interface to consume/extend the values from that should be 1:1 with the names/values of the parent (aka. please don't add new terms like catalog & toolkit).

@zkochan
Copy link
Member

zkochan commented Jul 6, 2023

Lastly, & where I think I've seen the most discussion, if a template were ever going to also be a valid package.json (as it is currently spec'd), then the interface to consume/extend the values from that should be 1:1 with the names/values of the parent (aka. please don't add new terms like catalog & toolkit).

How would you get the version spec of a dependency from a "template"? The current suggestion is to use a new catalog protocol. We can't just extend the dependencies field with all the dependencies of the template as we might not use all of those dependencies. With catalog we can pick just those that the project uses. And also we can put it to the right dependencies field:

{
  "dependencies": {
    "is-odd": "catalog:",
    "is-even": "1.0.0"
  },
  "devDependencies": {
    "is-number": "catalog:"
  }
}

@wesleytodd
Copy link

wesleytodd commented Jul 6, 2023

I do think we're mostly aligned on the problems.

Awesome @gluxon! I think one of the main things I was going to recommend for the RFC doc (and really any technical proposal) is to spend a bit more time in the write up on the problem statement. It really helps folks like me, who drop in and read it to understand what you are trying to do and not just make assumptions like I did.

For those in the future, I think we're fairly open here at PNPM and usually prefer up-front feedback, but acknowledge that's an unknown until you've hopped in. 🙂

This is awesome to hear and I felt the same from @zkochan in our DM, thanks!

I'm extremely worried about introducing new classes of security vulnerabilities to the NPM ecosystem.

I share your worry. It is actually one of the reasons I fear complexity in these types of things. It is critical for security that end users understand the system enough to make good decisions. When they do not understand it, they are likely to make mistakes which reduce security overall (for example blowing away package lock files instead of understanding how to fix them). So it was not exactly that I was sure this proposal introduced any clear issues, the fact that it was a bit hard to follow made me worried end users would also not be sure how it worked.

If I can try to convince folks why the delineation is valuable

FWIW, the stuff below this is stuff that I would have loved to read in the RFC itself. I can give specific feedback on these points in a way that the original RFC doc really didn't help much with. And honestly, I see in the comment you linked that you did address the "why not extends" but it was not included in the RFC doc. That is my bad for not reading it, but it makes it hard when the comment thread is so long, so maybe adding a section of the doc with links to key thread comments would help?

Given that, my current personal leaning is to remove extends from the initial version of the RFC and evaluate if it's necessary later down the line as users provide feedback.

Seeing that you addressed this, I now understand more about what you are thinking here. I still think that the complexity of the current proposal is too high, but I understand more how we got here.

I think most of the above commentary is about the RFC description and not the technical details of the RFC, so I will split that off for now from below which is my more technical responses.


How would that work with versions? I wouldn't want to inherit all dependencies from a parent package. We only need to include dependencies that are actually used in the code and pnpm cannot automatically detect that.

I think these are two separate concerns. We have always had the "extraneous dependency" problem and the reverse "implicit dependency" problem. There are tons of tools to help deal with this. I am not saying this is not a problem, just that letting that problem block a really ergonomic feature might not be in the best interest of the users.

I think a reasonable compromise here is to say "if you are extending a package.json you get everything in there" and then maybe you could introduce a new way of explicitly opting out of a individual dependencies from there (just brainstorming "lodash": false to exclude lodash from the referenced dependency list). But honestly I think it is more simple to start without that feature and if someone doesn't want the whole thing they need a new thing to extend from.

There are some cool monorepo managing tools like turborepo that will stop working if we will not have all the necessary fields inside a "static" package.json. I don't think it is realistic to expect them to calculate the package.json files using templates.

Not only is it not realizistic, I think it is not desired. This was something that came up in the discussion in the npm rfc as well, and I think what sticks with me from that is that we need semantics which we all (tool maintainers) stick to with regards to package.json but we have a bunch of ways to keep a flat package.json file while adding features to enhance the UX/DX.

Having to hunt down bugs that were introduced by an opaque setup like this

I am fully in support of @dominikg's comments above. I think finding a middle ground where the upkeep from a project maintainer is minimized but also doesn't effect the currently nice debugging ergonomics of a static package.json should be a top priority. And that specifically also applies to which dependencies get installed (considering that y'all are discussing breaking up the catalog part of this RFC).

Tools doing dependency analysis would be the ones i am most concerned about
Like what tools?

I assume this meant things like depcheck, but I have a bunch of things that couple to the static on disk package.json and not the lock file. There are a few reasons for this, but I am not sure all of them apply to pnpm like they have in the past for other package managers. Without a bit of a deep dive here I think this is just an open question of the impact.

Creating a net-new config/file for this type of work will also give you a lot of freedom & ensures end-users don't end up with dangerous partial states of their package.json if they didn't run x, y or z tool/command.

This is for sure one good way to augment package.json, but I am not sure a new file would remove the issue of "if the didnt run x tool/command". I think that is the risk of these types of features, and it was going to be the largest hurdle in the npm rfc I think as well. I am not sure I have a 100% solid answer on how to avoid this problem, but I think anything which makes "package.json more dynamic" will hit this problem and so we as a community of tool authors should be working toward a really good solution.

@darcyclarke
Copy link

darcyclarke commented Jul 6, 2023

How would you get the version spec of a dependency from a "template"?

You'd load the configuration into the tool. No different then what will have to happen if you "extend" a package.json like what is being referenced right now; accept, creating a net-new file ensures end-users understand there's a new concept/API they are working with.

I'd imagine you could simplify the architecture/API here by just using env vars & walking/loading consumer modules which will contain whatever nuance they want.

Example

/root/.npmrc (or a JSON format)

[pnpm]
prefix="ws-"
version=...
author=...
license=...
deps=...
devDeps=...

/root/workspaces/foo/pnpm-package-template.js

const { prefix, version, author, license, deps, devDeps } = process.env.pnpm
module.exports = {
  name: `${prefix}foo`,
  version,
  author,
  license,
  "dependencies": {
    "is-odd": deps["is-odd"],
    "is-even": "1.0.0"
  },
  "devDependencies": {
    "is-number": devDeps["is-number"]
  }
}

/root/workspaces/foo/package.json

This is the static, generated output from "/root/workspaces/foo/pnpm-package-template.js"

The above is by no means prescriptive & you could change the config & consumer name/format etc..

If you were deadset on using package.json on the consumer side then you're going to have to create a whole host of extension mechanisms, which is why this RFC is so verbose today & will likely be hard to maintain the next time someone wants to extend/template any net-new prop.

@zkochan
Copy link
Member

zkochan commented Jul 6, 2023

I think I like the solution suggested by Darcy. It is definitely better than extending everything and then excluding what is not needed. As we might end up excluding dozens of dependencies.

@zkochan
Copy link
Member

zkochan commented Jul 6, 2023

We already have the .pnpmfile.cjs file, which has a few supported hooks. This could be a new hook in .pnpmfile.cjs. We'd have to have a preinstallation step which would install any dependencies required from .pnpmfile.cjs (this is a feature that has already been requested).

However, if this will be a js file, I am not sure how the pnpm add, pnpm remove, pnpm update commands will update the dependencies inside the js file.


EDIT

Wait a moment. This now starts to look the same as a tool that I have already created: https://github.com/pnpm/meta-updater

That tool works because it also uses info from package.json. So it reads the package.json, makes some changes to it and writes it back.

@darcyclarke
Copy link

@zkochan I don't think you can get around the fact that pnpm will have to understand when it is modifying the deps or config of a templated/extended package/workspace. Any modification to the root package.json or config should likely trigger a walk to update the child workspaces. Running a command to update a specific workspace's deps will need to resolve where the dep/config should live & whether it's best to be nested or extended from root config & package.json regenerated. It does look like some of your work on meta-updater is relevant here.

@gluxon
Copy link
Member Author

gluxon commented Jul 7, 2023

Following up to #3 (comment), the "Catalogs" concept has now been moved back to #1.

I'm appreciative of the recent engagement from @dominikg, @wesleytodd, @styfle, and @darcyclarke. If any of you have time, I would love all of your continued feedback on catalogs over in RFC #1.

It seems like we're aligned from the PNPM side that catalogs are the feature we're most excited about. Templates were an attempt to go a bit further and solve some other common PNPM feature requests, but I recognize we have some more thinking to do.


To briefly respond to everyone's comments since last time:

  • @darcyclarke I've incorporated your feedback around not consuming package.json files in the catalogs RFC.

  • @dominikg Let me get back to you on the concerns of the new catalog: version specifier protocol later today.

  • @wesleytodd Needing to explain rationale and motivations better was great feedback. You're right — I agree I could have been done much better. Thanks. I think the catalogs RFC may be going in a different direction than the generalized package.json definition sharing work you've previously been involved in, but if you have time to give feedback on whether the catalogs RFC makes sense from a motivations, rationale, or implementation standpoint, please do!

@gluxon
Copy link
Member Author

gluxon commented Jul 9, 2023

Responding to @dominikg. Prior comments: #3 (comment), #3 (comment), #3 (comment), #3 (comment)


I still believe it is important to write the "static" package.json to disk.

Tools doing dependency analysis would be the ones i am most concerned about, but i am sure there are tools and workflows that parse other fields and get it wrong in subtle ways if something is missing or not aligned with what the template renders to the published version.

To recap, I asked in #3 (comment) if static package.json files written to disk were still necessary if we were to limit the proposal to a catalog: specifier and highly-pnpm specific settings like pnpm.patches.

I think it's worth noting that PNPM would not be the first package manager to introduce a new version specifier protocol.

  • The workspace: protocol was initially a Yarn 2 feature, which has since seen extensive usage. It has similar on-disk and publish time distinctions. Searching GitHub for workspace:* in package.json shows 132K results as of July 8th, 2023.
  • Looking over the package.json spec on npmjs.com, there's evidence of new specifiers being added over time. For example, Local Paths have a mention of "version 2.0" in terms of availability.
  • The current Yarn docs list many protocols. Among the ones not in npm, I see exec:, portal:, patch:.

Regarding dependency analysis tools parsing package.json and getting it "wrong", this can happen today. NPM and pnpm both support overrides with Yarn having the same feature called resolutions. The overrides feature means version specifiers in package.json can be a complete lie. Dependency analysis tools generally need to look at published package.json files or follow the node_modules resolution specification to be correct. (I do acknowledge this particular fact is fairly unfortunate and a loss for security. 😞)

You're right to be concerned of existing tools breaking. However, maintaining 100% compatibility with all third-party tools would severely limit any ability to improve developer experience. I think in the catalog: case,

  1. New version specifier protocols have well-established precedent, and
  2. There are other, more correct ways for dependency analysis tools to determine gain information (given features like overrides already exist).

If the main concern with it is that it increases (or rather does not reduce) the amount of merge conflicts, is there a way to tackle these differently?

Definitely possible! I'm sure the community would be interested in other ideas you have to reduce merge conflicts. Another idea in the past was to break up parts of the pnpm-lock.yaml file (https://github.com/orgs/pnpm/discussions/4324), but that was outside of the scope of the package.json angle.

pnpm has access to additional information from the lockfile and could look at both ends of the merge as well. So a pnpm resolve-merge-conflicts command would be able to do a better job than git alone.

A lockfile driver has been explored in the past, but the main problem is that it's a partial solution since GitHub (I believe rightfully so), doesn't support lockfile merge drivers. This means merge conflicts will continue to slow down developers when they're ready to merge their pull requests.

To give more context, I do believe reducing merge conflicts is an important concern. PNPM has been chosen by many organizations for its ability to scale well to monorepos with thousands of packages. In those situations, package.json files and pnpm-lock.yaml merge conflict very frequently.


@dominikg If you have the time, I would appreciate any thoughts or alternatives you have around catalogs in #1. 🙂

@gluxon
Copy link
Member Author

gluxon commented Jul 9, 2023

I've been going back to the drawing board over the last few days. First, I believe what's missing is an explanation of how catalog and toolkit are solving a somewhat uniquely pnpm concern.

Isolated Mode

I believe pnpm's semi-strict node_modules structure is its most innovative and best feature. Many folks in this conversation were heavily involved in bringing this feature to npm under the "isolated mode" name. (A big thank you to those folks.)

Isolated mode is fantastic. It means phantom dependencies where packages forget to declare their dependencies in the package.json file are caught. PNPM encourages its users to do the correct thing with its default isolated linker mode.

Declaring Dependencies: Do, or don't?

However, properly declaring dependencies across large monorepos with thousands of packages is actually surprisingly difficult, especially when there's hundreds of humans each working on their own feature. You end up with a dependency graph that's full of duplicates, package.json files that merge conflict each other, and corrupt pnpm-lock.yaml files on the main branch.

Thus, pnpm has two groups of users that this RFC was attempting to make happy:

  1. People that want to properly declare their dependenciesCatalogs

    The catalog feature is aimed at solving the problems above for this group. Catalogs reduce both duplicates and package.json merge conflicts, and I'm hopeful it'll one day reduce the likelihood of corrupt pnpm-lock.yaml files after a bad merge.

  2. People that intentionally do not want to declare dependenciesToolkits

    The toolkit template flavor was aimed towards appeasing these users by allowing them to declare one shared "toolkit" package. That shared toolkit would then brings its dependencies into the node_modules visibility of the parent package. This mimics peer dependencies in package managers that auto-install them, but peer dependencies are intended for a different purpose (plugins).

NOTE: In case it's relevant, I'm personally in the first camp. The bias in my writing might be showing a bit.

Focus

To recap, the features that are important to continue thinking about are catalogs and toolkits. My theory for why they're so frequently discussed in the pnpm repository is because both address side-effects stemming from pnpm's default isolated linker mode.

The most important part of this RFC is the catalog template and the toolkit template (which Dominik suggests to rename). These are one of the most requested features in the pnpm repository.

Proposal for Next Steps

Looking forward, I propose we take any and all templating out of our solution space. This includes templating in-memory, at publish, and onto the disk.

Given that, let's look at each of the problems the template flavors attempted to address and discuss our approach to each.

Catalog

This has been moved back to #1. Mention of templating package.json files have been removed and implementation now focuses on the new version specifier protocol.

Authoring

The author, license, funding, repository, bugs, contributors fields do not change very often, so the need to synchronize them or keep them up to date doesn't come up as often as it does for dependencies.

I would personally vote we stop attempting to solve this in pnpm. There's already many existing tools to synchronize these fields by editing package.json files. I don't see value in having pnpm add another way to do this.
There's also the LICENSE file issue @dominikg called out.

Scripts

I would also vote we stop attempting to solve this in pnpm. One of my favorite aspects of package.json scripts is that I can open them up and see what commands I can run. This property is already somewhat lost in this RFC.

Users that may need to share a set of commands across multiple packages already have a simple workaround. It's simple to create a devDependencies binary and execute subcommands off of that.

pnpm exec my-scripts <name>

Compatibility and Patches

Let's make these explicit remote references rather than templated.

{
  "pnpm": {
    "packageExtensions:remote": [
        "@example/patches@0.1.0"
    ],
    "packageExtensions": {
      "react-redux": {
        "peerDependencies": {
          "react-dom": "*"
        }
      }
    }
  }
}

The @example/patches package would export a definition for pnpm to consume.

{
  "name": "@example/patches",
  "version": "0.1.0",
  "exports": {
    "pnpm-packageExtensions-v1": "packageExtensions.json"
  }
}

I think I actually prefer this syntax over the existing RFC since it's more explicit.

Toolkit

This is the hard one, and why my comment goes onto a detour about isolated mode. At the heart of this, what we're currently thinking of as toolkits are an isolated mode "escape hatch".

In package managers with a flat node_modules hierarchy (i.e. npm's default) , every dependency is already toolkit package. This is because all import(...) calls can resolve any dependency no matter how deep in the dependency graph under the traditionally flat node_modules hierarchy. (By "flat", I'm referring to a setup where there's generally one node_modules dir at the root of a monorepo.)

First, I'm hoping we can start a more thorough discussion on the pnpm side for why users want to selectively escape isolated mode for certain packages. It's possible we can learn a bit more. If the use case is valid, I think we need to consider a more formal concept for exposing subdependencies.

A similar formal concept for "exposing" parts of a package today would be exports:

{
  "exports": {
    ".": "./index.js",
    "./submodule.js": "./src/submodule.js"
  }
} 

You could imagine that if we absolutely had to solve this problem, one (really unpleasant) formalized concept would be:

{
  "dependencies": {
    "something-my-consumers-can-import": "0.0.0"
  },
  "dependenciesExports": [
    "something-my-consumers-can-import"
  ]
} 

But let's not do that. The above is to illustrate an alternative given no other options.

I would be curious to hear from folks that have worked on npm: How would you respond to folks asking for an isolated mode selective escape hatch?

@nathanhammond
Copy link

nathanhammond commented Jul 10, 2023

👋 from Turborepo!

  • Dependencies: we resolve pnpm's (and npm's, and yarn's, and berry's) lockfile, not package.json. (This was the work of @chris-olszewski and he can provide color commentary if necessary.) We basically don't care what happens here as long as it gets serialized into the lockfile.
  • Scripts: we execute scripts via <packagemanager> run foo. That will continue to work separate from whatever design, but we need a static boolean "exists" or we have to attempt execution and detect failure at each node in a workspace graph. Ideally we would be able to statically determine this without separately implementing "pnpm's script lookup algorithm." We also invalidate on script command changes, so "boolean" here is a bit of a specificity decrease unless paired with a unique specifier.
  • Complete non-existence of package.json would break a billion tools, not just Turborepo. Lots of find-up of package.json exist in tools. I would recommend this as a non-goal for ecosystem compatibility reasons.

The term that I use to describe these goals are conformance. There are two general strategies for this:

  • conformance due to single source of configuration (see: XKCD 927)
  • conformance due to programmatic review of inputs

After years of experience in the infra space, I feel comfortable asserting that "conformance via single source of configuration" is not going to work outside of trivially-sized projects. Even if it is an option, people will fail to properly extend from the source of truth. 🤦‍♂️

Given that, these features would implicitly be "shortcut" ways for (human) authors to quickly assert conformance, or (less-complex) tooling to assert conformance. This RFC is essentially competing with external tool adoption for conformance.


Isolated mode: obviously correct, and weird to conflate with conformance. "partial isolated mode opt out" should be split from this conversation. I would not attempt to solve that using any of the strategies proposed in this RFC.

@dominikg
Copy link

A lockfile driver has been explored in the past, but the main problem is that it's a partial solution since GitHub (I believe rightfully so), doesn't support lockfile merge drivers. This means merge conflicts will continue to slow down developers when they're ready to merge their pull requests.

I don't think that missing native support in github is an argument against this feature. People using it can do so on their commandline with native git and pnpm and push the result.
I assume repositories that host thousands of packages come with their own set of management scripts anyways, so adding opt-in features to pnpm that help maintaining these and implement better automation for some tedious tasks is great, but these repositories are rare. Is there any information about the average number of packages in a pnpm monorepo? My gut says its less than 10.

@jakeleventhal
Copy link

👋 from Turborepo!

  • Dependencies: we resolve pnpm's (and npm's, and yarn's, and berry's) lockfile, not package.json. (This was the work of @chris-olszewski and he can provide color commentary if necessary.) We basically don't care what happens here as long as it gets serialized into the lockfile.
  • Scripts: we execute scripts via <packagemanager> run foo. That will continue to work separate from whatever design, but we need a static boolean "exists" or we have to attempt execution and detect failure at each node in a workspace graph. Ideally we would be able to statically determine this without separately implementing "pnpm's script lookup algorithm." We also invalidate on script command changes, so "boolean" here is a bit of a specificity decrease unless paired with a unique specifier.
  • Complete non-existence of package.json would break a billion tools, not just Turborepo. Lots of find-up of package.json exist in tools. I would recommend this as a non-goal for ecosystem compatibility reasons.

The term that I use to describe these goals are conformance. There are two general strategies for this:

  • conformance due to single source of configuration (see: XKCD 927)
  • conformance due to programmatic review of inputs

After years of experience in the infra space, I feel comfortable asserting that "conformance via single source of configuration" is not going to work outside of trivially-sized projects. Even if it is an option, people will fail to properly extend from the source of truth. 🤦‍♂️

Given that, these features would implicitly be "shortcut" ways for (human) authors to quickly assert conformance, or (less-complex) tooling to assert conformance. This RFC is essentially competing with external tool adoption for conformance.

Isolated mode: obviously correct, and weird to conflate with conformance. "partial isolated mode opt out" should be split from this conversation. I would not attempt to solve that using any of the strategies proposed in this RFC.

To piggy back off of this, I came to this thread via https://github.com/orgs/pnpm/discussions/5974. I agree with @nathanhammond. I also think that this RFC might be too ambitious when just having a shared, root-defined package version to be reused would do most of the legwork here.

root package.json
{
   "sharedPNPMDependencies" {
      "react": "18.12.0"
    }
}
some package package.json
{
  "dependencies": {
      "react": "workspace" --> uses 18.12.0
   }
}

This would all resolve to a common pnpm-lock file but allow you to quickly pull out packages. The whole concept of packages of packages and imports/exports seems like way overkill in my opinion.

@gluxon
Copy link
Member Author

gluxon commented Jul 10, 2023

Responding to @nathanhammond and @jakeleventhal: #3 (comment), #3 (comment)


👋 Hello! I'm a big fan of everything folks at Vercel are working on. Thanks to you and your team for improving open source.

After years of experience in the infra space, I feel comfortable asserting that "conformance via single source of configuration" is not going to work outside of trivially-sized projects. Even if it is an option, people will fail to properly extend from the source of truth. 🤦‍♂️

@nathanhammond The "conformance" term is useful nomenclature here. Thanks for bringing it into this conversation.

I'm also leaning away from conformance of authoring fields and scripts: #3 (comment).

I think usage of the catalog to declare dependencies consistently might still be a useful conformance tool though. Namely because usages of a dependency in the default catalog can be enforced easily on pre-merge CI checks. In that case, people that "fail to property extend from the source of truth" can be given a nudge towards the right direction. 🙂

Am I understanding the "conformance" idea correctly? Do you see concerns of "conformance via single source of configuration" for #1? RFC #1 (Catalogs) currently doesn't enforce usage of the catalog, but I think it's a useful followup addition.

Isolated mode: obviously correct, and weird to conflate with conformance. "partial isolated mode opt out" should be split from this conversation. I would not attempt to solve that using any of the strategies proposed in this RFC.

Agree. We can start a new discussion on this elsewhere.

I also think that this RFC might be too ambitious when just having a shared, root-defined package version to be reused would do most of the legwork here.

@jakeleventhal Also agree! I think we're going to do what's essentially the suggestion above for catalogs in #1.

@gluxon
Copy link
Member Author

gluxon commented Jul 10, 2023

Responding to @dominikg #3 (comment)

A lockfile driver has been explored in the past, but the main problem is that it's a partial solution since GitHub (I believe rightfully so), doesn't support lockfile merge drivers. This means merge conflicts will continue to slow down developers when they're ready to merge their pull requests.

I don't think that missing native support in github is an argument against this feature. People using it can do so on their commandline with native git and pnpm and push the result.

I do agree that it's useful to have a merge conflict driver in addition to anything we do here. Apologies for the unclear wording in the earlier comment, but I would do both. Any formatting changes or data deduplication we can do to prevent merge conflicts in the first place is valuable. @zkochan and I have made prior improvements to reduce pnpm-lock.yaml merge conflicts, and that's been well-received every time.

My earlier argument was that a merge conflict driver doesn't completely replace the need for a catalog: version specifier protocol. I would agree the reverse is true.

For reference, I think there is a working merge conflict driver available today: https://github.com/pnpm/merge-driver

I assume repositories that host thousands of packages come with their own set of management scripts anyways, so adding opt-in features to pnpm that help maintaining these and implement better automation for some tedious tasks is great, but these repositories are rare. Is there any information about the average number of packages in a pnpm monorepo? My gut says its less than 10.

I suspect you're correct. My gut is the average number of packages in a pnpm monorepo is <10. Similar to the earlier comment, I'd still be interested in helping repositories that host thousands of packages in a way that doesn't make it more difficult for the average case.

@gluxon
Copy link
Member Author

gluxon commented Jul 10, 2023

I'm appreciating all the feedback. The most contentious points folks have called out are around importing/exporting package.json files and templating. Can those comments be held back for now? My current proposal is to not do this: #3 (comment)

Waiting for @zkochan's thoughts around the suggested alternative solutions, which don't involve sharing package.json files.

@gluxon
Copy link
Member Author

gluxon commented Jul 10, 2023

Caught up with @zkochan in a DM. We're going to close this RFC. 🎉

Thanks to everyone who helped axe it. 🪓 Your feedback prevented a bad feature from making it onto the Internet. The world is a better place thanks to you. 🙇🏻‍♂️

Catalogs will continue in #1. All other features we were attempting were lower priority and won't be addressed for now.

@gluxon gluxon closed this Jul 10, 2023
@gluxon gluxon deleted the 0003-templates.md branch July 10, 2023 23:25
@zkochan
Copy link
Member

zkochan commented Jul 10, 2023

@gluxon thanks for working on this RFC. You did a good job both with the content and communicating with the reviewers. We have definitely learned a lot in the process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet