-
-
Notifications
You must be signed in to change notification settings - Fork 636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Javascript backend - run node package install once per workspace #20826
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Happy to see contributions to the JS backend.
- There's some unused fields / code that should be removed following the removal of the rule calls / package caches,
- There ought to be at least one test case that support the implementation. I suspect all tests pass because of how much heavy lifting the package managers already do. At least exposing the integrity file case as passing would be a good case to cover.
- Did you consider Install entire node project when creating node_modules digest, subset in sandboxes #19236? I'm fine with this change as is because it is a path forward, but I'm curious if you spent any thought on it.
This README is mainly for understanding the current backend functionality for PR review purposes. | ||
Plan to migrate much of this content to user docs. | ||
|
||
## Definitions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only skimmed through this README, LMK if you want it thoroughly checked.
### Dependencies between packages | ||
|
||
In the example above, for an installation of 'package-a' to also install 'package-c', the following things must in place: | ||
- Both packages must be in a workspace - i.e. the directories added to a common parent package.json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is indeed true, but I think the backend is fairly close to being able to serve as a workspace manager as well. Whether that is complexity pants is ready to take on: I doubt.
|
||
No import, package.json dep: | ||
- node_modules missing source code for 1st party deps, everything else present | ||
- Probably in this case, it would be nice to generate a warning, e.g. "Package-a source code has a dependency on package-b but package-b is not present in package-a's listed dependencies in package.json". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, but I'd also expect linters to detect unused deps more effectively. There's also a plethora of plugin-style packages that hijack instead of introducing explicit exports.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true - that would work.
|
||
If any of the package.json files in the workspace is modified, this causes a reinstall, as the process for 'Preparing configured default npm version.', (and also the subsequent installation process) has these files as an input. | ||
|
||
It may be possible to limit this by only including the package.json files for packages' dependencies. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not when you also install the entirety of the workspace each time.
If pants want to cache these results and also support the fields presence in package.json
, we'd need to parse and re-write relevant subsets of that file to the processes that require package.json
contents, as that is the "pants way".
src/python/pants/backend/javascript/subsystems/package-lock.json
Outdated
Show resolved
Hide resolved
Thanks for reviewing so quickly! It will probably take me a while to go back through everything.
Only briefly. I've just been trying to find a path forward mainly while trying to understand the backend at the same time. If I'm understanding that correctly, it's suggesting to, instead of running It seems to me there would be a couple of disadvantages:
|
1c5c8b7
to
fb8e41d
Compare
I've reworked the package integration test to cover the case of importing another workspace package, which triggered the integrity file conflict. In doing so, I found a bug, which held me up for a while. A newer version of pnpm (>8) than the default for node 16 (6.11.0) is needed for workspaces support. I found that the version specified on the subsystem options was not being used. Although it was being installed in a separate sandbox, the output was not captured, and on the process invocation for I've left a couple of comments on the fix for now - I'm just not sure if I've missed anything re. the original intent for the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the new bug you've found is real, I would argue keeping it to a separate issue and pull request is cleaner. I can look into this.
To make your integration test pass I believe removing src/js
from the path will work as a temporary.
w.r.t to the pnpm version, I think maybe best we can do is bump the default? It is unfortunate, but I'm not keen on validating tool versions for features. I believe pants should aim to please instead of throwing errors that might not be relevant for a particular usecase. Users can use the nodejs
subsystem and its package managers without writing javascript, after all.
Edit:
Honestly, I think just bumping to nodejs v18 should do the trick.
@tobni I've taken a stab at updating the node version and removing However, I still have one more problem when the test is run without explicitly providing package manager versions (e.g. In the file layout scenario used in the test, the workspace root shouldn't contain a package.json (for pnpm and yarn), and not necessarily a package manager. To address this for now, I modified the logic in prepare_corepack_tool() to supply the package manager name, but seems undesirable as this would override the package.json spec if it was provided. However, in my testing it seems that if a version is provided in package.json, then it is used: i.e. @rule(desc="Preparing Corepack managed tool.")
async def prepare_corepack_tool(
request: CorepackToolRequest, environment: NodeJSProcessEnvironment, nodejs: NodeJS
) -> CorepackToolDigest:
version = request.version or nodejs.package_managers.get(request.tool) # <-- version here is that from package.json and fed into CLI args
if not version and request.input_digest == EMPTY_DIGEST:
raise ValueError(f"Could not determine tool version for {request.tool}.")
tool_spec = f"{request.tool}@{version}" if version else request.tool
tool_description = tool_spec if version else f"default {tool_spec} version" so this doesn't actually matter. This goes back to my earlier comment above on the same section of code:
Thoughts on what to do here? Aside from this, there are some failing tests, I think all related to package-lock files needing updating after the node updates. |
Did you try this? To clarify, the environment variable COREPACK_HOME is not correctly set when preparing the tool. This is why the
works.
Had a little look through and here's what I gather: I was fairly certain that corepack vendored their own releases of package managers so that e.g I think that desired behaviour here is to "just work" if we can and allow configurability that allows larger repos to continue to function. I.e the rules should be
I think what is actually missing is case 3, which would be different from the wip you've pushed to remote. And I agree, as you've spotted, there is no reason to provide a digest because the relevant bit (if present) to achieve 1 is already parsed earlier, so there's no need for corepack to do it again! I suspect the addition of Lastly: |
Thanks! Sorry have been flat out this week On 6 Jun 2024, at 20:08, Tobias Nilsson ***@***.***> wrote:
I'm going to address the corepack issues via #21020 to cut out the noise from the changes herein.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: ***@***.***>
|
@riisi Headsup that nodejs18 PR was just merged :) |
@tobni Updated this now following @krishnan-chandra 's changes. |
src/python/pants/backend/javascript/nodejs_project_environment.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thanks!
Fixes #19236 and fixes #19235.
I've been poking around at this backend for a while trying to understand it. I think I'm most of the way there...
I wanted to raise this PR now for feedback - I think it's fairly complete as is - but I'm not sure if I have any blind spots in both understanding the original design intent and how it's being used.
There is a lot more functionality than I expected going in - huge props to @tobni for that - it would be great to fix the remaining issues and get some docs out so it can be used more & we can get more feedback.
Current process
For each first party package, the backend runs 'npm install' on its first party dependency packages. This is a recursive process.
Each
npm install
command is executed in the context of that package, generating a 'node_modules' directory containing 1st and 3rd party deps (in separate directories). Source code (fromjavascript_sources
targets) is then copied in.The output of
node_modules
from any 1st party dependency installs is merged to formnode_modules
for the package.Problems
Per #19235, newer versions of package managers generate integrity files in each node_modules directories. These cause conflicts on merge.
Secondly, and more critically, it seems (I've not dug too much into this, but some past discussion here), that each
npm install
is effectively installing all transient deps.E.g. in the case of a tree like package-A -> package-B -> package-C -> package-D ... I think while the intent is for package-A install to only install direct dependents, it's installing all transitive, so there's a lot more IO than intended. Effectively, it is merging ((package-A + trans deps) + (package-B + trans deps) + (package-c + trans deps)...))
New approach
Perform one install for the package in the workspace context.
The install command is
npm install --workspace package-a
.The install command needs to have visibility of the root package.json, which contains the 'workspaces' configuration, as well as any dependent package.json. Currently this is all package.json in the project.
The output is a single node_modules directory at the project level (assuming default package manager config - hoisting options could be configured differently).
Advantages
Simpler and is similar to how the user would use the package manager natively to install in a mono-repo setup.
The backend is simpler to troubleshoot as there is only one sandbox per install.
We delegate more of the package resolution process to the package managers.
Performance should be theoretically better given the issue above - but I've not tested. If anyone has anything handy for testing, please let me know.
Disadvantages
We are not caching each module install within Pants, which I'm guessing was the original intent. But I'm not sure we're gaining much from this approach currently.