TS: Handle monorepos by rewriting package.json#2619
TS: Handle monorepos by rewriting package.json#2619semmle-qlci merged 23 commits intogithub:masterfrom
Conversation
| Path symlinkPath = rootNodeModules.resolve(name); | ||
| if (!Files.exists(symlinkPath)) { | ||
| Files.createDirectories(symlinkPath.getParent()); | ||
| Files.createSymbolicLink(symlinkPath, file.getParent()); |
There was a problem hiding this comment.
Is this going to work on Windows? StackOverflow suggests that one needs special permissions to create symbolic links on Windows. Not an issue for LGTM.com, of course, but still relevant for LGTM-E.
There was a problem hiding this comment.
That's a good question. I'm having some difficulty getting Java to run on my Windows VM today but I'll try to test it this week.
FWIW yarn workspaces use symlinks and yarn manages to create them just fine on Windows, so maybe can use that as a fallback.
There was a problem hiding this comment.
Turns out yarn workspaces use "junctions", a type of symlink on NTFS which doesn't require admin. The Java API doesn't have a method for creating them.
Before going any further with the symlink approach, I'll experiment a bit with providing in a virtual file system in the compiler host we pass to the TypeScript compiler.
Could you briefly explain how this is enforced? |
By having an undocumented environment variable you need to set ( |
542e338 to
7e8fb14
Compare
|
I've pushed a revised version that no longer relies on symlinks, and doesn't pollute the original source checkout. We now install dependencies in |
|
@erik-krogh do you think you could help review this? @max-schaefer I'd still be interested in your take on this, if you have time, though the PR has grown quite a bit now. |
erik-krogh
left a comment
There was a problem hiding this comment.
I have not tried to run your code, but your idea of hooking into the TypeScript compiler looks good.
(Your comments really helped me understand the code).
This review was mainly me trying to understand all the parts of your code.
I'll take a second look later.
javascript/extractor/src/com/semmle/js/extractor/AutoBuild.java
Outdated
Show resolved
Hide resolved
javascript/extractor/src/com/semmle/js/extractor/AutoBuild.java
Outdated
Show resolved
Hide resolved
Co-Authored-By: Erik Krogh Kristensen <erik-krogh@github.com>
erik-krogh
left a comment
There was a problem hiding this comment.
LGTM, but I still haven't run the code.
Before I approve: Could you post a few links to some monorepos, then I'll confirm that it works on them.
We should probably get another pair of eyes on this (@esbena?).
| let redirected = this.redirectModuleName(moduleName, containingFile, options); | ||
| if (redirected != null) return redirected; | ||
| return ts.resolveModuleName(moduleName, containingFile, options, host, resolutionCache).resolvedModule; |
There was a problem hiding this comment.
So, ts.resolveModuleName uses a cache, and this.redirectModuleName doesn't (sometimes).
I'm just wondering whether there is a reason for ts.resolveModuleName to use a cache beyond performance.
Because if there is, then we should probably add a similar cache.
From my own look into how resolveModuleNames is implemented in TypeScript it seems like the cache is used purely for performance.
There was a problem hiding this comment.
Yes, I'm pretty sure the cache is just for performance.
The cache makes sense, because it's a lot of work to locate a package in the general case. The node_modules folder can be in a bunch of places, and there's both the actual package and its companion in @types to look for, and TypeScript also have to parse the package.json file to look at its typings field, and I'm sure there's more.
In our case, the packageEntryPoints map makes it a lot easier though, as it mostly does what the cache would do anyway. We do have to a try a few file extensions when looking for the file on disk, which I suppose we could cache as well, but I'm not sure it matters.
I've been testing on these three, though it would be great if you can find more: |
max-schaefer
left a comment
There was a problem hiding this comment.
Fantastic work! Looks great overall. I have a few very minor suggestions and comments, but none of it is blocking.
| console.warn('TypeScript: reported ' + diagnostics.length + ' semantic errors.'); | ||
| for (let diagnostic of diagnostics) { | ||
| let text = diagnostic.messageText; | ||
| if (typeof text !== 'string') { |
There was a problem hiding this comment.
Would it be worth adding a text && check? I'm sure it won't happen in practice, but since all we do with text is to log it the console it doesn't seem worth crashing the extractor if it should happen somehow.
| import * as ts from "./typescript"; | ||
|
|
||
| /** | ||
| * Mapping from the source root to the virtual source root. |
There was a problem hiding this comment.
Perhaps say "real source root" here, since that's the terminology used below? Also, would it perhaps be worth briefly outlining what these two roots are and how they relate to each other? (Perhaps something like the comment on line 12 below.)
javascript/extractor/src/com/semmle/js/extractor/AutoBuild.java
Outdated
Show resolved
Hide resolved
javascript/extractor/src/com/semmle/js/extractor/AutoBuild.java
Outdated
Show resolved
Hide resolved
javascript/extractor/src/com/semmle/js/extractor/AutoBuild.java
Outdated
Show resolved
Hide resolved
| propsToRemove.add(packageName); | ||
| } else { | ||
| // Remove file dependency on a package that don't exist in the checkout. | ||
| String dependecy = getChildAsString(dependencyObj, packageName); |
There was a problem hiding this comment.
| String dependecy = getChildAsString(dependencyObj, packageName); | |
| String dependency = getChildAsString(dependencyObj, packageName); |
| public static String getScratchDir() { | ||
| String env = Env.systemEnv().get(CODEQL_EXTRACTOR_JAVASCRIPT_SCRATCH_DIR_ENV_VAR); | ||
| if (env == null) { | ||
| throw new UserError(CODEQL_EXTRACTOR_JAVASCRIPT_SCRATCH_DIR_ENV_VAR + " must be set"); |
There was a problem hiding this comment.
Will this work on LGTM? I don't think it sets that environment variable, does it? However, LGTM_WORKSPACE looks like a suitable alternative and is used by a few of the other autobuilders.
There was a problem hiding this comment.
Looking at the other AutoBuild scripts, it seems that LGTM_WORKSPACE is set to whatever value is already in CODEQL_EXTRACTOR_LANG_SCRATCH_DIR.
There was a problem hiding this comment.
I'm not sure I understand. Do you mean that LGTM sets LGTM_WORKSPACE to CODEQL_EXTRACTOR_LANG_SCRATCH_DIR? But who sets the latter? LGTM doesn't use CodeQL.
| /** | ||
| * Extracts the package name from the prefix of an import string. | ||
| */ | ||
| const packageNameRex = /^(?:@[\w.-]+[/\\])?\w[\w.-]*(?=[/\\]|$)/; |
There was a problem hiding this comment.
Can import paths contain double slashes?
There was a problem hiding this comment.
Indeed they can and it seems TypeScript will happily resolve them. We should handle it now.
Co-Authored-By: Max Schaefer <54907921+max-schaefer@users.noreply.github.com>
|
NodeJS ran out of memory trying to extract But I got it to work on |
915bb0f to
3c1cbce
Compare
|
@erik-krogh could you review the last two commits?
Hm, that's unfortunate. The default memory limit of 1G might not be enough for this project when dependencies are included, but I doubt it's an issue on LGTM.com where the Node.js process has 4G. Maybe we should raise the default limit anyway, but that can be done in a separate PR. |
erik-krogh
left a comment
There was a problem hiding this comment.
@erik-krogh could you review the last two commits?
👍 from here.
NodeJS ran out of memory trying to extract angular/angular.
Hm, that's unfortunate. The default memory limit of 1G might not be enough for this project when dependencies are included, but I doubt it's an issue on LGTM.com where the Node.js process has 4G. Maybe we should raise the default limit anyway, but that can be done in a separate PR.
Could you do a local test that angular/angular works with 4GB before you merge?
Yep, works for me, installation+extraction took 19 minutes. |
Extends the dependency-installation support to handle monorepos, that is, repositories containing multiple packages. This also includes repos containing packages that haven't been published to NPM, such as our own
vscode-codeql.Update: it now works as follows:
SCRATCH_DIR, removing references to packages that exist in the same repo.yarn installfor each package.json file inSCRATCH_DIR.SCRATCH_DIRor to a package that exists in the same repo.Original (obsolete) PR description
It works as follows:For eachpackage.jsonfile with name, install a symlink to it in<source_root>/node_modules.Delete all mentions of the above package.json files fromdependenciesetc.Rewrite themainfield to./srcso TypeScript can find the contents of the package in terms of its original source files. Typicallymainrefers to a non-existing file such as./out/index.jswhich won't exist unless we run the full build. (Normally TypeScript would use the.d.tsfiles in theoutdirectory)Runyarn installfor eachpackage.jsonfile, as before.Extract TypeScript files.Restore originalpackage.jsonfiles. We don't clean upnode_modulesfolders as there isn't a concrete reason for doing so at the moment.It's a bit dirty to rewrite files on disk, but keep in mind that this is only intended to use on LGTM. Users of thecodeqlCLI should install dependencies themselves before extracting.