Skip to content

TS: Handle monorepos by rewriting package.json#2619

Merged
semmle-qlci merged 23 commits intogithub:masterfrom
asger-semmle:ts-monorepo-deps
Feb 5, 2020
Merged

TS: Handle monorepos by rewriting package.json#2619
semmle-qlci merged 23 commits intogithub:masterfrom
asger-semmle:ts-monorepo-deps

Conversation

@asgerf
Copy link
Contributor

@asgerf asgerf commented Jan 10, 2020

Extends the dependency-installation support to handle monorepos, that is, repositories containing multiple packages. This also includes repos containing packages that haven't been published to NPM, such as our own vscode-codeql.

Update: it now works as follows:

  • Rewrite package.json files and save them under SCRATCH_DIR, removing references to packages that exist in the same repo.
  • Run yarn install for each package.json file in SCRATCH_DIR.
  • Override hooks in the TypeScript compiler host to resolve packages in SCRATCH_DIR or to a package that exists in the same repo.

Original (obsolete) PR description

It works as follows:

  • For each package.json file with name, install a symlink to it in <source_root>/node_modules.
  • Delete all mentions of the above package.json files from dependencies etc.
  • Rewrite the main field to ./src so TypeScript can find the contents of the package in terms of its original source files. Typically main refers to a non-existing file such as ./out/index.js which won't exist unless we run the full build. (Normally TypeScript would use the .d.ts files in the out directory)
  • Run yarn install for each package.json file, as before.
  • Extract TypeScript files.
  • Restore original package.json files. We don't clean up node_modules folders as there isn't a concrete reason for doing so at the moment.

It's a bit dirty to rewrite files on disk, but keep in mind that this is only intended to use on LGTM. Users of the codeql CLI should install dependencies themselves before extracting.

@asgerf asgerf added JS WIP This is a work-in-progress, do not merge yet! labels Jan 10, 2020
Path symlinkPath = rootNodeModules.resolve(name);
if (!Files.exists(symlinkPath)) {
Files.createDirectories(symlinkPath.getParent());
Files.createSymbolicLink(symlinkPath, file.getParent());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this going to work on Windows? StackOverflow suggests that one needs special permissions to create symbolic links on Windows. Not an issue for LGTM.com, of course, but still relevant for LGTM-E.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question. I'm having some difficulty getting Java to run on my Windows VM today but I'll try to test it this week.

FWIW yarn workspaces use symlinks and yarn manages to create them just fine on Windows, so maybe can use that as a fallback.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out yarn workspaces use "junctions", a type of symlink on NTFS which doesn't require admin. The Java API doesn't have a method for creating them.

Before going any further with the symlink approach, I'll experiment a bit with providing in a virtual file system in the compiler host we pass to the TypeScript compiler.

@max-schaefer
Copy link
Contributor

but keep in mind that this is only intended to use on LGTM

Could you briefly explain how this is enforced?

@asgerf
Copy link
Contributor Author

asgerf commented Jan 13, 2020

but keep in mind that this is only intended to use on LGTM

Could you briefly explain how this is enforced?

By having an undocumented environment variable you need to set (LGTM_INDEX_TYPESCRIPT_INSTALL_DEPS=true)

@asgerf
Copy link
Contributor Author

asgerf commented Jan 23, 2020

I've pushed a revised version that no longer relies on symlinks, and doesn't pollute the original source checkout. We now install dependencies in SCRATCH_DIR and override various hooks in the TypeScript compiler host.

@asgerf
Copy link
Contributor Author

asgerf commented Jan 23, 2020

@erik-krogh do you think you could help review this?

@max-schaefer I'd still be interested in your take on this, if you have time, though the PR has grown quite a bit now.

Copy link
Contributor

@erik-krogh erik-krogh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not tried to run your code, but your idea of hooking into the TypeScript compiler looks good.
(Your comments really helped me understand the code).

This review was mainly me trying to understand all the parts of your code.
I'll take a second look later.

@asgerf asgerf marked this pull request as ready for review January 24, 2020 10:15
@asgerf asgerf requested a review from a team as a code owner January 24, 2020 10:15
Copy link
Contributor

@erik-krogh erik-krogh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I still haven't run the code.

Before I approve: Could you post a few links to some monorepos, then I'll confirm that it works on them.

We should probably get another pair of eyes on this (@esbena?).

Comment on lines +66 to +68
let redirected = this.redirectModuleName(moduleName, containingFile, options);
if (redirected != null) return redirected;
return ts.resolveModuleName(moduleName, containingFile, options, host, resolutionCache).resolvedModule;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, ts.resolveModuleName uses a cache, and this.redirectModuleName doesn't (sometimes).

I'm just wondering whether there is a reason for ts.resolveModuleName to use a cache beyond performance.
Because if there is, then we should probably add a similar cache.

From my own look into how resolveModuleNames is implemented in TypeScript it seems like the cache is used purely for performance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm pretty sure the cache is just for performance.

The cache makes sense, because it's a lot of work to locate a package in the general case. The node_modules folder can be in a bunch of places, and there's both the actual package and its companion in @types to look for, and TypeScript also have to parse the package.json file to look at its typings field, and I'm sure there's more.

In our case, the packageEntryPoints map makes it a lot easier though, as it mostly does what the cache would do anyway. We do have to a try a few file extensions when looking for the file on disk, which I suppose we could cache as well, but I'm not sure it matters.

@asgerf
Copy link
Contributor Author

asgerf commented Jan 28, 2020

Could you post a few links to some monorepos, then I'll confirm that it works on them.

I've been testing on these three, though it would be great if you can find more:

Copy link
Contributor

@max-schaefer max-schaefer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic work! Looks great overall. I have a few very minor suggestions and comments, but none of it is blocking.

console.warn('TypeScript: reported ' + diagnostics.length + ' semantic errors.');
for (let diagnostic of diagnostics) {
let text = diagnostic.messageText;
if (typeof text !== 'string') {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be worth adding a text && check? I'm sure it won't happen in practice, but since all we do with text is to log it the console it doesn't seem worth crashing the extractor if it should happen somehow.

import * as ts from "./typescript";

/**
* Mapping from the source root to the virtual source root.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps say "real source root" here, since that's the terminology used below? Also, would it perhaps be worth briefly outlining what these two roots are and how they relate to each other? (Perhaps something like the comment on line 12 below.)

propsToRemove.add(packageName);
} else {
// Remove file dependency on a package that don't exist in the checkout.
String dependecy = getChildAsString(dependencyObj, packageName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
String dependecy = getChildAsString(dependencyObj, packageName);
String dependency = getChildAsString(dependencyObj, packageName);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(And uses below.)

public static String getScratchDir() {
String env = Env.systemEnv().get(CODEQL_EXTRACTOR_JAVASCRIPT_SCRATCH_DIR_ENV_VAR);
if (env == null) {
throw new UserError(CODEQL_EXTRACTOR_JAVASCRIPT_SCRATCH_DIR_ENV_VAR + " must be set");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this work on LGTM? I don't think it sets that environment variable, does it? However, LGTM_WORKSPACE looks like a suitable alternative and is used by a few of the other autobuilders.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the other AutoBuild scripts, it seems that LGTM_WORKSPACE is set to whatever value is already in CODEQL_EXTRACTOR_LANG_SCRATCH_DIR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand. Do you mean that LGTM sets LGTM_WORKSPACE to CODEQL_EXTRACTOR_LANG_SCRATCH_DIR? But who sets the latter? LGTM doesn't use CodeQL.

/**
* Extracts the package name from the prefix of an import string.
*/
const packageNameRex = /^(?:@[\w.-]+[/\\])?\w[\w.-]*(?=[/\\]|$)/;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can import paths contain double slashes?

Copy link
Contributor Author

@asgerf asgerf Jan 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed they can and it seems TypeScript will happily resolve them. We should handle it now.

max-schaefer
max-schaefer previously approved these changes Jan 30, 2020
@erik-krogh
Copy link
Contributor

NodeJS ran out of memory trying to extract angular/angular.

But I got it to work on github/vscode-codeql, and it seems to work, so 👍 from here.

@asgerf asgerf removed the WIP This is a work-in-progress, do not merge yet! label Feb 4, 2020
@asgerf
Copy link
Contributor Author

asgerf commented Feb 5, 2020

@erik-krogh could you review the last two commits?

NodeJS ran out of memory trying to extract angular/angular.

Hm, that's unfortunate. The default memory limit of 1G might not be enough for this project when dependencies are included, but I doubt it's an issue on LGTM.com where the Node.js process has 4G. Maybe we should raise the default limit anyway, but that can be done in a separate PR.

Copy link
Contributor

@erik-krogh erik-krogh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@erik-krogh could you review the last two commits?

👍 from here.

NodeJS ran out of memory trying to extract angular/angular.

Hm, that's unfortunate. The default memory limit of 1G might not be enough for this project when dependencies are included, but I doubt it's an issue on LGTM.com where the Node.js process has 4G. Maybe we should raise the default limit anyway, but that can be done in a separate PR.

Could you do a local test that angular/angular works with 4GB before you merge?

@asgerf
Copy link
Contributor Author

asgerf commented Feb 5, 2020

Could you do a local test that angular/angular works with 4GB before you merge?

Yep, works for me, installation+extraction took 19 minutes.

@semmle-qlci semmle-qlci merged commit a5e183b into github:master Feb 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants