Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(compartment-mapper): Custom parser support #2304

Merged
merged 24 commits into from
Jun 3, 2024

Conversation

boneskull
Copy link
Collaborator

@boneskull boneskull commented May 28, 2024

Closes: #2303

Description

This PR adds custom parser support as discussed in #2303. To support the feature:

  • A new prop options was added to the PackagePolicy type.
  • Parsers now optionally accept compartmentDescriptor objects, so that parsers can enforce policy

Security Considerations

See #2303

Scaling Considerations

See #2303

Documentation Considerations

  • Add documentation for a new option parsers for e.g., importLocation
  • Add documentation for a new prop options in PackagePolicy

Testing Considerations

  • A round-trip test
  • A round-trip test w/ policy enforcement
  • Assert options is ignored by policy validator
  • Assert custom parsers cannot override builtin parsers

Compatibility Considerations

None

Upgrade Considerations

  • Update NEWS.md

@boneskull boneskull requested a review from naugtur May 28, 2024 22:39
@boneskull boneskull self-assigned this May 28, 2024
@boneskull boneskull requested a review from kriskowal May 28, 2024 22:39
@boneskull boneskull added the enhancement New feature or request label May 28, 2024
Copy link
Member

@kriskowal kriskowal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Design direction seems fine. I like to do import, archive, and import-archive together for any feature that applies to all three, to make sure the design is sound. A scaffold test should verify that an application has the same behavior through import or archive + import-archive using custom parsers. Any commentary on why archive and import-archive should not support custom parsers would be welcome. Please request a review again when this PR is out of draft.

Thank you for sharing LiteralUnion. There are many places where I imagine this pattern should be applied. I welcome a comment from @turadg whether this should become integral to our style for enum literals.

packages/compartment-mapper/src/link.js Outdated Show resolved Hide resolved
packages/compartment-mapper/src/node-modules.js Outdated Show resolved Hide resolved
packages/compartment-mapper/src/node-modules.js Outdated Show resolved Hide resolved
packages/compartment-mapper/test/custom-parser.test.js Outdated Show resolved Hide resolved
packages/compartment-mapper/src/types.js Show resolved Hide resolved
packages/compartment-mapper/test/custom-parser.test.js Outdated Show resolved Hide resolved
packages/compartment-mapper/src/import.js Outdated Show resolved Hide resolved
packages/compartment-mapper/src/types.js Outdated Show resolved Hide resolved
@boneskull
Copy link
Collaborator Author

@kriskowal

I like to do import, archive, and import-archive together for any feature that applies to all three, to make sure the design is sound.

Is "import-archive" the "unarchive" to the "archive" case? If so, I'm assuming that the same custom parser would need to be provided to both invocations. What happens (or should happen) if this is not true? Would we need to do something like add the parser itself to the archive?

@kriskowal
Copy link
Member

Is "import-archive" the "unarchive" to the "archive" case?

Indeed.

If so, I'm assuming that the same custom parser would need to be provided to both invocations. What happens (or should happen) if this is not true? Would we need to do something like add the parser itself to the archive?

It’s good enough to require that the same custom parsers be present for makeArchive and parseArchive (and their analogues). I don’t care to fantasize about capturing the custom parser in the archive at this time, and that can be a different route if we do go that direction.

@kriskowal
Copy link
Member

Analogously, makeArchive (and writeArchive etc) has to receive compatible exits with parseArchive (and importArchive etc). Coördinating out of band is expected.

packages/compartment-mapper/src/link.js Outdated Show resolved Hide resolved
Comment on lines 360 to 381
/** @type {Record<string, Language>} */
const customLanguageForExtension = Object.create(null);
for (const { parser, extensions, language } of parsers) {
if (
language in parserForLanguage &&
parserForLanguage[language] !== parser
) {
throw new Error(`Parser for language ${q(language)} already defined`);
}
parserForLanguage[language] = parser;
for (const extension of extensions) {
if (
extension in customLanguageForExtension &&
customLanguageForExtension[extension] !== language
) {
throw new Error(
`Extension ${q(extension)} already assigned language ${q(customLanguageForExtension[extension])}`,
);
}
customLanguageForExtension[extension] = language;
}
}
Copy link
Member

@kriskowal kriskowal May 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m beginning to suspect it’s better to thread separate “defaults for all compartments” options for parserForLanguage and languageForExtension than to merge them into parsers and then unmerge them here. In #2294, I would have to make @endo/compartment-mapper/import-parsers.js express an opinion about the language for each extension, and there is no clear opinion at that juncture.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This branch captures my investigation and I am convinced that threading parserForLanguage and languageForExtension as separate options (instead of consolidated custom parsers) is robust with less ceremony and will compose better with changes I have coming for #400

boneskull/native-parser...kriskowal-native-parser

@boneskull boneskull requested a review from kriskowal May 30, 2024 23:12
@boneskull boneskull marked this pull request as ready for review May 30, 2024 23:13
@boneskull
Copy link
Collaborator Author

@kriskowal OK, this is ready for Proper Review. Please take a looksee

@boneskull
Copy link
Collaborator Author

will update NEWS.md as a last step

Comment on lines 45 to 54
const parserForLanguage = freeze(
/** @type {const} */ ({
mjs: parserArchiveMjs,
'pre-mjs-json': parserArchiveMjs,
cjs: parserArchiveCjs,
'pre-cjs-json': parserArchiveCjs,
json: parserJson,
text: parserText,
bytes: parserBytes,
}),
);
Copy link
Collaborator Author

@boneskull boneskull May 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This (and the other one) shouldn't be mutated. Calling it const isn't enough 😄

UPDATE: There are at least three. I hope I didn't miss one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You caught them all.

/** @import {ParserForLanguage} from './types.js' */
/** @import {ReadFn} from './types.js' */
/** @import {ReadPowers} from './types.js' */
/** @import {SomeObject} from './types.js' */
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you, copilot

@@ -62,6 +62,12 @@ const q = JSON.stringify;
{
globals: ['a', {}],
},
{
options: {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we care that it's possible to put unserializable junk in here? Should that be restricted via types or at runtime or both or neither?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I defer to @naugtur

@boneskull
Copy link
Collaborator Author

boneskull commented May 30, 2024

@kriskowal Note: there's no explicit test for the archive use-case. I am not sure if it's needed?

/** @type {ParserForLanguage} */
const finalParserForLanguage = Object.create(null);

for (const [language, parser] of [
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that putting stuff in parserForLanguage seems to leak out of tests, so a) I stopped doing that, and b) I froze them.

finalLanguageForExtension[extension] = language;
}
for (const [extension, language] of entries(languageForExtension)) {
finalLanguageForExtension[extension] = language;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I should write anything back to the CompartmentDescriptor.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is fine.

/** @type {Record<string, Language>} */
const finalLanguageForExtension = Object.create(null);
for (const [extension, language] of entries(customLanguageForExtension)) {
if (extension in languageForExtension) {
Copy link
Collaborator Author

@boneskull boneskull May 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I be using has() instead of this? It's habit (because TypeScript seems to prefer it).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since they’re equivalent for Object.create(null), I’m inclined to overlook the question. Unless I’m wrong about them being equivalent. @erights for second opinion.

Copy link
Member

@kriskowal kriskowal May 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In boneskull/native-parser...kriskowal-native-parser, I did some additional work to ensure that in behaves as I would expect. In the case of options where a devious user might provide an object with a prototype chain that includes enumerable properties, the difference between has and in would be observable, but that can be avoided by capturing options with freeze(create(null), option).

Copy link
Member

@kriskowal kriskowal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is close to ready.

I investigated changing the parsers option to separate parserForLanguage and languageForExtension options and I like the results. I offer these commits for your consideration. I would like to subsume it into this PR in some fashion or another and I am not concerned about preserving my authorship metadata. boneskull/native-parser...kriskowal-native-parser
We typically use a rebase and merge commit and preserve the narrative of the commit history. Let me know if it was your intention to squash+merge the PR or if you want me to review the narrative of the commit history.

finalLanguageForExtension[extension] = language;
}
for (const [extension, language] of entries(languageForExtension)) {
finalLanguageForExtension[extension] = language;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is fine.

packages/compartment-mapper/src/node-modules.js Outdated Show resolved Hide resolved
@@ -200,7 +205,7 @@ export const assertPackagePolicy = (allegedPackagePolicy, path, url) => {
* It also moonlights as a type guard.
*
* @param {unknown} allegedPolicy - Alleged `Policy` to test
* @returns {asserts allegedPolicy is import('./types.js').Policy|undefined}
* @returns {asserts allegedPolicy is SomePolicy|undefined}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a precedent for Some* types already? Can we consider Maybe* instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a "maybe" type. This playground illustrates why this is a thing.

If I were to explain this in a nutshell: when we use Policy without a type argument, it artificially narrows the type to Policy<void, void, void, unknown>. We may be able to safely assume this currently, but that is not guaranteed. Using SomePolicy prevents future headaches, essentially.

@@ -62,6 +62,12 @@ const q = JSON.stringify;
{
globals: ['a', {}],
},
{
options: {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I defer to @naugtur

Comment on lines 45 to 54
const parserForLanguage = freeze(
/** @type {const} */ ({
mjs: parserArchiveMjs,
'pre-mjs-json': parserArchiveMjs,
cjs: parserArchiveCjs,
'pre-cjs-json': parserArchiveCjs,
json: parserJson,
text: parserText,
bytes: parserBytes,
}),
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You caught them all.

boneskull and others added 19 commits June 3, 2024 12:40
The reasoning behind this is that a custom parser may want to use the `CompartmentDescriptor` to do things like policy enforcement.
This swaps out the "native module" fixture for a "markdown" fixture. The test parser implementation is now a "markdown parser".

While the native module fixture was useful, it presents some practical portability issues.
Due to the type parameter defaults, these assertions _actually_ said, e.g.,

```ts
asserts value is Policy<void, void, void, void>
```

...which is not necessarily true.  This fixes the problem by adding `SomePolicy` and `SomePackagePolicy`.
This adds an optional property, `options`, to the `PackagePolicy` type.

Here, consumers can add any fields or metadata they need to the policy. Any values will pass through the policy validator _alive, and unspoiled_.

Endo should not read the contents of this property directly.  The default type of `unknown` means that access to the value must have an explicit type assertion; it is the responsibility of the consumer to provide a proper type for this field.
Changed `policy` prop to `SomePackagePolicy`.
Co-authored-by: Kris Kowal <kris@agoric.com>
Co-authored-by: Kris Kowal <kris@agoric.com>
- Adds tests for more situations
- Fixes prevention of overriding builtin parsers
- types for various options consolidated
@boneskull
Copy link
Collaborator Author

@kriskowal OK, I've cherry-picked your changes, then fixed a problem in bundle.js. This is ready for final review. Once approved, I'll squash and merge.

Co-authored-by: Kris Kowal <kris@agoric.com>
@kriskowal kriskowal changed the title feat(compartment-mapper): custom parser support feat(compartment-mapper): Custom parser support Jun 3, 2024
Copy link
Member

@kriskowal kriskowal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thank you! And thanks for offering to squash+merge. No notes on commit structure or messages.

@boneskull boneskull merged commit 43d867e into master Jun 3, 2024
17 checks passed
@boneskull boneskull deleted the boneskull/native-parser branch June 3, 2024 23:07
@boneskull boneskull mentioned this pull request Jun 5, 2024
2 tasks
boneskull added a commit that referenced this pull request Jun 6, 2024
## Description

This exposes `captureFromMap()` in `capture-lite.js`.

This function is similar to e.g., `makeArchiveFromMap()` in
`archive-lite.js`; but rather than creating a `.zip` archive, it simply
returns the fully-completed `CompartmentMapDescriptor`, `Sources`, and a
mapping of filename to compartment map name.

This information is needed for next-gen-lavamoat-node ("endomoat")'s
automatic policy generation.

Another commit disables the hardcoded check for parsers in the
compartment map validation functions (which are no longer necessary
after #2304).

### Questions

- Should this be split into two PRs?
- Should any of this be renamed?
- Internal functions were copy/pasted from `archive-lite.js` into
`capture-lite.js`. Should these be extracted into a shared module?

### Security Considerations

None that I'm aware of.

### Scaling Considerations

If anything, it may shave a few nanoseconds off of compartment map
validation.

### Documentation Considerations

Probably should be added to `NEWS.md`.

### Testing Considerations

- [ ] The compartment map validation is currently not tested in
isolation and probably should be (removing the parser-name assertion did
not cause a test to fail)
- [x] `captureFromMap()` needs some sort of basic round-trip test. I
think a snapshot of the return value may suffice?

### Compatibility Considerations

None

### Upgrade Considerations

None

---------

Co-authored-by: Kris Kowal <kris@agoric.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

custom parser support
4 participants