Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support a user-prompt-less version of cargo vet certify #293

Open
tomrittervg opened this issue Aug 16, 2022 · 11 comments
Open

Support a user-prompt-less version of cargo vet certify #293

tomrittervg opened this issue Aug 16, 2022 · 11 comments

Comments

@tomrittervg
Copy link

In order for Updatebot to be able to update rust crates (for crate owners who want updatebot to do so, like wgpu) we can detect when certification is needed and then run cargo vet certify package v1 v2 - but this command expects user input. We would like to run it in automation without user input. We would need, at a minimum, a --username flag we could use to populate as the user's email and username.

Mozilla-process-wise: After the bot performs the certification-thats-not-really-a-certification, the audits.toml file will become part of the patch submitted to phabricator and the user can approve their patch on Phabricator. supply-chain-reviewers will also be flagged for review, and I think by policy we could require the user to add a comment noting they have actually reviewed the patch and attest to the certification that Updatebot wrote for them. (And Updatebot can add a comment to phabricator to tell them to do so.)

Related: #151

@bholley
Copy link
Collaborator

bholley commented Aug 16, 2022

I'm a bit concerned about bypassing the EULA in these situations, and would generally rather not have each bot implementer winging the UX. Perhaps we can also add a flag to print the same text that you get in the interactive screen, and then the bot could embed that as a comment on the bug when attaching the patch for review?

@tomrittervg
Copy link
Author

Would that also update the audits.toml file automatically?

@mystor
Copy link
Collaborator

mystor commented Aug 19, 2022

I'm spitballing a bit here, but what if we had a command like:

cargo vet automation bulk-audit --username= [--save-audits] [--force-update-imports]

This command would automatically generate audits for everything such that a cargo vet would pass, including importing new packages if needed and updating audit-as-crates-io, and would produce a JSON output to describe the audits which were performed which the bot could use to present to the user. Perhaps something like:

{
  "changes": [
    // if an import was updated, this is generated, and will list any criteria which were
    // changed upstream and need to be approved (may be none)
    { "type": "UpdateImport", "criteria_changes": [ ... ] }, 
    // If an exemption was removed as it is no longer necessary (e.g. due to an import),
    // it will be listed here.
    { "type": "ExemptionRemoved", "crate": "...", "version": "...", "criteria": [...] },
    // If a crate was automatically marked as audit-as-crates-io = false, it could also be listed
    { "type": "SetAuditAsCratesIo", "crate": "...", "audit-as-crates-io": false },
    // If an audit was added for a crate, it will be listed here.
    {
      "type": "FullAudit",
      "crate": "...",
      "version": "...",
      "criteria": [...],
      "dependency_criteria": ..., // we might infer this some day?
      "eulas": { "safe-to-deploy": ... },
      "inspect_url": "https://..."
    },
    {
      "type": "DeltaAudit",
      "crate": "...",
      "from_version": "...",
      "to_version": "...",
      "criteria": [...],
      "dependency_criteria": ...,
      "eulas": { "safe-to-deploy": ... },
      "inspect_url": "https://..."
    }
  ]
}

One potential issue I see here is the lack of an ability to add notes to the given audits, which presumably would need to happen in response to the user e.g. commenting on the phabricator revision. Perhaps when --save-audits is passed to actually save the changes to audits.toml, bulk-audit needs to be fed input JSON with the actual audits to perform, including notes, so you'll generally run this twice: once to get the information to present to the user, and then again with the notes and commentary from the user saying that they approve the given audits. This might be a bit weird because imports may change the second time around, but that could be mitigated (e.g. updating the lockfile eagerly on the first run and running the second time with --locked so that we don't touch the network)

This is a bit sketchy, but perhaps we could add a new file to the store, like pending_audits.toml, which should never be checked into the actual repository but contains audits which have been generated, but not actually approved, and have a mode for cargo vet to run under which includes pending audits (otherwise having pending audits would be a vet failure). This might theoretically allow us to expand this workflow to also be usable locally as another interactive way to update the tree and apply audits from the CLI in the future, sort-of like a sharable expansion on vet's suggestions + the fetch command cache. Not sure how much value it'd bring, so probably not worth adding unless it brings benefits to automation though.

We'd presumably also report errors like violations which can't be solved automatically just like with cargo vet.

@bholley
Copy link
Collaborator

bholley commented Aug 19, 2022

So the fundamental issue here is that there are some interactive steps (reading the criteria, doing the audit, accepting the EULA, adding notes) that we'd like users to do. We have a command-line experience for this, but we'd also like to permit people to do it from the browser with a bot-based workflow.

Engineering each of these interactive steps as an explicit affordance in the bot seems sub-optimal on a number of fronts. As such, I wonder if we could do something like this:

(1) Automation invokes cargo vet --bot
(2) Cargo-vet emits a string-serialized data blob that captures the steps needed (generally: need audits X, Y, and Z).
(3) There is a cloud web service that accepts the blob from (2) and mimics the interactive steps, generating a serialized output token.
(4) Automation posts a comment with a URL like https://web.service?input=BLOB
(5) User goes through the flow, and paste the output token back into the conversation with the bot.
(6) Bot passes the token to cargo-vet, which generates the patch and pushes it as an addendum to the original changeset.

The engineering of (3) is of course non-trivial, but perhaps more tractable than it first appears with the help of off-the-shelf wasm tooling.

@bholley
Copy link
Collaborator

bholley commented Aug 19, 2022

So I think the above might actually be quite doable. The basic idea would be to abstract cargo-vet's interactive logic into a tiny standalone library that operates on serialized inputs and outputs and uses hooks to interact with the user. The existing implementation would attach the hooks to the CLI as they are now, and then we'd build a simple JS interface that talks to a WASM-ified version of the library (using something like this) and hooks it up to buttons and textareas.

@mystor
Copy link
Collaborator

mystor commented Aug 19, 2022

This sounds roughly like my concept (#293 (comment)), but with a specific blessed web service which provides the UX flow rather than spitting the results out as JSON, and asking the bot to provide the UX.

If we decide to serialize the state directly in the URI, it probably limits our future options in the UI somewhat, as we probably can't serialize the entire audit universe to the UI. This would mean we couldn't e.g. refine audit suggestions as the user adds audits with custom dependency_criteria by re-running the resolver in the UI.

Perhaps we could ask bots to host a CORS-accessible document with the full serialized state of the audit universe (as generated by a cargo vet command) which would be passed as a query parameter to the web-based UI? The bot could theoretically also host a POST endpoint which the final audits could be sent to automatically to avoid manually ferrying the data between the bot and the user's browser (though obviously the endpoint would also need to perform some authorization checks on the user to make sure they're allowed to submit those audits)

If we're clever about it we could potentially even offer this as a thing you can access locally, where we start a HTTP server as-if we were a bot from within cargo vet and then load the web UI, which would submit the results back to our original localhost HTTP server.

All of this is probably out-of-scope for a proof of concept though :-)

@bholley
Copy link
Collaborator

bholley commented Aug 19, 2022

This sounds roughly like my concept (#293 (comment)), but with a specific blessed web service which provides the UX flow rather than spitting the results out as JSON, and asking the bot to provide the UX.

Strictly speaking the web service wouldn't be blessed — you'd just want to interact with a wasmified version of the cargo-vet code whose version matched the one the bot was running in automation.

If we decide to serialize the state directly in the URI, it probably limits our future options in the UI somewhat, as we probably can't serialize the entire audit universe to the UI. This would mean we couldn't e.g. refine audit suggestions as the user adds audits with custom dependency_criteria by re-running the resolver in the UI.

I would be comfortable leaving such refinements as out-of-scope for now — particularly because the bot workflow will mostly be about updating existing packages, where it's less likely that you'll be bringing in a new subtree for which you want a different policy. If people want to do fancy stuff they can drop down to the CLI.

@mystor
Copy link
Collaborator

mystor commented Aug 19, 2022

If we're not doing refinement work, I don't know how much of the core cargo-vet library we'd actually want/need to share with the UI, it might be simpler to instead write the UI with javascript directly, and serialize/deserialize through json & query parameters.

The main advantage I could see would be to have a shared serde serialization definition, I'm not sure what else would benefit from being shared. Things like criteria descriptions, crate versions, and criteria selections probably need to be embedded in the query parameter anyway.

@bholley
Copy link
Collaborator

bholley commented Aug 20, 2022

I think the primary advantage of sharing the code is that we don't have to worry about keeping the implementations synchronized (so that they both behave consistently and interoperate correctly).

If the generic driver operates on the level of "display this text", "get text input", etc, then the JS code doesn't need to know anything about cargo-vet-specific like criteria or crate versions. If we can ensure that the wasm code is always same-revision with the code in automation, we don't need to worry about interoperability between versions, allowing the serialization to be totally opaque. And using the same code-paths for regular cargo-vet (including the serialization) minimizes the chance that we'll accidentally break the less-frequently-used web flow.

@mystor
Copy link
Collaborator

mystor commented Aug 22, 2022

I threw together an (extremely rough) idea of what the UX could potentially look like for an app like this (https://band-organic-lion.glitch.me/, don't look at the JS it's awful). Something like this wouldn't require us to load a wasm module with cargo-vet in it, and should be simple enough we could potentially have it within the cargo-vet repository, and shipping as include_bytes! constants within the binary if we wanted.

Roughly for something like this we'd need to send down the following bits of information somehow (e.g. through a JSON blob):

  • For each crate being certified
    1. Name
    2. Version/delta being certified
    3. Default criteria to suggest certifying for
  • For each criteria possible to certify for
    1. Name
    2. Full description text
    3. Set of criteria which imply this criteria

The result could then be submitted back by wrapping everything in a <form tag and POST-ing it to some endpoint, which would include the list of criteria being certified, notes, and a flag signifying that we've accepted the EULAs for each.

I worry a bit about the original idea of passing the information down in query parameters specifically because of the need to serialize the descriptions for each criteria, which could be fairly long (e.g. safe-to-deploy's description is 991 characters long). It might be doable though.

This all doesn't provide any way to e.g. control dependency_criteria but that is substantially more complex, and probably warrants dropping down to the CLI for now.

@bholley
Copy link
Collaborator

bholley commented Aug 24, 2022

I threw together an (extremely rough) idea of what the UX could potentially look like for an app like this (https://band-organic-lion.glitch.me/, don't look at the JS it's awful). Something like this wouldn't require us to load a wasm module with cargo-vet in it, and should be simple enough we could potentially have it within the cargo-vet repository, and shipping as include_bytes! constants within the binary if we wanted.

This seems like a fine approach.

I worry a bit about the original idea of passing the information down in query parameters specifically because of the need to serialize the descriptions for each criteria, which could be fairly long (e.g. safe-to-deploy's description is 991 characters long). It might be doable though.

Looks like it brotli-compresses down to 377 bytes, which seems promising. Generally-speaking I'd prefer to wrap up the bidirectional payloads in an opaque bundle anyway.

This all doesn't provide any way to e.g. control dependency_criteria but that is substantially more complex, and probably warrants dropping down to the CLI for now.

Agreed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants