Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand manifest validation for all settings #4377

Open
behnam opened this issue Aug 7, 2017 · 12 comments
Open

Expand manifest validation for all settings #4377

behnam opened this issue Aug 7, 2017 · 12 comments
Labels
A-diagnostics Area: Error and warning messages generated by Cargo itself. A-interacts-with-crates.io Area: interaction with registries A-new-subcommand Area: new subcommand C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` S-blocked-external Status: ❌ blocked on something out of the direct control of the Cargo project, e.g., upstream fix

Comments

@behnam
Copy link
Contributor

behnam commented Aug 7, 2017

There are some practical difficulties arising from the fact that manifest files are not always validated. I want to collect such cases, so we can improve the user experience, specially for newbies, by showing warnings about bad manifest entries, as soon as possible.

One important fact here is that we almost never want to throw an error on invalid entries, because it will affect forward-compatibility: would make many future changes non-backward-compatible.

Some Examples

I think almost every config item can have some sort of validation. This list only presents possible validation methods for different types of values.

  • package.keywords: Having invalid keywords strings only surface during packaging/publishing. See cargo publish should warn on invalid categories/keywords #4300 for details.

  • package.categories: This is a bit harder, because the list is maintained in crates.io repo. But, like any other index-related data, it can be fetched, cached, and checked against.

  • package.include, package.exclude, workspace.exlucde, workspace.members: These are configs with pattern-matching and warning on invalid patterns can help users track down packaging issues easier and faster, and make changes in those areas easier. (See Change Cargo include/exclude rules to gitignore patterns #4268)

  • package.readme and other file paths: Need to check the existence of the file, if linked. In addition, we can also warn on missing the config, if a best-guess file is present.

  • package.homepage and other URLs: Perform URL validation and check URLs against the newly-implemented blacklist. (blacklist is maintain in crates.io, so this would be another index-dependent check.)

  • features.<name> Cargo feature name validation inconsistent with crates.io #5554

  • Banning of wildcard dependencies --dry-run does not validate dependencies #5941

@behnam
Copy link
Contributor Author

behnam commented Aug 7, 2017

cc @alexcrichton, @carols10cents

behnam added a commit to behnam/rust-cargo that referenced this issue Aug 7, 2017
Background: rust-lang#4268

This diff takes us to **Stage 1.1** of the migration plan by allowing
glob patterns to include a leading slash, so that glob patterns can be
updated, if needed, to start with a slash, closer to the future behavior
with gitignore-like matching.

Why is this stage needed?

It's common to have `package.include` set like this:
```
include = ["src/**"]
```
In old interpretation, this would only include all files under the `src`
directory under the package root. With the new interpretation, this
would match any path with some directory called `src`, even if it's not
directly under the package root.

After this patch, package owners can start marking glob patters with a
leading slash to fix the warning thrown, if any.

One thing to notice here is that there are no extra matchings, but, if
a manifest has already a pattern with a leading slash, this would
silently start matching it with the paths. I believe this is fine, since
the old behavior would have been for the pattern to not match anything,
therefore the item was useless.

See also <rust-lang#4377> for suggestion
to throw warning on useless/invalid patterns in these fields.
@alexcrichton
Copy link
Member

Seems plausible to me! All this validation happens on crates.io though which I'd prefer to not duplicate, although we can perhaps do simple things like validate homepage is a URL locally for sure. (also agreed with warnings-for-now)

bors added a commit that referenced this issue Aug 8, 2017
[sources/path] Support leading slash in glob patterns

Background: #4268

This diff takes us to **Stage 1.1** of the migration plan by allowing
glob patterns to include a leading slash, so that glob patterns can be
updated, if needed, to start with a slash, closer to the future behavior
with gitignore-like matching.

Why is this stage needed?

It's common to have `package.include` set like this:
```
include = ["src/**"]
```
In old interpretation, this would only include all files under the `src`
directory under the package root. With the new interpretation, this
would match any path with some directory called `src`, even if it's not
directly under the package root.

After this patch, package owners can start marking glob patters with a
leading slash to fix the warning thrown, if any.

One thing to notice here is that there are no extra matchings, but, if
a manifest has already a pattern with a leading slash, this would
silently start matching it with the paths. I believe this is fine, since
the old behavior would have been for the pattern to not match anything,
therefore the item was useless.

See also <#4377> for suggestion
to throw warning on useless/invalid patterns in these fields.
@behnam
Copy link
Contributor Author

behnam commented Aug 8, 2017

Some more background and my rational for this proposal: The other day, for a new release for UNIC, I was updating a dozen of manifests, mostly their metadata. Since it's not vise to package/publish with dirty repository, I had to commit after each publish error, and try publishing again. This resulted in a handful of commits just catching all the small nits from the publish command, like "don't use space in the keywords list".

Agreed that we better have one source of truth. However, I also believe we can benefit a lot by moving this one source of truth to a library that can be used in client side (cargo CLI, clippy, etc), as well as server side (crates.io, etc). I call it the cargo-manifest crate, which I think should spin out of cargo proper (which hosts all the CLIs, etc). This would allow cargo, crates.io and third-party CLIs to share the same manifest read/validation logic, without the need to have cargo execution details imported as dependency.

What do you think?

@Boddlnagg
Copy link
Contributor

rust-lang/crates.io#7250 is also related to this (see point 2.)

bors added a commit that referenced this issue Aug 9, 2017
[sources/path] Support leading slash in glob patterns

Background: #4268

This diff takes us to **Stage 1.1** of the migration plan by allowing
glob patterns to include a leading slash, so that glob patterns can be
updated, if needed, to start with a slash, closer to the future behavior
with gitignore-like matching.

Why is this stage needed?

It's common to have `package.include` set like this:
```
include = ["src/**"]
```
In old interpretation, this would only include all files under the `src`
directory under the package root. With the new interpretation, this
would match any path with some directory called `src`, even if it's not
directly under the package root.

After this patch, package owners can start marking glob patters with a
leading slash to fix the warning thrown, if any.

One thing to notice here is that there are no extra matchings, but, if
a manifest has already a pattern with a leading slash, this would
silently start matching it with the paths. I believe this is fine, since
the old behavior would have been for the pattern to not match anything,
therefore the item was useless.

See also <#4377> for suggestion
to throw warning on useless/invalid patterns in these fields.
@carols10cents carols10cents added A-diagnostics Area: Error and warning messages generated by Cargo itself. A-interacts-with-crates.io Area: interaction with registries A-new-subcommand Area: new subcommand C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` labels Aug 27, 2017
@weihanglo weihanglo added the S-blocked-external Status: ❌ blocked on something out of the direct control of the Cargo project, e.g., upstream fix label May 14, 2023
@Yatekii
Copy link

Yatekii commented May 31, 2023

This is a painpoint that happens often to me.

Would it be possible to implement a preflight like endpoint in crates.io to do the validation before the publish?

I would love to work on this.

@weihanglo
Copy link
Member

Would it be possible to implement a preflight like endpoint in crates.io to do the validation before the publish?

The solution should come from a discussion with crates.io team. I don't have the answer to it, but Zulip t-crates.io is a better place to have such a chat with crates.io team I believe.

@hi-rustin
Copy link
Member

The solution should come from a discussion with crates.io team. I don't have the answer to it, but Zulip t-crates.io is a better place to have such a chat with crates.io team I believe.

Would it be possible to implement a preflight like endpoint in crates.io to do the validation before the publish?

In our previous crates.io team meeting, we expressed concerns regarding the potential performance issues that could arise from introducing this validation to the publish API. The API is resource-intensive as it requires unzipping the tarball to retrieve metadata, which could also lead to the consumption of your rate limit.

Thus, we're inclined first to unify the validation rules of Cargo and crates.io. This would be a good initial step, and I am currently working on this.

See: rust-lang/crates.io#7379

@Yatekii
Copy link

Yatekii commented Nov 6, 2023

Even if the rules are unified, a divergence between crates.io and cargo can occur if the cargo version is outdated.

I fail to see how uploading a bit of metadata via some simple API request is a problem. Admittedly I do not know how the API looks like at all.

@epage
Copy link
Contributor

epage commented Nov 6, 2023

In going through cargo's triage (or talking to people about it), I remember seeing somewhere that some differences were intentional.

An alternative route to go, once we have #12235, is to have the crates-io specific validation rules in a crate and a cargo linter would expose lints for crates-io and "all registries" published packages that don't conform to those validation rules. During the times where crates-io is out of sync with someone's MSRV, they can allow certain lints.

@epage
Copy link
Contributor

epage commented Nov 6, 2023

I fail to see how uploading a bit of metadata via some simple API request is a problem. Admittedly I do not know how the API looks like at all.

For the publish workflow, crates.io has been shifting away from "trust the metadata" to "parse Cargo.toml". That is what is expensive. Yes, there are a lot of design decisions that can be made, with various trade offs, that might get us to having server-side validation.

For example,

  • we could have a dry-run today and they just validate the metadata but (1) they might now be checking more than we upload and (2) this can require them to duplicate a lot of logic
  • We could POST the Cargo.toml but that file is incomplete without the associated project (I do want to auto-fill in most fields that are implied from the file system)
  • If we post an entire .crate then we run in the server-side performance issues

@Yatekii
Copy link

Yatekii commented Nov 6, 2023

Yeah I see that sending just an extract might be error prone.

Also, upon further reflection, I think just syncing the two checkers or at least having a crates.io-oriented validation crate would be enough. For people that do not want to keep up with the release cadence of cargo it will fail, but for everyone else this is perfect :)

@epage
Copy link
Contributor

epage commented Jan 22, 2024

One thought on how to solve this

  • crates.io split their limits into a package for cargo to pull in
  • Cargo check the limits in a crates.io-specific set of lints (User control over cargo warnings #12235). These could even be deny by default as the user can workaround limit changes (global or per user) by making the lint an allow lint.

One question is when should these lints fire. With publish = true (which is default though we've talked about changing that in #6153), all registries are matched. At least with #12786, we have truly private packages, more easily. So do we emit them on all commands, even if they won't apply? Do we reserve this for package and publish but then people won't find out until its too late? Do we do this in cargo's linter that we've talked about? imo no real solid "this is the way" answer.

In the lint, we should

  • Mention making the package private as a way to disable
  • Narrow down the list of registries if they aren't using crates.io (only if other registries are configured)
  • Disable the lint if it doesn't apply
  • Reach out to crates.io about limit issues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-diagnostics Area: Error and warning messages generated by Cargo itself. A-interacts-with-crates.io Area: interaction with registries A-new-subcommand Area: new subcommand C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` S-blocked-external Status: ❌ blocked on something out of the direct control of the Cargo project, e.g., upstream fix
Projects
None yet
Development

No branches or pull requests

8 participants