Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite Cargo.toml on publication to crates.io #4027

Closed
alexcrichton opened this issue May 10, 2017 · 5 comments · Fixed by #4030
Closed

Rewrite Cargo.toml on publication to crates.io #4027

alexcrichton opened this issue May 10, 2017 · 5 comments · Fixed by #4030

Comments

@alexcrichton
Copy link
Member

Right now when Cargo publishes a crate to crates.io it doesn't modify the manifest at all. This in turn means that literally checking out the manifest on crates.io and attempting to build it may fail! Notably:

  • path dependencies are listed, likely to nonexistent paths
  • The [workspace] key may be listed, pointing to a nonexistent path

This has long been ok in Cargo as it reinterprets dependencies internally, but many situations have arisen over time which make this just too difficult to work with:

  • When you've got a bunch of vendored sources lying nearby you may wish to edit them, and the most natural way to do that is to [replace] point at them, but the previously mentioned points can prevent this from working.
  • Cargobomb/crater test crates.io crates and frob the tomls to get them to work.
  • Packagers have asked in the past for a way to deal with this if they're packaging published code to crates.io.
@alexcrichton
Copy link
Member Author

cc @brson

@cuviper
Copy link
Member

cuviper commented May 11, 2017

/sub

I had thought that the way to solve this would be to just have a command-line option which ignores the parts that you're proposing to strip here. So you still wouldn't be able to do a bare cargo build, but you might be able to use cargo build --paint-my-bikeshed to ignore path-deps, workspaces, etc.

@alexcrichton
Copy link
Member Author

@cuviper I think that'd work in some situations, yeah, but it's not necessarily composable. The primary use case driving this, for example, is for projects like Gecko to use [replace] pointing at the vendored source code that they already have in the repository, editing it afterwards. In that sense there's no commands being executed in the directory, but rather just a typical path dependency.

I suppose we could add a feature like:

[dependencies]
foo = { path = 'vendor/foo', special-dep = true }

Although that'd be kinda unfortunate. My thinking is that in theory rewriting the manifest will work, it's just that in practice we're probably messing something up unfortunately.

@cuviper
Copy link
Member

cuviper commented May 12, 2017

Maybe you could just use a file in the crate root to indicate special handling? e.g. cargo-vendor already adds .cargo-ok and .cargo-checksum.json, so you could change behavior based on those.

@alexcrichton
Copy link
Member Author

It's possible, yeah. Similar to some of the discussion on #4030. The downside though is that any out-of-band marker is a stable interface to Cargo, wheras simply rewriting Cargo.toml doesn't expose any new stable API surface area in Cargo.

There's also the point in that linked comment about "what do paths default to?" Today it's crates.io but one could imagine that with a future implementation it may be different registry, perhaps context dependent on where you publish to (in which case a rewrite could come in handy). This out of band file could also contain the information, though.

alexcrichton added a commit to alexcrichton/cargo that referenced this issue May 17, 2017
This commit is an implementation of rewriting TOML manifests when we publish
them to the registry. The rationale for doing this is to provide a guarantee
that downloaded tarballs from crates.io can be built with `cargo build`
(literally). This in turn eases a number of other possible consumers of crates
from crates.io

* Vendored sources can now be more easily modified/checked as cargo build should
  work and they're standalone crates that suffice for `path` dependencies
* Tools like cargobomb/crater no longer need to edit the manifest and can
  instead perform regression testing on the literal tarballs they download
* Other systems such as packaging Rust code may be able to take advantage of
  this, but this is a less clear benefit.

Overall I'm hesitatnt about this, unfortunately. This is a silent translation
happening on *publish*, a rare operation, that's difficult to inspect before it
flies up to crates.io. I wrote a script to run this transformation over all
crates.io crates and found a surprisingly large number of discrepancies. The
transformation basically just downloaded all crates at all versions,
regenerated the manifest, and then tested if the two manifests were (in memory)
the same.

Unfortunately historical Cargo had a critical bug which I think made this
exercise not too useful. Cargo used to *not* recreate tarballs if one already
existed, which I believe led to situations such as:

1. `cargo publish`
2. Cargo generates an error about a dependency. This could be that there's a
   `version` not present in a `path` dependency, there could be a `git`
   dependency, etc.
3. Errors are fixed.
4. `cargo publish`
5. Publish is successful

In step 4 above historical Cargo *would not recreate the tarball*. This means
that the contents of the index (what was published) aren't guaranteed to match
with the tarball's `Cargo.toml`. When building from crates.io this is ok as the
index is the source of truth for dependency information, but it means that *any*
transformation to rewrite Cargo.toml is impossible to verify against all crates
on crates.io (due to historical bugs like these).

I strove to read as many errors as possible regardless, attempting to suss out
bugs in the implementation here. To further guard against surprises I've updated
the verification step of packaging to work "normally" in these sense that it's
not rewriting dependencies itself or changing summaries. I'm hoping that this
serves as a good last-ditch effort that what we're about to publish will indeed
build as expected when uploaded to crates.io

Overall I'm probably 70% confident in this change. I think it's necessary to
make progress, but I think there are going to be very painful bugs that arise
from this feature. I'm open to ideas to help weed out these bugs ahead of time!
I've done what I can but I fear it may not be entirely enough.

Closes rust-lang#4027
bors added a commit that referenced this issue May 18, 2017
Rewrite Cargo.toml when packaging crates

This commit is an implementation of rewriting TOML manifests when we publish
them to the registry. The rationale for doing this is to provide a guarantee
that downloaded tarballs from crates.io can be built with `cargo build`
(literally). This in turn eases a number of other possible consumers of crates
from crates.io

* Vendored sources can now be more easily modified/checked as cargo build should
  work and they're standalone crates that suffice for `path` dependencies
* Tools like cargobomb/crater no longer need to edit the manifest and can
  instead perform regression testing on the literal tarballs they download
* Other systems such as packaging Rust code may be able to take advantage of
  this, but this is a less clear benefit.

Overall I'm hesitatnt about this, unfortunately. This is a silent translation
happening on *publish*, a rare operation, that's difficult to inspect before it
flies up to crates.io. I wrote a script to run this transformation over all
crates.io crates and found a surprisingly large number of discrepancies. The
transformation basically just downloaded all crates at all versions,
regenerated the manifest, and then tested if the two manifests were (in memory)
the same.

Unfortunately historical Cargo had a critical bug which I think made this
exercise not too useful. Cargo used to *not* recreate tarballs if one already
existed, which I believe led to situations such as:

1. `cargo publish`
2. Cargo generates an error about a dependency. This could be that there's a
   `version` not present in a `path` dependency, there could be a `git`
   dependency, etc.
3. Errors are fixed.
4. `cargo publish`
5. Publish is successful

In step 4 above historical Cargo *would not recreate the tarball*. This means
that the contents of the index (what was published) aren't guaranteed to match
with the tarball's `Cargo.toml`. When building from crates.io this is ok as the
index is the source of truth for dependency information, but it means that *any*
transformation to rewrite Cargo.toml is impossible to verify against all crates
on crates.io (due to historical bugs like these).

I strove to read as many errors as possible regardless, attempting to suss out
bugs in the implementation here. To further guard against surprises I've updated
the verification step of packaging to work "normally" in these sense that it's
not rewriting dependencies itself or changing summaries. I'm hoping that this
serves as a good last-ditch effort that what we're about to publish will indeed
build as expected when uploaded to crates.io

Overall I'm probably 70% confident in this change. I think it's necessary to
make progress, but I think there are going to be very painful bugs that arise
from this feature. I'm open to ideas to help weed out these bugs ahead of time!
I've done what I can but I fear it may not be entirely enough.

Closes #4027
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants