Skip to content

Commit

Permalink
Rewrite Cargo.toml when packaging crates
Browse files Browse the repository at this point in the history
This commit is an implementation of rewriting TOML manifests when we publish
them to the registry. The rationale for doing this is to provide a guarantee
that downloaded tarballs from crates.io can be built with `cargo build`
(literally). This in turn eases a number of other possible consumers of crates
from crates.io

* Vendored sources can now be more easily modified/checked as cargo build should
  work and they're standalone crates that suffice for `path` dependencies
* Tools like cargobomb/crater no longer need to edit the manifest and can
  instead perform regression testing on the literal tarballs they download
* Other systems such as packaging Rust code may be able to take advantage of
  this, but this is a less clear benefit.

Overall I'm hesitatnt about this, unfortunately. This is a silent translation
happening on *publish*, a rare operation, that's difficult to inspect before it
flies up to crates.io. I wrote a script to run this transformation over all
crates.io crates and found a surprisingly large number of discrepancies. The
transformation basically just downloaded all crates at all versions,
regenerated the manifest, and then tested if the two manifests were (in memory)
the same.

Unfortunately historical Cargo had a critical bug which I think made this
exercise not too useful. Cargo used to *not* recreate tarballs if one already
existed, which I believe led to situations such as:

1. `cargo publish`
2. Cargo generates an error about a dependency. This could be that there's a
   `version` not present in a `path` dependency, there could be a `git`
   dependency, etc.
3. Errors are fixed.
4. `cargo publish`
5. Publish is successful

In step 4 above historical Cargo *would not recreate the tarball*. This means
that the contents of the index (what was published) aren't guaranteed to match
with the tarball's `Cargo.toml`. When building from crates.io this is ok as the
index is the source of truth for dependency information, but it means that *any*
transformation to rewrite Cargo.toml is impossible to verify against all crates
on crates.io (due to historical bugs like these).

I strove to read as many errors as possible regardless, attempting to suss out
bugs in the implementation here. To further guard against surprises I've updated
the verification step of packaging to work "normally" in these sense that it's
not rewriting dependencies itself or changing summaries. I'm hoping that this
serves as a good last-ditch effort that what we're about to publish will indeed
build as expected when uploaded to crates.io

Overall I'm probably 70% confident in this change. I think it's necessary to
make progress, but I think there are going to be very painful bugs that arise
from this feature. I'm open to ideas to help weed out these bugs ahead of time!
I've done what I can but I fear it may not be entirely enough.

Closes #4027
  • Loading branch information
alexcrichton committed May 17, 2017
1 parent 62acd6e commit 57db8bd
Show file tree
Hide file tree
Showing 8 changed files with 351 additions and 84 deletions.
10 changes: 8 additions & 2 deletions src/cargo/core/manifest.rs
Original file line number Diff line number Diff line change
@@ -1,20 +1,22 @@
use std::collections::HashMap;
use std::fmt;
use std::path::{PathBuf, Path};
use std::rc::Rc;

use semver::Version;
use serde::ser;

use core::{Dependency, PackageId, Summary, SourceId, PackageIdSpec};
use core::WorkspaceConfig;
use util::toml::TomlManifest;

pub enum EitherManifest {
Real(Manifest),
Virtual(VirtualManifest),
}

/// Contains all the information about a package, as loaded from a Cargo.toml.
#[derive(Clone, Debug)]
#[derive(Clone)]
pub struct Manifest {
summary: Summary,
targets: Vec<Target>,
Expand All @@ -27,6 +29,7 @@ pub struct Manifest {
publish: bool,
replace: Vec<(PackageIdSpec, Dependency)>,
workspace: WorkspaceConfig,
original: Rc<TomlManifest>,
}

#[derive(Clone, Debug)]
Expand Down Expand Up @@ -222,7 +225,8 @@ impl Manifest {
profiles: Profiles,
publish: bool,
replace: Vec<(PackageIdSpec, Dependency)>,
workspace: WorkspaceConfig) -> Manifest {
workspace: WorkspaceConfig,
original: Rc<TomlManifest>) -> Manifest {
Manifest {
summary: summary,
targets: targets,
Expand All @@ -235,6 +239,7 @@ impl Manifest {
publish: publish,
replace: replace,
workspace: workspace,
original: original,
}
}

Expand All @@ -251,6 +256,7 @@ impl Manifest {
pub fn profiles(&self) -> &Profiles { &self.profiles }
pub fn publish(&self) -> bool { self.publish }
pub fn replace(&self) -> &[(PackageIdSpec, Dependency)] { &self.replace }
pub fn original(&self) -> &TomlManifest { &self.original }
pub fn links(&self) -> Option<&str> {
self.links.as_ref().map(|s| &s[..])
}
Expand Down
23 changes: 22 additions & 1 deletion src/cargo/core/package.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ use std::path::{Path, PathBuf};

use semver::Version;
use serde::ser;
use toml;

use core::{Dependency, Manifest, PackageId, SourceId, Target};
use core::{Summary, SourceMap};
Expand All @@ -16,7 +17,7 @@ use util::{CargoResult, Config, LazyCell, ChainError, internal, human, lev_dista
///
/// A package is a `Cargo.toml` file plus all the files that are part of it.
// TODO: Is manifest_path a relic?
#[derive(Clone, Debug)]
#[derive(Clone)]
pub struct Package {
// The package's manifest
manifest: Manifest,
Expand Down Expand Up @@ -117,6 +118,26 @@ impl Package {
manifest_path: self.manifest_path,
}
}

pub fn to_registry_toml(&self) -> String {
let manifest = self.manifest().original().prepare_for_publish();
let toml = toml::to_string(&manifest).unwrap();
format!("\
# THIS FILE IS AUTOMATICALLY GENERATED BY CARGO\n\
#\n\
# When uploading crates to the registry Cargo will automatically\n\
# \"normalize\" Cargo.toml files for maximal compatibility\n\
# with all versions of Cargo and also rewrite `path` dependencies\n\
# to registry (e.g. crates.io) dependencies\n\
#\n\
# If you believe there's an error in this file please file an\n\
# issue against the rust-lang/cargo repository. If you're\n\
# editing this file be aware that the upstream Cargo.toml\n\
# will likely look very different (and much more reasonable)\n\
\n\
{}\
", toml)
}
}

impl fmt::Display for Package {
Expand Down
4 changes: 3 additions & 1 deletion src/cargo/core/workspace.rs
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,9 @@ impl<'cfg> Workspace<'cfg> {
///
/// This is currently only used in niche situations like `cargo install` or
/// `cargo package`.
pub fn ephemeral(package: Package, config: &'cfg Config, target_dir: Option<Filesystem>,
pub fn ephemeral(package: Package,
config: &'cfg Config,
target_dir: Option<Filesystem>,
require_optional_deps: bool) -> CargoResult<Workspace<'cfg>> {
let mut ws = Workspace {
config: config,
Expand Down
72 changes: 38 additions & 34 deletions src/cargo/ops/cargo_package.rs
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,9 @@ use std::sync::Arc;
use flate2::read::GzDecoder;
use flate2::{GzBuilder, Compression};
use git2;
use tar::{Archive, Builder, Header};
use tar::{Archive, Builder, Header, EntryType};

use core::{SourceId, Package, PackageId, Workspace, Source};
use core::{Package, Workspace, Source, SourceId};
use sources::PathSource;
use util::{self, CargoResult, human, internal, ChainError, Config, FileLock};
use ops::{self, DefaultExecutor};
Expand Down Expand Up @@ -202,9 +202,6 @@ fn tar(ws: &Workspace,
human(format!("non-utf8 path in source directory: {}",
relative.display()))
})?;
let mut file = File::open(file).chain_error(|| {
human(format!("failed to open for archiving: `{}`", file.display()))
})?;
config.shell().verbose(|shell| {
shell.status("Archiving", &relative)
})?;
Expand All @@ -230,18 +227,41 @@ fn tar(ws: &Workspace,
// unpack the selectors 0.4.0 crate on crates.io. Either that or take a
// look at rust-lang/cargo#2326
let mut header = Header::new_ustar();
let metadata = file.metadata().chain_error(|| {
human(format!("could not learn metadata for: `{}`", relative))
})?;
header.set_path(&path).chain_error(|| {
human(format!("failed to add to archive: `{}`", relative))
})?;
let mut file = File::open(file).chain_error(|| {
human(format!("failed to open for archiving: `{}`", file.display()))
})?;
let metadata = file.metadata().chain_error(|| {
human(format!("could not learn metadata for: `{}`", relative))
})?;
header.set_metadata(&metadata);
header.set_cksum();

ar.append(&header, &mut file).chain_error(|| {
internal(format!("could not archive source file `{}`", relative))
})?;
if relative == "Cargo.toml" {
let orig = Path::new(&path).with_file_name("Cargo.toml.orig");
header.set_path(&orig)?;
header.set_cksum();
ar.append(&header, &mut file).chain_error(|| {
internal(format!("could not archive source file `{}`", relative))
})?;

let mut header = Header::new_ustar();
let toml = pkg.to_registry_toml();
header.set_path(&path)?;
header.set_entry_type(EntryType::file());
header.set_mode(0o644);
header.set_size(toml.len() as u64);
header.set_cksum();
ar.append(&header, toml.as_bytes()).chain_error(|| {
internal(format!("could not archive source file `{}`", relative))
})?;
} else {
header.set_cksum();
ar.append(&header, &mut file).chain_error(|| {
internal(format!("could not archive source file `{}`", relative))
})?;
}
}
let encoder = ar.into_inner()?;
encoder.finish()?;
Expand All @@ -262,30 +282,14 @@ fn run_verify(ws: &Workspace, tar: &File, opts: &PackageOpts) -> CargoResult<()>
}
let mut archive = Archive::new(f);
archive.unpack(dst.parent().unwrap())?;
let manifest_path = dst.join("Cargo.toml");

// When packages are uploaded to a registry, all path dependencies are
// implicitly converted to registry dependencies, so we rewrite those
// dependencies here.
//
// We also make sure to point all paths at `dst` instead of the previous
// location that the package was originally read from. In locking the
// `SourceId` we're telling it that the corresponding `PathSource` will be
// considered updated and we won't actually read any packages.
let cratesio = SourceId::crates_io(config)?;
let precise = Some("locked".to_string());
let new_src = SourceId::for_path(&dst)?.with_precise(precise);
let new_pkgid = PackageId::new(pkg.name(), pkg.version(), &new_src)?;
let new_summary = pkg.summary().clone().map_dependencies(|d| {
if !d.source_id().is_path() { return d }
d.clone_inner().set_source_id(cratesio.clone()).into_dependency()
});
let mut new_manifest = pkg.manifest().clone();
new_manifest.set_summary(new_summary.override_id(new_pkgid));
let new_pkg = Package::new(new_manifest, &manifest_path);

// Now that we've rewritten all our path dependencies, compile it!
// Manufacture an ephemeral workspace to ensure that even if the top-level
// package has a workspace we can still build our new crate.
let id = SourceId::for_path(&dst)?;
let mut src = PathSource::new(&dst, &id, ws.config());
let new_pkg = src.root_package()?;
let ws = Workspace::ephemeral(new_pkg, config, None, true)?;

ops::compile_ws(&ws, None, &ops::CompileOptions {
config: config,
jobs: opts.jobs,
Expand Down
2 changes: 1 addition & 1 deletion src/cargo/ops/cargo_rustc/context.rs
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ use super::layout::Layout;
use super::links::Links;
use super::{Kind, Compilation, BuildConfig};

#[derive(Clone, Copy, Eq, PartialEq, Hash, Debug)]
#[derive(Clone, Copy, Eq, PartialEq, Hash)]
pub struct Unit<'a> {
pub pkg: &'a Package,
pub target: &'a Target,
Expand Down
Loading

0 comments on commit 57db8bd

Please sign in to comment.