New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't check out the crates.io index locally #4026

Merged
merged 1 commit into from May 12, 2017

Conversation

Projects
None yet
7 participants
@alexcrichton
Member

alexcrichton commented May 10, 2017

This commit moves working with the crates.io index to operating on the git
object layers rather than actually literally checking out the index. This is
aimed at two different goals:

  • Improving the on-disk file size of the registry
  • Improving cloning times for the registry as the index doesn't need to be
    checked out

The on disk size of my registry folder of a fresh check out of the index went
form 124M to 48M, saving a good chunk of space! The entire operation took about
0.6s less on a Unix machine (out of 4.7s total for current Cargo). On Windows,
however, the clone operation went from 11s to 6.7s, a much larger improvement!

Closes #4015

@rust-highfive

This comment has been minimized.

Show comment
Hide comment
@rust-highfive

rust-highfive May 10, 2017

r? @brson

(rust_highfive has picked a reviewer for you, use r? to override)

rust-highfive commented May 10, 2017

r? @brson

(rust_highfive has picked a reviewer for you, use r? to override)

@alexcrichton

This comment has been minimized.

Show comment
Hide comment
@alexcrichton
Member

alexcrichton commented May 10, 2017

@rust-highfive rust-highfive assigned matklad and unassigned brson May 10, 2017

Show outdated Hide outdated src/cargo/sources/registry/remote.rs
// Note that this `'static lifetime here is actually a lie, it's actually a
// borrow into the `repo` object below. We're guaranteed, though, that if
// filled in `tree` will be destroyed first, so this should be ok.

This comment has been minimized.

@Mark-Simulacrum

Mark-Simulacrum May 10, 2017

Member

What guarantees that tree will be destroyed first? Unspecified drop order? In that case, maybe this is a good case for ManuallyDrop? Or is that not yet usable in cargo due to instability?

@Mark-Simulacrum

Mark-Simulacrum May 10, 2017

Member

What guarantees that tree will be destroyed first? Unspecified drop order? In that case, maybe this is a good case for ManuallyDrop? Or is that not yet usable in cargo due to instability?

This comment has been minimized.

@matklad

matklad May 11, 2017

Member

@alexcrichton ping. Should there be a crate for this sort of stuff? https://github.com/Kimundi/owning-ref-rs perhaps?

@matklad

matklad May 11, 2017

Member

@alexcrichton ping. Should there be a crate for this sort of stuff? https://github.com/Kimundi/owning-ref-rs perhaps?

This comment has been minimized.

@alexcrichton

alexcrichton May 11, 2017

Member

Oh sorry missed this! Yes the unspecified drop order is what guarantees this. Lots of projects are relying on this so I don't think it's necessary to go out of the way and use ManuallyDrop, and yeah I'd also prefer to keep Cargo on stable.

@matklad unfortunately that crate won't help as it's targeted at Rust pointers, whereas here it's all phantom lifetimes through libgit2 :(

@alexcrichton

alexcrichton May 11, 2017

Member

Oh sorry missed this! Yes the unspecified drop order is what guarantees this. Lots of projects are relying on this so I don't think it's necessary to go out of the way and use ManuallyDrop, and yeah I'd also prefer to keep Cargo on stable.

@matklad unfortunately that crate won't help as it's targeted at Rust pointers, whereas here it's all phantom lifetimes through libgit2 :(

This comment has been minimized.

@cuviper

cuviper May 11, 2017

Member

Since it's in an Option, you could explicitly take() it first in an impl Drop for RemoteRegistry.

@cuviper

cuviper May 11, 2017

Member

Since it's in an Option, you could explicitly take() it first in an impl Drop for RemoteRegistry.

This comment has been minimized.

@alexcrichton

alexcrichton May 11, 2017

Member

Indeed! I'll do that.

@alexcrichton

alexcrichton May 11, 2017

Member

Indeed! I'll do that.

@bors

This comment has been minimized.

Show comment
Hide comment
@bors

bors May 10, 2017

Contributor

☔️ The latest upstream changes (presumably #4024) made this pull request unmergeable. Please resolve the merge conflicts.

Contributor

bors commented May 10, 2017

☔️ The latest upstream changes (presumably #4024) made this pull request unmergeable. Please resolve the merge conflicts.

Show outdated Hide outdated src/cargo/sources/registry/index.rs
// interpretation of each line here and older cargo will simply
// ignore the new lines.
let lines = contents.split(|b| *b == b'\n')
.filter_map(|b| str::from_utf8(b).ok())

This comment has been minimized.

@matklad

matklad May 11, 2017

Member

Ideally we should not swallow utf8-decoding errors here.

@matklad

matklad May 11, 2017

Member

Ideally we should not swallow utf8-decoding errors here.

This comment has been minimized.

@matklad

matklad May 11, 2017

Member

Or are we planing to switch to binary format some day?

@matklad

matklad May 11, 2017

Member

Or are we planing to switch to binary format some day?

This comment has been minimized.

@alexcrichton

alexcrichton May 11, 2017

Member

Nah this was mostly inspired from discussion on the RFC about schema versioning. I have no plans to break this personally, but it seems reasonable to be somewhat defensive about future changes to the index just for maximal flexibility of future cargo's implementation.

@alexcrichton

alexcrichton May 11, 2017

Member

Nah this was mostly inspired from discussion on the RFC about schema versioning. I have no plans to break this personally, but it seems reasonable to be somewhat defensive about future changes to the index just for maximal flexibility of future cargo's implementation.

This comment has been minimized.

@matklad

matklad May 11, 2017

Member

I'm totally ok with self.parse_registry_package(line).ok(), it's only str::from_utf8(b).ok() that feels overly defensive.

@matklad

matklad May 11, 2017

Member

I'm totally ok with self.parse_registry_package(line).ok(), it's only str::from_utf8(b).ok() that feels overly defensive.

This comment has been minimized.

@alexcrichton

alexcrichton May 11, 2017

Member

Sounds reasonable!

@alexcrichton

alexcrichton May 11, 2017

Member

Sounds reasonable!

Show outdated Hide outdated src/cargo/sources/registry/local.rs
@@ -34,7 +35,11 @@ impl<'cfg> RegistryData for LocalRegistry<'cfg> {
&self.index_path
}
fn config(&self) -> CargoResult<Option<RegistryConfig>> {
fn load(&self, root: &Path, path: &str) -> CargoResult<Vec<u8>> {

This comment has been minimized.

@matklad

matklad May 11, 2017

Member

Why not path: &Path? We convert str to Path both for Local and Remote registry anyway. Those slashes format!("{}/{}/{}", &fs_name[0..2], &fs_name[2..4], fs_name) make me nervous :)

@matklad

matklad May 11, 2017

Member

Why not path: &Path? We convert str to Path both for Local and Remote registry anyway. Those slashes format!("{}/{}/{}", &fs_name[0..2], &fs_name[2..4], fs_name) make me nervous :)

This comment has been minimized.

@matklad

matklad May 11, 2017

Member

And do we need root here? Can't we reconstruct it from index_path?

@matklad

matklad May 11, 2017

Member

And do we need root here? Can't we reconstruct it from index_path?

This comment has been minimized.

@alexcrichton

alexcrichton May 11, 2017

Member

Ah my thinking of passing in root is that index_path returns a Filesystem which is an "unlocked path", but here we've always got a locked path (locked elsewhere) so looking at a Path is proof of that.

I was originally unsure what would happen if we take a \-separated path when we go down to libgit2, I'm not sure if it handles internally the slash differences. Only one way to find out!

@alexcrichton

alexcrichton May 11, 2017

Member

Ah my thinking of passing in root is that index_path returns a Filesystem which is an "unlocked path", but here we've always got a locked path (locked elsewhere) so looking at a Path is proof of that.

I was originally unsure what would happen if we take a \-separated path when we go down to libgit2, I'm not sure if it handles internally the slash differences. Only one way to find out!

Show outdated Hide outdated src/cargo/sources/registry/remote.rs
// Note that this `'static lifetime here is actually a lie, it's actually a
// borrow into the `repo` object below. We're guaranteed, though, that if
// filled in `tree` will be destroyed first, so this should be ok.
tree: LazyCell<RefCell<Option<git2::Tree<'static>>>>,

This comment has been minimized.

@matklad

matklad May 11, 2017

Member

Why do we need LazyCell on top of RefCell? LazyCell is useful to return &T and not Ref<T>, but we are returning a Ref anyway, so just RefCell<Option<git2::Tree<'static>>> should be enough levels of indirection...

@matklad

matklad May 11, 2017

Member

Why do we need LazyCell on top of RefCell? LazyCell is useful to return &T and not Ref<T>, but we are returning a Ref anyway, so just RefCell<Option<git2::Tree<'static>>> should be enough levels of indirection...

This comment has been minimized.

@alexcrichton

alexcrichton May 11, 2017

Member

Hm excellent point!

@alexcrichton

alexcrichton May 11, 2017

Member

Hm excellent point!

@alexcrichton

This comment has been minimized.

Show comment
Hide comment
@alexcrichton

alexcrichton May 11, 2017

Member

Pushed some updates

Member

alexcrichton commented May 11, 2017

Pushed some updates

Show outdated Hide outdated src/cargo/sources/registry/remote.rs
let handle = ops::http_handle(self.config)?;
self.handle.fill(RefCell::new(handle)).ok().unwrap();
Ok(self.handle.borrow().unwrap())
}

This comment has been minimized.

@matklad

matklad May 11, 2017

Member

Looks like easy and repo could use LazyCell::get_or_try_init instead of manually unwrapping things.

@matklad

matklad May 11, 2017

Member

Looks like easy and repo could use LazyCell::get_or_try_init instead of manually unwrapping things.

This comment has been minimized.

@alexcrichton

alexcrichton May 11, 2017

Member

Aha yes indeed! I thought I tried that but thanks for catching

@alexcrichton

alexcrichton May 11, 2017

Member

Aha yes indeed! I thought I tried that but thanks for catching

@alexcrichton

This comment has been minimized.

Show comment
Hide comment
@alexcrichton

alexcrichton May 11, 2017

Member

Updated

Member

alexcrichton commented May 11, 2017

Updated

@matklad

This comment has been minimized.

Show comment
Hide comment
@matklad

matklad May 11, 2017

Member

LGTM, though there was seemingly legitimate failure on appveyor on the previous build.

Member

matklad commented May 11, 2017

LGTM, though there was seemingly legitimate failure on appveyor on the previous build.

@alexcrichton

This comment has been minimized.

Show comment
Hide comment
@alexcrichton

alexcrichton May 11, 2017

Member

Bah looks like libgit2 cares about \ vs /

Member

alexcrichton commented May 11, 2017

Bah looks like libgit2 cares about \ vs /

@alexcrichton

This comment has been minimized.

Show comment
Hide comment
@alexcrichton

alexcrichton May 11, 2017

Member

@bors: r=matklad

Member

alexcrichton commented May 11, 2017

@bors: r=matklad

@bors

This comment has been minimized.

Show comment
Hide comment
@bors

bors May 11, 2017

Contributor

📌 Commit b7414ce has been approved by matklad

Contributor

bors commented May 11, 2017

📌 Commit b7414ce has been approved by matklad

@bors

This comment has been minimized.

Show comment
Hide comment
@bors

bors May 11, 2017

Contributor

🔒 Merge conflict

Contributor

bors commented May 11, 2017

🔒 Merge conflict

@bors

This comment has been minimized.

Show comment
Hide comment
@bors

bors May 11, 2017

Contributor

☔️ The latest upstream changes (presumably #4032) made this pull request unmergeable. Please resolve the merge conflicts.

Contributor

bors commented May 11, 2017

☔️ The latest upstream changes (presumably #4032) made this pull request unmergeable. Please resolve the merge conflicts.

Don't check out the crates.io index locally
This commit moves working with the crates.io index to operating on the git
object layers rather than actually literally checking out the index. This is
aimed at two different goals:

* Improving the on-disk file size of the registry
* Improving cloning times for the registry as the index doesn't need to be
  checked out

The on disk size of my `registry` folder of a fresh check out of the index went
form 124M to 48M, saving a good chunk of space! The entire operation took about
0.6s less on a Unix machine (out of 4.7s total for current Cargo). On Windows,
however, the clone operation went from 11s to 6.7s, a much larger improvement!

Closes #4015
@alexcrichton

This comment has been minimized.

Show comment
Hide comment
@alexcrichton

alexcrichton May 11, 2017

Member

@bors: r=matklad

Member

alexcrichton commented May 11, 2017

@bors: r=matklad

@bors

This comment has been minimized.

Show comment
Hide comment
@bors

bors May 11, 2017

Contributor

📌 Commit 15cc376 has been approved by matklad

Contributor

bors commented May 11, 2017

📌 Commit 15cc376 has been approved by matklad

@bors

This comment has been minimized.

Show comment
Hide comment
@bors

bors May 11, 2017

Contributor

⌛️ Testing commit 15cc376 with merge d8fa3eb...

Contributor

bors commented May 11, 2017

⌛️ Testing commit 15cc376 with merge d8fa3eb...

bors added a commit that referenced this pull request May 11, 2017

Auto merge of #4026 - alexcrichton:bare-registry, r=matklad
Don't check out the crates.io index locally

This commit moves working with the crates.io index to operating on the git
object layers rather than actually literally checking out the index. This is
aimed at two different goals:

* Improving the on-disk file size of the registry
* Improving cloning times for the registry as the index doesn't need to be
  checked out

The on disk size of my `registry` folder of a fresh check out of the index went
form 124M to 48M, saving a good chunk of space! The entire operation took about
0.6s less on a Unix machine (out of 4.7s total for current Cargo). On Windows,
however, the clone operation went from 11s to 6.7s, a much larger improvement!

Closes #4015
@bors

This comment has been minimized.

Show comment
Hide comment
@bors

bors May 12, 2017

Contributor

☀️ Test successful - status-appveyor, status-travis
Approved by: matklad
Pushing d8fa3eb to master...

Contributor

bors commented May 12, 2017

☀️ Test successful - status-appveyor, status-travis
Approved by: matklad
Pushing d8fa3eb to master...

@bors bors merged commit 15cc376 into rust-lang:master May 12, 2017

3 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
homu Test successful
Details

@alexcrichton alexcrichton deleted the alexcrichton:bare-registry branch May 12, 2017

nabijaczleweli added a commit to nabijaczleweli/cargo-update that referenced this pull request May 16, 2017

Read registry as a repository rather than a file tree
Allows for seamless ransition for when
rust-lang/cargo#4026 lands

Closes #32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment