Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce cargo metadata subcommand #2196

Merged
merged 5 commits into from Jan 25, 2016
Merged

Conversation

matklad
Copy link
Member

@matklad matklad commented Dec 5, 2015

Most of the work was done by @dan-t in #1225 and by @winger in #1434

Fixes #2193

I failed to properly rebase previous attempts so I just salvaged this from bits and pieces.

@alexcrichton are you sure that the default format should be TOML? I think that TOML is more suitable for humans, and JSON is better (at the moment at least) for tools. Maybe we should default to TOML JSON?

@rust-highfive
Copy link

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @alexcrichton (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

features: Vec<String>,
no_default_features: bool)
-> CargoResult<(Resolve, Vec<Package>)> {
let mut source = try!(PathSource::for_path(manifest.parent().unwrap(), config));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this unwrap safe?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this should be fine

@matklad
Copy link
Member Author

matklad commented Dec 5, 2015

Another question is where should we put documentation for the format? Is the help message a right place?

bors added a commit that referenced this pull request Dec 8, 2015
This hides `SerializedDependency` from general public, as requested [here](#1434 (comment)). It also hides `SerializedManifest` which was (wrongly?) exposed.

This is required for #2196. I want to move in small steps this time, hence the separate PR.

Technically this break backwards compatibility, because `SerializedDependency` and `SerializedManifest` were public (`SerializedPackage` was private however). Are such changes allowed in cargo?
@Turbo87
Copy link
Member

Turbo87 commented Dec 8, 2015

@matklad how about inline code documentation in the relevant module file?

@matklad
Copy link
Member Author

matklad commented Dec 9, 2015

I'd love to see a more targeted purpose for this subcommand which can precisely answer what sort of data needs to be emitted on stdout.

Sure :) We want to use cargo metadata with a plugin for Intellij IDEA. IDEA based IDEs are "smart": they build a syntax tree from source code and perform "static analysis" (completion, navigation, refactorings, ...) using it. To build such tree across several files, we need to know some metadata about the project.

The most vital piece of information is a list of crate roots that belong to the project. Only if you know crate roots (src/lib.rs, bin/whatever.rs, etc) you can correctly map files to modules. This piece of information comes from targets filed of a package.

Note that a rust project can include several cargo subprojects (path dependencies), so we need this information too.

Another place where we need help from cargo is extren crates declarations. We need to resolve extern crate foo; to the source file, containing foo. This information is available because a project includes dependencies and packages include all known crates.

We don't use feautes and kind information at the moment.

For example there's quite a bit of information missing from the output

Missing output is Ok, it will be easy to add more information later. Moreover, if we output too much, we may make changing the internal cargo structures harder, because the serialized output is currently tied to this structures.

The output is also currently ambiguous in some cases (e.g. you can't connect a dependency to what it's depending on).

I would argue that this also falls into "missing information" category. There are two pieces of data here: a flat list of known packages and a dependency DAG. At the moment, metadata outputs only flat list of known packages. dependencies key in the output lists only declared (that is, not resolved) dependencies. We can add a proper DAG serialization later. At the moment the only real DAG information that we need is which package is a main package in the project. The root key in the output provides this info.

Source information for where a crate originally came from is not existent. For example you can't know if a crate came from git or from crates.io

This is an important piece of information, but it is not needed at least for our use case. It can be added later.

The Target serialization is missing all of the flags, some of which are quite important (e.g. test which indicates an integration test)

Hm, target serialization does include kind. I am more worried that it also includes

"metadata": {
  "extra_filename": "-a129c63589d0f328",
  "metadata": "a129c63589d0f328"
}

Not sure what it is, but seems like it does not belong here :)

SerializedDependency is quite old (and should go away in favor of Encodable on Dependency directly) and doesn't include any of the information about a platform target, what kind of dependency, etc.

SerializedDependency is removed. Missing information is again easy to add latter.

The manifest in a package is almost entirely omitted from the serialization here, but this may be ok.

What is the relation between a Package and a Manifest? The only extra bit of information that Package (and it's serialized representation) has is a path to Cargo.toml.

@matklad
Copy link
Member Author

matklad commented Dec 11, 2015

@alexcrichton 1.5 is 🎉 released 🎉 so any feedback on this? :)

Thinking of it more, I propose the following format:

{
  "packages": [
    {
      "name": "foo",
      "package_id": "some_name some_version some_source_id",
      //...
    },
    //...
  ],

  "graph": [
    "root": "some_name some_version some_source_id",
    "edges": [
      //...
    ]
  ] 
}

That is, we should include a PackageId to the representation of Package and store a graph of PackageIds separately. As I said, we can live with only root node in the graph for a start.

So we can remove this from the output.

A thing that bothers me a little is that PackageIds are supposed to be opaque identifiers, but they are serialized in a highly structured string.

@alexcrichton
Copy link
Member

Thanks for pushing on this again @matklad! Currently I'm at a work week in Orlando, but I hope to digest more of this first thing next week!

flag_manifest_path: Option<String>,
flag_no_default_features: bool,
flag_output_format: OutputFormat,
flag_output_path: OutputTo,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this could just be Option<String>? That way a custom Decodable implementation isn't needed

@alexcrichton
Copy link
Member

Thanks @matklad! This is looking good to me. I think it's important to rely on the standard Encodable implementations as much as possible to ensure consistency among commands (e.g. Encodable for Package).

I've got a few bike-sheddy points that I'd like to discuss here:

  1. The actual data emitted as part of SerializedPackage is pretty limiting now. I think we may want to tighten it up a bit (e.g. use package ids instead of name/version), and we may want to expose a few standard features (like features).
  2. I think that we may also want to do an audit of other Encodable implementations transitively included as well. For example I'm not sure that Target and Dependency have suitable encodings, would be a good thing to check up on!
  3. The term metadata is interesting here when compared to read-manifest. The difference between the two commands is that metadata will give you the entire dependency graph whereas read-manifest will only give you the current package. That distinction seems fine (one fetches deps, one doesn't), but the naming between the two seems a little inconsistent.

Thoughts?


To respond to some of your points as well, I agree that limiting the amount of information being printed is a good idea. I'd just want to make sure that this is useful in a first draft rather than restricting it too much in terms of the information being printed. For example I'm ok with not emitting the dependency graph for now, that can always be added later.

@matklad
Copy link
Member Author

matklad commented Dec 16, 2015

(e.g. use package ids instead of name/version),

Hm, DependencyInner does not include a package id. It can be found in Resolve, which is not encodable. Lets serialize Resolve in the graph thing?

For example I'm not sure that Target and Dependency have suitable encodings, would be a good thing to check up on!

Will look at the Encodables today.

That distinction seems fine (one fetches deps, one doesn't), but the naming between the two seems a little inconsistent.

Does anybody uses read-manifest in the wild? I dislike a bit that there are two machine oriented commands in cargo. Maybe we should change from cargo metadata to cargo metadata --dependencies to allow for future extensions (like cargo metadata --manifest)?

@alexcrichton
Copy link
Member

Ah yes Dependency intentionally does not include PackageId (because it's not a resolved dependency). To reconstruct the same graph that Cargo has this'll need to serialize Resolve. I think it'd be possible with a structure like:

{
    "packages": [ ... ],
    "resolve": {
        "root": "...",
        "packages": {
            "package1": ["dep1", "dep2"],
        }
    }

Where all the packages in resolve are listed as package ids. That way tools can use resolve to figure out what the precise structure is, and then they can use the package metadata information in the packages array to figure out things like source paths and such.

@alexcrichton
Copy link
Member

Also I'm not sure if there are too many users of read-manifest, but I'd be fine deprecating in favor of this command if they served the same purpose. I do feel like the modes of operation may be important to preserve, however.

@matklad
Copy link
Member Author

matklad commented Dec 16, 2015

Ah yes Dependency intentionally does not include PackageId (because it's not a resolved dependency).

Should we include both declared and resolved dependencies in the output? I'd say yes: this will help to detect issues like #2064

@matklad
Copy link
Member Author

matklad commented Dec 16, 2015

@alexcrichton I'm a bit worried that Package and Manifest have almost identical representation:

https://github.com/rust-lang/cargo/blob/master/src/cargo/core/manifest.rs#L47-L52
https://github.com/rust-lang/cargo/blob/master/src/cargo/core/package.rs#L26-L32

If I add features, I need to add it to both places, which is troubling.

Given this comment I'm not even sure that we need both a Package and a Manifest ;)

@alexcrichton
Copy link
Member

Should we include both declared and resolved dependencies in the output?

I think we should, yeah, but in separate locations (to mirror what Cargo does)

I'm a bit worried that Package and Manifest have almost identical representation:

An excellent point! I think it should be ok to actually just have the Encodable implementation for Package delegate to the manifest, or to just delete Encodable for Manifest and deal with it all in Package.

@matklad
Copy link
Member Author

matklad commented Dec 17, 2015

this'll need to serialize Resolve.

There is EncodebleResolve which is used for the Cargo.lock, but the format is a bit different from what we want here.

Particularly, there is no package_id in the package node, (version, name, source) are used instead. However for deps package_id is used.

I'd better provide a separate EncodableResolve implementation for the purpose of metadata.

@alexcrichton
Copy link
Member

Ah actually I think that's basically exactly what we want here, could you elaborate on what's missing from constructing it? (e.g. you've already got a Resolve itself). I think it may also be important to preserve the metadata section for future purposes.

I will say though idiomatically the implementation of Encodable should probably mirror what you've been migrating to elsewhere!

@matklad
Copy link
Member Author

matklad commented Dec 17, 2015

The entry in the lock file looks like this:

[[package]]
name = "aho-corasick"
version = "0.3.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
dependencies = [
 "memchr 0.1.6 (registry+https://github.com/rust-lang/crates.io-index)",
]

Note that in dependencies we have a string (a package id). But in package itself we have only name, version, source, but no package id.

Package id string can be reconstructed from this three things, but it does not feel right.

@matklad matklad force-pushed the metadata2 branch 2 times, most recently from 4757aca to f8d5a21 Compare January 25, 2016 14:06
@alexcrichton
Copy link
Member

@bors: r+ b24bf7e

Looks good to me, thanks again @matklad!

@bors
Copy link
Collaborator

bors commented Jan 25, 2016

⌛ Testing commit b24bf7e with merge 859c5d3...

bors added a commit that referenced this pull request Jan 25, 2016
Most of the work was done by @dan-t in #1225 and by @winger in #1434

Fixes #2193

I failed to properly rebase previous attempts so I just salvaged this from bits and pieces.

@alexcrichton are you sure that the default format should be TOML? I think that TOML is more suitable for humans, and JSON is better (at the moment at least) for tools. Maybe we should default to ~~TOML~~ JSON?
@matklad
Copy link
Member Author

matklad commented Jan 25, 2016

@alexcrichton what is the release schedule for cargo?

Am I correct that metadata will be available right away in nightlies? And that it will get to stable Rust only with 1.8.0?

@bors
Copy link
Collaborator

bors commented Jan 25, 2016

@bors bors merged commit b24bf7e into rust-lang:master Jan 25, 2016
@alexcrichton
Copy link
Member

@matklad yeah this'll ride the normal release trains and will be available in Rust 1.8

@matklad
Copy link
Member Author

matklad commented Jan 29, 2016

Ouch, @alexcrichton, I think we've missed one rather major issue here, this one

Especially given that there are two "implementations" of Encodable for package id.

https://github.com/matklad/cargo/blob/metadata2/src/cargo/core/resolver/encode.rs#L99

https://github.com/matklad/cargo/blob/metadata2/src/cargo/core/package_id.rs#L29

The id's in resolve and in packages are indeed different :( Here is an example

{
  "packages": [
    {
      "name": "geom",
      "version": "0.1.0",
      // This id is different 
      "id": "geom 0.1.0 (path+file:\/\/\/home\/matklad\/projects\/rustraytracer)",
      "source": null,
      ...
    },
    ...
  ],
  "resolve": {
    "root": {
      "name": "rustraytracer",
      "version": "0.1.0",
      "source": null,
      "dependencies": [
        // from this id :(
        "geom 0.1.0", 
        "rand 0.3.11 (registry+https:\/\/github.com\/rust-lang\/crates.io-index)",
        "regex 0.1.41 (registry+https:\/\/github.com\/rust-lang\/crates.io-index)",
        "rustc-serialize 0.3.16 (registry+https:\/\/github.com\/rust-lang\/crates.io-index)",
        "simple_parallel 0.3.0 (registry+https:\/\/github.com\/rust-lang\/crates.io-index)",
        "time 0.1.33 (registry+https:\/\/github.com\/rust-lang\/crates.io-index)",
        "utils 0.1.0"
      ]
    },
    ...
  },
}

In resolve, there is no source part in id for path dependencies. In packages, source is always included in id. We can fix it if we omit source from packages.

@alexcrichton
Copy link
Member

Oh dear that is indeed not good!

I can't seem to recall myself why there are two Encodable implementations for PackageId, but perhaps they can be merged (favoring the one from resolve)?

@matklad
Copy link
Member Author

matklad commented Jan 29, 2016

but perhaps they can be merged (favoring the one from resolve)?

Probably not. The one from resolve is "context sensitive". Look at this function: https://github.com/rust-lang/cargo/blob/master/src/cargo/core/resolver/encode.rs#L189. It creates an EncodablePackageId from PackageId.

Unfortunately it depends on the root package and because of this can't be reused or repeated in the general Encodable implementation for PackageId.

@alexcrichton
Copy link
Member

Hm ok, so the requirements of resolve are indeed a little different. We don't want to emit filesystem paths to the lock file because otherwise they'd just oscillate over time as you migrate among machines.

That being said there aren't too many uses for encoding package ids, and in general it amounts to an assertion that path-based source ids are omitted where everything else is included.

It's probably fine for now to change the encodable implementation for package ids to ignore path sources and that way it'll match resolve (and resolve can use the same implementation)

@matklad
Copy link
Member Author

matklad commented Jan 29, 2016

It's probably fine for now to change the encodable implementation for package ids to ignore path sources and that way it'll match resolve (and resolve can use the same implementation)

I've tried to do it here and the result is unsatisfactory.

The crux of the problem is that PackageId should be both Encodable and Decodable. Without serializing source it is impossible to preserve Decodability. And without correct Decodable implementation, cargo-install tests fail.

EncodableResolve sidesteps the issue by retrieving the source from elsewhere: https://github.com/rust-lang/cargo/blob/master/src/cargo/core/resolver/encode.rs#L138

This makes me think again that maybe we should leave lock-file resolve serilization to lockfile only, and instead provide a simpler and more natural generic serialization for resolve :)

And there is one more thing, SourceId.precise is serialized differently in two places, but this is a minor issue.

@alexcrichton
Copy link
Member

Hm ok, I forgot that package ids were being encoded for cargo-install. For now let's go to what you were mentioning early on which is to not use the EncodableResolve implementation for resolve itself. The cargo metadata command can implement its own version of Encodable for that.

Sorry for the roundabout way to conclude that!

@matklad
Copy link
Member Author

matklad commented Jan 29, 2016

Sorry for the roundabout way to conclude that!

Experiment is a nice way to make a ( I hope so :) ) correct conclusion.

here is a PR: #2331

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
relnotes Release-note worthy
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants