Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for RFC 2963: rustdoc JSON backend #76578

Open
3 of 5 tasks
Tracked by #84
Manishearth opened this issue Sep 10, 2020 · 21 comments
Open
3 of 5 tasks
Tracked by #84

Tracking issue for RFC 2963: rustdoc JSON backend #76578

Manishearth opened this issue Sep 10, 2020 · 21 comments
Labels
A-rustdoc-json Area: Rustdoc JSON backend B-unstable Blocker: Implemented in the nightly compiler and unstable. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue.

Comments

@Manishearth
Copy link
Member

Manishearth commented Sep 10, 2020

RFC PR: rust-lang/rfcs#2963
RFC: https://rust-lang.github.io/rfcs/2963-rustdoc-json.html
Documentation: https://doc.rust-lang.org/nightly/nightly-rustc/rustdoc_json_types/
Issues: A-rustdoc-json Area: Rustdoc JSON backend

Todo:

@Manishearth Manishearth added T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. labels Sep 10, 2020
@Manishearth
Copy link
Member Author

cc @P1n3appl3 if you want to help fill in substeps for the tracking issue and/or help implement them

@aDotInTheVoid
Copy link
Member

Implemented in #79539

@jyn514 jyn514 added the B-unstable Blocker: Implemented in the nightly compiler and unstable. label Dec 16, 2020
@aDotInTheVoid
Copy link
Member

@rustbot modify labels: +A-rustdoc-json

@rustbot rustbot added the A-rustdoc-json Area: Rustdoc JSON backend label Feb 25, 2021
github-actions bot pushed a commit to rust-lang/glacier that referenced this issue Apr 9, 2021
=== stdout ===
=== stderr ===
error: the -Z unstable-options flag must be passed to enable --output-format for documentation generation (see rust-lang/rust#76578)

==============
github-actions bot pushed a commit to rust-lang/glacier that referenced this issue Apr 9, 2021
=== stdout ===
=== stderr ===
error: the -Z unstable-options flag must be passed to enable --output-format for documentation generation (see rust-lang/rust#76578)

==============
@makspll

This comment was marked as off-topic.

Dylan-DPC added a commit to Dylan-DPC/rust that referenced this issue Jul 31, 2022
triagebot.yml: CC Enselic when rustdoc-json-types changes

Being the maintainer of [cargo-public-api](https://github.com/Enselic/cargo-public-api) which relies on [rustdoc JSON](rust-lang#76578) means I have high stakes in the rustdoc JSON format itself. Would be great if I could be pinged when the format is about to change.

I hope this is OK. Big thanks in advance.
@aDotInTheVoid
Copy link
Member

Some blocker's from zulip discussion:

  • Settle on a meta-format for format_version
    • Decide how do deal with nightly only lang features
  • jsondoclint passes on core + friends (Rustdoc JSON: Invalid output for core #106435)
  • Ensure rustdoc-json-types is fully documented
  • decide what do do with https://crates.io/crates/rustdoc-types
  • robust cross-crate item lookup
  • go over names again
  • very carefully consider impact of new language features, to ensure we can add them without breaking anyone. syn-2's model and release note's are helpfull for this.

@Xuanwo
Copy link
Contributor

Xuanwo commented Nov 22, 2023

A quick note for users who want to try this feature:

cargo +nightly rustdoc --lib -p <your-package> -- -Z unstable-options --output-format json

@wfraser
Copy link

wfraser commented Nov 22, 2023

I notice that the output from this doesn't include items in stdlib (just references to them), just like the old -Zsave-analysis feature. With -Zsave-analysis, you could get the stdlib analysis from the rust-analysis rustup component. Is there a way to generate it using rustdoc, and/or are there any plans to ship a similar rustup component to fill that gap?

@obi1kenobi
Copy link
Contributor

I notice that the output from this doesn't include items in stdlib (just references to them), just like the old -Zsave-analysis feature. With -Zsave-analysis, you could get the stdlib analysis from the rust-analysis rustup component. Is there a way to generate it using rustdoc, and/or are there any plans to ship a similar rustup component to fill that gap?

There's a nightly-only component called rust-docs-json that includes rustdoc for built-in modules like core, std, alloc etc.

rustup component add rust-docs-json --toolchain nightly

Depending on the details of your use case, it's also possible that some of the machinery that powers cargo-semver-checks may be of use. For example, here's a playground query for "items with allowed lints defined in core." I maintain cargo-semver-checks and its underlying infrastructure, and would be happy to chat if this looks interesting — I'm also in the rust-lang Zulip if that's easier. This is subject matter I find professionally interesting, so I'd love to hear about what you're looking to build!

@wfraser
Copy link

wfraser commented Nov 22, 2023

My usecase is rsbrowse, a TUI interactive code browser. I implemented it using -Zsave-analysis (so currently it doesn't work, unless you can point it at a really old compiler), and the rustdoc json output seems like a perfect replacement for that.

I had initially tried replacing it with rust-analyzer's LSIF output, but that's primarily intended for going from source code text -> type info, whereas I really want the reverse, and it's not good for that.

@obi1kenobi
Copy link
Contributor

Very cool! I hadn't heard of rsbrowse before. I think rsbrowse and cargo-semver-checks run into the exact same combination of issues: resolving and querying items, which are possibly across crate boundaries, and then pointing to specific bits of code as needed (whether to explore or to flag a semver issue). I'm hopeful that they might be able to share a solution as well, if you're interested!

I recently gave a talk on the architecture behind cargo-semver-checks which might be of interest: https://www.youtube.com/watch?v=Fqo8r4bInsk

In any case, I have quite a bit of experience with rustdoc JSON, including how to minimize the amount of work needed to stay up to date on rustdoc format versions which change rather frequently — on average, ~once per Rust release. There's also a limitation in rustdoc JSON at the moment that makes it somewhat challenging to resolve items across crate boundaries, but I think it's surmountable. Give me a ping if I can help, or if you're interested in building atop some of the same infrastructure that powers cargo-semver-checks and our playground!

@wfraser
Copy link

wfraser commented Nov 28, 2023

I took a couple days and reimplemented rsbrowse's backend to use rustdoc json, and overall it is a perfect fit! I found it much easier using this than the RLS save-analysis data (for my use-case at least). Even implementing cross-crate lookup was easy: my program simply looks in the local crate's .index first, then if it's not found, in .paths, then uses the first component to figure out which other json file to look in, and then matches based on the full path. It works great, as long as you remember to always pass around a crate name with an item's ID :)

One problem I've found so far is that the stdlib files seem to be missing things
from their ".paths" map which are referenced from elsewhere in the json, which makes reliably figuring out full paths difficult, but this can be worked around.

EDIT: I just realized, it's probably because the stdlib files were generated without --document-private-items and --document-hidden-items being specified. When rsbrowse invokes rustdoc for the workspace crates, it passes these flags, so I never saw missing paths until I started loading the stdlib files. Maybe it would make sense to generate those files with all info included instead?

There also doesn't seem to be a way to get it to generate json for build-dependencies like syn. I'm calling cargo doc --workspace and passing a RUSTDOCFLAGS to enable JSON output, and it gets most things, but not build- or dev-dependencies. Maybe I'll have to resort to parsing cargo metadata and calling rustdoc manually?

I'm also not sure how it handles multiple versions of a crate in a dependency tree.

But overall, for me, this feature is like 95% perfect, and I can pretty easily work around the remaining rough edges. I'm very happy with the data format!

example errors showing missing paths in `core`
missing path for ItemId(CrateId { name: "core" }, Id("0:3244:262")) (Try)
missing path for ItemId(CrateId { name: "core" }, Id("0:7480:163")) (IntoIterator)
missing path for ItemId(CrateId { name: "core" }, Id("0:7480:163")) (IntoIterator)
missing path for ItemId(CrateId { name: "core" }, Id("0:7480:163")) (IntoIterator)
missing path for ItemId(CrateId { name: "core" }, Id("0:3249:143")) (FromResidual)
missing path for ItemId(CrateId { name: "core" }, Id("0:7443:19788")) (Product)
missing path for ItemId(CrateId { name: "core" }, Id("0:3249:143")) (FromResidual)
missing path for ItemId(CrateId { name: "core" }, Id("0:7439:19789")) (Sum)
missing path for ItemId(CrateId { name: "core" }, Id("0:7476:142")) (FromIterator)
missing path for ItemId(CrateId { name: "core" }, Id("0:3255:14999")) (Residual)

@makspll
Copy link

makspll commented Dec 1, 2023

Agree with @wfraser, I've been doing a big re-write of my code gen crate, and took the time to properly crawl the output for things I am interested in. My biggest issue so far is cross-crate lookups, my use case requires analyzing this json from multiple crates and then trying to match them up - specifically traits and their impls across crates.

Everything works fine inside the one local crate, but ID's are not stable across multiple crates (actually sometimes they seem to line up, but not always), this means that if I have an Item for which I only have the .paths reference, in order to look it up across crates ,I need to have a tree of all off the paths I crawled in all other crates, and look up this path in those paths. If there is a match I can use it.

This lets me generate public import paths in code gen. It would be nice to have some sort of stable ID which works across crates, maybe a combination of crate name and the ID

@obi1kenobi
Copy link
Contributor

I'm also not sure how it handles multiple versions of a crate in a dependency tree.

AFAIK there's currently no guaranteed way to resolve cross-crate imports across multiple versions of the same crate. This is why cargo-semver-checks doesn't support cross-crate analysis at the moment.

I believe @LukeMathWalker was looking into adding a way to reliably look up items across crate boundaries. At the moment I can't seem to find the link to the thread where that was discussed, so I'm not sure what the status on that is.

@LukeMathWalker
Copy link
Contributor

My strategy to overcome this limitation relies on rust-lang/compiler-team#635 (or some variations of it) landing.
We haven't yet found consensus on how to approach it and I recently didn't have the time to do further research and create momentum for it.

@aDotInTheVoid
Copy link
Member

See the first section of #106697, for what I want to eventually do on this. (And more broadly, don't bet on anything in that issue getting finished this year, it's been alot)

@orium
Copy link
Member

orium commented Jan 9, 2024

It seems that inner items, such as struct fields or enum variants are not included in crate.paths. This makes it particularly annoying mapping an item id to a item path. For context I'm the author of cargo-rdme and this tool needs to go from intralink to item path (via item id).

For instance, if we have this code:

struct MyStruct {
    my_field: u64,
}

we will get this json (some fields omitted for brevity):

{
  "index": {
    "0:5:1777": {
      "id": "0:5:1777",
      "name": "my_field",
      "inner": { "struct_field": { "primitive": "u64" } }
    },
    "0:4:1776": {
      "id": "0:4:1776",
      "name": "MyStruct",
      "inner": { "struct": { "kind": { "plain": { "fields": [ "0:5:1777" ] } } } }
    }
  },
  "paths": {
    "0:4:1776": {
      "path": [ "foo", "MyStruct"],
      "kind": "struct"
    }
  },
  "format_version": 28
}

Note that there is no path for foo::MyStruct::my_field: it only shows up in crate.index. To figure out the path of my_field (id 0:5:1777) I have to traverse the index, and find out which struct has inner.struct.kind.plain.fields with id 0:5:1777. Then I have to get the path of the struct I found, in this case crate.paths["0:4:1776"], and append the field name to get foo::MyStruct::my_field.

Ideally we would have the field in crate.paths, just like the struct. That would make the format very easy to use. Another way (which is slightly less convenient for my use case) is to have aparent field in the item of my_field in the index, pointing at the parent item (MyStruct):

{
  "index": {
    "0:5:1777": {
      "id": "0:5:1777",
      "name": "my_field",
      "parent": "0:4:1776"
    },
    "0:4:1776": {
      "id": "0:4:1776",
      "name": "MyStruct",
    }
  }
}

That would also make it easy to go from field to the corresponding struct and then get the path of that struct.

Of course, in an ideal world, we would have both: inner item paths in crate.paths as well as a parent field in the index.

@obi1kenobi
Copy link
Contributor

👋 I'm the maintainer of cargo-semver-checks and I've had to solve some related problems. I'm not a maintainer of rustdoc or any "official" Rust components, so this is just my personal 2 cents.

Unfortunately, I think that crate.paths is much more likely to be removed than improved. It has a number of issues, for example:

  • The set of paths for a given item can sometimes be infinite -- I've seen it in multiple real-world crates!
  • It's hard to pick a "single canonical path" to show in crate.paths, since that path might be private / from a foreign crate / a type alias of another item / another edge case.
  • Some crates are just humongous and already as is generate 300MB+ of rustdoc JSON. Adding fields to crate.paths would bloat those files even more, and not by a small amount.

This is inherent complexity in the domain, and I don't think rustdoc JSON will be able to handle it for us.

The way cargo-semver-checks addresses this is by using a query engine that internally handles the necessary name and import path resolution, and lets us use a declarative query language where we get to just take those things for granted. Here's a playground link where you can see how we can look up structs with their import paths and fields. Obviously, you can look up items and their parents by ID as well.

I've already implemented and thoroughly tested all this logic, and I'd be happy to help you get started with it if you're open to trying it out!

P.S.: cargo-rdme is cool, TIL about it 🤩 Thanks for building it!

@orium
Copy link
Member

orium commented Jan 9, 2024

Thanks @obi1kenobi for your answer. I wasn't aware the number of path could explode and even be infinite. I also wasn't aware of the trustfall query language: it seem that it might be useful for what I'm doing (thanks for creating that!).

Given this information, I want to change my suggestion to something a bit more selfish (because it more directly solves my problem). Since an Item has links, it could provide more useful information. Currently it simply gives out the item id of each link. But since the mapping between a mapping id and a canonical path seems to be a hard to solve, I would argue it is reasonable for the item path, as used by rustdoc to create links, to be included in Item::links. Basically, instead of having

{
  "index": {
    "0:0:1779": {
      "links": {
        "MyStruct::my_field": "0:5:1777"
      }
    }
  }
}

we could have

{
  "index": {
    "0:0:1779": {
      "links": {
        "MyStruct::my_field": { "id": "0:5:1777", "path": ["foo", "MyStruct", "my_field"] }
      }
    }
  }
}

(An even more selfish suggestion would be for the html link to be part of Item::links, e.g. foo/struct.MyStruct.html#structfield.my_field, but that's problably too specific to be part of the rustdoc json output.)

@obi1kenobi
Copy link
Contributor

(Again speaking in a purely personal capacity.)

Both suggestions have the effect of increasing the file size in order to denormalize the format. It's a denormalization because this info was already available elsewhere, and it's being duplicated for convenience.

It is my perception that both size increases and denormalization come with strong negative externalities, and are unlikely to happen. For example:

  • There are ongoing conversations about docs.rs hosting rustdoc JSON files in addition to the rustdoc HTML — this would be obviously useful for both our tools, right? But the bigger the JSON file size, the harder that's going to be to pull off and the more it's going to cost.
  • Past "duplicated for convenience" rustdoc JSON features (e.g. item inlining) were the source of many frustrating bugs. Getting rid of those bugs (often by eliminating the duplication) was a substantial focus of rustdoc JSON work over the last year and a half or so. Putting denormalizations back in is likely to cause many new such bugs, which will be frustrating for both us as users and for the rustdoc team.

I think it's in our best interest as rustdoc users to make sure rustdoc maintainers can focus on high-impact changes that unlock new capabilities for us. We can add ease-of-use denormalizations one abstraction layer above rustdoc JSON itself, for example via a layer like Trustfall and its rustdoc adapter.

If the links connection to other items would be useful to have, it would take less than 10min to expose it via Trustfall and make it available for querying like so:

query {
  Crate {
    item {
      ... on Struct {
        struct_id: id @output
        struct_name: name @output

        link {
          linked_item_type: __typename @output
          linked_item_id: id @output
          linked_item_name: name @output
        }
      }
    }
  }
}

I'd be happy to add it if it would be useful to you.

@obi1kenobi
Copy link
Contributor

PR adding the above to the Trustfall schema for rustdoc: obi1kenobi/trustfall-rustdoc-adapter#308

Lmk what you think!

@bdbai
Copy link
Contributor

bdbai commented May 24, 2024

Referring to Zulip discussion Where to find keyword entries in JSON rustdoc, currently it seems not possible to include keywords in the JSON rustdoc. I am not sure how hard it would be to implement this feature, but hopefully it will be considered before we stabilize it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-rustdoc-json Area: Rustdoc JSON backend B-unstable Blocker: Implemented in the nightly compiler and unstable. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests