Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON backend experimental impl #75114

Closed
wants to merge 20 commits into from
Closed

Conversation

P1n3appl3
Copy link
Contributor

This is for preliminary code review of the in progress implementation of RFC 2963.

@rust-highfive
Copy link
Collaborator

r? @GuillaumeGomez

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Aug 3, 2020
@ollie27 ollie27 added the T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. label Aug 3, 2020
Copy link
Member

@tmandry tmandry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good! Left some comments.

src/librustdoc/json/mod.rs Show resolved Hide resolved
src/librustdoc/json/mod.rs Show resolved Hide resolved
Comment on lines 95 to 154
match item.inner.clone() {
StructItem(s) => s.fields.into_iter().for_each(|i| self.insert(i, cache)),
UnionItem(u) => u.fields.into_iter().for_each(|i| self.insert(i, cache)),
VariantItem(clean::Variant { kind: clean::VariantKind::Struct(v) }) => {
v.fields.into_iter().for_each(|i| self.insert(i, cache));
}
EnumItem(e) => e.variants.into_iter().for_each(|i| self.item(i, cache).unwrap()),
TraitItem(t) => t.items.into_iter().for_each(|i| self.insert(i, cache)),
ImplItem(i) => i.items.into_iter().for_each(|i| self.insert(i, cache)),
_ => {}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to have this knowledge inside a method on ItemEnum itself, either returning an iterator of inner items or taking a closure.

Also, a problem with using a wildcard here is that a new kind of item with children could be added, and this match could be forgotten. So it's actually safer to list everything out.

fn after_krate(&mut self, krate: &clean::Crate, cache: &Cache) -> Result<(), Error> {
debug!("Done with crate");
let mut index = (*self.index).clone().into_inner();
let trait_items = cache.traits.iter().filter_map(|(id, trait_item)| {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this could be extracted to a function instead of happening directly in after_krate

#[serde(rename = "trait")]
pub trait_: Option<Type>,
#[serde(rename = "for")]
pub for_: Type,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we'd ever want to create an index of Types so we could de-duplicate them. This could be an open question on the RFC to consider before stabilizing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some sense they're already partially de-duplicated because they refer to IDs for the structs/enums/traits instead of containing their definitions. It doesn't seem to me that another level of indirection is worth it in general, but it would be nice for the purpose of easily comparing deeply nested types.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I think experience will tell if this is worth it or not, but I suspect you're right.

Copy link

@betamos betamos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This CL is great, essentially self-explanatory, thanks to the prior refactor. My comments are mainly minor docstrings and naming. Looking forward to the tests.

src/librustdoc/json/types.rs Show resolved Hide resolved

#[derive(Clone, Debug, Serialize, Deserialize)]
pub struct Item {
/// This can be used as a key to the `external_crates` map of [`Crate`][] to see which crate
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be? Perhaps just state that the crate_num here corresponds to Crate.id?

Also if not too late, perhaps using crate_id is a better name unless they number and id are different.

src/librustdoc/json/types.rs Outdated Show resolved Hide resolved
src/librustdoc/json/types.rs Outdated Show resolved Hide resolved
src/librustdoc/json/types.rs Outdated Show resolved Hide resolved

#[serde(rename_all = "snake_case")]
#[derive(Clone, Debug, Serialize, Deserialize)]
pub enum GenericArgs {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and below it would be helpful with the same level of docs as in the first part of the file. They are very helpful.

ResolvedPath {
name: String,
id: Id,
args: Box<Option<GenericArgs>>,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Swap wrapping? Also, curious about the boxing, what's the purpose?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's purely to satisfy the compiler because otherwise the Type enum is self referential due to GenericArgs containing other types.

pub blanket_impl: Option<Type>,
}

// TODO: this is currently broken because imports have the same ID as the module that contains
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what way? Do imports not show up at all, is there a race? Perhaps imports could be omitted until this is resolved?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep there was a race. Still working on fixing it, but since re-exports aren't going to be part of the first PR it's not blocking for a little bit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment has since been removed. Is this still relevant?

src/librustdoc/json/conversions.rs Show resolved Hide resolved
@@ -1405,6 +1405,11 @@ impl Path {
pub fn last_name(&self) -> &str {
self.segments.last().expect("segments were empty").name.as_str()
}

pub fn whole_name(&self) -> String {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this contextually dependent? What's the difference towards a fully qualified path?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't be contextually dependent, it is just the fully qualified path.

Comment on lines 16 to 23
# local IDs have to be in `index`, external ones can sometimes be in `index` but otherwise have
# to be in `paths`
def valid_id(item_id):
return item_id in crate["index"] or item_id[0] != "0" and item_id in crate["paths"]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really the bare minimum level of testing for the output. We just check that the blob doesn't contain any invalid IDs (ones that aren't present in index or paths. This doesn't handle the case where neither the ID nor the item itself end up in the output, or really anything about the structure of the output itself.

Originally I'd planned to write tests that simply compared the output to a known-good version for some minimal examples, but without writing a fuzzy differ to only look at the fields we care about that turns out to be quite difficult due to the large output size and potentially brittle to small changes in the compiler and stdlib.

@jyn514
Copy link
Member

jyn514 commented Aug 14, 2020

IMO I would rather merge this sooner rather than later and do the cleanup in later PRs. That would let people test things out on nightly - I expect any bugs in the implementation not to have too much impact on the design itself.

@P1n3appl3 the CI failures are saying to change TODO to FIXME: https://github.com/rust-lang/rust/pull/75114/files#diff-94e97fcd5e773cae8edd0781d7bd5411R443 seems like it could be a FIXME but https://github.com/rust-lang/rust/pull/75114/files#diff-94e97fcd5e773cae8edd0781d7bd5411R104 should be implemented before we merge I think.

I'm ok with leaving macros for a follow-up PR.

@tmandry
Copy link
Member

tmandry commented Aug 14, 2020

@GuillaumeGomez ^ what do you think?

@GuillaumeGomez
Copy link
Member

The code looks good, so I think we're getting close to a merge. However, I agree with @jyn514: please implement https://github.com/rust-lang/rust/pull/75114/files#diff-94e97fcd5e773cae8edd0781d7bd5411R104 before. :)

@GuillaumeGomez
Copy link
Member

Just thought: we need tests too. Probably a new test suite (I can add that if needed) to be sure that every need PR on this feature won't bring regressions alongside.

@bors
Copy link
Contributor

bors commented Aug 24, 2020

☔ The latest upstream changes (presumably #74489) made this pull request unmergeable. Please resolve the merge conflicts.

@jyn514
Copy link
Member

jyn514 commented Sep 1, 2020

To avoid having this perpetually in limbo, can we delay the test suite until stabilization? The best test suite will be people using it ;) and it's going to be at least as much work as this PR if not more.

@P1n3appl3 FYI @nmccarty has volunteered to finish up the implementation on this PR if you're busy with work/school/life.

@P1n3appl3
Copy link
Contributor Author

Sorry I should have updated here: I've been on vacation visiting family for the past 2 weeks and was mostly offline but I've got some time now before school ramps up so I'm planning to finish/fix up the implementation.

@robinkrahl has been testing out the backend in its current state and has found several bugs that I'm still working on (doc-hidden wasn't being respected, and there are issues where serde incorrectly deserializes fields due to using untagged enums). There are some smaller things that I can just open issues for once this is merged, but I want to get the big ones fixed so it's usable on release.

I also have been trying to write up tests and I agree that it's best left for another PR (especially given the size of this one already). I would love some help from @nmccarty on coming up with those as well as fixing the known bugs. I'm "Pineapple" on the rust discord or just email me and I can show you the remaining TODOs.

@GuillaumeGomez
Copy link
Member

To avoid having this perpetually in limbo, can we delay the test suite until stabilization?

I don't ask for an extensive list of tests, just at least one. Always useful to at least avoid regressions. Then more will be added once new bugs are fixed and so on.

@jyn514 jyn514 added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 15, 2020
@tmandry
Copy link
Member

tmandry commented Oct 14, 2020

The code in this PR has several tests and I think it's fine to merge as-is, once conflicts are resolved and failing tests are fixed.

@GuillaumeGomez What do you think?

@P1n3appl3 I think this is basically done. Are you able to push this over the finish line? I'd also be happy to rebase and fix it up.

@P1n3appl3
Copy link
Contributor Author

P1n3appl3 commented Oct 14, 2020

Yup @jyn514 already offered to help clean up some WIP bugfixes and minor changes that I haven't pushed yet so he can do the rebase i think. After that it should be good to merge.

Comment on lines +65 to +67
/// Whether this item is meant to be omitted from the generated documentation due to `#doc(hidden)`,
/// because it is private, or because it was inlined.
pub stripped: bool,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still prefer to omit stripped items altogether. I won't block on it because I really want to see this in nightly 😆 but it should be removed before stabilization; rustdoc should not have a stable way to introspect private items from other crates.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--document-private-items is really useful for generating internal docs, and I think this feature should support it, even though it technically provides a way to do what you say.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you pass --document-private-items, the items won't be stripped (and in fact you'll lose the ability to distinguish between private and public items). This is more like 'unconditionally document private items'.

@jyn514
Copy link
Member

jyn514 commented Oct 15, 2020

 Traceback (most recent call last):
  File "compare.py", line 106, in <module>
    main(sys.argv[1], sys.argv[2])
  File "compare.py", line 98, in main
    check_subset(expected_main, actual_main)
  File "compare.py", line 65, in check_subset
    _check_subset(expected_main["root"], actual_main["root"], [])
  File "compare.py", line 62, in _check_subset
    _check_subset(expected_index.get(expected, {}), actual_index.get(actual, {}), trace)
  File "compare.py", line 46, in _check_subset
    new_trace = trace.copy()
AttributeError: 'list' object has no attribute 'copy'

@jyn514
Copy link
Member

jyn514 commented Oct 15, 2020

I think you want copy.deepcopy(trace).

src/librustdoc/json/mod.rs Outdated Show resolved Hide resolved
This way it doesn't have to build rustc twice. Eventually it should
probably get its own suite, like rustdoc-js, but that can be fixed in a
follow-up.
| PrimitiveItem(_)
| AssocConstItem(_, _)
| AssocTypeItem(_, _)
| StrippedItem(_)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is correct? StrippedItem always has an inner item: https://doc.rust-lang.org/nightly/nightly-rustc/rustdoc/clean/types/enum.ItemEnum.html#variant.StrippedItem

src/librustdoc/json/conversions.rs Show resolved Hide resolved
Comment on lines 76 to 78
fn from(deprecation: clean::Deprecation) -> Self {
Deprecation { since: deprecation.since, note: deprecation.note }
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this exhaustive to avoid missing new fields.

Suggested change
fn from(deprecation: clean::Deprecation) -> Self {
Deprecation { since: deprecation.since, note: deprecation.note }
}
fn from(deprecation: clean::Deprecation) -> Self {
let clean::Deprecation { since, note } = deprecation;
Deprecation { since, note }
}

Inherited => Visibility::Default,
Crate => Visibility::Crate,
Restricted(did, path) => {
Visibility::Restricted { parent: did.into(), path: path.whole_name() }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to have both the path and the ID? I think the path can be looked up by the id.

/// For the most part items are private by default. The exceptions are associated items of
/// public traits and variants of public enums.
Default,
Crate,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does pub(crate) need to be treated specially? It's the same as pub(in crate), i.e. a restricted path.

pub blanket_impl: Option<Type>,
}

// TODO: this is currently broken because imports have the same ID as the module that contains
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment has since been removed. Is this still relevant?

Copy link
Member

@jyn514 jyn514 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it ok if I change some of the output to differ from the eRFC? Some of this info really shouldn't be duplicated.

@@ -2314,7 +2314,7 @@ impl Clean<Vec<Item>> for doctree::Import<'_> {
name: None,
attrs: self.attrs.clean(cx),
source: self.span.clean(cx),
def_id: DefId::local(CRATE_DEF_INDEX),
def_id: cx.tcx.hir().local_def_id(self.id).to_def_id(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this change?

Comment on lines +200 to +224
impl From<clean::Struct> for Struct {
fn from(struct_: clean::Struct) -> Self {
let clean::Struct { struct_type, generics, fields, fields_stripped } = struct_;
Struct {
struct_type: struct_type.into(),
generics: generics.into(),
fields_stripped,
fields: fields.into_iter().map(|i| i.def_id.into()).collect(),
impls: Vec::new(), // Added in JsonRenderer::item
}
}
}

impl From<clean::Union> for Struct {
fn from(struct_: clean::Union) -> Self {
let clean::Union { struct_type, generics, fields, fields_stripped } = struct_;
Struct {
struct_type: struct_type.into(),
generics: generics.into(),
fields_stripped,
fields: fields.into_iter().map(|i| i.def_id.into()).collect(),
impls: Vec::new(), // Added in JsonRenderer::item
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These impls are exactly the same, which means that the JSON output doesn't distinguish between structs and unions. Is that intentional?

.iter()
.map(|(k, v)| {
(
k.as_u32(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having numeric keys seems annoying. Would it make more sense to instead have a list that you look up by index?

Concretely, currently the output looks like this:

{
    "1": {
      "name": "std",
      "html_root_url": "https://doc.rust-lang.org/nightly/"
    },
    "2": {
      "name": "core",
      "html_root_url": "https://doc.rust-lang.org/nightly/"
    },
  ...
}

and I'd prefer it to look like

[
    {
      "name": "std",
      "html_root_url": "https://doc.rust-lang.org/nightly/"
    },
   {
      "name": "core",
      "html_root_url": "https://doc.rust-lang.org/nightly/"
    }
  ...
]

src/librustdoc/json/mod.rs Outdated Show resolved Hide resolved
let mut index = (*self.index).clone().into_inner();
index.extend(self.get_trait_items(cache));
let output = types::Crate {
root: types::Id(String::from("0:0")),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the point of this field?

},
source: source.into(),
visibility: visibility.into(),
docs: attrs.collapsed_doc_value().unwrap_or_default(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this instead be null if there are no docs?

let output = types::Crate {
root: types::Id(String::from("0:0")),
version: krate.version.clone(),
includes_private: cache.document_private,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be included? The user will know if they passed --document-private-items.

pub struct JsonRenderer {
/// A mapping of IDs that contains all local items for this crate which gets output as a top
/// level field of the JSON blob.
index: Rc<RefCell<FxHashMap<types::Id, types::Item>>>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making this a map makes it annoying to work with, and introduces duplicate information: now the ID is both a key and a field on the object. Instead, I think it makes sense to make it an array, and you have to look up the id in the object if you want it.

}

#[derive(Clone, Debug, PartialEq, Eq, Hash, Serialize, Deserialize)]
pub struct Id(pub String);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently this contains both the crate number and the item id. But the crate number is already part of Item. Do we need both? The Id is used to globally identify the item, so I think a better idea is to use a { crate_number, id } object, then remove the crate number from Item.

@jyn514
Copy link
Member

jyn514 commented Nov 8, 2020

Also, it might be better to name this crate_version to avoid confusing with format_version.

Doesn't need to be fixed here, but 'source' is ambiguous, it could also be the code itself. I would rather call it span, like rustc (and I'd like to make the same change in rustdoc).

I've implemented these changes.

Comment on lines +107 to +110
docs: Default::default(),
links: Default::default(),
attrs: Default::default(),
deprecation: Default::default(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look right: it shows Drop as having no docs at all.

@jyn514
Copy link
Member

jyn514 commented Nov 8, 2020

It turns out that I broke the tests by moving them to rustdoc/, because now the makefile is being ignored and it just compiles structs.rs successfully instead of ever running the JSON backend. Rather than move them back to run-make-fulldeps, I think it makes more sense to have a full src/test/rustdoc-json folder which allows only building stage1 and not having to add rustdoc support to run-make.

@bors
Copy link
Contributor

bors commented Nov 15, 2020

☔ The latest upstream changes (presumably #79070) made this pull request unmergeable. Please resolve the merge conflicts.

Note that reviewers usually do not review pull requests until merge conflicts are resolved! Once you resolve the conflicts, you should change the labels applied by bors to indicate that your PR is ready for review. Post this as a comment to change the labels:

@rustbot modify labels: +S-waiting-on-review -S-waiting-on-author

@jyn514
Copy link
Member

jyn514 commented Nov 30, 2020

Closing this in favor of #79539; I still want to address the points I brought up, but we can do that after the initial impl lands on nightly.

@jyn514 jyn514 closed this Nov 30, 2020
m-ou-se added a commit to m-ou-se/rust that referenced this pull request Dec 1, 2020
…illaumeGomez

Rustdoc: JSON backend experimental impl, with new tests.

Based on rust-lang#75114 by `@P1n3appl3`

The first commit is all of rust-lang#75114, but squased to 1 commit, as that was much easier to rebase onto master.

The git history is a mess, but I think I'll edit it after review, so it's obvious whats new.

## Still to do

- [ ] Update docs.
- [ ] Add bless option to tests.
- [ ] Add test option for multiple files in same crate.
- [ ] Decide if the tests should check for json to be equal or subset.
- [ ] Go through the rest of the review for the original pr. (This is open because the test system is done(ish), but stuff like [not using a hashmap](rust-lang#75114 (comment)) and [using `CRATE_DEF_INDEX` ](rust-lang#75114 (comment)) hasn't)

I'm also sure how many of these we need to do before landing on nightly, as it would be nice to get this in tree, so it isn't effected by churn like rust-lang#79125, rust-lang#79041, rust-lang#79061

r? `@jyn514`
bors added a commit to rust-lang-ci/rust that referenced this pull request Dec 2, 2020
Rustdoc: JSON backend experimental impl, with new tests.

Based on rust-lang#75114 by `@P1n3appl3`

The first commit is all of rust-lang#75114, but squased to 1 commit, as that was much easier to rebase onto master.

The git history is a mess, but I think I'll edit it after review, so it's obvious whats new.

## Still to do

- [ ] Update docs.
- [ ] Add bless option to tests.
- [ ] Add test option for multiple files in same crate.
- [ ] Decide if the tests should check for json to be equal or subset.
- [ ] Go through the rest of the review for the original pr. (This is open because the test system is done(ish), but stuff like [not using a hashmap](rust-lang#75114 (comment)) and [using `CRATE_DEF_INDEX` ](rust-lang#75114 (comment)) hasn't)

I'm also sure how many of these we need to do before landing on nightly, as it would be nice to get this in tree, so it isn't effected by churn like rust-lang#79125, rust-lang#79041, rust-lang#79061

r? `@jyn514`
@aDotInTheVoid
Copy link
Member

@rustbot modify labels: +A-rustdoc-json

@rustbot rustbot added the A-rustdoc-json Area: Rustdoc JSON backend label Mar 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-rustdoc-json Area: Rustdoc JSON backend S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants