Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto-generate IDs for all <dt> elements #5190

Merged
merged 11 commits into from
Feb 6, 2022
Merged

Auto-generate IDs for all <dt> elements #5190

merged 11 commits into from
Feb 6, 2022

Conversation

sideshowbarker
Copy link
Collaborator

@sideshowbarker sideshowbarker commented Jan 20, 2022

This change causes all dt elements in rendered HTML output from Markdown source to have ID values auto-generated based on their text content — in the same way all heading values in rendered HTML output have auto-generated IDs.

Related: mdn/content#12165 (comment) cc @wbamberg


This is an initial naïve proof-of-concept attempt at implementing ID generation for dt output — I’d be happy to further iterate on this patch as much as needed.

I’d also be happy to write up an additional patch for generating IDs in output from HTML sources (in addition to Markdown sources), if somebody can point me to where I’d need to patch — and if we think it’s actually important enough to have this for the HTML-source case, given that we have almost no HTML sources left in the repo, except in the mozilla/projects/nss and tools subtrees.

And I’m also happy to write accompanying tests.

This change causes all <dt> elements in rendered HTML output to have
ID values auto-generated based on their text content — in the same way
all heading values in rendered HTML output have auto-generated IDs.

Related: mdn/content#12165 (comment)
@schalkneethling
Copy link

Hey @sideshowbarker! Thanks for getting this started. We definitely want to do this and this already looks great. The only thing I would ask is to duplicate the function(slugify) you are using from KumaScript as close to the current code as possible, and use that instead of importing from KumaScript.

KumaScript is undergoing an overhaul and I would prefer we not depend on this function from KumaScript. Thanks a lot!

@wbamberg
Copy link
Collaborator

wbamberg commented Jan 27, 2022

Generating IDs for dt elements needs more than slugify, it also needs "uniquify" in case there are two dt elements with the same text content. We already do this for headings, I believe in injectSectionIDs().

In fact of course we have to modify the heading-id-generation as well, since heading IDs and dt IDs share the same namespace, so uniquifying has to work across both sets:

<h2>A thing</h2>

<dl>
  <dt>A thing</dt>
  <dd>...</dd>
</dl>

(This also means that generating IDs for dt elements, depending how it's done, could modify existing heading IDs and potentially break live samples. But we could presumably avoid this in the implementation, by having dt IDs always "defer to" heading IDs. But also, if the implementation is more complicated that way, we could probably figure out whether this would be a real problem and fix the live sample calls if it is).

I'm really unfamiliar with the Yari code (outside KumaScript) but it looks as if section IDs get injected in renderFromURL(), where we're getting the raw HTML from the Markdown using m2h, then calling injectSectionIDs() on it. It seems weird for this core rendering work to be happening in KumaScript, but 🤷.

So I'm not sure duplicating slugify is the way to go. It might be better to adapt injectSectionIDs() to work for dt elements as well as headings.

@sideshowbarker
Copy link
Collaborator Author

Generating IDs for dt elements needs more than slugify, it also needs "uniquify" in case there are two dt elements with the same text content. We already do this for headings, I believe in injectSectionIDs().

… It might be better to adapt injectSectionIDs() to work for dt elements as well as headings.

Indeed. I turns out to be better in pretty much every possible way. So I’ve now changed the patch here to add the handling in injectSectionIDs().

In hindsight, I should have done my homework right and figured out myself that injectSectionIDs() was the place to go, but I’m super appreciative you did — that you took the time to investigate and identify it as the proper place to add the handling.

(This also means that generating IDs for dt elements, depending how it's done, could modify existing heading IDs and potentially break live samples. But we could presumably avoid this in the implementation, by having dt IDs always "defer to" heading IDs.

My initial rewrite here doesn’t yet go so far as doing that. I’m happy to try to update it to do that — if we agree that we do in fact really want to make that be the behavior.

But also, if the implementation is more complicated that way

It’d be more complicated for sure. How much more complicated, I don’t yet — because I haven’t looked at the code in detail.

But the issue with it is that the IDs are generated in document order, and after the the current code generates an ID, I think it doesn’t keep state information about the association between that ID and the element it belongs to. So, if it generates an ID for some dt in some early part of the document, and then later on in the document, it finds a heading with the same text that dt has, then I guess it’ll need to find that dt (by the ID value), and then do a DOM operation to change the ID of that dt to make that the one with the generated number appended — and so that it can use the original ID value for the heading.

If the existing code there has access to the entire DOM for the document at that point, I think it’d be relatively uncomplicated to implement what I outlined above. But if that code doesn’t actually have access to the DOM for the whole document, then it would not be able to look up that dt element by its ID, and would not be able to do any DOM operations at all on that dt. So, figuring out how make something that works in that case would be relatively complicated.

we could probably figure out whether this would be a real problem and fix the live sample calls if it is).

Yeah, it might be easier to first test for any breakage of existing live-sample IDs.

@schalkneethling schalkneethling removed their assignment Jan 28, 2022
@schalkneethling
Copy link

@wbamberg, looks like you and @sideshowbarker has pretty much got this covered. I am going to approve the PR but, I wil leave merging to you when you both feel this is ready.

@wbamberg
Copy link
Collaborator

Wow, thanks Mike for coming up with a fix! I will take a look.

(This also means that generating IDs for dt elements, depending how it's done, could modify existing heading IDs and potentially break live samples. But we could presumably avoid this in the implementation, by having dt IDs always "defer to" heading IDs.

My initial rewrite here doesn’t yet go so far as doing that. I’m happy to try to update it to do that — if we agree that we do in fact really want to make that be the behavior.

But also, if the implementation is more complicated that way

It’d be more complicated for sure. How much more complicated, I don’t yet — because I haven’t looked at the code in detail.

But the issue with it is that the IDs are generated in document order, and after the the current code generates an ID, I think it doesn’t keep state information about the association between that ID and the element it belongs to. So, if it generates an ID for some dt in some early part of the document, and then later on in the document, it finds a heading with the same text that dt has, then I guess it’ll need to find that dt (by the ID value), and then do a DOM operation to change the ID of that dt to make that the one with the generated number appended — and so that it can use the original ID value for the heading.

If the existing code there has access to the entire DOM for the document at that point, I think it’d be relatively uncomplicated to implement what I outlined above. But if that code doesn’t actually have access to the DOM for the whole document, then it would not be able to look up that dt element by its ID, and would not be able to do any DOM operations at all on that dt. So, figuring out how make something that works in that case would be relatively complicated.

we could probably figure out whether this would be a real problem and fix the live sample calls if it is).

Yeah, it might be easier to first test for any breakage of existing live-sample IDs.

Having thought about this a bit more, it's unlikely to be a real problem. As far as I can tell it would need a page where:

  • heading and dt text clash AND
  • they clash in the right order for the heading text to change AND
  • the page contains a live sample using that particular heading

Given also that most live sample headings are going to contain spaces (since they are a description of an example) and dt IDs aren't, (by your algorithm in this patch) it's even less likely to happen.

@sideshowbarker
Copy link
Collaborator Author

Having thought about this a bit more, it's unlikely to be a real problem. As far as I can tell it would need a page where:

* heading and dt text clash AND
* they clash in the right order for the heading text to change AND
* the page contains a live sample using that particular heading

Given also that most live sample headings are going to contain spaces (since they are a description of an example) and dt IDs aren't, (by your algorithm in this patch) it's even less likely to happen.

OK, yeah — hadn’t thought about it in those terms, but indeed that all makes sense.

So it seems like the patch in this PR may be good to go — OK with you if I go ahead and merge it?

@wbamberg
Copy link
Collaborator

Having thought about this a bit more, it's unlikely to be a real problem. As far as I can tell it would need a page where:

* heading and dt text clash AND
* they clash in the right order for the heading text to change AND
* the page contains a live sample using that particular heading

Given also that most live sample headings are going to contain spaces (since they are a description of an example) and dt IDs aren't, (by your algorithm in this patch) it's even less likely to happen.

OK, yeah — hadn’t thought about it in those terms, but indeed that all makes sense.

So it seems like the patch in this PR may be good to go — OK with you if I go ahead and merge it?

Please give me a bit of time to look at it properly first. Sorry to be slow, but I want to be as sure as I can be, and I'm not at all familiar with it.

Copy link
Collaborator

@wbamberg wbamberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried this out and overall it seems to work pretty great. I had a couple of comments - the uniquify at least looks worth addressing, but the other one might not have a better solution.

I would love to have review from someone familiar with this code though, but I'm not sure who that would be.

kumascript/src/api/util.js Outdated Show resolved Hide resolved
kumascript/src/api/util.js Outdated Show resolved Hide resolved
@wbamberg
Copy link
Collaborator

wbamberg commented Feb 2, 2022

This seems to successfully uniquify headings and dt elements:

## I'm a thing

- i'm a thing
  - : also a thing
- I'm a thing
  - : also a thing
- Im a thing
  - : also ...

## i'm a thing

=>

<h2 id="im_a_thing"><a href="#im_a_thing" title="Permalink to I'm a thing">I'm a thing</a></h2>

<div>
<dl>
  <dt id="im_a_thing_2">i'm a thing</dt>
  <dd>
    <p>also a thing</p>
  </dd>
  <dt id="im_a_thing_3">I'm a thing</dt>
  <dd>
    <p>also a thing</p>
  </dd>
  <dt id="im_a_thing_4">Im a thing</dt>
  <dd>
    <p>also ...</p>
  </dd>
</dl>
</div>

<h2 id="im_a_thing_5"><a href="#im_a_thing_5" title="Permalink to i'm a thing">i'm a thing</a></h2>

Also we're getting much nicer ID values for dt elements when they contain internal markup like "Read only".

But, this regresses sidebars:

Screen Shot 2022-02-02 at 9 28 02 AM

I believe this is because it changes the case of the ID in <section id="Quick_links">, which is needed for the CSS to recognize that this should get sidebar styling.

It's pretty hairy stuff :(.

@sideshowbarker
Copy link
Collaborator Author

But, this regresses sidebars:

I believe this is because it changes the case of the ID in <section id="Quick_links">, which is needed for the CSS to recognize that this should get sidebar styling.

Yup — thanks for catching that. I’ve pushed an update with a fix.

Copy link

@schalkneethling schalkneethling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sideshowbarker for the PR and @wbamberg for the great review. 🚢

@schalkneethling
Copy link

@sideshowbarker I see the following error with the build:

/home/runner/work/yari/yari/client/build/en-us/docs/web/duplicate_ids
Error: MacroLiveSampleError within /home/runner/work/yari/yari/testing/content/files/en-us/web/embeddable/index.html, line 66 column 6 (unable to find any live code samples for "Meter" within /en-US/docs/Web/Embeddable)
    at buildDocument (/home/runner/work/yari/yari/build/index.js:346:17)
    at processTicksAndRejections (internal/process/task_queues.js:97:5)
    at async buildDocumentInteractive (/home/runner/work/yari/yari/build/cli.js:50:29)
    at async buildDocuments (/home/runner/work/yari/yari/build/cli.js:130:9)
    at async Se._action (/home/runner/work/yari/yari/build/cli.js:343:60)
    at async Se.run (/home/runner/work/yari/yari/node_modules/@caporal/core/dist/index.js:1:27579)
    at async Te._run (/home/runner/work/yari/yari/node_modules/@caporal/core/dist/index.js:1:32257)

error: MacroLiveSampleError within /home/runner/work/yari/yari/testing/content/files/en-us/web/embeddable/index.html, line 66 column 6 (unable to find any live code samples for "Meter" within /en-US/docs/Web/Embeddable)

...includes fixing the lowercasing we added (the code added earlier
wasn’t actually assigning the lowercased string back to the id variable.)
@sideshowbarker
Copy link
Collaborator Author

error: MacroLiveSampleError within /home/runner/work/yari/yari/testing/content/files/en-us/web/embeddable/index.html, line 66 column 6 (unable to find any live code samples for "Meter" within /en-US/docs/Web/Embeddable)

Tested a fix and just now pushed an update with the change

sideshowbarker added a commit to mdn/content that referenced this pull request Feb 5, 2022
This changes takes some headings for examples that have live samples and
that are of this form:

> ### multiply

... and changes them to this form:

> ### Example using "multiply"

That ensures those headings get more-descriptive and more-helpful
anchors/fragments of the form foo#example_using_multiply — and it
also ensures that the live-sample behavior for those won’t break after
mdn/yari#5190 lands, and all dt elements have
generated IDs (in which case that foo#multiply anchor will take
readers to the actual definition for the `multiply` term, rather than
to an example).
sideshowbarker added a commit to mdn/content that referenced this pull request Feb 5, 2022
This changes takes some headings for examples that have live samples and
that are of this form:

> ### multiply

... and changes them to this form:

> ### Example using "multiply"

That ensures those headings get more-descriptive and more-helpful
anchors/fragments of the form foo#example_using_multiply — and it
also ensures that the live-sample behavior for those won’t break after
mdn/yari#5190 lands, and all dt elements have
generated IDs (in which case that foo#multiply anchor will take
readers to the actual definition for the `multiply` term, rather than
to an example).
estelle pushed a commit to mdn/content that referenced this pull request Feb 5, 2022
This changes takes some headings for examples that have live samples and
that are of this form:

> ### multiply

... and changes them to this form:

> ### Example using "multiply"

That ensures those headings get more-descriptive and more-helpful
anchors/fragments of the form foo#example_using_multiply — and it
also ensures that the live-sample behavior for those won’t break after
mdn/yari#5190 lands, and all dt elements have
generated IDs (in which case that foo#multiply anchor will take
readers to the actual definition for the `multiply` term, rather than
to an example).
testing/content/files/en-us/learn/css/css_layout/introduction/flex
otherwise fails without this change, with the following error:

> error: MacroLiveSampleError within
> /home/runner/work/yari/yari/testing/content/files/en-us/learn/css/css_layout/introduction/flex/index.html,
> line 20 column 4 (unable to find any live code samples for "flex_1"
> within /en-US/docs/Learn/CSS/CSS_layout/Introduction/Flex)

And we need make sure not to touch the IDs for class=bc-data divs,
because the system does case-sensitive matching on those.
@sideshowbarker
Copy link
Collaborator Author

OK, after making a few other fixes, CI is all green now ✅

Copy link

@schalkneethling schalkneethling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @sideshowbarker for sticking this one out. Much appreciated. 🎉

@schalkneethling schalkneethling merged commit f1aedef into mdn:main Feb 6, 2022
@sideshowbarker sideshowbarker deleted the sideshowbarker/dt-elements-add-generated-id branch February 6, 2022 12:57
@sideshowbarker
Copy link
Collaborator Author

Thanks, @sideshowbarker for sticking this one out. Much appreciated. 🎉

Thanks for you help and guidance — and @wbamberg too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improves an existing feature.
Projects
Development

Successfully merging this pull request may close these issues.

3 participants