Include H3-H6 as individual search results and highlight the search term in the result. #624

samuelcolvin · 2015-06-09T17:23:02Z

If I search for a subsection heading (eg. h3) I would expect the subsection I searched for to appear in the search results, instead only it's parent h1 seems to appear.

Perhaps most easily demonstrated with an example: on help.tutorcruncher.com If I search for "credit note" (which is the text of an h3 heading: help.tutorcruncher.com/accounting/#credit-notes) I would expect to see "credit note" in the results. I guess as a heading would make most sense, but highlighted in the text of a result would be ok too.

Instead the parent h1 "Invoices" shows in results without "credit note" shown anywhere.

Perhaps we need to alter search.py > create_entry_for_section to recursively work right down the hierarchy of sections?

The text was updated successfully, but these errors were encountered:

d0ugal · 2015-06-09T21:16:50Z

It only includes h1 and h2 by design. This is the just like we only include h1 and h2 in the table of contents and thus made the same choice here to match.

I don't think there would be a problem with making the search more find grained, I can't really think of anything. I wrote this particular section of the search code in December I think, so my memory of the reasoning behind it is a little rusty :)

d0ugal · 2015-06-09T21:18:21Z

Would we want to include all of them from h1 - h6?

samuelcolvin · 2015-06-09T21:22:53Z

I don't see why not.

But the other option which might make more sense is to improve the sample shown in each search result. It would be great if the sample was adjusted to include the search result rather than just being the first paragraph of the section with the result in (I assume?), I guess it would also be useful if it highlighted the search result by making it bold.

Does that make sense?

d0ugal · 2015-06-09T21:28:30Z

Yeah, that makes sense. I feel like we should probably do both.

Adding H3-H6 is really trivial. So maybe we should do that first.

Improving the display of the results is somewhat trickier. I guess we need to update this code and probably find the sentence with the search term. This could be tricky if the word was only found due to tokenisation. Then highlight the search term in that sentence.

d0ugal · 2015-06-09T21:29:31Z

I've marked this for 0.15 to at least do the H3-H6 change. I'm not sure I have the capacity to improve the search results too - so, some help there would be good if anyone is interested.

samuelcolvin · 2015-06-09T21:33:21Z

looks very doable, and I'm keen to help but honestly I'm pretty busy right now, I'm not going to get round to it in June.

d0ugal · 2015-06-09T21:36:04Z

Sure :)

ProfYaffle · 2015-06-10T08:47:10Z

@d0ugal - "This is the just like we only include h1 and h2 in the table of contents" - is that true? I thought I'd observed more "we only include the first two levels of heading, whatever they are".

Case in point: simple hierarchy of headings:

... and then comment out (or just break...) one level in the middle:

d0ugal · 2015-06-10T08:50:35Z

We take the first two levels as Python-Markdown interprets them when it creates a table of contents. It is a bit wonky, and not an ideal approach. I'd be happy to see a better on me :)

samuelcolvin · 2015-06-10T08:52:44Z

I think the second case would be good in some situations, but the obvious solution is to make this a config variable so the default behaviour stays the same.

Do we generate search headings from the same object as the main toc/scroll-spy? I can see that people (me for a start) are going to want one setting heading depth in search and another for everything else.

ProfYaffle · 2015-06-10T08:55:26Z

@d0ugal, that's fine, I didn't know from your comment whether it was actually meant to be h1 and h2, and thus not working as intended. It's easy enough to live with the current behaviour.

The conversation above is probably related to roll-up headings as well, as limiting the heading styles in the ToC (or using more levels in your docs that aren't included) is just another way to limit ToC overload. I have less of an opinion on the search angle.

d0ugal · 2015-06-10T08:57:44Z

@samuelcolvin

Do we generate search headings from the same object as the main toc/scroll-spy? I can see that people (me for a start) are going to want one setting heading depth in search and another for everything else.

No, this is done in a different area as the search index needs the full document contents and not just the headers.

d0ugal · 2015-06-10T09:00:29Z

@ProfYaffle

I didn't know from your comment whether it was actually meant to be h1 and h2, and thus not working as intended. It's easy enough to live with the current behaviour.

I don't know either :) That bit of code predates my involvement with MkDocs. I do think we should revisit the TOC generation at some point. I think @waylan has stated that the current way isn't ideal. Basically, Python-Markdown creates the table of contents in HTML. We then parse that to create an object representing the table of contents for a page. It is a bit of a hack :)

waylan · 2015-06-10T14:31:24Z

Actually, the bits of MkDocs that I didn't like (at least regarding the TOC), was removed/fixed in a6e6eb1. As far as I'm concerned the TOC handling code in MkDocs is fine. However, that doesn't mean there isn't room for improvement.

Under the hood, Python-Markdown extracts the headers into a nested list of tokens (here) which it then converts into an HTML list, serializes as a string, and assigns to the toc attribute of the Markdown class. I suppose, ideally, the nested list (before converting to HTML) would be more useful to MkDocs, but it doesn't really make sense from Python-Markdown's perspective to preserve it (the extension's primary purpose is to insert the TOC into the document in HTML form; providing the TOC outside of that context is just a bonus). Granted, it is a little inefficient for MkDocs to have to parse that HTML back into a list. One option would be for MkDocs to not use the TOC extension, but to simply extract the headers itself and then it could do whatever worked best for MkDocs.

Either way, how should skipped/missing levels be handled? Consider this example:

# Header 1-1

### Header 3-1

### Header 3-2

If we require every level (even skipped levels to be part of the TOC list, how would that work?

* Header 1-1
    * [what goes here?]
        * Header 3-1
        * Header 3-2

I don't see a clear answer to what goes in the level 2 slot. It could be left blank, but a blank menu item would be bad, so that doesn't work. And if it is not blank, what would go there? Therefore, it is left out completely. The really tricky part would be if we had this instead:

# Header 1-1

### Header 3-1

## Header 2-1

How would that look?

* Header 1-1
    * [what goes here?]
        * Header 3-1
    * Header 2-1

Today, you get this:

* Header 1-1
    * Header 3-1
    * Header 2-1

While that is obviously "wrong," I think it is the "correct" result for Markdown to return. It clearly communicates to the document author that they have an error in their document that they need to fix. One of Markdown's "features" is that it never breaks the parser on bad input, it simply outputs unexpected formatting that clues the author in to the problem. In other words, it is good practice to never publish a Markdown document without first reviewing a rendered copy of it.

Interestingly, this Markdown Style Guide suggests that document authors should avoid skipping header levels in their documents. In other words, I don't think MkDocs needs to try to support something that is generally considered bad form. In fact, this Markdown Lint Tool (which is based on the previously mentioned style guide) will raise an error on skipped header levels.

P.S.: @d0ugal, it might be interesting to add that lint tool to your flake8 tests to check the Markdown source in the docs dir (as well as the README, etc.) to enforce good form in the docs. Some of them are pretty awful to read/edit right now. It is a Ruby project, but is does have a CLI so it should not be impossible to use from Tox/Travis (the Node.js clone does not have a CLI and I'm not aware of a Python clone outside of this, which is a little too special-purpose for general use).

d0ugal · 2015-06-27T14:00:16Z

The additional headers are now included individually in the search results by #644

d0ugal added Enhancement Easy labels Jun 9, 2015

d0ugal changed the title ~~Search not using sub headings~~ Include H3-H6 as individual search results and highlight the search term in the result. Jun 9, 2015

d0ugal added this to the 0.15.0 milestone Jun 9, 2015

d0ugal closed this as completed Jun 27, 2015

This was referenced Jun 27, 2015

Highlight text in search results #665

Closed

Add Markdown linting for MkDocs documentation #666

Closed

d0ugal mentioned this issue Aug 15, 2015

Inconsistent labelling of headers in toctree (h1+h2 vs h2+h3) #712

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Include H3-H6 as individual search results and highlight the search term in the result. #624

Include H3-H6 as individual search results and highlight the search term in the result. #624

samuelcolvin commented Jun 9, 2015

d0ugal commented Jun 9, 2015

d0ugal commented Jun 9, 2015

samuelcolvin commented Jun 9, 2015

d0ugal commented Jun 9, 2015

d0ugal commented Jun 9, 2015

samuelcolvin commented Jun 9, 2015

d0ugal commented Jun 9, 2015

ProfYaffle commented Jun 10, 2015

d0ugal commented Jun 10, 2015

samuelcolvin commented Jun 10, 2015

ProfYaffle commented Jun 10, 2015

d0ugal commented Jun 10, 2015

d0ugal commented Jun 10, 2015

waylan commented Jun 10, 2015

d0ugal commented Jun 27, 2015

Include H3-H6 as individual search results and highlight the search term in the result. #624

Include H3-H6 as individual search results and highlight the search term in the result. #624

Comments

samuelcolvin commented Jun 9, 2015

d0ugal commented Jun 9, 2015

d0ugal commented Jun 9, 2015

samuelcolvin commented Jun 9, 2015

d0ugal commented Jun 9, 2015

d0ugal commented Jun 9, 2015

samuelcolvin commented Jun 9, 2015

d0ugal commented Jun 9, 2015

ProfYaffle commented Jun 10, 2015

d0ugal commented Jun 10, 2015

samuelcolvin commented Jun 10, 2015

ProfYaffle commented Jun 10, 2015

d0ugal commented Jun 10, 2015

d0ugal commented Jun 10, 2015

waylan commented Jun 10, 2015

d0ugal commented Jun 27, 2015