Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include H3-H6 as individual search results and highlight the search term in the result. #624

Closed
samuelcolvin opened this issue Jun 9, 2015 · 15 comments

Comments

@samuelcolvin
Copy link
Contributor

If I search for a subsection heading (eg. h3) I would expect the subsection I searched for to appear in the search results, instead only it's parent h1 seems to appear.

Perhaps most easily demonstrated with an example: on help.tutorcruncher.com If I search for "credit note" (which is the text of an h3 heading: help.tutorcruncher.com/accounting/#credit-notes) I would expect to see "credit note" in the results. I guess as a heading would make most sense, but highlighted in the text of a result would be ok too.

Instead the parent h1 "Invoices" shows in results without "credit note" shown anywhere.

Perhaps we need to alter search.py > create_entry_for_section to recursively work right down the hierarchy of sections?

@d0ugal
Copy link
Member

d0ugal commented Jun 9, 2015

It only includes h1 and h2 by design. This is the just like we only include h1 and h2 in the table of contents and thus made the same choice here to match.

I don't think there would be a problem with making the search more find grained, I can't really think of anything. I wrote this particular section of the search code in December I think, so my memory of the reasoning behind it is a little rusty :)

@d0ugal
Copy link
Member

d0ugal commented Jun 9, 2015

Would we want to include all of them from h1 - h6?

@samuelcolvin
Copy link
Contributor Author

I don't see why not.

But the other option which might make more sense is to improve the sample shown in each search result. It would be great if the sample was adjusted to include the search result rather than just being the first paragraph of the section with the result in (I assume?), I guess it would also be useful if it highlighted the search result by making it bold.

Does that make sense?

@d0ugal d0ugal changed the title Search not using sub headings Include H3-H6 as individual search results and highlight the search term in the result. Jun 9, 2015
@d0ugal
Copy link
Member

d0ugal commented Jun 9, 2015

Yeah, that makes sense. I feel like we should probably do both.

Adding H3-H6 is really trivial. So maybe we should do that first.

Improving the display of the results is somewhat trickier. I guess we need to update this code and probably find the sentence with the search term. This could be tricky if the word was only found due to tokenisation. Then highlight the search term in that sentence.

@d0ugal d0ugal added this to the 0.15.0 milestone Jun 9, 2015
@d0ugal
Copy link
Member

d0ugal commented Jun 9, 2015

I've marked this for 0.15 to at least do the H3-H6 change. I'm not sure I have the capacity to improve the search results too - so, some help there would be good if anyone is interested.

@samuelcolvin
Copy link
Contributor Author

looks very doable, and I'm keen to help but honestly I'm pretty busy right now, I'm not going to get round to it in June.

@d0ugal
Copy link
Member

d0ugal commented Jun 9, 2015

Sure :)

@ProfYaffle
Copy link
Contributor

@d0ugal - "This is the just like we only include h1 and h2 in the table of contents" - is that true? I thought I'd observed more "we only include the first two levels of heading, whatever they are".

Case in point: simple hierarchy of headings:

image

... and then comment out (or just break...) one level in the middle:

image

@d0ugal
Copy link
Member

d0ugal commented Jun 10, 2015

We take the first two levels as Python-Markdown interprets them when it creates a table of contents. It is a bit wonky, and not an ideal approach. I'd be happy to see a better on me :)

@samuelcolvin
Copy link
Contributor Author

I think the second case would be good in some situations, but the obvious solution is to make this a config variable so the default behaviour stays the same.

Do we generate search headings from the same object as the main toc/scroll-spy? I can see that people (me for a start) are going to want one setting heading depth in search and another for everything else.

@ProfYaffle
Copy link
Contributor

@d0ugal, that's fine, I didn't know from your comment whether it was actually meant to be h1 and h2, and thus not working as intended. It's easy enough to live with the current behaviour.

The conversation above is probably related to roll-up headings as well, as limiting the heading styles in the ToC (or using more levels in your docs that aren't included) is just another way to limit ToC overload. I have less of an opinion on the search angle.

@d0ugal
Copy link
Member

d0ugal commented Jun 10, 2015

@samuelcolvin

Do we generate search headings from the same object as the main toc/scroll-spy? I can see that people (me for a start) are going to want one setting heading depth in search and another for everything else.

No, this is done in a different area as the search index needs the full document contents and not just the headers.

@d0ugal
Copy link
Member

d0ugal commented Jun 10, 2015

@ProfYaffle

I didn't know from your comment whether it was actually meant to be h1 and h2, and thus not working as intended. It's easy enough to live with the current behaviour.

I don't know either :) That bit of code predates my involvement with MkDocs. I do think we should revisit the TOC generation at some point. I think @waylan has stated that the current way isn't ideal. Basically, Python-Markdown creates the table of contents in HTML. We then parse that to create an object representing the table of contents for a page. It is a bit of a hack :)

@waylan
Copy link
Member

waylan commented Jun 10, 2015

Actually, the bits of MkDocs that I didn't like (at least regarding the TOC), was removed/fixed in a6e6eb1. As far as I'm concerned the TOC handling code in MkDocs is fine. However, that doesn't mean there isn't room for improvement.

Under the hood, Python-Markdown extracts the headers into a nested list of tokens (here) which it then converts into an HTML list, serializes as a string, and assigns to the toc attribute of the Markdown class. I suppose, ideally, the nested list (before converting to HTML) would be more useful to MkDocs, but it doesn't really make sense from Python-Markdown's perspective to preserve it (the extension's primary purpose is to insert the TOC into the document in HTML form; providing the TOC outside of that context is just a bonus). Granted, it is a little inefficient for MkDocs to have to parse that HTML back into a list. One option would be for MkDocs to not use the TOC extension, but to simply extract the headers itself and then it could do whatever worked best for MkDocs.

Either way, how should skipped/missing levels be handled? Consider this example:

# Header 1-1

### Header 3-1

### Header 3-2

If we require every level (even skipped levels to be part of the TOC list, how would that work?

* Header 1-1
    * [what goes here?]
        * Header 3-1
        * Header 3-2

I don't see a clear answer to what goes in the level 2 slot. It could be left blank, but a blank menu item would be bad, so that doesn't work. And if it is not blank, what would go there? Therefore, it is left out completely. The really tricky part would be if we had this instead:

# Header 1-1

### Header 3-1

## Header 2-1

How would that look?

* Header 1-1
    * [what goes here?]
        * Header 3-1
    * Header 2-1

Today, you get this:

* Header 1-1
    * Header 3-1
    * Header 2-1

While that is obviously "wrong," I think it is the "correct" result for Markdown to return. It clearly communicates to the document author that they have an error in their document that they need to fix. One of Markdown's "features" is that it never breaks the parser on bad input, it simply outputs unexpected formatting that clues the author in to the problem. In other words, it is good practice to never publish a Markdown document without first reviewing a rendered copy of it.

Interestingly, this Markdown Style Guide suggests that document authors should avoid skipping header levels in their documents. In other words, I don't think MkDocs needs to try to support something that is generally considered bad form. In fact, this Markdown Lint Tool (which is based on the previously mentioned style guide) will raise an error on skipped header levels.

P.S.: @d0ugal, it might be interesting to add that lint tool to your flake8 tests to check the Markdown source in the docs dir (as well as the README, etc.) to enforce good form in the docs. Some of them are pretty awful to read/edit right now. It is a Ruby project, but is does have a CLI so it should not be impossible to use from Tox/Travis (the Node.js clone does not have a CLI and I'm not aware of a Python clone outside of this, which is a little too special-purpose for general use).

@d0ugal
Copy link
Member

d0ugal commented Jun 27, 2015

The additional headers are now included individually in the search results by #644

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants