Sphinx-style permalinks filter #2743
Conversation
Permalinks are not supposed to change every time a HTML page is regenerated, right? But isn't that what's happening here because |
for h in ['h1', 'h2', 'h3', 'h4']: | ||
nodes = doc.findall('*//%s' % h) | ||
for node in nodes: | ||
new_node = lxml.html.fragment_fromstring('<a id="{0}" href="#{0}" class="headerlink" title="Permalink to this headline"> π</a>'.format(uuid.uuid4())) |
Kwpolska
May 5, 2017
Member
You’re doing UUIDs too much. I would:
- check if the header has an
id
already, and use that
- otherwise, create a slugified form of the header and use that, watching out for conflicts within the same page and appending numbers (example for Django)
You’re doing UUIDs too much. I would:
- check if the header has an
id
already, and use that - otherwise, create a slugified form of the header and use that, watching out for conflicts within the same page and appending numbers (example for Django)
ralsina
May 5, 2017
Author
Member
Honestly, I don't care about this enough to do the extra work.
Honestly, I don't care about this enough to do the extra work.
Kwpolska
May 5, 2017
Member
I can take over if you want.
I can take over if you want.
@felixfontein that is a very good point. So, the id generation here is useless. |
LGTM with @Kwpolska's latest fix |
I’m not done yet, I want to at least implement the blacklist (or perhaps whitelist) |
Since the filter cannot/doesn't have persistent state, the link generation algorithm cannot be changed anymore after merging this (or at least from the next release on). Otherwise this filter doesn't generate permalinks, but code-dependent links which will change as soon as this code is touched in a non-trivial way! So we really shouldn't merge this one too early! |
The link generation algorithm I have in mind will be pretty stable, so no worries. (The links being permanent also depend on user actions) |
@ralsina, @felixfontein: this is ready for review. Header/id deduplication will be implemented in another PR, because it needs to be done for all IDs, not only the few headers we touch here. |
LGTM, but kind of incomplete without that extra deduplication PR. Also, how permanent these permalinks will be depends a lot on the deduplication algorithm and what kind of page this is. (If it is an index where newer posts will appear before older ones, this will be a big problem.) |
xpath_list = ['*//div[@class="e-content entry-content"]//{hx}'] | ||
for xpath_expr in xpath_list: | ||
for hx in ['h1', 'h2', 'h3', 'h4', 'h5', 'h6']: | ||
nodes = doc.findall(xpath_expr.format(hx=hx)) |
felixfontein
May 14, 2017
Contributor
In case xpath_expr
does not contain hx
, wouldn't this add a permalink six times for every matching node?
In case xpath_expr
does not contain hx
, wouldn't this add a permalink six times for every matching node?
felixfontein
May 14, 2017
Contributor
(Maybe first put the formatted expressions into a set, then convert the set to a sorted list, and iterate over it. That should fix this problem.)
(Maybe first put the formatted expressions into a set, then convert the set to a sorted list, and iterate over it. That should fix this problem.)
Kwpolska
May 14, 2017
Member
Yes, it will. This sounds like an edge case, but are you aware of any good solutions for this?
Yes, it will. This sounds like an edge case, but are you aware of any good solutions for this?
Kwpolska
May 14, 2017
Member
Made it a set. There’s still the case of the same header being returned by multiple expressions, should I also work around that?
Made it a set. There’s still the case of the same header being returned by multiple expressions, should I also work around that?
Signed-off-by: Chris Warrick <kwpolska@gmail.com>
Signed-off-by: Chris Warrick <kwpolska@gmail.com>
This prevents permalinks being created where they shouldn’t be, eg. post or site titles. Signed-off-by: Chris Warrick <kwpolska@gmail.com>
Signed-off-by: Chris Warrick <kwpolska@gmail.com>
The code is waiting on the
The deduplication algorithm will run primarily on indexes. We can discuss whether or not it should be changing links from the bottom-up in that PR. |
Bottom-up would be a good idea, I think. This will at least make the links permanent when |
+1 to bottom-up |
@ralsina: review welcome. @felixfontein: alright, done on |
LGTM other than the typo in the doc. I can't officially +1 because it's my PR :-) |
.headerlink { opacity: 0.1; margin-left: 0.2em; } | ||
.headerlink:hover { opacity: 1; text-decoration: none; } | ||
|
||
Additionally, you can provide a custom list of XPath expressions which should be used for finding headers (``{hx}}`` is replaced by headers h1 through h6). |
ralsina
May 14, 2017
Author
Member
Unbalanced braces
Unbalanced braces
Merging, thanks for the reviews. |
Pull Request Checklist
Description
A filter implementing Sphinx-style header links. Bonus: it works for any compiler, since it's done as HTML post-processing.