Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sphinx-style permalinks filter #2743

Merged
merged 6 commits into from May 14, 2017
Merged

Sphinx-style permalinks filter #2743

merged 6 commits into from May 14, 2017

Conversation

@ralsina
Copy link
Member

ralsina commented May 4, 2017

Pull Request Checklist

  • I’ve read the guidelines for contributing.
  • I updated AUTHORS.txt and CHANGES.txt (if the change is non-trivial) and documentation (if applicable).
  • I tested my changes.

Description

A filter implementing Sphinx-style header links. Bonus: it works for any compiler, since it's done as HTML post-processing.

@felixfontein
Copy link
Contributor

felixfontein commented May 5, 2017

Permalinks are not supposed to change every time a HTML page is regenerated, right? But isn't that what's happening here because uuid.uuid4() always generates a new value which was never generated before?

for h in ['h1', 'h2', 'h3', 'h4']:
nodes = doc.findall('*//%s' % h)
for node in nodes:
new_node = lxml.html.fragment_fromstring('<a id="{0}" href="#{0}" class="headerlink" title="Permalink to this headline">&nbsp;&pi;</a>'.format(uuid.uuid4()))

This comment has been minimized.

Copy link
@Kwpolska

Kwpolska May 5, 2017

Member

You’re doing UUIDs too much. I would:

  • check if the header has an id already, and use that
  • otherwise, create a slugified form of the header and use that, watching out for conflicts within the same page and appending numbers (example for Django)

This comment has been minimized.

Copy link
@Kwpolska

Kwpolska May 5, 2017

Member

Also, what @felixfontein said about permalinks being, um, permanent.

This comment has been minimized.

Copy link
@ralsina

ralsina May 5, 2017

Author Member

Honestly, I don't care about this enough to do the extra work.

This comment has been minimized.

Copy link
@Kwpolska

Kwpolska May 5, 2017

Member

I can take over if you want.

This comment has been minimized.

Copy link
@ralsina

ralsina May 5, 2017

Author Member

@Kwpolska that would be great, thanks!

@ralsina
Copy link
Member Author

ralsina commented May 5, 2017

@felixfontein that is a very good point. So, the id generation here is useless.

@ralsina
Copy link
Member Author

ralsina commented May 8, 2017

LGTM with @Kwpolska's latest fix

@Kwpolska
Copy link
Member

Kwpolska commented May 8, 2017

I’m not done yet, I want to at least implement the blacklist (or perhaps whitelist)

@felixfontein
Copy link
Contributor

felixfontein commented May 8, 2017

Since the filter cannot/doesn't have persistent state, the link generation algorithm cannot be changed anymore after merging this (or at least from the next release on). Otherwise this filter doesn't generate permalinks, but code-dependent links which will change as soon as this code is touched in a non-trivial way! So we really shouldn't merge this one too early!

@Kwpolska
Copy link
Member

Kwpolska commented May 9, 2017

The link generation algorithm I have in mind will be pretty stable, so no worries. (The links being permanent also depend on user actions)

@Kwpolska Kwpolska force-pushed the add-title-permalinks branch from 25f737c to 7b6d73c May 14, 2017
@Kwpolska
Copy link
Member

Kwpolska commented May 14, 2017

@ralsina, @felixfontein: this is ready for review. Header/id deduplication will be implemented in another PR, because it needs to be done for all IDs, not only the few headers we touch here.

Copy link
Contributor

felixfontein left a comment

LGTM, but kind of incomplete without that extra deduplication PR. Also, how permanent these permalinks will be depends a lot on the deduplication algorithm and what kind of page this is. (If it is an index where newer posts will appear before older ones, this will be a big problem.)

xpath_list = ['*//div[@class="e-content entry-content"]//{hx}']
for xpath_expr in xpath_list:
for hx in ['h1', 'h2', 'h3', 'h4', 'h5', 'h6']:
nodes = doc.findall(xpath_expr.format(hx=hx))

This comment has been minimized.

Copy link
@felixfontein

felixfontein May 14, 2017

Contributor

In case xpath_expr does not contain hx, wouldn't this add a permalink six times for every matching node?

This comment has been minimized.

Copy link
@felixfontein

felixfontein May 14, 2017

Contributor

(Maybe first put the formatted expressions into a set, then convert the set to a sorted list, and iterate over it. That should fix this problem.)

This comment has been minimized.

Copy link
@Kwpolska

Kwpolska May 14, 2017

Member

Yes, it will. This sounds like an edge case, but are you aware of any good solutions for this?

This comment has been minimized.

Copy link
@Kwpolska

Kwpolska May 14, 2017

Member

Made it a set. There’s still the case of the same header being returned by multiple expressions, should I also work around that?

ralsina and others added 6 commits May 4, 2017
Signed-off-by: Chris Warrick <kwpolska@gmail.com>
Signed-off-by: Chris Warrick <kwpolska@gmail.com>
This prevents permalinks being created where they shouldn’t be, eg.
post or site titles.

Signed-off-by: Chris Warrick <kwpolska@gmail.com>
Signed-off-by: Chris Warrick <kwpolska@gmail.com>
@Kwpolska Kwpolska force-pushed the add-title-permalinks branch from 3b748b0 to 1298a14 May 14, 2017
@Kwpolska
Copy link
Member

Kwpolska commented May 14, 2017

LGTM, but kind of incomplete without that extra deduplication PR.

The code is waiting on the deduplicate-headers branch until this lands.

Also, how permanent these permalinks will be depends a lot on the deduplication algorithm and what kind of page this is. (If it is an index where newer posts will appear before older ones, this will be a big problem.)

The deduplication algorithm will run primarily on indexes. We can discuss whether or not it should be changing links from the bottom-up in that PR.

@felixfontein
Copy link
Contributor

felixfontein commented May 14, 2017

Bottom-up would be a good idea, I think. This will at least make the links permanent when INDEXES_STATIC is True and you're not on the front page. If INDEXES_STATIC is False, all hope is lost anyway...

@ralsina
Copy link
Member Author

ralsina commented May 14, 2017

+1 to bottom-up

@Kwpolska
Copy link
Member

Kwpolska commented May 14, 2017

@ralsina: review welcome.

@felixfontein: alright, done on header-deduplication branch.

Copy link
Member Author

ralsina left a comment

LGTM other than the typo in the doc. I can't officially +1 because it's my PR :-)

.headerlink { opacity: 0.1; margin-left: 0.2em; }
.headerlink:hover { opacity: 1; text-decoration: none; }

Additionally, you can provide a custom list of XPath expressions which should be used for finding headers (``{hx}}`` is replaced by headers h1 through h6).

This comment has been minimized.

Copy link
@ralsina

ralsina May 14, 2017

Author Member

Unbalanced braces

@Kwpolska
Copy link
Member

Kwpolska commented May 14, 2017

Merging, thanks for the reviews.

@Kwpolska Kwpolska merged commit 3270735 into master May 14, 2017
5 checks passed
5 checks passed
codacy/pr Good work! A positive pull request.
Details
continuous-integration/appveyor/branch AppVeyor build succeeded
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details
@Kwpolska Kwpolska deleted the add-title-permalinks branch May 14, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants
You can’t perform that action at this time.