Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve links in Markdown text. Support mistune 2.x #992

Merged
merged 12 commits into from Apr 7, 2022

Conversation

dairiki
Copy link
Contributor

@dairiki dairiki commented Jan 25, 2022

Here's another giant PR. This one refactors lektor.markdown with two main goals:

  • Clean up and fix a number of bugs with link and image resolution in Markdown text.
  • Support using the latest version of mistune (==2.0.2), while also maintaining support for the previous major version of mistune (0.9.4).

Issue(s) Resolved

Fixes #313
Fixes #966
Fixes #950
Supercedes #944
Supercedes #685
Supercedes #470

Related Issues / Links

Doc PR: lektor/lektor-website#338

Description of Changes

  • Wrote at least one-line docstrings (for any new functions)
  • Added unit test(s) covering the changes (if testable)
  • Link to corresponding documentation pull request for getlektor.com

Link and image URL resolution via the Lektor DB

Previously any URLs in Markdown text which were not external links were interpreted as URL paths relative to either the Lektor site root (if the path starts with a /) or the page containing the markdown text (if the path does not start with a /). As discussed in #966, this causes issues, e.g.:

  • when linking to attachments, when alternatives are in use
  • when linking to pages or attachments that use a custom slug

This PR changes the default behavior so that an attempt is made to resolve links as Lektor db paths to a source object. If that resolution is successful, then the URL to that source object is used. If that Lektor db lookup fails, then fall back to interpreting the path as a URL path (as is done now.)

Note that this new behavior is exactly the same as is currently implemented by Lektor’s url jinja filter.

Since, in many cases, resolution via the DB produces the same result as just interpreting the path as a URL path, while this is a backward-incompatible change, most of the time it will not make a difference. And when it does, I think, that most of the time, the new behavior is what was probably wanted in the first place.

New resolve_links markdown field config option

This PR adds a new field configuration option for markdown fields. This option can take one of three values:

  • resolve_links = never
    Never resolve links to Lektor source objects. This is the old (current) behavior, equivalent to prefixing all paths with '!'.
  • resolve_links = when-possible
    This is the new default behavior. When possible, links are resolved to Lektor source objects, then the URL to those source objects is used. When the resolution to source object fails, the links are interpreted as URL paths.
  • resolve_links = always
    Resolve links to Lektor source objects, then use the URL of those source objects. If that resolution fails, an error is raised.

This option may be set in the field definition section of a model.ini file. E.g. to restore the legacy behavior to a specific body field of your pages, you could put in models/page.ini:

...
[fields.body]
label = Body
type = markdown
resolve_links = never
...

Cleanup and of SourceObject.url_to.

This PR refactors SourceObject.url_to. I think the new code is much more readable than the old version.

Change of behavior with absolute=True

It does make one behavior change. Previously if absolute=True was passed, the SourceObject.url_to would return an "absolute" URL path (one which starts with a /), but that path was relative to the site root configured in the project file (if any). The PR changes that so that a real absolute path is returned (relative to the root of the webserver). E.g., if in the project file a site URL was set:

[project]
url = http://example.org/some/path/

SourceObject.url_to("/", absolute=True) used to return "/". This PR changes things so that it returns "/some/path/".

It can be argued that the previous behavior was a bug. Consider writing — not unreasonably — in a template:

See <a href="{{ this.get('subpage').url_to(absolute=True) }}">this sub-page</a>.

If there is a [project]url set with a non-empty path, that link will be a broken one under the current behavior.

Addition of resolve and strict_resolve parameters

This PR also adds two new parameters to SourceObject.url_to: resolve, and strict_resolve.
The same new parameters are also added to Context.url_to and to the url jinja filter.

Passing resolve=False will prevent any attempt to resolve the path as a db path. This is equivalent to what happens currently if one prefixes the path with '!'.

Passing strict_resolve=True will cause an error to be raised if the path can not be resolved as a db path.

Support for lektor: URL scheme

As described in #966, SourceObject.url_to is modified to recognize the new lektor: URL scheme. Prefixing a path with lektor: is now essentially equivalent to passing the new strict_resolve=True parameter (see above).

With the addition of the strict_resolve parameter, the addition of the lektor: URL scheme is probably overkill. At this point I plan on removing it from this PR before merging have removed it from this PR.


Support for mistune 2.x

This also adds support for the latest version of mistune.

Note that it is unlikely that existing Lektor plugins that further customize Lektor's Markdown rendering will “just work” if running under the new version of mistune. (Most such plugins work by subclassing the mistune Renderer class. The API to that class has changed in the latest version of mistune. The arguments to the constructor of the mistune parser have changed as well.)

Lektor plugins that work now should continue to work if mistune is pinned to the old version (mistune<2.)

(A little more thought should probably be given to the issue of making life under multiple versions of mistune easy for plugins.)

@dairiki dairiki force-pushed the bug.markdown-link-resolution branch 4 times, most recently from 880a007 to 9f9eb00 Compare January 25, 2022 06:46
@dairiki dairiki force-pushed the bug.markdown-link-resolution branch 3 times, most recently from 6029aa4 to 9cc8700 Compare February 26, 2022 22:18
dairiki added a commit to dairiki/lektor-website that referenced this pull request Feb 27, 2022
@dairiki dairiki marked this pull request as ready for review February 27, 2022 01:03
This add a "lektor:" URL scheme, recognized by `SourceObject.url_to`,
which causes the URL path to be interpreted as a (possibly relative)
Lektor db-path.  This is very similar to how plain paths (with no
scheme or netloc) are handled currently, except that, if resolution
to a Lektor source object fails, an error will be raised, instead of
falling back to interpreting the path as a URL path.

It also introduces a slightly-backward-incompatible change in how
`SourceObject.url_to` interprets its `absolute` parameter.
Formerly, if `absolute` was truish, the "absolute" URL path relative
to the site root (configured by the `[project]url` setting in the
project file). This has now been changed to return an absolute URL
path (which includes any prefix defined by `[project]url`.)

It can be argued that the former behavior was a bug.  Suppose a
template, not unreasonably, contains

    See <a href="{{ this.get('child').url_to(absolute=True)
    }}">child</a>.

That link will be a broken one if `[project]url` is set to something
with a non-empty path (e.g. `url = http://example.org/site-prefix/`).
This allows Markdown Renderers to vary their behavior depending on
the configuration of the markdown field in the datamodel.
When alternatives are in use, Attachment.iter_source_filenames was
only including the metadata file for the attachment's alt. It was
not including the fallback metadata file.  This fixes that.
The new field option for markdown fields controls how relative urls
are resolved within the Markdown text.

`resolve_links = never`

Never resolve links to Lektor source objects.  This is equivalent to
the previous behavior, where ImprovedRenderer prefixed all relative
URLs with ``'!'`` before passing them to SourceObject.url_to.

`resolve_links = when-possible`

This is the new default. When possible, links are resolved to Lektor
source objects, then the URLs to those source objects are used.  If
resolution to source object fails, the link is interpreted as URL path.

`resolve_links = always`

Liek `resolve_links = when-possible`, except raises an error if the
link can not be resolved to a Lektor source object.
@dairiki dairiki force-pushed the bug.markdown-link-resolution branch from 9cc8700 to 3db533a Compare March 31, 2022 00:34
@dairiki
Copy link
Contributor Author

dairiki commented Mar 31, 2022

@yagebu @xlotlu (and any other interested parties):

I would like to merge this PR soon. However, since it does introduce a backward-incompatibility — namely that links in markdown text will be resolved via the Lektor DB rather than interpreted only as URLs (see the main PR description, above) — I want to double-check that there are no strong objections before doing so.

Also, would appreciate opinions on whether this is a significant enough change to warrant a major version bump (to 4.x) rather than just a minor version bump (to 3.4.x).

Copy link
Contributor

@yagebu yagebu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM :)

I'd be happy to ship this with 3.4, since this is, where incompatible, probably closer to the expected/intended behaviour (as evidenced by the long chain of issues and PRs that this PR would finally close :))

@dairiki dairiki merged commit 9b8613d into lektor:master Apr 7, 2022
@dairiki dairiki deleted the bug.markdown-link-resolution branch April 7, 2022 01:42
dairiki added a commit to lektor/lektor-website that referenced this pull request Jul 28, 2022
dairiki added a commit to dairiki/lektor that referenced this pull request Feb 24, 2023
dairiki added a commit to dairiki/lektor that referenced this pull request Feb 24, 2023
@dairiki dairiki mentioned this pull request Feb 24, 2023
3 tasks
dairiki added a commit that referenced this pull request Feb 26, 2023
* refactor(markdown): allow rendering markdown without a current record

Though that if there is no current record, it will not be possible to
perform record-relative resolution of URLs for links and images.

* fix: markdown jinja filter (broken in PR #992)

This addresses #1100.
@dairiki dairiki mentioned this pull request Aug 2, 2023
dairiki added a commit to dairiki/lektor that referenced this pull request Sep 11, 2023
* refactor(markdown): allow rendering markdown without a current record

Though that if there is no current record, it will not be possible to
perform record-relative resolution of URLs for links and images.

* fix: markdown jinja filter (broken in PR lektor#992)

This addresses lektor#1100.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants