Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Completely rewrote shortcode parser. #2200

Merged
merged 15 commits into from Jan 4, 2016
Merged

Completely rewrote shortcode parser. #2200

merged 15 commits into from Jan 4, 2016

Conversation

@felixfontein
Copy link
Contributor

felixfontein commented Dec 28, 2015

This PR fixes some shortcomings in the current shortcode parsers:

  • {{% a "%}}" %}} is parsed as expected;
  • better error messages are produced for various malformed situations;
  • no more infinite loops in cases such as ==> {{% <==;
  • strange constructs such as {{% a "b"c"d" %}} aren't allowed anymore.
@felixfontein felixfontein mentioned this pull request Dec 28, 2015


def _format_position(data, pos):
"""Return position formatted as line/column."""

This comment has been minimized.

Copy link
@Kwpolska

Kwpolska Dec 28, 2015

Member

Could use a note that this is for pretty error messages.

args.append(cname)

return name, (args, kwargs)
empty_string = data[:0] # same string type as data; to make Python 2 happy

This comment has been minimized.

Copy link
@Kwpolska

Kwpolska Dec 28, 2015

Member

I think '' will be enough in our case. By the way, this file didn’t have a from __future__ import unicode_literals import — I just fixed that.

This comment has been minimized.

Copy link
@felixfontein

felixfontein Dec 28, 2015

Author Contributor

This makes some test fail, since it always returns a unicode string. My final fix makes it return the same string type as the input was. But we could of course also fix the tests :)

Kwpolska added 2 commits Dec 28, 2015
Signed-off-by: Chris Warrick <kwpolska@gmail.com>
Signed-off-by: Chris Warrick <kwpolska@gmail.com>
@Kwpolska
Copy link
Member

Kwpolska commented Dec 28, 2015

  • What does non-trivial mean?
  • How fast is it for large inputs?
@felixfontein

This comment has been minimized.

Copy link
Contributor

felixfontein commented on nikola/shortcodes.py in e2b09e9 Dec 28, 2015

How about also making sure the error message sounds good when filename is None? After all, filename is optional.

This comment has been minimized.

Copy link
Member Author

Kwpolska replied Dec 28, 2015

Fixed in f8cccda. (note that filename is likely not to be empty in most cases)

@felixfontein
Copy link
Contributor Author

felixfontein commented Dec 28, 2015

You can also replace non-trival with non-empty if you want.

About large inputs: since it uses str.find('{{%', pos) to find the start of the next shortcode, and only then starts going through the string character by character until the end of this shortcode, it should be pretty efficient. If you use no shortcodes, it should not be noticeably slower than the old code.

Signed-off-by: Chris Warrick <kwpolska@gmail.com>
@Kwpolska
Copy link
Member

Kwpolska commented Dec 28, 2015

Would you mind adding tests (real tests, not doctests!) for all the things this parser fixes?

@felixfontein
Copy link
Contributor Author

felixfontein commented Dec 28, 2015

I found some more bugs, improved some error messages, and added proper tests.

@Kwpolska
Copy link
Member

Kwpolska commented Dec 28, 2015

LGTM, anybody else?

@ralsina
Copy link
Member

ralsina commented Jan 4, 2016

LGTM too. Merging.

ralsina added a commit that referenced this pull request Jan 4, 2016
Completely rewrote shortcode parser.
@ralsina ralsina merged commit 7803f7a into master Jan 4, 2016
4 checks passed
4 checks passed
continuous-integration/appveyor/branch AppVeyor build succeeded
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details
@ralsina ralsina deleted the improving_shortcodes branch Jan 4, 2016
@ralsina
Copy link
Member

ralsina commented Jan 4, 2016

@felixfontein how could we expand on this to make it handle both hugo shortcode cases?

  1. {{% %}} contents of the shortcode are passed as-is
  2. {{< >}} contents of the shortcode are passed through the markup compiler before replacing

/me is totally dumb about parsers and stuff. No computer-thingies college courses, see ;-)

@Kwpolska
Copy link
Member

Kwpolska commented Jan 4, 2016

Let’s not do < >. It’s tricky, because some things in the foodchain might replace it with &lt; and &gt;. We don’t need full feature parity with Hugo.

@ralsina
Copy link
Member

ralsina commented Jan 4, 2016

We could use a different marker.

On Mon, Jan 4, 2016 at 12:46 PM Chris Warrick notifications@github.com
wrote:

Let’s not do < >. It’s tricky, because some things in the foodchain might
replace it with < and >. We don’t need full feature parity with
Hugo.


Reply to this email directly or view it on GitHub
#2200 (comment).

@Kwpolska
Copy link
Member

Kwpolska commented Jan 4, 2016

Is % really not enough?

On 4 January 2016 at 20:23, Roberto Alsina notifications@github.com wrote:

We could use a different marker.

On Mon, Jan 4, 2016 at 12:46 PM Chris Warrick notifications@github.com
wrote:

Let’s not do < >. It’s tricky, because some things in the foodchain might
replace it with < and >. We don’t need full feature parity with
Hugo.


Reply to this email directly or view it on GitHub
#2200 (comment).


Reply to this email directly or view it on GitHub
#2200 (comment).

Chris Warrick https://chriswarrick.com/
PGP: 5EAAEA16

@felixfontein
Copy link
Contributor Author

felixfontein commented Jan 4, 2016

We might already have some problems, if you write something along {{% a "b<c" %}} it might be that the argument will be b&lt;c instead of b<c since shortcodes are applied to the transformed result.

It would be better to first replace the shortcodes with some unique IDs, then use markdown, and finally replace the markers by their transformed content. This would also make it trivial to implement support for {{< >}}: just apply these shortcodes before applying markdown.

@ralsina
Copy link
Member

ralsina commented Jan 4, 2016

The reason for "shortcodes with markup inside them" is pretty good. Imagine you want to make a paragraph be a bootstrap warning panel, or whatever. You could do this:

{{< warning-panel >}}
Some markdown or rest here
{{< /warning-panel >}}

Requiring the content of the shortcode not be processed by the compiler makes this usecase suck.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants
You can’t perform that action at this time.