New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use shortcodes in a more robust manner #2737
Changes from 22 commits
741fb2e
f5d6133
9118589
359c08e
c4f8a3f
284eb11
1ccbecf
92af8ca
d041bfa
813b330
5adfa75
3e4ad02
33abf6d
b9cf603
9b8c567
892d136
cd3ae74
5b73c6b
af11c7d
ce8682d
fbd5730
1296210
6b4c4ba
62c5645
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -27,14 +27,11 @@ | |
"""Support for Hugo-style shortcodes.""" | ||
|
||
from __future__ import unicode_literals | ||
from .utils import LOGGER | ||
import sys | ||
|
||
import uuid | ||
|
||
# Constants | ||
_TEXT = 1 | ||
_SHORTCODE_START = 2 | ||
_SHORTCODE_END = 3 | ||
from .utils import LOGGER | ||
import sys | ||
|
||
|
||
class ParsingError(Exception): | ||
|
@@ -83,11 +80,10 @@ def _skip_whitespace(data, pos, must_be_nontrivial=False): | |
|
||
def _skip_nonwhitespace(data, pos): | ||
"""Return first position not before pos which contains a non-whitespace character.""" | ||
while pos < len(data): | ||
if data[pos].isspace(): | ||
break | ||
pos += 1 | ||
return pos | ||
for i, x in enumerate(data[pos:]): | ||
if x.isspace(): | ||
return pos + i | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Now you are iterating over the rest of the string twice in case no space is found. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No I am not? Sorry, it's late and I don't see it :-) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, I forgot to delete all that. Will fix. |
||
return len(data) | ||
|
||
|
||
def _parse_quoted_string(data, start): | ||
|
@@ -209,14 +205,66 @@ def _parse_shortcode_args(data, start, shortcode_name, start_pos): | |
raise ParsingError("Shortcode '{0}' starting at {1} is not terminated correctly with '%}}}}'!".format(shortcode_name, _format_position(data, start_pos))) | ||
|
||
|
||
def _new_sc_id(): | ||
return str('SHORTCODE{0}REPLACEMENT'.format(str(uuid.uuid4()).replace('-', ''))) | ||
|
||
|
||
def extract_shortcodes(data): | ||
""" | ||
Return data with replaced shortcodes, shortcodes. | ||
|
||
data is the original data, with the shortcodes replaced by UUIDs. | ||
|
||
a dictionary of shortcodes, where the keys are UUIDs and the values | ||
are the shortcodes themselves ready to process. | ||
""" | ||
shortcodes = {} | ||
splitted = _split_shortcodes(data) | ||
|
||
def extract_data_chunk(data): | ||
"""Take a list of splitted shortcodes and return a string and a tail. | ||
|
||
The string is data, the tail is ready for a new run of this same function. | ||
""" | ||
text = [] | ||
for i, token in enumerate(data): | ||
if token[0] == 'SHORTCODE_START': | ||
name = token[3] | ||
sc_id = _new_sc_id() | ||
text.append(sc_id) | ||
# See if this shortcode closes | ||
for j in range(i, len(data)): | ||
if data[j][0] == 'SHORTCODE_END' and data[j][3] == name: | ||
# Extract this chunk | ||
shortcodes[sc_id] = ''.join(t[1] for t in data[i:j + 1]) | ||
return ''.join(text), data[j + 1:] | ||
# Doesn't close | ||
shortcodes[sc_id] = token[1] | ||
return ''.join(text), data[i + 1:] | ||
elif token[0] == 'TEXT': | ||
text.append(token[1]) | ||
return ''.join(text), data[1:] | ||
elif token[0] == 'SHORTCODE_END': # This is malformed | ||
raise Exception('Closing unopened shortcode {}'.format(token[3])) | ||
|
||
text = [] | ||
tail = splitted | ||
while True: | ||
new_text, tail = extract_data_chunk(tail) | ||
text.append(new_text) | ||
if not tail: | ||
break | ||
return ''.join(text), shortcodes | ||
|
||
|
||
def _split_shortcodes(data): | ||
"""Given input data, splits it into a sequence of texts, shortcode starts and shortcode ends. | ||
|
||
Returns a list of tuples of the following forms: | ||
|
||
1. (_TEXT, text) | ||
2. (_SHORTCODE_START, text, start, name, args) | ||
3. (_SHORTCODE_END, text, start, name) | ||
1. ("TEXT", text) | ||
2. ("SHORTCODE_START", text, start, name, args) | ||
3. ("SHORTCODE_END", text, start, name) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why do you do these replacements? Before, you'd get a runtime error in case you did a typo (because the symbol with the typo doesn't exist). Now you won't notice anything until something behaves strange, and then you have to some debugging to find out that you did a typo. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Because reading the parsed code made my head hurt. |
||
|
||
Here, text is the raw text represented by the token; start is the starting position in data | ||
of the token; name is the name of the shortcode; and args is a tuple (args, kw) as returned | ||
|
@@ -228,9 +276,9 @@ def _split_shortcodes(data): | |
# Search for shortcode start | ||
start = data.find('{{%', pos) | ||
if start < 0: | ||
result.append((_TEXT, data[pos:])) | ||
result.append(("TEXT", data[pos:])) | ||
break | ||
result.append((_TEXT, data[pos:start])) | ||
result.append(("TEXT", data[pos:start])) | ||
# Extract name | ||
name_start = _skip_whitespace(data, start + 3) | ||
name_end = _skip_nonwhitespace(data, name_start) | ||
|
@@ -246,13 +294,13 @@ def _split_shortcodes(data): | |
# Must be followed by '%}}' | ||
if pos > len(data) or data[end_start:pos] != '%}}': | ||
raise ParsingError("Syntax error: '{{{{% /{0}' must be followed by ' %}}}}' ({1})!".format(name, _format_position(data, end_start))) | ||
result.append((_SHORTCODE_END, data[start:pos], start, name)) | ||
result.append(("SHORTCODE_END", data[start:pos], start, name)) | ||
elif name == '%}}': | ||
raise ParsingError("Syntax error: '{{{{%' must be followed by shortcode name ({0})!".format(_format_position(data, start))) | ||
else: | ||
# This is an opening shortcode | ||
pos, args = _parse_shortcode_args(data, name_end, shortcode_name=name, start_pos=start) | ||
result.append((_SHORTCODE_START, data[start:pos], start, name, args)) | ||
result.append(("SHORTCODE_START", data[start:pos], start, name, args)) | ||
return result | ||
|
||
|
||
|
@@ -284,17 +332,17 @@ def apply_shortcodes(data, registry, site=None, filename=None, raise_exceptions= | |
pos = 0 | ||
while pos < len(sc_data): | ||
current = sc_data[pos] | ||
if current[0] == _TEXT: | ||
if current[0] == "TEXT": | ||
result.append(current[1]) | ||
pos += 1 | ||
elif current[0] == _SHORTCODE_END: | ||
elif current[0] == "SHORTCODE_END": | ||
raise ParsingError("Found shortcode ending '{{{{% /{0} %}}}}' which isn't closing a started shortcode ({1})!".format(current[3], _format_position(data, current[2]))) | ||
elif current[0] == _SHORTCODE_START: | ||
elif current[0] == "SHORTCODE_START": | ||
name = current[3] | ||
# Check if we can find corresponding ending | ||
found = None | ||
for p in range(pos + 1, len(sc_data)): | ||
if sc_data[p][0] == _SHORTCODE_END and sc_data[p][3] == name: | ||
if sc_data[p][0] == "SHORTCODE_END" and sc_data[p][3] == name: | ||
found = p | ||
break | ||
if found: | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -74,3 +74,23 @@ def test_errors(self): | |
self.assertRaisesRegexp(shortcodes.ParsingError, "^Found shortcode ending '{{% / %}}' which isn't closing a started shortcode", shortcodes.apply_shortcodes, '{{% / %}}', self.fakesite.shortcode_registry, raise_exceptions=True) | ||
self.assertRaisesRegexp(shortcodes.ParsingError, "^Syntax error: '{{% /' must be followed by ' %}}'", shortcodes.apply_shortcodes, '{{% / a %}}', self.fakesite.shortcode_registry, raise_exceptions=True) | ||
self.assertRaisesRegexp(shortcodes.ParsingError, "^Shortcode '<==' starting at .* is not terminated correctly with '%}}'!", shortcodes.apply_shortcodes, '==> {{% <==', self.fakesite.shortcode_registry, raise_exceptions=True) | ||
|
||
|
||
@pytest.mark.parametrize("input, expected", [ | ||
('{{% foo %}}', (u'SC1', {u'SC1': u'{{% foo %}}'})), | ||
('{{% foo %}} bar {{% /foo %}}', (u'SC1', {u'SC1': u'{{% foo %}} bar {{% /foo %}}'})), | ||
('AAA{{% foo %}} bar {{% /foo %}}BBB', (u'AAASC1BBB', {u'SC1': u'{{% foo %}} bar {{% /foo %}}'})), | ||
('AAA{{% foo %}} {{% bar %}} {{% /foo %}}BBB', (u'AAASC1BBB', {u'SC1': u'{{% foo %}} {{% bar %}} {{% /foo %}}'})), | ||
('AAA{{% foo %}} {{% /bar %}} {{% /foo %}}BBB', (u'AAASC1BBB', {u'SC1': u'{{% foo %}} {{% /bar %}} {{% /foo %}}'})), | ||
('AAA{{% foo %}} {{% bar %}} quux {{% /bar %}} {{% /foo %}}BBB', (u'AAASC1BBB', {u'SC1': u'{{% foo %}} {{% bar %}} quux {{% /bar %}} {{% /foo %}}'})), | ||
('AAA{{% foo %}} BBB {{% bar %}} quux {{% /bar %}} CCC', (u'AAASC1 BBB SC2 CCC', {u'SC1': u'{{% foo %}}', u'SC2': u'{{% bar %}} quux {{% /bar %}}'})), | ||
]) | ||
def test_extract_shortcodes(input, expected, monkeypatch): | ||
|
||
i = iter('SC%d' % i for i in range(1, 100)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I should change this to use itertools.counter anyway |
||
if sys.version[0] < "3": | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if sys.version_info[0] < 3: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually, that's what it was and it failed :-P There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, you did According to the documentation of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ahhhh right. |
||
monkeypatch.setattr(shortcodes, '_new_sc_id', i.next) | ||
else: | ||
monkeypatch.setattr(shortcodes, '_new_sc_id', i.__next__) | ||
extracted = shortcodes.extract_shortcodes(input) | ||
assert extracted == expected |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about other compilers? Also, I think it might be a slightly better-looking API to put
extract_shortcodes
in the Nikola object as well.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Kwpolska Since the old API still works, later branches can port each one to the new API.
I did not put extract_shortcodes in the Nikola object mostly because there's nothing site-specific in it, and I am half convinced to use a real tokenizer in shortcodes.py which would make things awkward if things need refactoring.