Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use shortcodes in a more robust manner #2737

Merged
merged 24 commits into from May 10, 2017
Merged

Use shortcodes in a more robust manner #2737

merged 24 commits into from May 10, 2017

Conversation

@ralsina
Copy link
Member

@ralsina ralsina commented Apr 28, 2017

Pull Request Checklist

  • I’ve read the guidelines for contributing.
  • I updated AUTHORS.txt and CHANGES.txt (if the change is non-trivial) and documentation (if applicable).
  • I tested my changes.

Description

This changes how shortcodes are applied:

  • Before compiling the file, shortcodes are extracted and replaced by UUIDs
  • After compiling, shortcodes are processed and re-inserted where the UUID markers are

This avoids problems like rst mangling URLs and such.

Still needs work:

  • apply_shortcodes_2 is not a proper function name ;-)
  • All compilers can be switched to this approach gradually, but I would prefer if they all switched at once.

The older API can be kept for backwards compatibility since it has not been touched.

# Previous shortcode closes here
buffer.append(chunk)
sc_id = str(uuid.uuid4())
new_data.append(sc_id)

This comment has been minimized.

@Kwpolska

Kwpolska Apr 28, 2017
Member

The UUID might not be enough, perhaps add some special marker that it’s a shortcode. Also, I assume it’s guaranteed not to appear in the final cache/ file?

This comment has been minimized.

@felixfontein

felixfontein Apr 29, 2017
Contributor

Also some compiler might mangle the - in the UUID. I'd remove the dashes to be on the safe side.
(And add some marker, like sc_id = 'SHORTCODE{0}REPLACEMENT'.format(uuid.uuid4().replace('-', '')).)

This comment has been minimized.

@ralsina

ralsina May 4, 2017
Author Member

The UUID should be unique enough, considering there are more UUIDs than atoms in the universe. The stuff about dashes sounds good.

This comment has been minimized.

@Kwpolska

Kwpolska May 5, 2017
Member

UUIDs are unique within each other, but not in the world. If you just use an UUID, especially without dashes, you might conflict with, say, a checksum that is part of the user content (or any other 128+-bit integer in hex format — which is not so rare on security– or low-level-coding–oriented blogs).

@@ -116,6 +116,9 @@ def slugify_tag_name(self, name, lang):

def slugify_category_name(self, path, lang):
"""Slugify a category name."""
if not path: # Probably used link://category

This comment has been minimized.

This comment has been minimized.

@felixfontein

felixfontein Apr 29, 2017
Contributor

I guess this wasn't intended to be part of this PR

This comment has been minimized.

@ralsina

ralsina May 4, 2017
Author Member

Yeah, it's not.

if lang is None:
lang = utils.LocaleBorg().current_lang
return shortcodes.apply_shortcodes(data, self.shortcode_registry, self, filename, lang=lang, with_dependencies=with_dependencies, extra_context=extra_context)

def apply_shortcodes2(self, data, shortcodes, filename=None, lang=None, with_dependencies=False, extra_context=None):

This comment has been minimized.

@Kwpolska

Kwpolska Apr 28, 2017
Member

Eh, even the best people have functions named like this. apply_shortcodes_uuid?

This comment has been minimized.

@ralsina

ralsina May 4, 2017
Author Member

Sounds good.

@@ -83,6 +80,9 @@ def _skip_whitespace(data, pos, must_be_nontrivial=False):

def _skip_nonwhitespace(data, pos):
"""Return first position not before pos which contains a non-whitespace character."""
for i, x in enumerate(data[pos:]):
if x.isspace():
return pos + i

This comment has been minimized.

@felixfontein

felixfontein Apr 29, 2017
Contributor

Now you are iterating over the rest of the string twice in case no space is found.

This comment has been minimized.

@ralsina

ralsina May 4, 2017
Author Member

No I am not? Sorry, it's late and I don't see it :-)

This comment has been minimized.

@felixfontein

felixfontein May 5, 2017
Contributor

The for loop tries to find the first non-whitespace. If it doesn't find anything, the while loop (which is still there below) does the same thing afterwards, and it also won't find anything, so finally len(data) will be returned. You can replace the while loop with return len(data).

This comment has been minimized.

@ralsina

ralsina May 7, 2017
Author Member

Ah, I forgot to delete all that. Will fix.

# Previous shortcode closes here
buffer.append(chunk)
sc_id = str(uuid.uuid4())
new_data.append(sc_id)

This comment has been minimized.

@felixfontein

felixfontein Apr 29, 2017
Contributor

Also some compiler might mangle the - in the UUID. I'd remove the dashes to be on the safe side.
(And add some marker, like sc_id = 'SHORTCODE{0}REPLACEMENT'.format(uuid.uuid4().replace('-', '')).)

@@ -209,14 +209,58 @@ def _parse_shortcode_args(data, start, shortcode_name, start_pos):
raise ParsingError("Shortcode '{0}' starting at {1} is not terminated correctly with '%}}}}'!".format(shortcode_name, _format_position(data, start_pos)))


def _extract_shortcodes(data):

This comment has been minimized.

@felixfontein

felixfontein Apr 29, 2017
Contributor

Why use a private function name? Isn't this function supposed to be used by plugins? It shouldn't start with an underscore IMO.

This comment has been minimized.

@ralsina

ralsina May 4, 2017
Author Member

No, it's only meant to be used right here.

3. (_SHORTCODE_END, text, start, name)
1. ("TEXT", text)
2. ("SHORTCODE_START", text, start, name, args)
3. ("SHORTCODE_END", text, start, name)

This comment has been minimized.

@felixfontein

felixfontein Apr 29, 2017
Contributor

Why do you do these replacements? Before, you'd get a runtime error in case you did a typo (because the symbol with the typo doesn't exist). Now you won't notice anything until something behaves strange, and then you have to some debugging to find out that you did a typo.

This comment has been minimized.

@ralsina

ralsina May 4, 2017
Author Member

Because reading the parsed code made my head hurt.

@ralsina ralsina changed the title WIP: use shortcodes in a more robust manner Use shortcodes in a more robust manner May 4, 2017
@ralsina ralsina changed the title Use shortcodes in a more robust manner WIP: Use shortcodes in a more robust manner May 4, 2017
@ralsina ralsina changed the title WIP: Use shortcodes in a more robust manner Use shortcodes in a more robust manner May 4, 2017
@ralsina
Copy link
Member Author

@ralsina ralsina commented May 5, 2017

This has a bug when shortcodes are "nested". Specifically this is used in the manual, with the "raw" shortcode wrapping another shortcode that should not be processed.

The extract_shortcodes algorithm is wrong. I know how to fix it, it's going to take a bit longer.

@ralsina
Copy link
Member Author

@ralsina ralsina commented May 7, 2017

@Kwpolska @felixfontein if you could give this code another look, this is sort-of-done I think.

Copy link
Contributor

@felixfontein felixfontein left a comment

LGTM. I assume you tested this more than the default tests do it :)

tail = splitted
while True:
new_text, tail = extract_data_chunk(tail)
text += new_text

This comment has been minimized.

@felixfontein

felixfontein May 7, 2017
Contributor

Wouldn't it be better to make text an array of strings, and return ''.join(text) at the end?

This comment has been minimized.

@ralsina

ralsina May 7, 2017
Author Member

Sure.

a dictionary of shortcodes, where the keys are UUIDs and the values
are the shortcodes themselves ready to process.
>>> c = 0

This comment has been minimized.

@Kwpolska

Kwpolska May 7, 2017
Member

Doctests are not executed anymore (#2703), it might be better to make those real tests.

This comment has been minimized.

@ralsina

ralsina May 7, 2017
Author Member

Oh, well. I'll do that then.

@@ -81,14 +81,17 @@ def compile_string(self, data, source_path=None, is_two_file=True, post=None, la
'language_code': LEGAL_VALUES['DOCUTILS_LOCALES'].get(LocaleBorg().current_lang, 'en')
}

from nikola import shortcodes as sc

This comment has been minimized.

@Kwpolska

Kwpolska May 7, 2017
Member

What about other compilers? Also, I think it might be a slightly better-looking API to put extract_shortcodes in the Nikola object as well.

This comment has been minimized.

@ralsina

ralsina May 7, 2017
Author Member

@Kwpolska Since the old API still works, later branches can port each one to the new API.

I did not put extract_shortcodes in the Nikola object mostly because there's nothing site-specific in it, and I am half convinced to use a real tokenizer in shortcodes.py which would make things awkward if things need refactoring.

@ralsina
Copy link
Member Author

@ralsina ralsina commented May 7, 2017

@felixfontein the demo site has a bunch of shortcodes usage, and this passes the invariance test, so at least that much works :-)

Copy link
Member

@Kwpolska Kwpolska left a comment

LGTM!

def test_extract_shortcodes(input, expected, monkeypatch):

i = iter('SC%d' % i for i in range(1, 100))
if sys.version[0] < "3":

This comment has been minimized.

@Kwpolska

Kwpolska May 8, 2017
Member

if sys.version_info[0] < 3:

This comment has been minimized.

@ralsina

ralsina May 8, 2017
Author Member

Actually, that's what it was and it failed :-P

This comment has been minimized.

@felixfontein

felixfontein May 8, 2017
Contributor

No, you did sys.version[0] < 3.

According to the documentation of sys.version: "Do not extract version information out of it, rather, use version_info [...]" (https://docs.python.org/2/library/sys.html#sys.version_info, https://docs.python.org/3/library/sys.html#sys.version)

This comment has been minimized.

@ralsina

ralsina May 9, 2017
Author Member

Ahhhh right.

])
def test_extract_shortcodes(input, expected, monkeypatch):

i = iter('SC%d' % i for i in range(1, 100))

This comment has been minimized.

@ralsina

ralsina May 9, 2017
Author Member

I should change this to use itertools.counter anyway

@ralsina ralsina merged commit 77ddb6f into master May 10, 2017
5 checks passed
5 checks passed
codacy/pr Good work! A positive pull request.
Details
continuous-integration/appveyor/branch AppVeyor build succeeded
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details
@ralsina ralsina deleted the shortcodes2 branch May 10, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants