-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
smart punctuation option? #119
Comments
Let me offer some more detail -- seems like this issue doesn't have much detail to act on. One of the features I thought would be interesting from using Jekyll is "smart" typography, which comes from the SmartyPants spec implemented as an extension in the RedCarpet markdown parser, as noted in Jekyll docs. To give some examples, that would mean turning text like this:
...into this:
By no means do I think this should be a default, but it would be interesting to add -- I love {en,em}-dashes in particular. :) |
@marcusklaas it looks like smart punctuation exists as an extension in the C reference parser, along with tests: https://github.com/commonmark/cmark/blob/master/test/smart_punct.txt |
That's very good to know! This is a very good candidate for addition to pulldown as an extension. The tricky part is the implementation. Smart punctuation would add a number of special tokens, quotes and dashes, that we'd need to scan for. That could have a non-trivial performance impact, since the main character scan routine is the costliest part of the parse. This cost would even be apparent when the option is disabled, since many optimizations rely on this character set being static. Do you have ideas on how we could solve this efficiently, @raphlinus? I recall you mentioned this problem when we added table support and its pipe character to the "new" pulldown. |
Ok, I think I've solved the perf issue. First, I added However, I expected to see no change if diff --git a/parse_line_baseline b/parse_line_master
index 0b6d3c9..e4cc529 100644
--- a/parse_line_baseline
+++ b/parse_line_master
@@ -26,16 +26,18 @@
mov byte, ptr, [rbp, -, 558], 1
mov word, ptr, [rbp, -, 557], 0
mov byte, ptr, [rbp, -, 555], 1
+ movabs rax, 1103806595072
+ mov qword, ptr, [rbp, -, 539], rax
mov qword, ptr, [rbp, -, 554], 0
mov qword, ptr, [rbp, -, 546], 0
- mov dword, ptr, [rbp, -, 539], 0
- mov byte, ptr, [rbp, -, 535], 1
- mov dword, ptr, [rbp, -, 534], 0
- mov byte, ptr, [rbp, -, 530], 1
- mov dword, ptr, [rbp, -, 529], 16777216
- mov qword, ptr, [rbp, -, 525], 0
- mov qword, ptr, [rbp, -, 517], 0
- mov word, ptr, [rbp, -, 509], 256
+ mov byte, ptr, [rbp, -, 531], 0
+ mov dword, ptr, [rbp, -, 530], 257
+ mov byte, ptr, [rbp, -, 526], 1
+ mov word, ptr, [rbp, -, 525], 0
+ mov byte, ptr, [rbp, -, 523], 1
+ mov qword, ptr, [rbp, -, 516], 0
+ mov qword, ptr, [rbp, -, 522], 0
+ mov byte, ptr, [rbp, -, 508], 1
mov qword, ptr, [rbp, -, 485], 0
mov qword, ptr, [rbp, -, 491], 0
mov qword, ptr, [rbp, -, 499], 0 So I changed special_bytes to a static and for the case with no smart characters in crdt.md,
Yep. It's almost 2% faster even with the extra LUT entries. 3.3% faster without them. SIMD wasn't impacted by the static change, probably because the if you're worried about the perf of non-extended CommonMark mode, then you can add one (100% predictable) branch at the start of parse_line to choose a different LUT when there are options enabled, and this won't destroy inlining, it will just change one pointer to a different static location. Even one LUT per permutation of options, though one that handles all of them would probably suffice. I will PR the static change in a moment. But otherwise, implement away! |
It was being recreated on the stack each time parse_line was called. This will make it faster when more special chars are added, and to swap out LUTs for different parse options without affecting the performance of the first pass when those options are not enabled. See pulldown-cmark#119 (comment)
It was being recreated on the stack each time parse_line was called. This will make it faster when more special chars are added, and to swap out LUTs for different parse options without affecting the performance of the first pass when those options are not enabled. See pulldown-cmark#119 (comment)
Hello, is there any news on this issue? |
Will this be implemented? It would be a really useful feature for me. |
If you're like me and want a quick visual fix in lieu of a proper solution to this problem at the Markdown parser stage, you can try Smartquotes.js. You can see it in action at my website. I imagine that this is a decently common problem for Zola websites. While it means that your website now depends on JavaScript, it's not a huge deal since it still works just fine without JS and the script is pretty tiny. Just pop the JavaScript file in <script src="smartquotes.js"></script>
<script>smartquotes()</script> Edit: I now have my own fork that also supports turning |
I've picked up this issue. In the spirit of @cormacrelf's work, I'm using the dynamic lookup table idea. I'm hoping to reuse much of the work he has already done, especially on smart quotes. Hopefully smart punctuation can make it into the 0.8 release of pulldown. |
Smart punctuation landed on master! |
Is there a way to parse with smart punctuation?
The text was updated successfully, but these errors were encountered: