New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor entities encoder #5
Conversation
…, not escaping HTML special characters.
@@ -26,20 +26,14 @@ module Markd | |||
lit("\n") if @last_output != "\n" | |||
end | |||
|
|||
def escape(text, preserve_entities = false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if preserve_entities
served any purpose, it was at least not specced and the result from gsub
would be the same anyway because only the four special characters are really replaced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right.
168f751
to
4cf3726
Compare
4cf3726
to
852a393
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking ✨ ! I have left a comment about concatenates string.
Through benchmark, +
is faster then #{}
, Maybe revert it all back?
https://github.com/icyleaf/fast-crystal#concatenation-code
@@ -26,20 +26,14 @@ module Markd | |||
lit("\n") if @last_output != "\n" | |||
end | |||
|
|||
def escape(text, preserve_entities = false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right.
high = chars.codepoint_at(0) | ||
low = chars.codepoint_at(0) | ||
codepoint = (high - 0xD800) * -0x400 + low - 0xDC00 + 0x10000 | ||
|
||
"&#x" + codepoint.to_s(16).upcase + ";" | ||
"&#x#{codepoint.to_s(16).upcase};" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Through benchmark, +
is faster then #{}
when concatenates string, maybe revert it all?
https://github.com/icyleaf/fast-crystal#concatenation-code
About the string interpolation: I don't think your benchmark is accurate because it uses a constant and LLVM is probably applying some performance tweaks. I don't think it is worth sacrificing good coding style for this little improvement. If you don't mind, I'd rather use interpolation, but I'm happy to change it if you'd prefer it that way. Benchmark example: def foo
rand.to_s
end
Benchmark.ips do |bm|
bm.report "+ single" { 10.times { "a" + foo }}
bm.report "# single" { 10.times { "a#{foo}" }}
end |
This PR made everything 100 times slower (I thought it was #7 but it's this PR) |
I would strongly suggest to have a benchmark somewhere and, before merging a PR, check if the performance gets better or worse. |
It seems |
Also this: return @@regex if @@regex.source != "^"
@@regex = Regex.union(HTMLEntities::ENTITIES_MAPPINGS.values) can be easily replaced with a constant... |
@asterite Thanks for the investigation. I was pretty sure I tested performance impact before pushing, but I'm not certain that I did it on this PR... 🤔
|
I profiled it with XCode's instruments and it pointed right to that method. I didn't investigate it much more, though. |
I'm using Crystal's README.md to benchmark this. Before this PR there was just a |
Ah yes, |
This PR renames the custom
HTML.escape
andHTML.unescape
methods because their purpose is to encode and decode HTML entities, not escaping HTML special characters.I also refactored and simplified some related code and removed unused constants in
Rule
.When crystal-lang/crystal#4555 get's merged, the custom implementation of
Renderer#escape
should be replaced with the one from stdlib.