-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Typographer (Replacing ©, ®, ™, ... etc) #4
Conversation
9922931
to
7fa1847
Compare
Okay after adding another test (fixture) and cross-checking that one with the JS implementation as well, I'm now reasonably confident that this works as intended. :) |
src/plugins/extra/typographer.rs
Outdated
// Note: the regex docs state that multi-line regexes (with "(?m)" that | ||
// is) are not unicode-aware, meaning they *only* match '\n' as the | ||
// newline and not any other newline characters which... apparently | ||
// exist... Oh well... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in markdown-it here \n
is the only possible newline, other newlines will get normalized for simplicity
so I don't think any problem with newlines exists
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just checked this by inserting this test into markdown-it-typographer.rs
:
#[test]
fn can_handle_unicode_line_tabulation() {
run("--\u{000B}---", "<p>– —</p>");
}
This uses the line tabulation character which println!
does interpret as a newline (but preserving the horizontal indentation as expected).
The typography works fine, it produces "<p>–\u{b}—</p>\n"
and the same happens in the JS lib (I checked). Other newlines like \u{0085}
aka next line get completely ignored by JS, while the rust version would produce \u{85}
in the output.
So for the typography I will actually remove the comment since it works fine (matching these more exotic newlines works just as fine as matching $
or ^
) but I'm not sure what you mean by "other newlines will get normalized". Is that something that is planned, or a missing feature, or should it already work and it' somehow broken in these tests?
src/plugins/extra/typographer.rs
Outdated
]); | ||
|
||
static ref RARE_RE: Regex = Regex::new(r"\+-|\.\.|\?\?\?\?|!!!!|,,|--").unwrap(); | ||
static ref SCOPED_RE: Regex = Regex::new(r"\((C|c|TM|tm|R|r)\)").unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Case-insensitive flag in Rust looks something like this:
Regex::new(r"(?i)\((c|tm|r)\)").unwrap()
In your version, (Tm)
will not get replaced, which differs from JS.
But I guess that was intentional to simplify replace_abbreviation
below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was actually not intentional; I can add a test for that, and thanks for the pointer towards (?i)
! :)
In theory, custom walk function will do the job (i.e. not using
Actually, as far as I remember, text nodes that shouldn't be replaced should have
People usually use replacements together with smartquotes. Since smartquotes aren't implemented, I don't think this feature is complete, thus don't enable it by default yet. |
I believe this is the best way to fix it: 547505e |
Codecov ReportBase: 95.13% // Head: 95.32% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## master #4 +/- ##
==========================================
+ Coverage 95.13% 95.32% +0.19%
==========================================
Files 58 54 -4
Lines 2302 2226 -76
==========================================
- Hits 2190 2122 -68
+ Misses 112 104 -8
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
Thanks very much for the feedback, I'll look into it once I find a few quiet minutes. :)
Indeed, that's a good point. I'm already looking into porting the smartquotes next, although the algorithm used in JS is annoying to implement in rust due to borrow checking. But I'm sure I'll get there :D |
This will only apply to test fixtures when they get re-generated.
The test fixture is copied from markdown-it, with the rust code being just a copy of the tables file with modified path and plugin set. The core rule is to be filled; for now it is a no-op.
@rlidwka pointed out that this is how the JS implementation handles this. Confirmed by adding a unit test to the JS lib.
The `plus_minus` test works! The rest should be a walk in the park now. 😆
`cargo test -- --test fixtures_markdown_it_typographer_txt::ellipsis` # Conflicts: # src/plugins/extra/typographer.rs
# Conflicts: # src/plugins/extra/typographer.rs
# Conflicts: # src/plugins/extra/typographer.rs
7fa1847
to
1e6911c
Compare
This became quite a comprehensive rebase so that The linebreak question is a separate issue, if any. |
Just found time to add one last thing here; the way I had used |
This one turned out to be super difficult because we don't want to be making these replacements inside linkified URLs. Luckily, @rlidwka had the preceding commit ready to prevent that from happening. This fixes the last tests from the markdown-it fixture.
I had somehow missed that pattern when reading the docs on `once_cell` but now that I know about it, this is _the_ obvious way to do it. # Conflicts: # src/plugins/extra/typographer.rs
Thanks to @rlidwka who provided this piece in an act of kind teaching. :)
36c691c
to
fbc43e9
Compare
Following the comments in #5 I went ahead and made the changes you indicated. Thanks again for the feedback, this is a great learning experience for me! I appreciate that it takes a lot of time to do these reviews. ✅ |
This is a straight-up re-implementation of
lib/rules_core/replacements.js
from the JS library.There are still a few corner cases I'm not sure about, but this is progressed enough that I'd like feedback if anyone is willing to give. 🙂
I'm especially unsure about the
linkify.rs
modification. I needed this because the JS implementation checks for auto-linkified links, and doesn't touch the text in those. I tried multiple ways to do this.The "easiest" would have been to be able to walk up the tree of nodes, to determine whether one of my parents is
Linkified
. We don't have that though, right?The next attempt was to modify
walk_mut
so that I could pass a second closure that would be called when exiting the node. That way I could keep track of the linkification status, similar to how the JS lib does it. However, I never managed to get this past the borrow checker, because I was using a singlelinkification_level: i32
which I needed tomut
-pass to both the entry and exit closure. It might well be that there would be a trivial fix for this (I'm still quite new to Rust and my stdlib-foo is not great) but in any case that would mean a bunch of boiler-plate to insert into all fourwalk[_post][_mut]
methods and everywhere they're used.So my solution is to attach an empty
LinkifyMarker
struct to the nodes when they get auto-linkified. I then just check if the text node I'm dealing with has that marker, and ignore it if it's the case.Side note: the Error I'm encountering in #3 is trivial to fix with this branch: just add the
typographer
plugin to the parser during test, and it seems work!To do