Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Smartquotes #5

Merged
merged 23 commits into from
Dec 15, 2022
Merged

Smartquotes #5

merged 23 commits into from
Dec 15, 2022

Conversation

black-puppydog
Copy link
Contributor

This is the follow-up to #4 that I mentioned before, so it includes all of those commits, plus "a few" more.

I should first say thanks again for reference implementation in JS, wouldn't have been able to do it without. 🙂

Of course this still needs linting and squashing, and I haven't run it through benchmarking at all. But at least I managed to avoid making string copies all over the place, so that's nice.
I had to add some lifetime annotations to the walk function to do that. Is that okay, or do you see a more elegant way to do this?

If you do want to crawl through the unsquashed history here you'll find that the IT WORKS commit reads a lot like the JS version. However, I found that to be too long, so I started breaking things out, and the result is actually fairly readable, if I say so myself. 😛

black-puppydog and others added 9 commits November 20, 2022 01:56
This will only apply to test fixtures when they get re-generated.
The test fixture is copied from markdown-it, with the rust code being
just a copy of the tables file with modified path and plugin set.

The core rule is to be filled; for now it is a no-op.
@rlidwka pointed out that this is how the JS implementation handles this.
Confirmed by adding a unit test to the JS lib.
The `plus_minus` test works!
The rest should be a walk in the park now. 😆
`cargo test -- --test fixtures_markdown_it_typographer_txt::ellipsis`

# Conflicts:
#	src/plugins/extra/typographer.rs
# Conflicts:
#	src/plugins/extra/typographer.rs
# Conflicts:
#	src/plugins/extra/typographer.rs
Copy link
Collaborator

@rlidwka rlidwka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't check smartquotes.rs implementation yet, typographer implementation looks good (maybe merge it separately)

tests/fixtures/testgen.js Outdated Show resolved Hide resolved
src/parser/node.rs Outdated Show resolved Hide resolved
src/plugins/extra/mod.rs Show resolved Hide resolved
src/plugins/extra/typographer.rs Outdated Show resolved Hide resolved
src/plugins/extra/typographer.rs Outdated Show resolved Hide resolved
src/plugins/extra/typographer.rs Outdated Show resolved Hide resolved
src/plugins/extra/smartquotes.rs Outdated Show resolved Hide resolved
src/plugins/extra/smartquotes.rs Show resolved Hide resolved
src/plugins/extra/smartquotes.rs Show resolved Hide resolved
src/plugins/extra/smartquotes.rs Outdated Show resolved Hide resolved
src/plugins/extra/smartquotes.rs Outdated Show resolved Hide resolved
>
{
fn run(root: &mut Node, _: &MarkdownIt) {
let text_tokens = all_text_tokens(root);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JS has pretest that returns early if input has no quotes at all (to improve performance by skipping the heavy logic in most cases)

maybe all_text_tokens can also check if quotes (' or ") are found, and return None/empty vec as a shortcut

needs benchmarks though

//!
//! The solution proposed here is to first compute all the replacement
//! operations on a read-only flat view of the document, and _then_ to perform
//! all replacements in a single call to `root.walk_mut`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That solution is impressive. Also quite sad that it can't be done simpler.

Maybe the solution is to get rid of AST tree, as JS version doesn't have any. I tried to add AST as an improvement, but rust just isn't very good with tree manipulation (out of the box and without unsafe that is).

@black-puppydog
Copy link
Contributor Author

didn't check smartquotes.rs implementation yet, typographer implementation looks good (maybe merge it separately)

Indeed, I left the typographer PR #4 open because I figured it's kinda ready, while this one here definitely needs "some" cleaning. But this PR really depends on the typographer as well, so I based the smartquotes branch off off the typographer one.

Thank you so much for the feedback! I'll go over it when I'm a little less tired. Some of it might also go into #4 then, we'll see. :)

This one turned out to be super difficult because we don't want to be
making these replacements inside linkified URLs.
Luckily, @rlidwka had the preceding commit ready to prevent that from
happening.

This fixes the last tests from the markdown-it fixture.
I had somehow missed that pattern when reading the docs on `once_cell` but
now that I know about it, this is _the_ obvious way to do it.

# Conflicts:
#	src/plugins/extra/typographer.rs
Thanks to @rlidwka who provided this piece in an act of kind teaching. :)
@black-puppydog
Copy link
Contributor Author

Integrated the first batch of feedback, some still to do.

This still includes the commits from #4 but everything starting from Generate smartquote test cases (8947b04) belongs to this PR.

This very closely follows the JS implementation
This way the main loop actually becomes quite small :)

# Conflicts:
#	src/plugins/extra/smartquotes.rs

# Conflicts:
#	src/plugins/extra/smartquotes.rs
Give that we don't use nested loops any longer, it is now easier to use
a `for` loop.

# Conflicts:
#	src/plugins/extra/smartquotes.rs

# Conflicts:
#	src/plugins/extra/smartquotes.rs
These were just screaming to be rewritten a little.
They're still not terribly intuitive, but at least concise.

# Conflicts:
#	src/plugins/extra/smartquotes.rs
The regex solution is concise, but has its drawbacks:

1. We need to call `to_string` to use a regex.
2. Using the whole regex machinery for single characters seems like overkill.
3. There was already a perfectly usable function in the library. :)
Previously the quotes were implemented to be `const` generics, but there
was no interface exposed to actually use this conveniently.
These are typically used in conjunction, so now is probably the right
time to enable them by default.
@black-puppydog
Copy link
Contributor Author

Okay, I went over most of the feedback now.

I also went ahead and added the two modules (typographer and smartquotes) to the extras module by default. Together with an example. I used a second call just to keep the lines in the example short.

@black-puppydog black-puppydog marked this pull request as ready for review December 3, 2022 16:31
@black-puppydog
Copy link
Contributor Author

I have to say I don't actually understand what the deps CI checks for. I did run the same commands locally at some point and it failed in the same way. I can try to fix this, but not sure what I'm really looking for.

@rlidwka
Copy link
Collaborator

rlidwka commented Dec 15, 2022

I have to say I don't actually understand what the deps CI checks for.

It checks with lowest possible versions of dependencies to make sure lower semver bounds are correct.

rust-lang/cargo#5657

In short: forget about it, probably regex crate version needs to get bumped.

Is it good to merge as is, or did you want to add anything else? I'll merge if it is, and fix tests on my own.

@black-puppydog
Copy link
Contributor Author

For me it's good as is.
Looking forward to seeing what the fix ends up being :)

@rlidwka rlidwka merged commit 33320b1 into markdown-it-rust:master Dec 15, 2022
@rlidwka rlidwka temporarily deployed to github-pages December 15, 2022 09:23 — with GitHub Actions Inactive
@black-puppydog black-puppydog deleted the smartquotes branch December 15, 2022 09:25
@rlidwka
Copy link
Collaborator

rlidwka commented Dec 15, 2022

Looking forward to seeing what the fix ends up being :)

Old code compiled fine with regex 0.2.0, new code doesn't, that's all. - 741dcd4

Also CLI part was forgotten in PR above, so added it. - eb54590

Seems to be working, gonna release it as 0.5.0 sometime soon. Thanks!

@black-puppydog
Copy link
Contributor Author

Ah damn, I have never needed the CLI so yeah I totally forgot about that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants