Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for custom width providers #128

Closed
wants to merge 4 commits into from

Conversation

Moxinilian
Copy link

@Moxinilian Moxinilian commented Apr 1, 2018

This is part of the ongoing effort to support #126
This PR offers a way to set custom (integer) widths for characters. It does not add support for floating point numbers yet.
This allows the user to pass to most functions a closure taking a char and returning its width as a usize.
I tried to limit API changes for simple tasks by leaving the fill and wrap functions intact in signature, and added wrap_dynamic and fill_dynamic that take the closure.
I was not really inspired with those names, if you want me to change them to something else I would be glad to do it.
I also chose to re-export unicode_width::UnicodeWidthChar as it will probably come in handy for people writing custom closures, avoiding them to depend on unicode_width.

Example:

let wrap = Wrapper::new(15);
let res = wrap.wrap_dynamic("Concurrency without data races.", |c| {
    if c == 'h' {
        c.width().unwrap_or(0) + 4
    } else {
        c.width().unwrap_or(0)
    }
});
assert_eq!(res, vec!["Concurrency", "without", "data races."]);

This example illustrates an hypothetical case where an h would actually take 5 columns instead of 1.
I included it as a test, with its equivalent for Wrapper::fill_dynamic.

Benchmark:

before
test fill_100 ... bench:         539 ns/iter (+/- 115)
test fill_200 ... bench:       1,188 ns/iter (+/- 141)
test fill_400 ... bench:       2,416 ns/iter (+/- 329)
test fill_800 ... bench:       5,349 ns/iter (+/- 974)
test wrap_100 ... bench:         639 ns/iter (+/- 111)
test wrap_200 ... bench:       1,397 ns/iter (+/- 206)
test wrap_400 ... bench:       2,729 ns/iter (+/- 405)
test wrap_800 ... bench:       5,575 ns/iter (+/- 664)

after
test fill_100 ... bench:         554 ns/iter (+/- 139)
test fill_200 ... bench:       1,199 ns/iter (+/- 145)
test fill_400 ... bench:       2,409 ns/iter (+/- 1,570)
test fill_800 ... bench:       5,142 ns/iter (+/- 658)
test wrap_100 ... bench:         632 ns/iter (+/- 113)
test wrap_200 ... bench:       1,430 ns/iter (+/- 221)
test wrap_400 ... bench:       2,732 ns/iter (+/- 323)
test wrap_800 ... bench:       5,417 ns/iter (+/- 456)

Zero-cost abstraction really is awesome.

@Moxinilian
Copy link
Author

Moxinilian commented Apr 1, 2018

By the way this crate definitely deserves a refactor, it's getting quite big and it's a bit hard to work on it when it's only one single big file full of documentation.

@mgeisler
Copy link
Owner

mgeisler commented Apr 6, 2018

Hey @Moxinilian, thanks for the PR. It's not been forgotten, I've just been busy lately...

I'm traveling this weekend, so it won't be before next week that I get proper time to look at this.

Just some short remarks: I love that you could do this via a generic function parameter --- zero-cost abstractions are awesome, as you say :-)

About measuring the characters: would it perhaps be better to measure words at a time?

With combining characters, I guess you don't really know the true displayed width of a word until you typeset the entire thing. Take emoticons as an extreme example since I'm typing this on my phone: I recently learned that a symbol like 👩‍👩‍👧‍👦 is created at rendering time --- it's really multiple Unicode characters with zero-width-joiners between them, something like 👧👧👨👨 next to each other. See https://emojipedia.org/emoji-zwj-sequences/.

So if you measure the width of this char by char, then you get the wrong result. I think other combining characters will have the same problem.

Finally, you're definitely right that the crate should be refactored by now... It was my first larger piece of Rust code, so I'm happy to learn how to structure it better!

@Moxinilian
Copy link
Author

Moxinilian commented Apr 6, 2018

You're right to bring this up because I thought about it and forgot to mention it.
I initially thought about arabic characters that can merge and change shape. So I wanted to have the closure take an &str instead. However, chars do not have the same memory representation as str in Rust. Therefore, casting one to the other requires to reallocate memory for every character, which can be expensive (at least more expensive than what is achieved now, so I did not dare do it).

But as I'm writing this I'm realizing that we could simply stop iterating over characters and make it split words. Even better, this approach would feel very natural in a functional programming style, which is in my opinion better in general, and unless you think otherwise, would really be appropriate. But then, what is a word? Where does it begin and where does it end? Is ending punctuation such as colons or interrogation marks part of a word or separate? And how would we handle hyphenation? It raises a lot of questions that I would struggle to answer.

Maybe we could try to disallow hyphenation on custom lengths and use spaces as the only word delimiters. But then that would make more conditions and increase the code complexity, while also probably making it quite inelegant.
Tell me what you feel is the best approach so I can try to implement it.

Feel free to also check out my concerns about support of floating points in issue #126.

@Moxinilian
Copy link
Author

I'm going to revert the changes and try to implement that instead, as there would be no point in merging this now that we're going down the word route.

@Moxinilian Moxinilian closed this Apr 7, 2018
@mgeisler
Copy link
Owner

mgeisler commented Apr 7, 2018

My comments about combining characters might have been distracting and unhelpful since it's the current code that suffers from those problems -- it already works char by char and does not account for combining characters at all.

Looking at the unicode-width crate, I see that it has code like:

impl UnicodeWidthStr for str {
    fn width(&self) -> usize {
        self.chars().map(|c| cw::width(c, false).unwrap_or(0)).fold(0, Add::add)
    }
}

This means that the char by char approach is perfectly fine when we use that crate to measure the column width of the monospace characters.

However, I like the idea of making this more generally applicable, so I would welcome a refactor in the direction of scanning for split points (breakable whitespace) and then measuring the width of these words on a per-word basis. The end result should be identical to what we do today, but it should be more adaptable.

@Moxinilian
Copy link
Author

Moxinilian commented Apr 7, 2018

Alright then. I now remember why I accepted it anyway: the reason I'm working on it is Piston, and piston-graphics seems to also be fine with computing width character by character.
In fact, I was trying to find good ways to rewrite the algorithm for str, and it really was clunky. So I believe it is acceptable for now.
However, my project will need to support emojis in a way. So I guess sooner or later I will have to go back to it...

@Moxinilian Moxinilian reopened this Apr 7, 2018
@Moxinilian
Copy link
Author

I'm going to read this document to verify easy solutions have not already been found.
http://unicode.org/reports/tr14/

@mgeisler
Copy link
Owner

mgeisler commented Apr 8, 2018

I'm thinking I'll make a 0.10.0 release now (after #129 got reported) and then we can merge this in and play a bit with the API and do refactorings and whatnot for a later 0.11.0 release. Hope that seems okay!

@mgeisler mgeisler mentioned this pull request May 22, 2018
@robinkrahl
Copy link
Contributor

I’m also interested in this feature to be able to use textwrap with PDF documents (in genpdf). What is the current status of this PR? Is there anything I can do to help?

@mgeisler
Copy link
Owner

Hi @robinkrahl, I think this PR has stalled, but I would still like to see this kind of feature implemented. @Moxinilian, if you're still interested in this, then please let us know!

I recently rewrote some of the inner parts of textwrap (#213), so I think this PR should be closed and then a new one could be opened to fit with the new API. Ideally, this change should be invisible at the surface-level of the API, so that one can continue to write

let lines = wrap("some text", 80);

Perhaps the WrapOptions trait could gain a width(&str) -> T method, where T can be usize, float, or even u16 as needed (I mention u16 since the terminal_size crate measures the width in u16, which makes me think that it's completely overkill to use usize all over).

Let me close this PR for now and then we can perhaps discuss more in #126.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants