Make word separators and splitters more flexible #402

robinkrahl · 2021-07-01T10:12:44Z

This PR makes the word separators and splitters more flexible to allow users to use their own words types. It also adds an example that shows how to use textwrap with custom word types, namely with styled strings.

As this is only a draft, I did not add much documentation.

The basic changes are:

For word separation, introduce a new method that just returns the range of the detected words. Users can then apply their own logic for creating words from these ranges.
For word splitting, introduce a new Fragments iterator struct that yields the fragments for a word. It turned out to be easier to introduce a Splittable trait than to use a closure to perform the splitting, but this could also be changed. I used a new struct for the iterator because that makes it much easier to keep track of the life times.

This patch adds the find_word_ranges method to the WordSeparator struct that makes it possible to find words within a string without using textwrap’s Word type. This is especially useful when using custom types for strings.

This patch adds the word_splitters::Fragments struct that yields the fragments for a word. This makes it easier to reason over the lifetimes of the generated iterator and allows us to make it generic over the word type.

This patch introduces the new word_splitters::Splittable trait and makes word_splitters::Fragments generic over that trait. This allows library users to use their own fragment types and not only core::Word.

robinkrahl · 2021-07-01T16:27:47Z

src/word_splitters.rs

+#[derive(Debug)]
+pub struct Fragments<W: Splittable, I: Iterator<Item = usize>> {
+    word: W,
+    split_points: I,


This type parameter is unnecessary, we can directly use std::vec::IntoIter<usize>.

mgeisler · 2021-07-03T20:30:10Z

Hey @robinkrahl, I just wanted to say that I think this looks really nice!

One thought: would it make sense to combine Splittable with Fragment and thus say that any type which a Fragment should have a way of splitting itself? The trait could have a default implementation which doesn't do any splitting.

So far, my thinking has been that fragments contain just the necessary information for us to be able to wrap them nicely with the two wrapping algorithms. That is, a fragment has a width and whitespce/penalty following it. However, since the proposed wrapping "pipeline" also involves finding and splitting the fragments (Words), perhaps it would be much nicer to combine the whole thing into one.

Just a quick idea... I'll give it a proper look tomorrow.

robinkrahl · 2021-07-04T06:23:38Z

Thanks for the update!

One thought: would it make sense to combine `Splittable` with `Fragment` and thus say that any type which a `Fragment` should have a way of splitting itself? The trait could have a default implementation which doesn't do any splitting.

Good point. I had a similar thought. The reason why I didn’t want to combine these two is that the width computation is not required for the word splitting, it might be expensive and might require more information than just the string and the style. Therefore, users might want to have separate types for a splittable word and a fragment with width calculations that can also be cached. (Even the current implementation for core::Word currently performs unnecessary width calculations for words that are split up later, though the cost is probably negligible.) In my use case, I have to look up the glyphs in the font data to calculate their widths. Therefore, my pipeline currently is: separate and split words using string slices (→ StyledWord), calculate their widths and copy the strings (→ StyledFragment), cache the result and then run the wrap algorithm. Of course this could also be implemented with a single struct, but it would have to keep some state. These are no big issues, so if you prefer a single trait, I could live with that too.

Configure required features for style example

robinkrahl · 2021-07-04T07:11:36Z

src/word_splitters.rs

+            },
+            penalty: if keep_ending {
+                self.word.penalty
+            } else if word.ends_with('-') {


This condition should be negated – fixed in the next commit.

mgeisler · 2021-07-04T11:28:35Z

Therefore, users might want to have separate types for a splittable word and a fragment with width calculations that can also be cached.

Yes, I can see why that could be useful for users of the library. On the other hand, I wrote a script to download every reverse dependency, and I cannot find any public usages of textwrap::core, except for Clap which uses textwrap::core::display_width. The more advanced functions seems unused except by my own examples — and perhaps genpdf at some point 😆

Basically, we seem to have pretty free hands in how we structure the internals, while still making a 0.15.0 release a drop-in replacement for everybody. If two traits makes your life easier, than I'm happy to support it.

(Even the current implementation for core::Word currently performs unnecessary width calculations for words that are split up later, though the cost is probably negligible.)

You're right and I remember toying with the idea of having an UnmeasuredWord struct back in the day. In the end I figured it was enough with Word and then measure the text twice. The difference in performance is about 7% on a quick test I just did — but given that it takes around 260 microseconds to wrap a 4800 character long line on my laptop, I think the performance is good enough, even for realtime wrapping in a GUI or similar.

These are no big issues, so if you prefer a single trait, I could live with that too.

No, let's go with two since it seems to better model the information necessary at different stages. Would it perhaps make sense to have

SplittableFragment: an abstract piece of text from which you can get a sub-fragment (your Splittable trait)
MeasuredFragment: an abstract piece of text which known how big it is and which we can word wrap (the existing Fragment trait)

I'm thinking names like that might help to tie the two concepts together, while also being more descriptive. Please let me know what you think!

robinkrahl · 2021-07-04T11:50:24Z

SplittableFragment: an abstract piece of text from which you can get a sub-fragment (your Splittable trait)

MeasuredFragment: an abstract piece of text which known how big it is and which we can word wrap (the existing Fragment trait)

I'm thinking names like that might help to tie the two concepts together, while also being more descriptive. Please let me know what you think!

Sounds good!

mgeisler · 2021-07-04T20:34:50Z

SplittableFragment: an abstract piece of text from which you can get a sub-fragment (your Splittable trait)

MeasuredFragment: an abstract piece of text which known how big it is and which we can word wrap (the existing Fragment trait)

I'm thinking names like that might help to tie the two concepts together, while also being more descriptive. Please let me know what you think!

Sounds good!

Okay, cool — let's do that independently of this PR.

mgeisler · 2021-07-05T15:31:51Z

src/word_splitters.rs

+        let len = self.word.as_ref().len();
+        if self.prev < len || self.prev == 0 {
+            let w = self.word.split(self.prev..len, true);
+            // TODO: shouldn’t this be just len?


Sorry, I should have added a comment to explain this: with just

self.prev = len;

you get an infinite loop when len == 0 since self.prev == 0 stays true after every call to next.

If you remove the +1 on master, you'lll see that cargo test hangs.

I see, thanks. Will remove the TODO.

mgeisler · 2021-07-05T15:35:38Z

examples/style.rs

+    }
+}
+
+impl textwrap::word_splitters::Splittable for StyledWord<'_> {


It's a bit of a shame that the example doesn't show why there are two traits :-) I love the example in itself, it's super great at demonstrating the concept of wrapping not-just-plain-text. However, it would be nice if it would exploit the two traits better.

Will you be having different structs in genpdf, one for Fragment and another for Splittable? if not, then I would prefer to keep the number of concepts low and add a split method to Fragment.

Ah, sorry, you already explained that you have a StyledWord struct for the unmeasured case and a StyledFragment for the pre-split and measured words.

Exactly, the distinction is especially relevant if the width computation is non-trivial, which is typically the case for scenarios other than the terminal. I could add an example that produces a PDF file, but I think that would be too complex to be useful as an example for textwrap. Maybe we can have an example that produces an SVG image?

You're right that a full-blown PDF seems unnecessary — could you instead pretend that you need two structs in the style example? I believe you're using len() on the strings, which is cheating ever so slightly :-)

It might look a bit arbitrary, but for educational purposes, I think we're allowed to exaggerate.

Yeah, we can do that.

This name better matches the new `SplittableFragment` trait in #402.

robinkrahl added 4 commits July 1, 2021 10:13

Add WordSeparator::find_word_ranges method

e6b7c3d

This patch adds the find_word_ranges method to the WordSeparator struct that makes it possible to find words within a string without using textwrap’s Word type. This is especially useful when using custom types for strings.

Introduce word_splitters::Fragments struct

637809c

This patch adds the word_splitters::Fragments struct that yields the fragments for a word. This makes it easier to reason over the lifetimes of the generated iterator and allows us to make it generic over the word type.

Make word_splitters::Fragments generic over new trait

14fa737

This patch introduces the new word_splitters::Splittable trait and makes word_splitters::Fragments generic over that trait. This allows library users to use their own fragment types and not only core::Word.

Add styled example

a105b03

robinkrahl force-pushed the generic branch from b898eb5 to a105b03 Compare July 1, 2021 10:13

robinkrahl commented Jul 1, 2021

View reviewed changes

fixup! Add styled example

e992b18

Configure required features for style example

robinkrahl commented Jul 4, 2021

View reviewed changes

mgeisler reviewed Jul 5, 2021

View reviewed changes

mgeisler added a commit that referenced this pull request Jul 6, 2021

Rename Fragment to MeasuredFragment

9d25e28

This name better matches the new `SplittableFragment` trait in #402.

mgeisler mentioned this pull request Jul 6, 2021

Rename Fragment to MeasuredFragment #406

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make word separators and splitters more flexible #402

Make word separators and splitters more flexible #402

robinkrahl commented Jul 1, 2021

robinkrahl Jul 1, 2021

mgeisler commented Jul 3, 2021

robinkrahl commented Jul 4, 2021 via email

robinkrahl Jul 4, 2021

mgeisler commented Jul 4, 2021

robinkrahl commented Jul 4, 2021 •

edited

mgeisler commented Jul 4, 2021

mgeisler Jul 5, 2021 •

edited

mgeisler Jul 5, 2021

robinkrahl Jul 5, 2021

mgeisler Jul 5, 2021

mgeisler Jul 5, 2021

robinkrahl Jul 5, 2021

mgeisler Jul 5, 2021

robinkrahl Jul 5, 2021

Make word separators and splitters more flexible #402

Are you sure you want to change the base?

Make word separators and splitters more flexible #402

Conversation

robinkrahl commented Jul 1, 2021

Choose a reason for hiding this comment

mgeisler commented Jul 3, 2021

robinkrahl commented Jul 4, 2021 via email

Choose a reason for hiding this comment

mgeisler commented Jul 4, 2021

robinkrahl commented Jul 4, 2021 • edited

mgeisler commented Jul 4, 2021

mgeisler Jul 5, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robinkrahl commented Jul 4, 2021 •

edited

mgeisler Jul 5, 2021 •

edited