doc(inline/iter_strings_lossy): describe different behaviors #52

wongjiahau · 2023-10-11T02:03:15Z

Preface

When I copied the code from the terminal-inline example, I expected it to work as shown in the README Screenshot:

However, when I copied the code to my own project, it behaved differently. Specifically, the word was split by whitespace instead of non-alphanumeric characters.

It took me a while to figure out that the culprit was the unicode flag, which I did not enable in my project.

Reference

similar/src/text/abstraction.rs

Lines 68 to 74 in 47f4625

    
           /// Tokenizes into unicode words. 
        
           #[cfg(feature = "unicode")] 
        
           fn tokenize_unicode_words(&self) -> Vec<&Self>; 
        
           /// Tokenizes into unicode graphemes. 
        
           #[cfg(feature = "unicode")] 
        
           fn tokenize_graphemes(&self) -> Vec<&Self>;

wongjiahau added 2 commits October 11, 2023 09:59

doc(inline/iter_strings_lossy): describe different behaviors

70115eb

doc(inline/iter_strings_lossy): corrected examples

31ecc2c

mitsuhiko approved these changes Oct 12, 2023

View reviewed changes

mitsuhiko merged commit 2b31f65 into mitsuhiko:main Oct 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

doc(inline/iter_strings_lossy): describe different behaviors #52

doc(inline/iter_strings_lossy): describe different behaviors #52

wongjiahau commented Oct 11, 2023

	/// Tokenizes into unicode words.
	#[cfg(feature = "unicode")]
	fn tokenize_unicode_words(&self) -> Vec<&Self>;

	/// Tokenizes into unicode graphemes.
	#[cfg(feature = "unicode")]
	fn tokenize_graphemes(&self) -> Vec<&Self>;

doc(inline/iter_strings_lossy): describe different behaviors #52

doc(inline/iter_strings_lossy): describe different behaviors #52

Conversation

wongjiahau commented Oct 11, 2023

Preface

Reference