Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: support breaking words containing hyphen - #6014

Merged
merged 11 commits into from Apr 20, 2023
Merged

Conversation

ad1992
Copy link
Member

@ad1992 ad1992 commented Dec 20, 2022

We currently treat words containing "-" as a single word but CSS wrapping algorithm allows breaking words containing "-" eg "non-profit" hence I have updated the algorithm to achieve the same behaviour

Screen.Recording.2022-12-20.at.1.19.46.PM.mov

@vercel
Copy link

vercel bot commented Dec 20, 2022

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
excalidraw ✅ Ready (Inspect) Visit Preview 💬 Add feedback Apr 19, 2023 0:39am
excalidraw-package-example ✅ Ready (Inspect) Visit Preview Apr 19, 2023 0:39am
1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
docs ⬜️ Ignored (Inspect) Visit Preview Apr 19, 2023 0:39am

src/element/textElement.ts Outdated Show resolved Hide resolved
if (words.length > 1) {
words.forEach((word, index) => {
if (index !== words.length - 1) {
words[index] = word += "-";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we keeping the -? E.g. she-wolf will be turned to she- wolf. Is it to keep the - to contribute to width calculation? Will not the extra spaces be similarly problematic?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For splitting and joining we use the space delimeter.
There is no extra space which gets added since we are splitting and joining via space delimeter at the end. so she-wolf -> ["she-", "wolf"]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To simplify there could be multiple delimeters for splitting but uiltimately we need the list of words on which we run the wrap algo hence we split by that token (eg hyphen) -> join by space -> split by space so space is the final delimeter which we were and are using to split and join so tomorrow another token if we have to introduce -> we split by that token -> join by space -> finally split by space to have the right list of words

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just have a single parseTokens function for both?

const parseTokens = (text: string) => {
  const tokens = [];
  let buffer = '';

  for (let i = 0; i < text.length; i++) {
    const char = text[i];

    if (char === ' ' || char === '-') {
      if (char === '-') {
        buffer += char;
      }
      if (buffer.length > 0) {
        tokens.push(buffer);
        buffer = '';
      }
    } else {
      buffer += char;
    }
  }

  if (buffer.length > 0) {
    tokens.push(buffer);
  }

  return tokens;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to make it clear

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made the changes, let me know if its good now

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't say it's less readable because (especially if commented). And it'll be way faster. Also, the algo you're using isn't quite correct rn: it splits wolf- into ["wolf-", ""].

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly thats how split function behaves when the last char is the split delimeter.
I can though remove it, but that shouldn't cause any issue as its taken care in the algo, or do you mean you are seeing any issue ?
As per rewrite I still feel its making it complex / less readable specially for words with no hyphens at all => which is just split(" ")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly thats how split function behaves when the last char is the split delimeter.

It may not cause issues now, but may in the future. Best to make the parser behave correctly (and which will also be faster) from the start and avoid potential issues later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To summarise as discussed in chat - here is a quick perf testing - looks like my algo is faster than the for loop (Small is mine, long is for loop)
image

Tried with some random large testcase (prefilled in CSB),
If anyone wants to test it out https://6w38zw.csb.app/

As per the empty string getting added - right now it won't matter as adding empty string will not make any difference but I agree that we should remove the empty token (without compromising the readability) so algo is more accurate.

@ad1992 ad1992 merged commit 5ddb28d into master Apr 20, 2023
7 checks passed
@ad1992 ad1992 deleted the aakansha-wrap-text branch April 20, 2023 05:40
alswl pushed a commit to alswl/excalidraw that referenced this pull request Nov 15, 2023
* fix: support breaking words containing hyphen -

* fix

* add spec

* fix

* fix

* fix

* fix and add spec

* improve code and add more tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

None yet

2 participants