-
-
Notifications
You must be signed in to change notification settings - Fork 6.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: support breaking words containing hyphen - #6014
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
1 Ignored Deployment
|
d847c74
to
d74ce94
Compare
if (words.length > 1) { | ||
words.forEach((word, index) => { | ||
if (index !== words.length - 1) { | ||
words[index] = word += "-"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we keeping the -
? E.g. she-wolf
will be turned to she- wolf
. Is it to keep the -
to contribute to width calculation? Will not the extra spaces be similarly problematic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For splitting and joining we use the space delimeter.
There is no extra space which gets added since we are splitting and joining via space delimeter at the end. so she-wolf
-> ["she-", "wolf"]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To simplify there could be multiple delimeters for splitting but uiltimately we need the list of words on which we run the wrap algo hence we split by that token (eg hyphen) -> join by space -> split by space so space
is the final delimeter which we were and are using to split and join so tomorrow another token if we have to introduce -> we split by that token -> join by space -> finally split by space to have the right list of words
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just have a single parseTokens
function for both?
const parseTokens = (text: string) => {
const tokens = [];
let buffer = '';
for (let i = 0; i < text.length; i++) {
const char = text[i];
if (char === ' ' || char === '-') {
if (char === '-') {
buffer += char;
}
if (buffer.length > 0) {
tokens.push(buffer);
buffer = '';
}
} else {
buffer += char;
}
}
if (buffer.length > 0) {
tokens.push(buffer);
}
return tokens;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to make it clear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have made the changes, let me know if its good now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't say it's less readable because (especially if commented). And it'll be way faster. Also, the algo you're using isn't quite correct rn: it splits wolf-
into ["wolf-", ""]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sadly thats how split
function behaves when the last char is the split delimeter.
I can though remove it, but that shouldn't cause any issue as its taken care in the algo, or do you mean you are seeing any issue ?
As per rewrite I still feel its making it complex / less readable specially for words with no hyphens at all => which is just split(" ")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sadly thats how split function behaves when the last char is the split delimeter.
It may not cause issues now, but may in the future. Best to make the parser behave correctly (and which will also be faster) from the start and avoid potential issues later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To summarise as discussed in chat - here is a quick perf testing - looks like my algo is faster than the for loop (Small is mine, long is for loop)
Tried with some random large testcase (prefilled in CSB),
If anyone wants to test it out https://6w38zw.csb.app/
As per the empty string getting added - right now it won't matter as adding empty string will not make any difference but I agree that we should remove the empty token (without compromising the readability) so algo is more accurate.
* fix: support breaking words containing hyphen - * fix * add spec * fix * fix * fix * fix and add spec * improve code and add more tests
We currently treat words containing "-" as a single word but CSS wrapping algorithm allows breaking words containing "-" eg "non-profit" hence I have updated the algorithm to achieve the same behaviour
Screen.Recording.2022-12-20.at.1.19.46.PM.mov