Skip to content

fix for FS#2676, inserting zero length spaces into long sequences of non...#165

Merged
Chris--S merged 2 commits intodokuwiki:masterfrom
Chris--S:FS#2676
Feb 3, 2013
Merged

fix for FS#2676, inserting zero length spaces into long sequences of non...#165
Chris--S merged 2 commits intodokuwiki:masterfrom
Chris--S:FS#2676

Conversation

@Chris--S
Copy link
Collaborator

...-breaking characters in diffs

post process the html content string returned by Diff->format to locate long, unbroken strings of characters. Examine those strings and insert zero length (zl) spaces after certain characters (e.g. /#!,:;). When there are sequences of the 'special' characters only insert the zl space after the last character in the sequence.

Also, don't modify content within html tags and keep html entities together.

@michitux
Copy link
Collaborator

Should this be used in the diff mails, too? Or are (possibly mobile) mail clients better at that?

inc/html.php Outdated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that entities in the form &#xHEX; (where HEX is a hex value) are valid, too.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for simplicity, do you think changing to my later simplified pattern?

&#?\w{1,4};

I don't think its a good idea to make it overly complicated or accurate. I think its ok to catch more than the set of valid html entities. So saying, do any have more than 4 chars?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, many, have a look at http://htmlentities.com/html/entities/

@selfthinker
Copy link
Collaborator

Although the original was about URLs because URLs are by far the longest string, I wonder if it also makes sense to do something about other potentially long strings. E.g. _ could be in long variable names or - could be in long product numbers...

@Chris--S
Copy link
Collaborator Author

The fix is for long strings, long being 12 characters without a breaking character.

@selfthinker
Copy link
Collaborator

Yes, I get that. But because the idea came because of URLs, we only looked for typical characters in URLs to break a string. That's why we didn't think of - or _.

@Chris--S
Copy link
Collaborator Author

I don't think those two characters should be followed by zero length spaces as they tend to indicate full words. They could be followed by ­ (is that necessary for '-').

Thinking out loud ... we could do a second parse for long unbroken strings looking for '-' after the first. That would avoid breaking at '-' and '_' except when they were involved in long strings without the other break characters.

@selfthinker
Copy link
Collaborator

Using ­ for - and _ would be fine with me as well. Maybe we should leave it as it is and only add other characters when we encounter problems with words including them.

@splitbrain
Copy link
Collaborator

I'd say let's keep it simple for now.

Would a shy add a hyphen when the browser wraps it? If yes, I'd not use that for diffs as an additional character might be confusing.

Chris--S added a commit that referenced this pull request Feb 3, 2013
fix for FS#2676, inserting zero length spaces into long sequences of non...
@Chris--S Chris--S merged commit 1061759 into dokuwiki:master Feb 3, 2013
splitbrain added a commit that referenced this pull request Apr 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants