Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[expression] implement a wordwrap function (feature request #9412) #1181

Closed
wants to merge 20 commits into from

Conversation

nirvn
Copy link
Contributor

@nirvn nirvn commented Feb 16, 2014

(See http://hub.qgis.org/issues/9412 for background information)

This is a simple implementation of a wordwrap() function, very useful string function when it comes to improving auto-generated multi-line labels. The function takes three parameters: wordwrap(string,delimiter,minimum_characters_to_wrap).

Take this simple label example: "University of Technology", where a one line label might be too wide. Currently, the user can decide to wrap label on the space character, resulting in the following 3-line label:
University
of
Technology

The wordwrap function improves things as it avoids creating lines of too few characters (think 'of', 'the', 'and', etc.). The following expression wordwrap('University of Technology',' ',3) would result in the much improved 2-line label:
University
of Technology

I realize it must be way too late to be included in 2.2, yet I'd like to point out that it'd be a very nice function to offer to users sooner than later, and that the implementation is very simple and does not add any infrastructure to QGIS. It mimics the many other string functions that have been thoroughly tested.

Jury, I rest my case :)

@nirvn
Copy link
Contributor Author

nirvn commented Feb 16, 2014

@NathanW2 ping.

@wonder-sk
Copy link
Member

Few concerns :-)

  • no unit tests included (see tests/src/core/testqgsexpression.cpp)
  • no function docs included (see resources/function_help/)
  • doesn't it make more sense to specify max. characters per line (instead of min)?
  • does not take into account possible existing newlines in the input (may do word wrapping also when not necessary)
  • assumes that the delimiter is always one character long

@nirvn
Copy link
Contributor Author

nirvn commented Feb 16, 2014

@wonder-dk Thanks, those are all very sensible comments.

Re maximum vs minimum in this context, I prefer minimum. I'm open to be converted in believing otherwise but atm the cases I have in mind fit better with minimum.

Good catch re delimiter.

Regarding input string already containing newline, I had decided not to deal with this scenario failing to see when that'd be relevant. That said I could split input string on newline and add another loop if it's needed.

@nirvn
Copy link
Contributor Author

nirvn commented Feb 16, 2014

Also thanks for the speedy reply.

@wonder-sk
Copy link
Member

Min/max: when speaking about word wrapping (e.g. in text editors), the task is to wrap the text when it would not fit on one line, having a maximum line length. In theory we could have both, for example if the value is negative, it means minimum length, otherwise it would be maximum length.

Input string with newline: the current algorithm can give strange looking results e.g. when formatting a piece of text with two paragraphs - the first line of the second paragraph would be cut incorrectly.

@nirvn
Copy link
Contributor Author

nirvn commented Feb 17, 2014

@wonder-sk , I've reworked the function so it handles preexisting newline characters, and added a test in testqgsexpression.cpp. Is it going in the right direction?

Regarding the maximum vs minimum, I feel implementing a minimum number of characters is more appropriate than a number of maximum characters. A text editor word wrapping / reflow has to deal with a delimited width and an hard edge (i.e. the text box´s width). In our case, we cannot know the width (that would require knowing the font {family, size, etc.}) and aren't facing the hard edge either. Hence, I think it's much more practical to implement it the same way PHP does (see http://www.php.net/wordwrap).

As for the function help, I've added a wordwrap file into the proper directory, but it's not getting picked up when I do git status or commit. Will seek help on the QGIS channel.

@nirvn
Copy link
Contributor Author

nirvn commented Feb 17, 2014

Attaching before-after images to illustrate the wordwrap-based label improvement.

This is a automated multiline label using the space character to wrap (no wordwrap):
without_wordwrap

This is an automated multiline label using the space character to wrap after a minimum of 5 characters (using wordwrap):
with_wordwrap

@wonder-sk
Copy link
Member

(btw. you have beautiful maps!)

With the test you are on the right track - but have you tried to run the test? From a look into the code it seems it will fail (you specified just one argument to wordrap). It would be good to add at least 1-2 other tests beyond the simplest case to demonstrate the functionality in case of more lines on input or when other parameters change - so if someone changes the behavior of wordwrap it is more likely that the test will fail.

Min/max: sorry I have to disagree again :-) I tried to search around a bit and can't really find any case where the implementation would use the minimum... PHP's wordwrap uses "length" parameter as a maximum, Python's textwrap uses "width" also as a maximum. For word wrap the maximum length simply makes more sense.

For the function help, make sure to re-run CMake after adding the help file, so that it is picked up by the build system.

One final thought: it would be probably enough to have just two parameters: wordwrap(text, length) - leaving out the delimiter. Other implementations implicitly assume that delimiter is any whitespace. At some point when we will support optional parameters, we could have further optional parameters like delimiter character (regexp) and break character.

@nirvn
Copy link
Contributor Author

nirvn commented Feb 17, 2014

@wonder-sk regarding the need for a custom delimiter, it's for the function to be useful for non-Latin users. I for one have to deal with Indic-based Khmer scripts on a daily basis, and the word delimiter isn't a space, it's a specific zero-width space defined within the Khmer script Unicode range.

@nirvn
Copy link
Contributor Author

nirvn commented Feb 17, 2014

@wonder-sk alright, you've convinced me :) as of the latest commit pushed, a positive length number will treat the number as maximum (*) number of characters before wrap, and a negative number will indicate the minimum number of characters to wrap.

I've fixed the one test I had introduced and added two more. It appears fine on my machine.

As for the delimiter parameter, as mentioned above, it's needed for strings that are not latin-based. More than happy to make it optional when the expression engine supports optional arguments.

(*) a line might still exceed the maximum number of characters if it doesn't find a delimiter within the maximum number of characters. This is to insure words are not split in half. For e.g., wordwrap("university of technology",' ',3) would produce:
university
of
technology

@NathanW2
Copy link
Member

As for the delimiter parameter, as mentioned above, it's needed for strings that are not latin-based. More than happy to make it optional when the expression engine supports optional arguments.

We have support for optional, well really it's variable arguments, by telling the function it takes -1 args. The function can now take as many args as you give it. Just make sure that it's documented that first two are mandatory and the other one is not.

@nirvn
Copy link
Contributor Author

nirvn commented Feb 18, 2014

@NathanW2 , @wonder-sk , per discussion, I've made the delimiter argument optional.

@leyan
Copy link
Contributor

leyan commented Feb 18, 2014

I agree with nirvn that here the minimum is more useful that a maximum length. We are in a special case where the total width is not known (so no hard limit there), but is pretty narrow. A maximum length makes sense when the length of the words is short relative to the length of the line, so we have to group several words each line. However, this is not really the case in labels, the main issue is to deal with lines with too few characters. The example map given by nirvn where he makes sure short words are not on a line of their own in between long words seems much more useful in real situation.

@nirvn
Copy link
Contributor Author

nirvn commented Feb 18, 2014

@leyan the implementation now offers both styles. A positive wrap value will behave like python & php textwrap / wordwrap while a negative value will allow for a "minimum to wrap" behavior.

@wonder-sk
Copy link
Member

Final notes:

  • the function should check also for max. number of arguments (3)
  • code style: the code should be formatted according to QGIS standards - use scripts/prepare_commit.sh
  • code style: opening curly braces should go to new line

Otherwise I think it looks good!

@nirvn
Copy link
Contributor Author

nirvn commented Feb 20, 2014

Alright, seems I've managed to mess up the pull request commits. Will clean this up in the next 24 hours, sorry.

@nirvn
Copy link
Contributor Author

nirvn commented Feb 21, 2014

Closing as I made a mess of this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants