Soft word wrapping #948

proskur1n · 2021-05-09T21:52:43Z

Hello, this pull request adds soft word wrapping (linebreak in vim) support to vis. In addition, it defines an option to change the visual line length instead of always using window width for the maximum line length. The latter was made trivial by my code rewrite.

Rationale

It is a lot of pain to write LaTeX and Markdown documents without proper word wrapping support in the editor. This option is something that makes your life much easier and is supported by nearly all other text editors. In fact, a similar functionality was requested by #142, but on the contrary to that proposal, this pull request only affects the way lines are displayed on screen and doesn't modify the underlined character data.

New options

breakat / brk specifies a list of characters after which the editor may decide to wrap the line before window width is reached. The default value for this option is an empty string which corresponds to the old behavior of vis. In this case, cells are just printed one after another until window width is reached.
wrapcolumn / wc affects the maximum displayed line length and is especially useful for large monitors. Zero is the default for this option and thus disables this feature. I didn't exactly know how to name this option, but "wrapcolumn" sounds pretty good to me.

I have included a little screenshot with both of the new options enabled below.

:set wrapcolumn 80
:set breakat ' .!?'

Todo

I have tried this feature with a couple of different text files and couldn't see any problems with my implementation. However, feel free to test this pull request on some strange edge cases, which may cause something to break. Anyway, my intention was to not change the default look of vis, but I had to rewrite the view_addch function in view.c, which may have caused some unintentional side effects.
Right now, breakat only supports ASCII characters, but it shouldn't be too difficult to extend my code to handle any UTF-8 characters. In fact, vim implementation of breakat also only supports ASCII.

ninewise

Got a bit confused by the removal of the default case, which is now the 'fallthrough' path. Looks good to me over all.

mcepl · 2021-08-22T17:28:40Z

Works for me, just it could be emphasized somewhere, that settings should go to vis.events.WIN_OPEN handler, not (more usual, I guess) vis.events.INIT one.

sh4r1k7 · 2022-06-01T09:13:05Z

Works great but I found one edge case:

The underline styling carries through the wrap-gap. My guess is this should be easy to clear/restore while rendering the gaps, but it's not a big deal either way (I don't think your PR introduced this behaviour, but is relevant).

erf · 2022-09-03T22:46:24Z

Anything stopping this from being merged? Seem like a good candidate

whiteinge · 2022-09-23T05:47:55Z

I pulled this down locally and it works for me. This would be a nice addition. 👍

@proskur1n I'm curious if this technique could also be used to mimic nowrap. I often open structured data files (commonly JSON but many others as well) where line lengths are thousands of characters long but I only care to see the beginning of each line/record. (I don't need horizontal scrolling, I just want to toggle wrapping so I can see individual line numbers without the noise.)

I was hoping this PR might let me set wrapcolumn to a value larger than $COLUMNS as a hack for my use-case. That doesn't work which makes complete sense -- it's a weird idea to wrap text off-screen after all. That said, would it be possible to use the same technique in this PR to disable wrapping altogether and introduce a nowrap setting?

proskur1n · 2022-09-26T18:54:09Z

@erf I marked this pull request as a draft because there were some segfaults when scrolling through binary files. Those should be fixed by aa54c37 now. However, it may also have introduced other bugs :)

@whiteinge I don't think this is possible at the moment. As far as I know, if you simply skip some characters when rendering, the navigation will stop working properly. It's actually a miracle that soft wrapping works at all with so few changes to the code.

On a side note, I was thinking about dropping support for breakat and simply wrapping on isspace(...) instead (With a boolean :set option to enable it). Are there any cases when you would want to wrap the lines on something else instead?

rnpnr

Hi, thanks for the patch! I think this is pretty much ready to merge
there are just a few points inline.

I was thinking about dropping support for breakat and simply wrapping on isspace(...)
Are there any cases when you would want to wrap the lines on something else instead?

Yes, sometimes I edit Japanese text in Vis and there are no spaces in
Japanese. Instead its better to break on characters such as 。、！？
or other such wide characters. I'm sure there are other people editing
Japanese or similar languages.

I did test this use case and it works great!

rnpnr · 2023-07-27T02:58:16Z

view.c

@@ -4,6 +4,7 @@
 #include <ctype.h>
 #include <errno.h>
 #include <limits.h>
+#include <assert.h>


Please remove this and associated calls. assert() isn't used anywhere
else in vis and I would rather not introduce it.

I have removed them in my last commit. I don't understand your reasoning though, as asserts are useful for debugging and vis is already configured to remove them from release builds. I originally inserted those asserts there because they would have caught a weird segfault I had to debug earlier.

gdb can also catch segfaults. The problem with assert() is that its
implemented as a macro and can have side effects when used incorrectly.
In general its usage also makes me think the author doesn't understand
the code they have written. There was nothing wrong with your usage but
I don't want to open that can of worms.

rnpnr · 2023-07-27T03:08:34Z

view.c

-		for (int i = view->col + 1; i < view->width; i++)
-			view->line->cells[i] = view->cell_blank;
+static bool view_add_cell(View *view, const Cell *cell) {
+	/* at most one iteration most of the time */


Maybe:

Suggested change

/* at most one iteration most of the time */

/* one iteration most of the time */

It should be "at most", because this loop is only executed if the line has to be wrapped. In most cases, the while-loop body is not entered at all.

rnpnr · 2023-07-27T03:09:30Z

view.c

+	for (int i = 0; i < cell->width; ++i) {
+		assert(view->col < view->width);
+		if (i == 0) {
+			view->line->cells[view->col++] = *cell;


Can this first iteration go outside the loop like it was before?

Yes, i think so. To my understanding, valid cells cannot have width == 0

cell_unused actually does have width == 0 but it is only used inside this function. So it should be fine. Leaving this code as-is would make it a little bit more robust though (no-op for invalid and empty cells).

proskur1n · 2023-07-27T21:22:55Z

Should we rename breakat to wrapat ? So that they are displayed next to each other in :help.

rnpnr

Sorry for being a stickler about that comment but there were a lot of
changes that seemed somewhat arbitrary in the "Fix segfault on some binary
files" commit and that is one of them. I just want to make everything
there is correct before I merge this.

rnpnr · 2023-07-27T23:43:21Z

view.c

+static bool view_add_cell(View *view, const Cell *cell) {
+	/* at most one iteration most of the time. we have to use a while-loop
+	 * here because of some pathological edge cases where one unicode char
+	 * may be bigger than the (extremely small) terminal width. */


This still doesn't make sense to me. It can be either of "at most one
iteration" in which case the loop isn't needed OR "one iteration most
of the time" which you are suggesting is the case. But now the comment
makes it seem like a pointless endeavor because no one is using vis in
a terminal that is less than 1 (or even 10) character(s) wide.

There are three cases:

Zero iterations: We are somewhere in the middle of the line. We have enough space to insert the next character and don't need to wrap the current line.

One iteration: Our terminal is 80 characters wide. We are inserting the 81st character. The while-loop runs once and wraps the current line. We can now insert another 80 (single width) characters until next line wrap.

Edge case: We have a one character wide terminal window and try to insert a double unicode. In this case the while-loop runs for every remaining line on the screen until no lines are left and we hit the return false branch. The editor obviously wont be able to render the text properly in this case, but it also won't crash. Calls to view_wrap_line in a loop make sure that the editor still does all the necessary housekeeping like setting the line number, replacing characters with empty cells and etc.

Sadly, I do not remember exactly what the problem was but it was something along those lines:

In my first version view_add_cell and view_wrap_line were implemented as mutually recursive functions. It didn't cause any problems in the most cases. However, sometimes after a line wrap there still wasn't enough space to insert the next character and the two functions were calling each other indefinitely. In aa54c37 I have removed the recursion entirely and added a while-loop to account for the edge case of an extremely small terminal window.

On second thought, I think it wasn't really the recursion itself causing the segfault, but rather a null pointer somewhere as a result of the mutual recursion. Anyway, no recursion -> no problems :)

recursion

Ahhh, I see now. I didn't catch that that was what you removed. If thats
the case it was probably running out of stack space. Let me think about
how to reword it a bit more and I will update the comment it before
merging.

Also that probably means that it wasn't an issue with your patch and
actually a preexisting issue. I'll make sure to keep it as a separate commit.

Also that probably means that it wasn't an issue with your patch

No, it was actually a problem I introduced in one of the first commits in this PR and not an preexisting issue. Both functions don't exist in master at all.

Feel free to reword my comments.

It doesn't segfault anymore after the recursion got removed, but there is still an out-of-bounds write without the while-loop in view->line->cells[view->col++] = cell_unused;

We cannot replace while with if here.

How about this instead:

static bool view_add_cell(View *view, const Cell *cell) { - /* at most one iteration most of the time. we have to use a while-loop - * here because of some pathological edge cases where one unicode char - * may be bigger than the (extremely small) terminal width. */ - while (view->col + cell->width > view_max_text_width(view)) { + if (view->col + cell->width > view_max_text_width(view)) { view_wrap_line(view); if (!view->line) return false; } view->line->width += cell->width; view->line->len += cell->len; - view->line->cells[view->col++] = *cell; - for (int i = 1; i < cell->width; ++i) { - /* set cells of a character which uses multiple columns */ + if (cell->width > 0) + view->line->cells[view->col++] = *cell; + /* set cells of a character which uses multiple columns */ + for (int i = 1; i < cell->width; i++) view->line->cells[view->col++] = cell_unused; - } - return true; }

Because for someone just reading over the code its really not clear how
the while loop fixes anything. This diff accounts for the only case the
previous version of the for loop covered.

This doesn't fix the out-of-bounds write. When the terminal window is resized to be one character wide then view->line->cells will only have space for a single character.

This line will not produce an out-of-bounds write. view->col will be 0 after a resize. So this array access is valid.

if (cell->width > 0) view->line->cells[view->col++] = *cell;

This for-loop is the problem in this case. If cell->width is for example two then we will write to view->line->cells[1] which is out-of-bounds for an array with only 1 element.

for (int i = 1; i < cell->width; i++) view->line->cells[view->col++] = cell_unused;

We should just keep the while-loop instead of removing it for doubtful gains and introducing new bugs in the process...

Ok then the comment should explain this. Right now the code doesn't
make sense and the comment doesn't give any insight into what is going
on and just leads to confusion.

How about this:

static bool view_add_cell(View *view, const Cell *cell) { - /* at most one iteration most of the time. we have to use a while-loop - * here because of some pathological edge cases where one unicode char - * may be bigger than the (extremely small) terminal width. */ + /* if the terminal is resized to a single (ASCII) char an out + * of bounds write could be performed for a multiwide char. + * this can be caught by iterating through the lines with + * view_wrap_line() until no lines remain. + */ while (view->col + cell->width > view_max_text_width(view)) { view_wrap_line(view); if (!view->line)

I am fine with that 👍 . I would just mention that "at most one iteration most of the time". It would make it clear that this is an inexpensive loop.

rnpnr · 2023-07-27T23:52:39Z

view.c

@@ -4,6 +4,7 @@
 #include <ctype.h>
 #include <errno.h>
 #include <limits.h>
+#include <assert.h>


gdb can also catch segfaults. The problem with assert() is that its
implemented as a macro and can have side effects when used incorrectly.
In general its usage also makes me think the author doesn't understand
the code they have written. There was nothing wrong with your usage but
I don't want to open that can of worms.

rnpnr · 2023-07-27T23:59:38Z

Should we rename breakat to wrapat ? So that they are displayed next to each other in :help.

No I think breakat makes more sense and is consistent with other editors.

rnpnr

Everything looks good now. Give me a little bit to squash some commits
and reword some comments and I'll merge it.

Also I was just doing some quick research and it seems like not even
neovim supports arbitrary UTF-8 in the breakat option so great work!

this is contolled by the wrapcolumn/wc and breakat/brk options related martanne#142: Word wrap and line breaks related martanne#932: Vis for Prose? related martanne#1092: Disabling line wrapping

rnpnr · 2023-07-28T18:52:13Z

The two commits I just pushed are what I will merge shortly. Below is
the diff between what I pushed and what was here before (for posterity):

diff --git a/view.c b/view.c
index ebdb876..0b7ce36 100644
--- a/view.c
+++ b/view.c
@@ -178,9 +178,9 @@ static int view_max_text_width(const View *view) {
 }
 
 static void view_wrap_line(View *view) {
+	Line *wrapped_line = view->line;
 	int col = view->col;
 	int wrapcol = (view->wrapcol > 0) ? view->wrapcol : view->col;
-	Line *wrapped_line = view->line;
 
 	view->line = view->line->next;
 	view->col = 0;
@@ -197,7 +197,7 @@ static void view_wrap_line(View *view) {
 		}
 	}
 
-	/* clear remaining of line */
+	/* clear remaining cells on line */
 	for (int i = wrapcol; i < view->width; ++i) {
 		if (i < col) {
 			wrapped_line->width -= wrapped_line->cells[i].width;
@@ -208,9 +208,11 @@ static void view_wrap_line(View *view) {
 }
 
 static bool view_add_cell(View *view, const Cell *cell) {
-	/* at most one iteration most of the time. we have to use a while-loop
-	 * here because of some pathological edge cases where one unicode char
-	 * may be bigger than the (extremely small) terminal width. */
+	/* if the terminal is resized to a single (ASCII) char an out
+	 * of bounds write could be performed for a wide char. this can
+	 * be caught by iterating through the lines with view_wrap_line()
+	 * until no lines remain. usually 0 or 1 iterations.
+	 */
 	while (view->col + cell->width > view_max_text_width(view)) {
 		view_wrap_line(view);
 		if (!view->line)
@@ -220,11 +222,9 @@ static bool view_add_cell(View *view, const Cell *cell) {
 	view->line->width += cell->width;
 	view->line->len += cell->len;
 	view->line->cells[view->col++] = *cell;
-	for (int i = 1; i < cell->width; ++i) {
-		/* set cells of a character which uses multiple columns */
+	/* set cells of a character which uses multiple columns */
+	for (int i = 1; i < cell->width; i++)
 		view->line->cells[view->col++] = cell_unused;
-	}
-
 	return true;
 }

ninewise approved these changes May 11, 2021

View reviewed changes

proskur1n marked this pull request as draft November 25, 2021 10:32

proskur1n mentioned this pull request Nov 25, 2021

vis crashes quite often on Ctrl-C #988

Closed

mcepl mentioned this pull request Dec 31, 2021

Vis for Prose? #932

Closed

proskur1n marked this pull request as ready for review September 26, 2022 18:56

rnpnr requested changes Jul 27, 2023

View reviewed changes

rnpnr reviewed Jul 27, 2023

View reviewed changes

rnpnr approved these changes Jul 28, 2023

View reviewed changes

proskur1n added 2 commits July 28, 2023 12:44

view: refactor view_addch

1a81e09

view.c: add word wrapping

5d7d62c

this is contolled by the wrapcolumn/wc and breakat/brk options related martanne#142: Word wrap and line breaks related martanne#932: Vis for Prose? related martanne#1092: Disabling line wrapping

rnpnr force-pushed the wrap-words branch from 931658f to 5d7d62c Compare July 28, 2023 18:49

rnpnr merged commit 5d7d62c into martanne:master Jul 28, 2023
24 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Soft word wrapping #948

Soft word wrapping #948

proskur1n commented May 9, 2021 •

edited

Loading

ninewise left a comment

mcepl commented Aug 22, 2021

sh4r1k7 commented Jun 1, 2022

erf commented Sep 3, 2022

whiteinge commented Sep 23, 2022

proskur1n commented Sep 26, 2022 •

edited

Loading

rnpnr left a comment

rnpnr Jul 27, 2023

proskur1n Jul 27, 2023

rnpnr Jul 27, 2023

rnpnr Jul 27, 2023

proskur1n Jul 27, 2023

rnpnr Jul 27, 2023

proskur1n Jul 27, 2023

proskur1n Jul 27, 2023

proskur1n commented Jul 27, 2023

rnpnr left a comment

rnpnr Jul 27, 2023

proskur1n Jul 28, 2023 •

edited

Loading

proskur1n Jul 28, 2023

rnpnr Jul 28, 2023

proskur1n Jul 28, 2023 •

edited

Loading

proskur1n Jul 28, 2023

rnpnr Jul 28, 2023

proskur1n Jul 28, 2023 •

edited

Loading

rnpnr Jul 28, 2023

proskur1n Jul 28, 2023

rnpnr Jul 27, 2023

rnpnr commented Jul 27, 2023

rnpnr left a comment

rnpnr commented Jul 28, 2023

	/* at most one iteration most of the time */
	/* one iteration most of the time */

Soft word wrapping #948

Soft word wrapping #948

Conversation

proskur1n commented May 9, 2021 • edited Loading

Rationale

New options

Todo

ninewise left a comment

Choose a reason for hiding this comment

mcepl commented Aug 22, 2021

sh4r1k7 commented Jun 1, 2022

erf commented Sep 3, 2022

whiteinge commented Sep 23, 2022

proskur1n commented Sep 26, 2022 • edited Loading

rnpnr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

proskur1n commented Jul 27, 2023

rnpnr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

proskur1n Jul 28, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

proskur1n Jul 28, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

proskur1n Jul 28, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rnpnr commented Jul 27, 2023

rnpnr left a comment

Choose a reason for hiding this comment

rnpnr commented Jul 28, 2023

proskur1n commented May 9, 2021 •

edited

Loading

proskur1n commented Sep 26, 2022 •

edited

Loading

proskur1n Jul 28, 2023 •

edited

Loading

proskur1n Jul 28, 2023 •

edited

Loading

proskur1n Jul 28, 2023 •

edited

Loading