Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Soft word wrapping #948

Merged
merged 2 commits into from
Jul 28, 2023
Merged

Soft word wrapping #948

merged 2 commits into from
Jul 28, 2023

Conversation

proskur1n
Copy link
Contributor

@proskur1n proskur1n commented May 9, 2021

Hello, this pull request adds soft word wrapping (linebreak in vim) support to vis. In addition, it defines an option to change the visual line length instead of always using window width for the maximum line length. The latter was made trivial by my code rewrite.

Rationale

It is a lot of pain to write LaTeX and Markdown documents without proper word wrapping support in the editor. This option is something that makes your life much easier and is supported by nearly all other text editors. In fact, a similar functionality was requested by #142, but on the contrary to that proposal, this pull request only affects the way lines are displayed on screen and doesn't modify the underlined character data.

New options

  • breakat / brk specifies a list of characters after which the editor may decide to wrap the line before window width is reached. The default value for this option is an empty string which corresponds to the old behavior of vis. In this case, cells are just printed one after another until window width is reached.
  • wrapcolumn / wc affects the maximum displayed line length and is especially useful for large monitors. Zero is the default for this option and thus disables this feature. I didn't exactly know how to name this option, but "wrapcolumn" sounds pretty good to me.

I have included a little screenshot with both of the new options enabled below.

:set wrapcolumn 80
:set breakat ' .!?'

vis_word_wrap

Todo

  • I have tried this feature with a couple of different text files and couldn't see any problems with my implementation. However, feel free to test this pull request on some strange edge cases, which may cause something to break. Anyway, my intention was to not change the default look of vis, but I had to rewrite the view_addch function in view.c, which may have caused some unintentional side effects.
  • Right now, breakat only supports ASCII characters, but it shouldn't be too difficult to extend my code to handle any UTF-8 characters. In fact, vim implementation of breakat also only supports ASCII.

Copy link
Collaborator

@ninewise ninewise left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got a bit confused by the removal of the default case, which is now the 'fallthrough' path. Looks good to me over all.

@mcepl
Copy link
Contributor

mcepl commented Aug 22, 2021

Works for me, just it could be emphasized somewhere, that settings should go to vis.events.WIN_OPEN handler, not (more usual, I guess) vis.events.INIT one.

@proskur1n proskur1n marked this pull request as draft November 25, 2021 10:32
@mcepl mcepl mentioned this pull request Dec 31, 2021
@sh4r1k7
Copy link
Contributor

sh4r1k7 commented Jun 1, 2022

Works great but I found one edge case:

image

The underline styling carries through the wrap-gap. My guess is this should be easy to clear/restore while rendering the gaps, but it's not a big deal either way (I don't think your PR introduced this behaviour, but is relevant).

@erf
Copy link
Contributor

erf commented Sep 3, 2022

Anything stopping this from being merged? Seem like a good candidate

@whiteinge
Copy link

I pulled this down locally and it works for me. This would be a nice addition. 👍

@proskur1n I'm curious if this technique could also be used to mimic nowrap. I often open structured data files (commonly JSON but many others as well) where line lengths are thousands of characters long but I only care to see the beginning of each line/record. (I don't need horizontal scrolling, I just want to toggle wrapping so I can see individual line numbers without the noise.)

I was hoping this PR might let me set wrapcolumn to a value larger than $COLUMNS as a hack for my use-case. That doesn't work which makes complete sense -- it's a weird idea to wrap text off-screen after all. That said, would it be possible to use the same technique in this PR to disable wrapping altogether and introduce a nowrap setting?

@proskur1n
Copy link
Contributor Author

proskur1n commented Sep 26, 2022

@erf I marked this pull request as a draft because there were some segfaults when scrolling through binary files. Those should be fixed by aa54c37 now. However, it may also have introduced other bugs :)

@whiteinge I don't think this is possible at the moment. As far as I know, if you simply skip some characters when rendering, the navigation will stop working properly. It's actually a miracle that soft wrapping works at all with so few changes to the code.

On a side note, I was thinking about dropping support for breakat and simply wrapping on isspace(...) instead (With a boolean :set option to enable it). Are there any cases when you would want to wrap the lines on something else instead?

@proskur1n proskur1n marked this pull request as ready for review September 26, 2022 18:56
Copy link
Collaborator

@rnpnr rnpnr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks for the patch! I think this is pretty much ready to merge
there are just a few points inline.

I was thinking about dropping support for breakat and simply wrapping on isspace(...)
Are there any cases when you would want to wrap the lines on something else instead?

Yes, sometimes I edit Japanese text in Vis and there are no spaces in
Japanese. Instead its better to break on characters such as 。、!?
or other such wide characters. I'm sure there are other people editing
Japanese or similar languages.

I did test this use case and it works great!

view.c Outdated
@@ -4,6 +4,7 @@
#include <ctype.h>
#include <errno.h>
#include <limits.h>
#include <assert.h>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this and associated calls. assert() isn't used anywhere
else in vis and I would rather not introduce it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed them in my last commit. I don't understand your reasoning though, as asserts are useful for debugging and vis is already configured to remove them from release builds. I originally inserted those asserts there because they would have caught a weird segfault I had to debug earlier.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gdb can also catch segfaults. The problem with assert() is that its
implemented as a macro and can have side effects when used incorrectly.
In general its usage also makes me think the author doesn't understand
the code they have written. There was nothing wrong with your usage but
I don't want to open that can of worms.

view.c Outdated
for (int i = view->col + 1; i < view->width; i++)
view->line->cells[i] = view->cell_blank;
static bool view_add_cell(View *view, const Cell *cell) {
/* at most one iteration most of the time */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe:

Suggested change
/* at most one iteration most of the time */
/* one iteration most of the time */

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be "at most", because this loop is only executed if the line has to be wrapped. In most cases, the while-loop body is not entered at all.

view.c Outdated
for (int i = 0; i < cell->width; ++i) {
assert(view->col < view->width);
if (i == 0) {
view->line->cells[view->col++] = *cell;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this first iteration go outside the loop like it was before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, i think so. To my understanding, valid cells cannot have width == 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cell_unused actually does have width == 0 but it is only used inside this function. So it should be fine. Leaving this code as-is would make it a little bit more robust though (no-op for invalid and empty cells).

@proskur1n
Copy link
Contributor Author

Should we rename breakat to wrapat ? So that they are displayed next to each other in :help.

Copy link
Collaborator

@rnpnr rnpnr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for being a stickler about that comment but there were a lot of
changes that seemed somewhat arbitrary in the "Fix segfault on some binary
files" commit and that is one of them. I just want to make everything
there is correct before I merge this.

view.c Outdated
static bool view_add_cell(View *view, const Cell *cell) {
/* at most one iteration most of the time. we have to use a while-loop
* here because of some pathological edge cases where one unicode char
* may be bigger than the (extremely small) terminal width. */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still doesn't make sense to me. It can be either of "at most one
iteration" in which case the loop isn't needed OR "one iteration most
of the time" which you are suggesting is the case. But now the comment
makes it seem like a pointless endeavor because no one is using vis in
a terminal that is less than 1 (or even 10) character(s) wide.

Copy link
Contributor Author

@proskur1n proskur1n Jul 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are three cases:

  1. Zero iterations: We are somewhere in the middle of the line. We have enough space to insert the next character and don't need to wrap the current line.
  2. One iteration: Our terminal is 80 characters wide. We are inserting the 81st character. The while-loop runs once and wraps the current line. We can now insert another 80 (single width) characters until next line wrap.
  3. Edge case: We have a one character wide terminal window and try to insert a double unicode. In this case the while-loop runs for every remaining line on the screen until no lines are left and we hit the return false branch. The editor obviously wont be able to render the text properly in this case, but it also won't crash. Calls to view_wrap_line in a loop make sure that the editor still does all the necessary housekeeping like setting the line number, replacing characters with empty cells and etc.

Sadly, I do not remember exactly what the problem was but it was something along those lines:

In my first version view_add_cell and view_wrap_line were implemented as mutually recursive functions. It didn't cause any problems in the most cases. However, sometimes after a line wrap there still wasn't enough space to insert the next character and the two functions were calling each other indefinitely. In aa54c37 I have removed the recursion entirely and added a while-loop to account for the edge case of an extremely small terminal window.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, I think it wasn't really the recursion itself causing the segfault, but rather a null pointer somewhere as a result of the mutual recursion. Anyway, no recursion -> no problems :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recursion

Ahhh, I see now. I didn't catch that that was what you removed. If thats
the case it was probably running out of stack space. Let me think about
how to reword it a bit more and I will update the comment it before
merging.

Also that probably means that it wasn't an issue with your patch and
actually a preexisting issue. I'll make sure to keep it as a separate commit.

Copy link
Contributor Author

@proskur1n proskur1n Jul 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also that probably means that it wasn't an issue with your patch

No, it was actually a problem I introduced in one of the first commits in this PR and not an preexisting issue. Both functions don't exist in master at all.

Feel free to reword my comments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't segfault anymore after the recursion got removed, but there is still an out-of-bounds write without the while-loop in view->line->cells[view->col++] = cell_unused;

We cannot replace while with if here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this instead:

 static bool view_add_cell(View *view, const Cell *cell) {
-	/* at most one iteration most of the time. we have to use a while-loop
-	 * here because of some pathological edge cases where one unicode char
-	 * may be bigger than the (extremely small) terminal width. */
-	while (view->col + cell->width > view_max_text_width(view)) {
+	if (view->col + cell->width > view_max_text_width(view)) {
 		view_wrap_line(view);
 		if (!view->line)
 			return false;
 	}
 
 	view->line->width += cell->width;
 	view->line->len += cell->len;
-	view->line->cells[view->col++] = *cell;
-	for (int i = 1; i < cell->width; ++i) {
-		/* set cells of a character which uses multiple columns */
+	if (cell->width > 0)
+		view->line->cells[view->col++] = *cell;
+	/* set cells of a character which uses multiple columns */
+	for (int i = 1; i < cell->width; i++)
 		view->line->cells[view->col++] = cell_unused;
-	}
-
 	return true;
 }

Because for someone just reading over the code its really not clear how
the while loop fixes anything. This diff accounts for the only case the
previous version of the for loop covered.

Copy link
Contributor Author

@proskur1n proskur1n Jul 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't fix the out-of-bounds write. When the terminal window is resized to be one character wide then view->line->cells will only have space for a single character.

This line will not produce an out-of-bounds write. view->col will be 0 after a resize. So this array access is valid.

if (cell->width > 0)
	view->line->cells[view->col++] = *cell;

This for-loop is the problem in this case. If cell->width is for example two then we will write to view->line->cells[1] which is out-of-bounds for an array with only 1 element.

for (int i = 1; i < cell->width; i++)
	view->line->cells[view->col++] = cell_unused;

We should just keep the while-loop instead of removing it for doubtful gains and introducing new bugs in the process...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok then the comment should explain this. Right now the code doesn't
make sense and the comment doesn't give any insight into what is going
on and just leads to confusion.

How about this:

 static bool view_add_cell(View *view, const Cell *cell) {
-	/* at most one iteration most of the time. we have to use a while-loop
-	 * here because of some pathological edge cases where one unicode char
-	 * may be bigger than the (extremely small) terminal width. */
+	/* if the terminal is resized to a single (ASCII) char an out
+	 * of bounds write could be performed for a multiwide char.
+	 * this can be caught by iterating through the lines with
+	 * view_wrap_line() until no lines remain.
+	 */
 	while (view->col + cell->width > view_max_text_width(view)) {
 		view_wrap_line(view);
 		if (!view->line)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with that 👍 . I would just mention that "at most one iteration most of the time". It would make it clear that this is an inexpensive loop.

view.c Outdated
@@ -4,6 +4,7 @@
#include <ctype.h>
#include <errno.h>
#include <limits.h>
#include <assert.h>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gdb can also catch segfaults. The problem with assert() is that its
implemented as a macro and can have side effects when used incorrectly.
In general its usage also makes me think the author doesn't understand
the code they have written. There was nothing wrong with your usage but
I don't want to open that can of worms.

@rnpnr
Copy link
Collaborator

rnpnr commented Jul 27, 2023

Should we rename breakat to wrapat ? So that they are displayed next to each other in :help.

No I think breakat makes more sense and is consistent with other editors.

Copy link
Collaborator

@rnpnr rnpnr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks good now. Give me a little bit to squash some commits
and reword some comments and I'll merge it.

Also I was just doing some quick research and it seems like not even
neovim supports arbitrary UTF-8 in the breakat option so great work!

this is contolled by the wrapcolumn/wc and breakat/brk options

related martanne#142: Word wrap and line breaks
related martanne#932: Vis for Prose?
related martanne#1092: Disabling line wrapping
@rnpnr
Copy link
Collaborator

rnpnr commented Jul 28, 2023

The two commits I just pushed are what I will merge shortly. Below is
the diff between what I pushed and what was here before (for posterity):

diff --git a/view.c b/view.c
index ebdb876..0b7ce36 100644
--- a/view.c
+++ b/view.c
@@ -178,9 +178,9 @@ static int view_max_text_width(const View *view) {
 }
 
 static void view_wrap_line(View *view) {
+	Line *wrapped_line = view->line;
 	int col = view->col;
 	int wrapcol = (view->wrapcol > 0) ? view->wrapcol : view->col;
-	Line *wrapped_line = view->line;
 
 	view->line = view->line->next;
 	view->col = 0;
@@ -197,7 +197,7 @@ static void view_wrap_line(View *view) {
 		}
 	}
 
-	/* clear remaining of line */
+	/* clear remaining cells on line */
 	for (int i = wrapcol; i < view->width; ++i) {
 		if (i < col) {
 			wrapped_line->width -= wrapped_line->cells[i].width;
@@ -208,9 +208,11 @@ static void view_wrap_line(View *view) {
 }
 
 static bool view_add_cell(View *view, const Cell *cell) {
-	/* at most one iteration most of the time. we have to use a while-loop
-	 * here because of some pathological edge cases where one unicode char
-	 * may be bigger than the (extremely small) terminal width. */
+	/* if the terminal is resized to a single (ASCII) char an out
+	 * of bounds write could be performed for a wide char. this can
+	 * be caught by iterating through the lines with view_wrap_line()
+	 * until no lines remain. usually 0 or 1 iterations.
+	 */
 	while (view->col + cell->width > view_max_text_width(view)) {
 		view_wrap_line(view);
 		if (!view->line)
@@ -220,11 +222,9 @@ static bool view_add_cell(View *view, const Cell *cell) {
 	view->line->width += cell->width;
 	view->line->len += cell->len;
 	view->line->cells[view->col++] = *cell;
-	for (int i = 1; i < cell->width; ++i) {
-		/* set cells of a character which uses multiple columns */
+	/* set cells of a character which uses multiple columns */
+	for (int i = 1; i < cell->width; i++)
 		view->line->cells[view->col++] = cell_unused;
-	}
-
 	return true;
 }

@rnpnr rnpnr merged commit 5d7d62c into martanne:master Jul 28, 2023
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants