-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Soft word wrapping #948
Soft word wrapping #948
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got a bit confused by the removal of the default case, which is now the 'fallthrough' path. Looks good to me over all.
Works for me, just it could be emphasized somewhere, that settings should go to |
Anything stopping this from being merged? Seem like a good candidate |
I pulled this down locally and it works for me. This would be a nice addition. 👍 @proskur1n I'm curious if this technique could also be used to mimic I was hoping this PR might let me set |
@erf I marked this pull request as a draft because there were some segfaults when scrolling through binary files. Those should be fixed by aa54c37 now. However, it may also have introduced other bugs :) @whiteinge I don't think this is possible at the moment. As far as I know, if you simply skip some characters when rendering, the navigation will stop working properly. It's actually a miracle that soft wrapping works at all with so few changes to the code. On a side note, I was thinking about dropping support for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, thanks for the patch! I think this is pretty much ready to merge
there are just a few points inline.
I was thinking about dropping support for breakat and simply wrapping on isspace(...)
Are there any cases when you would want to wrap the lines on something else instead?
Yes, sometimes I edit Japanese text in Vis and there are no spaces in
Japanese. Instead its better to break on characters such as 。、!?
or other such wide characters. I'm sure there are other people editing
Japanese or similar languages.
I did test this use case and it works great!
view.c
Outdated
@@ -4,6 +4,7 @@ | |||
#include <ctype.h> | |||
#include <errno.h> | |||
#include <limits.h> | |||
#include <assert.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this and associated calls. assert()
isn't used anywhere
else in vis and I would rather not introduce it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed them in my last commit. I don't understand your reasoning though, as asserts are useful for debugging and vis is already configured to remove them from release builds. I originally inserted those asserts there because they would have caught a weird segfault I had to debug earlier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gdb can also catch segfaults. The problem with assert()
is that its
implemented as a macro and can have side effects when used incorrectly.
In general its usage also makes me think the author doesn't understand
the code they have written. There was nothing wrong with your usage but
I don't want to open that can of worms.
view.c
Outdated
for (int i = view->col + 1; i < view->width; i++) | ||
view->line->cells[i] = view->cell_blank; | ||
static bool view_add_cell(View *view, const Cell *cell) { | ||
/* at most one iteration most of the time */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe:
/* at most one iteration most of the time */ | |
/* one iteration most of the time */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be "at most", because this loop is only executed if the line has to be wrapped. In most cases, the while-loop body is not entered at all.
view.c
Outdated
for (int i = 0; i < cell->width; ++i) { | ||
assert(view->col < view->width); | ||
if (i == 0) { | ||
view->line->cells[view->col++] = *cell; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this first iteration go outside the loop like it was before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, i think so. To my understanding, valid cells cannot have width == 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cell_unused
actually does have width == 0
but it is only used inside this function. So it should be fine. Leaving this code as-is would make it a little bit more robust though (no-op for invalid and empty cells).
Should we rename |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for being a stickler about that comment but there were a lot of
changes that seemed somewhat arbitrary in the "Fix segfault on some binary
files" commit and that is one of them. I just want to make everything
there is correct before I merge this.
view.c
Outdated
static bool view_add_cell(View *view, const Cell *cell) { | ||
/* at most one iteration most of the time. we have to use a while-loop | ||
* here because of some pathological edge cases where one unicode char | ||
* may be bigger than the (extremely small) terminal width. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still doesn't make sense to me. It can be either of "at most one
iteration" in which case the loop isn't needed OR "one iteration most
of the time" which you are suggesting is the case. But now the comment
makes it seem like a pointless endeavor because no one is using vis in
a terminal that is less than 1 (or even 10) character(s) wide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are three cases:
- Zero iterations: We are somewhere in the middle of the line. We have enough space to insert the next character and don't need to wrap the current line.
- One iteration: Our terminal is 80 characters wide. We are inserting the 81st character. The while-loop runs once and wraps the current line. We can now insert another 80 (single width) characters until next line wrap.
- Edge case: We have a one character wide terminal window and try to insert a double unicode. In this case the while-loop runs for every remaining line on the screen until no lines are left and we hit the
return false
branch. The editor obviously wont be able to render the text properly in this case, but it also won't crash. Calls to view_wrap_line in a loop make sure that the editor still does all the necessary housekeeping like setting the line number, replacing characters with empty cells and etc.
Sadly, I do not remember exactly what the problem was but it was something along those lines:
In my first version view_add_cell and view_wrap_line were implemented as mutually recursive functions. It didn't cause any problems in the most cases. However, sometimes after a line wrap there still wasn't enough space to insert the next character and the two functions were calling each other indefinitely. In aa54c37 I have removed the recursion entirely and added a while-loop to account for the edge case of an extremely small terminal window.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, I think it wasn't really the recursion itself causing the segfault, but rather a null pointer somewhere as a result of the mutual recursion. Anyway, no recursion -> no problems :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
recursion
Ahhh, I see now. I didn't catch that that was what you removed. If thats
the case it was probably running out of stack space. Let me think about
how to reword it a bit more and I will update the comment it before
merging.
Also that probably means that it wasn't an issue with your patch and
actually a preexisting issue. I'll make sure to keep it as a separate commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also that probably means that it wasn't an issue with your patch
No, it was actually a problem I introduced in one of the first commits in this PR and not an preexisting issue. Both functions don't exist in master at all.
Feel free to reword my comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't segfault anymore after the recursion got removed, but there is still an out-of-bounds write without the while-loop in view->line->cells[view->col++] = cell_unused;
We cannot replace while
with if
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about this instead:
static bool view_add_cell(View *view, const Cell *cell) {
- /* at most one iteration most of the time. we have to use a while-loop
- * here because of some pathological edge cases where one unicode char
- * may be bigger than the (extremely small) terminal width. */
- while (view->col + cell->width > view_max_text_width(view)) {
+ if (view->col + cell->width > view_max_text_width(view)) {
view_wrap_line(view);
if (!view->line)
return false;
}
view->line->width += cell->width;
view->line->len += cell->len;
- view->line->cells[view->col++] = *cell;
- for (int i = 1; i < cell->width; ++i) {
- /* set cells of a character which uses multiple columns */
+ if (cell->width > 0)
+ view->line->cells[view->col++] = *cell;
+ /* set cells of a character which uses multiple columns */
+ for (int i = 1; i < cell->width; i++)
view->line->cells[view->col++] = cell_unused;
- }
-
return true;
}
Because for someone just reading over the code its really not clear how
the while loop fixes anything. This diff accounts for the only case the
previous version of the for
loop covered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't fix the out-of-bounds write. When the terminal window is resized to be one character wide then view->line->cells
will only have space for a single character.
This line will not produce an out-of-bounds write. view->col
will be 0 after a resize. So this array access is valid.
if (cell->width > 0)
view->line->cells[view->col++] = *cell;
This for-loop is the problem in this case. If cell->width
is for example two then we will write to view->line->cells[1]
which is out-of-bounds for an array with only 1 element.
for (int i = 1; i < cell->width; i++)
view->line->cells[view->col++] = cell_unused;
We should just keep the while-loop instead of removing it for doubtful gains and introducing new bugs in the process...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok then the comment should explain this. Right now the code doesn't
make sense and the comment doesn't give any insight into what is going
on and just leads to confusion.
How about this:
static bool view_add_cell(View *view, const Cell *cell) {
- /* at most one iteration most of the time. we have to use a while-loop
- * here because of some pathological edge cases where one unicode char
- * may be bigger than the (extremely small) terminal width. */
+ /* if the terminal is resized to a single (ASCII) char an out
+ * of bounds write could be performed for a multiwide char.
+ * this can be caught by iterating through the lines with
+ * view_wrap_line() until no lines remain.
+ */
while (view->col + cell->width > view_max_text_width(view)) {
view_wrap_line(view);
if (!view->line)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with that 👍 . I would just mention that "at most one iteration most of the time". It would make it clear that this is an inexpensive loop.
view.c
Outdated
@@ -4,6 +4,7 @@ | |||
#include <ctype.h> | |||
#include <errno.h> | |||
#include <limits.h> | |||
#include <assert.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gdb can also catch segfaults. The problem with assert()
is that its
implemented as a macro and can have side effects when used incorrectly.
In general its usage also makes me think the author doesn't understand
the code they have written. There was nothing wrong with your usage but
I don't want to open that can of worms.
No I think |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything looks good now. Give me a little bit to squash some commits
and reword some comments and I'll merge it.
Also I was just doing some quick research and it seems like not even
neovim supports arbitrary UTF-8 in the breakat
option so great work!
this is contolled by the wrapcolumn/wc and breakat/brk options related martanne#142: Word wrap and line breaks related martanne#932: Vis for Prose? related martanne#1092: Disabling line wrapping
The two commits I just pushed are what I will merge shortly. Below is diff --git a/view.c b/view.c
index ebdb876..0b7ce36 100644
--- a/view.c
+++ b/view.c
@@ -178,9 +178,9 @@ static int view_max_text_width(const View *view) {
}
static void view_wrap_line(View *view) {
+ Line *wrapped_line = view->line;
int col = view->col;
int wrapcol = (view->wrapcol > 0) ? view->wrapcol : view->col;
- Line *wrapped_line = view->line;
view->line = view->line->next;
view->col = 0;
@@ -197,7 +197,7 @@ static void view_wrap_line(View *view) {
}
}
- /* clear remaining of line */
+ /* clear remaining cells on line */
for (int i = wrapcol; i < view->width; ++i) {
if (i < col) {
wrapped_line->width -= wrapped_line->cells[i].width;
@@ -208,9 +208,11 @@ static void view_wrap_line(View *view) {
}
static bool view_add_cell(View *view, const Cell *cell) {
- /* at most one iteration most of the time. we have to use a while-loop
- * here because of some pathological edge cases where one unicode char
- * may be bigger than the (extremely small) terminal width. */
+ /* if the terminal is resized to a single (ASCII) char an out
+ * of bounds write could be performed for a wide char. this can
+ * be caught by iterating through the lines with view_wrap_line()
+ * until no lines remain. usually 0 or 1 iterations.
+ */
while (view->col + cell->width > view_max_text_width(view)) {
view_wrap_line(view);
if (!view->line)
@@ -220,11 +222,9 @@ static bool view_add_cell(View *view, const Cell *cell) {
view->line->width += cell->width;
view->line->len += cell->len;
view->line->cells[view->col++] = *cell;
- for (int i = 1; i < cell->width; ++i) {
- /* set cells of a character which uses multiple columns */
+ /* set cells of a character which uses multiple columns */
+ for (int i = 1; i < cell->width; i++)
view->line->cells[view->col++] = cell_unused;
- }
-
return true;
} |
Hello, this pull request adds soft word wrapping (
linebreak
in vim) support to vis. In addition, it defines an option to change the visual line length instead of always using window width for the maximum line length. The latter was made trivial by my code rewrite.Rationale
It is a lot of pain to write LaTeX and Markdown documents without proper word wrapping support in the editor. This option is something that makes your life much easier and is supported by nearly all other text editors. In fact, a similar functionality was requested by #142, but on the contrary to that proposal, this pull request only affects the way lines are displayed on screen and doesn't modify the underlined character data.
New options
breakat / brk
specifies a list of characters after which the editor may decide to wrap the line before window width is reached. The default value for this option is an empty string which corresponds to the old behavior of vis. In this case, cells are just printed one after another until window width is reached.wrapcolumn / wc
affects the maximum displayed line length and is especially useful for large monitors. Zero is the default for this option and thus disables this feature. I didn't exactly know how to name this option, but "wrapcolumn" sounds pretty good to me.I have included a little screenshot with both of the new options enabled below.
Todo
Right now,breakat
only supports ASCII characters, but it shouldn't be too difficult to extend my code to handle any UTF-8 characters. In fact, vim implementation ofbreakat
also only supports ASCII.