New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Represent Screen state as UTF-8 #7992

Merged
merged 2 commits into from Jun 13, 2018

Conversation

Projects
None yet
6 participants
@bfredl
Member

bfredl commented Feb 10, 2018

This simplifies how screen.c stores rendered chars in ScreenLines, by storing them as the UTF-8 which will anyway be encoded and sent to the UI layer. Note this PR does NOT change in any way how screen.c processes incoming UTF-8 data from buffers, cmdline, messages etc, any algorithm that operates on UCS-4 (like arabic shaping, treatment of non-printable chars) will still do so. Simplifications can surely be done later on, but has higher risk than this PR that only changes how the result of these algorithms are stored on the grid. The immediate purpose is rather to make it simpler to read from and process the screen state after it has been rendered to the grid:

  • The internal compositor introduced in #6619 will merge multiple grids and overlays to a single global grid.
  • It will be simpler to experiment with more efficient grid transmission modes (#7723 , #6459). screen_char for instance could just maintain the first and last invalid column in each row, and instead the UI layer will transmit these chunks on ui_flush by directly translating from the grid to some (opt-in) more compact representation.

@marvim marvim added the WIP label Feb 10, 2018

@bfredl bfredl force-pushed the bfredl:mbscreen branch from 9a9dda0 to ea61f8e Feb 11, 2018

@bfredl

This comment has been minimized.

Member

bfredl commented Feb 11, 2018

Hmm, not all these assignments have coverage. For instance, we have no test that uses folded lines inside cmdline window.

@bfredl bfredl force-pushed the bfredl:mbscreen branch 3 times, most recently from 4e08af6 to 8e658cc Feb 14, 2018

@bfredl

This comment has been minimized.

Member

bfredl commented Feb 19, 2018

Question: currently this PR freezes the option maxcombine to always have the effect of the largest value 6. Is there a point in supporting lower values, other than lower memory consumption? (regarding which nvim TUI has always stored MAX_MCO cchars per cell and no one has complained...)

@justinmk

This comment has been minimized.

Member

justinmk commented Feb 19, 2018

'maxcombine' likely had the same motivation as 'encoding'. +1 for freezing & deprecating it.

@bfredl

This comment has been minimized.

Member

bfredl commented Feb 19, 2018

Away it goes then. Though unlike 'encoding' we should probably just ignore writing lower values, I don't think getting more combining chars than expected breaks fundamental exceptions the same way. (just to be clear: it only affects screen rendering, not what h and l etc considers to be a single char)

In the future we could consider implementing a NFG-like solution to support arbitrary graphemes without O(screensize) overhead (not necessarily real normalization, could just keep a mapping between UTF-8 grapheme strings and invented out-of-range codepoints)

@bfredl bfredl force-pushed the bfredl:mbscreen branch from b0159cd to fbd4d15 Feb 19, 2018

@bfredl bfredl force-pushed the bfredl:mbscreen branch from fbd4d15 to d6e0980 Mar 2, 2018

@bfredl bfredl force-pushed the bfredl:mbscreen branch 3 times, most recently from 367ee40 to d2c80ff Mar 3, 2018

@bfredl bfredl force-pushed the bfredl:mbscreen branch from d2c80ff to 62f12dd Mar 24, 2018

@bfredl bfredl force-pushed the bfredl:mbscreen branch 3 times, most recently from a0399b4 to 90174c0 Mar 31, 2018

@bfredl bfredl force-pushed the bfredl:mbscreen branch from 90174c0 to 5811e18 Apr 2, 2018

@bfredl bfredl added this to the 0.3.1 milestone Apr 9, 2018

@bfredl bfredl added refactor ui labels Apr 9, 2018

typedef char_u schar_T;
typedef unsigned short sattr_T;
typedef char_u schar_T[MAX_MCO * 4 + 1];
typedef int16_t sattr_T;
/*
* The characters that are currently on the screen are kept in ScreenLines[].

This comment has been minimized.

@oni-link

oni-link Apr 18, 2018

Contributor

This comment block still contains references to ScreenLinesUC.

@@ -155,14 +155,7 @@ EXTERN char_u *LineWraps INIT(= NULL); /* line wraps to next line */
* ScreenLinesC[0][off] is only to be used when ScreenLinesUC[off] != 0.
* Note: These three are only allocated when enc_utf8 is set!
*/

This comment has been minimized.

@oni-link

oni-link Apr 18, 2018

Contributor

Comment block contains old references.

// ScreenLines[off] Contains a copy of the whole screen, as it is currently
// displayed
// ScreenAttrs[off] Contains the associated attributes.
// LineOffset[row] Contains the offset into ScreenLines*[] and ScreenAttrs[]

This comment has been minimized.

@oni-link

oni-link Apr 18, 2018

Contributor

ScreenLines*[] -> ScreenLines[]

// displayed
// ScreenAttrs[off] Contains the associated attributes.
// LineOffset[row] Contains the offset into ScreenLines*[] and ScreenAttrs[]
// for each line.

This comment has been minimized.

@oni-link

oni-link Apr 18, 2018

Contributor

Indentation is too small, probably because TABs were removed.

This comment has been minimized.

@bfredl

bfredl Apr 18, 2018

Member

These could probably be removed anyway, and moved to globals.h if needed.

if (clear_width > 0 && !rlflag) {
// blank out the rest of the line
while (col < clear_width && ScreenLines[off_to][0] == ' '
&& ScreenLines[off_to][1] == NUL

This comment has been minimized.

@oni-link

oni-link Apr 18, 2018

Contributor

Is the test for ' ' not enough to know that ascii is encoded here? Some composing characters could follow.

@justinmk justinmk modified the milestones: 0.3.1, 0.3.2 Jun 10, 2018

@bfredl bfredl force-pushed the bfredl:mbscreen branch 2 times, most recently from feec318 to 2441d41 Jun 10, 2018

@bfredl

This comment has been minimized.

Member

bfredl commented Jun 11, 2018

@justinmk Why 0.3.2 and not 0.3.1? The PR is done basically, and if merged soon, it will receive enough testing on master for 0.3.1.

@bfredl

This comment has been minimized.

Member

bfredl commented Jun 12, 2018

All tests pass, except for unrelated timer failure in QB. Internal docs are not perfect yet, but I aim to gradually improve them in following PRs. Running the new tests on master to confirm that behavior is consistent: #8534

@bfredl bfredl force-pushed the bfredl:mbscreen branch 2 times, most recently from 2263158 to 7dcd1b6 Jun 12, 2018

@teto

This comment has been minimized.

Contributor

teto commented Jun 13, 2018

if the failure .../nvim-deps/usr/share/lua/5.1/nvim/msgpack_rpc_stream.lua:84: invalid msgpack-rpc string is not related to the PR, I vote for merging. This is a long overdue simple simplification of screen.c

c = ScreenLinesUC[off];
else
c = ScreenLines[off];
c = mb_ptr2char(ScreenLines[off]);

This comment has been minimized.

@justinmk

justinmk Jun 13, 2018

Member

Why is it wrapped in mb_ptr2char() now?

This comment has been minimized.

@bfredl

bfredl Jun 13, 2018

Member

Because for back compat this function must return the char as a Number. (And thus ignore composing chars) Should be utf_ptr2char though.

* Returns the produced number of bytes.
*/
int utfc_char2bytes(int off, char_u *buf)
{

This comment has been minimized.

@justinmk

justinmk Jun 13, 2018

Member

IIUC now utfc_char2bytes is not needed because the screen already contains what would have been the result of that. So ui_puts can be called directly on the screen bytes.

Would appreciate calling this out in the commit message as a concrete example.

// ScreenLines[].
//
// update_screen() is the function that updates all windows and status lines.
// It is called form the main loop when must_redraw is non-zero. It may be

This comment has been minimized.

@justinmk

@bfredl bfredl force-pushed the bfredl:mbscreen branch 2 times, most recently from b0d0eec to 050f397 Jun 13, 2018

bfredl added some commits Feb 10, 2018

screen: use UTF-8 representation
Store text in ScreenLines as UTF-8, so it can be sent as-is to the UI
layer. `utfc_char2bytes(off,buf)` is removed, as `ScreenLines[off]` now
already contains this representation.

To recover the codepoints that the screen arrays previously contained, use
utfc_ptr2char (or utf_ptr2char to ignore composing chars).

NB: This commit does NOT change how screen.c processes incoming UTF-8 data
from buffers, cmdline, messages etc. Any algorithm that operates on UCS-4
(like arabic shaping, treatment of non-printable chars)
is left unchanged for now.
@bfredl

This comment has been minimized.

Member

bfredl commented Jun 13, 2018

@justinmk I added detailed commit message

@bfredl

This comment has been minimized.

Member

bfredl commented Jun 13, 2018

Failure in ui/incommand_spec

test/functional/ui/inccommand_spec.lua @ 2475
Failure message: ./test/functional/helpers.lua:312: 
retry() attempts: 2
./test/functional/ui/screen.lua:307: Row 8 did not match.
Expected:
|bar baz fox                   |
|bar foo ba^z                   |
|{15:~                             }|
|{15:~                             }|
|{15:~                             }|
|{15:~                             }|
|{11:[No Name] [+]                 }|
|*xxx                           |
|xxx                           |
|xxx                           |
|xxx                           |
|xxx                           |
|xxx                           |
|{10:term                          }|
|                              |
Actual:
|bar baz fox                   |
|bar foo ba^z                   |
|{15:~                             }|
|{15:~                             }|
|{15:~                             }|
|{15:~                             }|
|{11:[No Name] [+]                 }|
|*                              |
|                              |
|                              |
|                              |
|                              |
|                              |
|{10:term                          }|
|                              |

To print the expect() call that would assert the current screen state, use
screen:snapshot_util(). In case of non-deterministic failures, use
screen:redraw_debug() to show all intermediate screen states.  
stack traceback:
./test/functional/helpers.lua:312: in function 'retry'
test/functional/ui/inccommand_spec.lua:2476: in function <test/functional/ui/inccommand_spec.lua:2475>

As it is retry failure it is probably unrelated indeterministic failure (also I think I have seen it in other PRs)

@bfredl bfredl merged commit 463da84 into neovim:master Jun 13, 2018

4 of 5 checks passed

QuickBuild Build pr-7992 finished with status FAILED
Details
codecov/patch 94.88% of diff hit (target 81.29%)
Details
codecov/project 81.29% (+<.01%) compared to 315b7f8
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@KillTheMule

This comment has been minimized.

Contributor

KillTheMule commented Jun 13, 2018

Yeah that't the "terminal activity plus inccomand" test, see #8311. Seems like that did not fully fix it?

@justinmk

This comment has been minimized.

Member

justinmk commented Jun 13, 2018

Yes that test is a known issue.

justinmk added a commit that referenced this pull request Jul 18, 2018

NVIM v0.3.1
FEATURES:
07499a8 #8709 man.vim: C highlighting for EXAMPLES section
07f82ad #8699 TUI: urxvt: also send xterm focus-reporting seqs
40911e4 #8616 API: emit nvim_buf_lines_event from :terminal
c46997a #8546 fillchars: Add "eob" flag

FIXES:
74d19f6 #8576 startup: avoid blank stdin buffer if other files were opened
4874214 #8737 Only waitpid() for processes that we care about
cd6e7e8 #8743 Check all child processes for exit in SIGCHLD handler
c230ef2 #8746 channel.c: Prevent channel_destroy_early() from freeing uninitialized rpc stuff
0ed8b12 #8681 transstr_buf: fix length comparison
d241f27 #8708 TUI: Fix standout mode
9afed40 #8698 man.vim: fix for mandoc
e889640 #8682 provider/node: npm --loglevel silent
1cbc830 #8613 API: nvim_win_set_cursor: set curswant
bf6048e #8628 checkhealth: Python: fix VIRTUAL_ENV check
3cc3506 #8528 checkhealth: node.js: also search yarn

CHANGES:
b751449 #8619 defaults: shortmess+=F
1248178 #8578 highlight: high-priority CursorLine if fg is set.
01570f1 #8726 terminal: handle &confirm and :confirm on unloading
56065bb #8721 screen: truncate showmode messages
bf2460e #7551 buffer: fix copying :setlocal options
c1c14fa #8520 Ex mode: always "improved" (gQ)
050f397 #7992 options: remove 'maxcombine` option (always 6)

INTERNAL:
463da84 #7992 screen: use UTF-8 representation

UtkarshMe added a commit to UtkarshMe/neovim that referenced this pull request Jul 28, 2018

NVIM v0.3.1
FEATURES:
07499a8 neovim#8709 man.vim: C highlighting for EXAMPLES section
07f82ad neovim#8699 TUI: urxvt: also send xterm focus-reporting seqs
40911e4 neovim#8616 API: emit nvim_buf_lines_event from :terminal
c46997a neovim#8546 fillchars: Add "eob" flag

FIXES:
74d19f6 neovim#8576 startup: avoid blank stdin buffer if other files were opened
4874214 neovim#8737 Only waitpid() for processes that we care about
cd6e7e8 neovim#8743 Check all child processes for exit in SIGCHLD handler
c230ef2 neovim#8746 channel.c: Prevent channel_destroy_early() from freeing uninitialized rpc stuff
0ed8b12 neovim#8681 transstr_buf: fix length comparison
d241f27 neovim#8708 TUI: Fix standout mode
9afed40 neovim#8698 man.vim: fix for mandoc
e889640 neovim#8682 provider/node: npm --loglevel silent
1cbc830 neovim#8613 API: nvim_win_set_cursor: set curswant
bf6048e neovim#8628 checkhealth: Python: fix VIRTUAL_ENV check
3cc3506 neovim#8528 checkhealth: node.js: also search yarn

CHANGES:
b751449 neovim#8619 defaults: shortmess+=F
1248178 neovim#8578 highlight: high-priority CursorLine if fg is set.
01570f1 neovim#8726 terminal: handle &confirm and :confirm on unloading
56065bb neovim#8721 screen: truncate showmode messages
bf2460e neovim#7551 buffer: fix copying :setlocal options
c1c14fa neovim#8520 Ex mode: always "improved" (gQ)
050f397 neovim#7992 options: remove 'maxcombine` option (always 6)

INTERNAL:
463da84 neovim#7992 screen: use UTF-8 representation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment