New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Add proper expression parser #7234

Merged
merged 116 commits into from Dec 9, 2017

Conversation

Projects
None yet
6 participants
@ZyX-I
Contributor

ZyX-I commented Sep 4, 2017

Tasks for the parser:

  • Be part of the regular expression evaluation process.
  • Be part of the highlighting routines for <C-r>=, etc.
  • Used to provide smarter completion.
  • Lambdas should hold AST and not return {expr} string.
  • External API: ASYNC function that yields expressions AST (should be possible to have function completely thread-safe and not only ASYNC if I get rid of things like vim_isIDc).
  • Small isolated program which does AST parsing, likely even without printing, for fuzzers, or in case I ever have time to use things like KLEE.

Minimal tasks before completing which PR must not be merged: highlighting and API.

Constraints:

  • Of course, it should not crash.
  • Parser should make it possible to parse, highlight and complete any byte sequence which user may input. Highlighting and completion should be sensible enough for any practical input (e.g. I may input <C-r>=(|) with | being a cursor position: this is not a valid expression, but it what parser will once receive for highlighting), AST for invalid input should be just sensible enough to allow sensible highlighting and completion.
  • Parser should yield good error messages.
  • No recursion.
@ZyX-I

This comment has been minimized.

Contributor

ZyX-I commented Sep 4, 2017

In the current state it is not going to pass any tests due to not handled values in switch().

@marvim marvim added the WIP label Sep 4, 2017

@ZyX-I ZyX-I force-pushed the ZyX-I:expression-parser branch 4 times, most recently from c5b5cc9 to 6a98fab Sep 4, 2017

@ZyX-I

This comment has been minimized.

Contributor

ZyX-I commented Sep 10, 2017

Some minimal language is finished: supporting unary/binary plus and calling/nesting parenthesis alongside with a single actual value (register) makes me think I can proceed with extending current parser code to accept the rest of the language and not inventing something else. On the way is, in any order

  1. checking whether I got handling operators precedence right by adding multiplication (others may wait until 3. is sorted out)
  2. comma operator (it is simpler to express multiargument call as Call(func, Comma(Comma(arg1, arg2), arg3)), then use comma code for list and dictionary literals with minimal modifications)
  3. { handling as the greatest ambiguity in VimL expressions: it is used for lambdas, dictionaries and for curly braces names at once, in some positions exactly any one of the listed entities may appear
  4. actually getting completion working: my idea is that if completion is requested parser function needs to be supplied with what is there up to the cursor, at EOC parser will set up completion by analyzing what is there in the AST stack and what was the previous token
  5. colon operator (either Ternary(cond, Colon()) or for dictionary, same reasoning as for commas).

Rest of parsing should be handled only after implementing the above.

(BTW, I think it may be a good and relatively easy to test lexer with KLEE. The only thing I completely do not like is that they do not support llvm-4.0 yet and llvm-3.x is not slotted and thus may not be installed alongside with 4.0, so KLEE could only either run in docker on my machine or I need to compile LLVM myself.)

@ZyX-I

This comment has been minimized.

Contributor

ZyX-I commented Sep 11, 2017

Managed to get KLEE run some tests, though it did not find anything useful yet. At least, I could be made sure that lexer always skips something (i.e. will not make parser fall into the infinite loop) and does not crash. Should be more useful for the parser (when it is finished), also turning code which compiles for KLEE into an executable suitable for fuzzing is much easier.

@@ -81,6 +81,8 @@ foreach(subdir
event
eval
lua
viml

This comment has been minimized.

@bfredl

bfredl Sep 12, 2017

Member

what is the logic behind eval / viml split? Is viml for "abstract" parsing/manipulation etc while eval is for actual evaluation/execution? Or is it just "old" vs "new" code? Is it important enough to warrant the split?

This comment has been minimized.

@ZyX-I

ZyX-I Sep 12, 2017

Contributor

eval was intended for expression evaluation only. viml is for the whole VimL, including Ex commands. The “expression evaluation” parts like parsing and execution are going to move to viml/ for sure, the core problem is that after the whole VimL is ported to a new parser, the difference between expression evaluation and Ex commands evaluation should be gone (both will use a new VM, whatever it will actually be).

What is left of eval is

  1. typval_T and related values manipulations. It is more logical to move that near the VM, to viml/ as well (just eval/typval* to viml/typval* when viml/ will actually contain some executor, not much hassle).
  2. A big bunch of f_… functions. My idea is that most logical choice for that is treating them just like ex_… functions used for Ex commands: scattered around the codebase, near the domain these functions belong to. But this is too heavy refactoring so probably somebody will move them to a separate file (as a part of #5081) where that will live from now on. File may be moved later.
  3. Utilities for things mentioned above: currently existing executing parser, functions, etc.

So it is just “old” vs “new” code, with “new” not using “eval” because it is not about only the expressions. I do not want to have expressions parsing in eval/ because it is going to be fairly isolated from what already is there, but sharing some things with Ex commands parser (for which there is no preexisting directory in any case).

@ZyX-I ZyX-I force-pushed the ZyX-I:expression-parser branch from 987bfa6 to a92378e Sep 12, 2017

@ZyX-I

This comment has been minimized.

Contributor

ZyX-I commented Sep 12, 2017

The KLEE finally finished, no errors in lexer found. For some reason I did not have time automatically reported (zsh has a feature which makes it automatically act as if command was run like time … if it took more then $REPORTTIME seconds to finish which I have set to 5, with some other exceptions which should not apply to my script), but first file generated by the script (build/klee/a.bc) states to be created Sep 12 01:21, the last one (build/klee/out/info) states Sep 12 16:04: almost 15h.

@ZyX-I

This comment has been minimized.

Contributor

ZyX-I commented Sep 12, 2017

Total generated 53 873 “test cases”, 221 MiB according to du -hs. Generated directory is greatly compressable: attached tar.xz archive is only 1.3 MiB. Archive attached in case somebody is interested what KLEE generates, though it is not very interesting: lots of “test cases”, LLVM assembly used (plus bitcode), three things looking like log files, some coverage data in text format (not very readable) and KLEE own profiling data (rather top level, KLEE was being used not for profiling KLEE after all).

klee.tar.xz.not.zip

@justinmk

This comment has been minimized.

Member

justinmk commented Sep 15, 2017

@ZyX-I so you plan to expose the AST somehow, e.g. as a dictionary , to plugins? Or is it possible?

@ZyX-I

This comment has been minimized.

Contributor

ZyX-I commented Sep 17, 2017

I plan to expose it, but only AST (not lexer). In the future - via two functions, one for parsing expressions exclusively, like eval() (it has some differences regarding newline handling), another for parsing Ex commands.

And only via API, relying on the VimL function generator for VimL. Plus parser may be exposed via ASYNC function.

@@ -452,6 +464,8 @@ LexExprToken viml_pexpr_next_token(ParserState *const pstate, const bool peek)
//
// Used highlighting groups and assumed linkage:
//
// NVimInternalError -> highlight as fg:red/bg:red

This comment has been minimized.

@oni-link

oni-link Sep 18, 2017

Contributor

Is this correct, fg and bg having the same color?

This comment has been minimized.

@ZyX-I

ZyX-I Sep 18, 2017

Contributor

As an internal error it should be something which will make user report it. Given that I do not want to spawn disturbing errors “invisible”, outstanding character is a good alternative.

This comment has been minimized.

@ZyX-I

ZyX-I Sep 18, 2017

Contributor

Note that this is also the only group which I did not bother to (plan to) link to some preexisting group.

@ZyX-I ZyX-I force-pushed the ZyX-I:expression-parser branch 3 times, most recently from 32a84f9 to 9ccafa8 Sep 18, 2017

@ZyX-I

This comment has been minimized.

Contributor

ZyX-I commented Sep 24, 2017

Got working figure braces (dictionary, curly-braces-names in value position, lambdas, but not complex curly-braces-names setups like foo{"bar"}) and commas (minus multiargument calls: should be working but not tested yet); operator priority code looks like something working as well (minus “no associativity” part). Most problematic parts to code in parser should be coded now, though there are still challenges like highlighting of string literals, floating-point numbers, ternary operator or non-associative operators (and, of course, making that pass linter which will not be happy with a big function out there).

And no completion code yet, though I have an idea how to code it which needs only minor modifications to what is already there.

@ZyX-I ZyX-I force-pushed the ZyX-I:expression-parser branch 2 times, most recently from 8e677d4 to 549be54 Sep 25, 2017

@oni-link

This comment has been minimized.

Contributor

oni-link commented Sep 27, 2017

Error while building with ninja:

CMake Error at /home/oni-link/git/neovim/cmake/RunXgettext.cmake:13 (message):
  xgettext failed to run correctly: /usr/bin/xgettext: Non-ASCII string at
  ../viml/parser/expressions.c:1274.

                     Please specify the source encoding through --from-code.

Message contains :

                ERROR_FROM_NODE_AND_MSG(
                    new_top_node,
                    _("E15: Don’t know what figure brace means: %.*s"));

ZyX-I added some commits Sep 28, 2017

test/helpers: Add format_string and format_luav
First intended to provide %r functionality like in Python (and also support for 
%*.*s, but this was not checked), second adds nice table formatting for use in 
cases similar to screen:snapshot_util().

@ZyX-I ZyX-I force-pushed the ZyX-I:expression-parser branch 3 times, most recently from 8c06e1d to c0f3304 Sep 28, 2017

@ZyX-I

This comment has been minimized.

Contributor

ZyX-I commented Oct 2, 2017

Got working lambdas, dictionaries, curly braces names, function calls and nesting parenthesis, ternary operator, comparisons, unary/binary plus, operator associativity in general.

What is left of specifically the parser

  • Dot handling.
  • String literals handling.
  • Finishing rest of “not interesting” operators.
  • Determining completion context.
  • Making all that pass the linter (it will not be happy to see that huge parser function).
  • KLEE check (going to be very time-consuming; currently rerunning it for the lexer).

ZyX-I added some commits Nov 29, 2017

keymap: Do not use vim_isIDc in keymap.c
Note: there are three changes to ascii_isident. Reverting first two (in 
find_special_key and first in get_special_key_code) normally fails the new test 
with empty &isident, but reverting the third does not. Hence adding `>` to 
&isident.

Ref vim/vim#2389.

@ZyX-I ZyX-I force-pushed the ZyX-I:expression-parser branch from 277ecef to 0b4054e Nov 30, 2017

@ZyX-I

This comment has been minimized.

Contributor

ZyX-I commented Nov 30, 2017

Got that after building libnvim-test on travis:

+asan_check /home/travis/build/neovim/neovim/build/log
+check_logs /home/travis/build/neovim/neovim/build/log '*san.*'
++find /home/travis/build/neovim/neovim/build/log -type f -name '*san.*'
+for log in '$(find "${1}" -type f -name "${2}")'
+sed -i /home/travis/build/neovim/neovim/build/log/ubsan.28775 -e '/Warning: noted but unhandled ioctl/d' -e '/could cause spurious value errors to appear/d' -e '/See README_MISSING_SYSCALL_OR_IOCTL for guidance/d'
+local err=
++find /home/travis/build/neovim/neovim/build/log -type f -name '*san.*' -size +0
+for log in '$(find "${1}" -type f -name "${2}" -size +0)'
+cat /home/travis/build/neovim/neovim/build/log/ubsan.28775
==28775==LeakSanitizer has encountered a fatal error.
==28775==HINT: For debugging, try setting environment variable LSAN_OPTIONS=verbosity=1:log_threads=1
==28775==HINT: LeakSanitizer does not work under ptrace (strace, gdb, etc)

What is that?

In any case, I am restarting travis under assumption that problem is temporary. Other CIs succeeded.

@ZyX-I

This comment has been minimized.

Contributor

ZyX-I commented Nov 30, 2017

With verboser logging:

+cat /home/travis/build/neovim/neovim/build/log/ubsan.28765
==28765==AddressSanitizer: failed to intercept '__isoc99_printf'
==28765==AddressSanitizer: failed to intercept '__isoc99_sprintf'
==28765==AddressSanitizer: failed to intercept '__isoc99_snprintf'
==28765==AddressSanitizer: failed to intercept '__isoc99_fprintf'
==28765==AddressSanitizer: failed to intercept '__isoc99_vprintf'
==28765==AddressSanitizer: failed to intercept '__isoc99_vsprintf'
==28765==AddressSanitizer: failed to intercept '__isoc99_vsnprintf'
==28765==AddressSanitizer: failed to intercept '__isoc99_vfprintf'
==28765==AddressSanitizer: failed to intercept '__cxa_throw'
==28765==AddressSanitizer: libc interceptors initialized
|| `[0x10007fff8000, 0x7fffffffffff]` || HighMem    ||
|| `[0x02008fff7000, 0x10007fff7fff]` || HighShadow ||
|| `[0x00008fff7000, 0x02008fff6fff]` || ShadowGap  ||
|| `[0x00007fff8000, 0x00008fff6fff]` || LowShadow  ||
|| `[0x000000000000, 0x00007fff7fff]` || LowMem     ||
MemToShadow(shadow): 0x00008fff7000 0x000091ff6dff 0x004091ff6e00 0x02008fff6fff
redzone=16
max_redzone=2048
quarantine_size_mb=256M
thread_local_quarantine_size_kb=1024K
malloc_context_size=30
SHADOW_SCALE: 3
SHADOW_GRANULARITY: 8
SHADOW_OFFSET: 0x7fff8000
==28765==Installed the sigaction for signal 11
==28765==Installed the sigaction for signal 7
==28765==Installed the sigaction for signal 8
==28765==T0: stack [0x7ffdf1066000,0x7ffdf1866000) size 0x800000; local=0x7ffdf1864260
==28765==LeakSanitizer: Dynamic linker not found. TLS will not be handled correctly.
==28765==AddressSanitizer Init done
==28766==Could not attach to thread 28765 (errno 1).
==28766==Failed suspending threads.
==28765==LeakSanitizer has encountered a fatal error.
==28765==HINT: For debugging, try setting environment variable LSAN_OPTIONS=verbosity=1:log_threads=1
==28765==HINT: LeakSanitizer does not work under ptrace (strace, gdb, etc)

ZyX-I added some commits Dec 3, 2017

Merge branch 'master' into expression-parser
Hoping that could fix the LSAN issue: no idea what it is talking about.
Revert "fix! set lsan options"
This reverts commit 6299332.
@ZyX-I

This comment has been minimized.

Contributor

ZyX-I commented Dec 3, 2017

The problem somehow solved itself.

@ZyX-I

This comment has been minimized.

Contributor

ZyX-I commented Dec 3, 2017

New build failure looks unrelated:

./test/functional/helpers.lua:268: 
retry() attempts: 2
./test/functional/ui/screen.lua:302: Row 2 did not match.
Expected:
  |{1: }                                                 |
  |*lost                                              |
  |gained                                            |
  |{4:~                                                 }|
  |{5:[No Name] [+]                                     }|
  |:                                                 |
  |{3:-- TERMINAL --}                                    |
Actual:
  |{1: }                                                 |
  |*gained                                            |
  |lost                                              |
  |gained                                            |
  |{5:[No Name] [+]                                     }|
  |:                                                 |
  |{3:-- TERMINAL --}                                    |

To print the expect() call that would assert the current screen state, use
screen:snapshot_util(). In case of non-deterministic failures, use
screen:redraw_debug() to show all intermediate screen states.  

stack traceback:
	./test/functional/helpers.lua:268: in function 'retry'
	...uild/neovim/neovim/test/functional/terminal/tui_spec.lua:300: in function <...uild/neovim/neovim/test/functional/terminal/tui_spec.lua:293>
@ZyX-I

This comment has been minimized.

Contributor

ZyX-I commented Dec 6, 2017

CI succeeded.

@justinmk

This comment has been minimized.

Member

justinmk commented Dec 6, 2017

FYI the #7234 (comment) failure is my fault, I will make another change to the TUI startup and see if it continues; if it continues I will change the test to avoid spurious failure.

@@ -10573,5 +10573,124 @@ This is not allowed when the textlock is active:
- closing a window or quitting Vim
- etc.

==============================================================================
13. Command-line expressions coloring *expr-coloring*

This comment has been minimized.

@justinmk

justinmk Dec 8, 2017

Member

@ZyX-I Is there a reason to call it "coloring" instead of "highlighting"?

Edit: I changed this in the merge commit.

@justinmk

This comment has been minimized.

Member

justinmk commented Dec 8, 2017

@ZyX-I what is your plan for command parsing or general VimL parsing? Will this be a new API function that accepts entire sources?

Is there a need for nvim_parse_* or would it make sense to have just one API function: nvim_parse .

ptr++; \
} \
} while (0)
switch (pre) {

This comment has been minimized.

@oni-link

oni-link Dec 8, 2017

Contributor

The switch could be replaced with

vim_str2nr_dec: 
   PARSE_NUMBER(...);
   goto xxx;
vim_str2nr_bin: 
   PARSE_NUMBER(...);
   goto xxx;
vim_str2nr_oct:
   PARSE_NUMBER(...);
   goto xxx;
vim_str2nr_hex:
   PARSE_NUMBER(...);
xxx:

Edit: reordered labels in case what is invalid.

@ZyX-I

This comment has been minimized.

Contributor

ZyX-I commented Dec 8, 2017

@justinmk There are use-cases for separate expression parser in the API other then making it easier to test some things (I mean, test as a plugin developer, Neovim developers are always going to have unit tests available), you may see a list in the documentation of the new function. For VimL parser I plan on exactly the API function accepting entire source, though internally it should be possible to save state and use it later to proceed (needed for alternative syntax highlighting: if we have a parser and highlighter in a C code, why limit highligter to command-line only?).

@justinmk justinmk merged commit fbdc3ac into neovim:master Dec 9, 2017

4 checks passed

QuickBuild Build pr-7234 finished with status SUCCESSFUL
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls First build on master at 77.636%
Details

@justinmk justinmk removed the RFC label Dec 9, 2017

justinmk added a commit that referenced this pull request Dec 9, 2017

@ZyX-I ZyX-I deleted the ZyX-I:expression-parser branch Dec 9, 2017

@justinmk justinmk modified the milestones: 0.2.4, 0.2.3 Mar 21, 2018

justinmk added a commit that referenced this pull request Jun 11, 2018

NVIM v0.3.0
FEATURES:
3cc7ebf #7234 built-in VimL expression parser
6a7c904 #4419 implement <Cmd> key to invoke command in any mode
b836328 #7679 'startup: treat stdin as text instead of commands'
58b210e :digraphs : highlight with hl-SpecialKey #2690
7a13611 #8276 'startup: Let `-s -` read from stdin'
1e71978 events: VimSuspend, VimResume #8280
1e7d5e8 #6272 'stdpath()'
f96d99a #8247 server: introduce --listen
e8c39f7 #8226 insert-mode: interpret unmapped META as ESC
98e7112 msg: do not scroll entire screen (#8088)
f72630b #8055 let negative 'writedelay' show all redraws
5d2dd2e win: has("wsl") on Windows Subsystem for Linux #7330
a4f6cec cmdline: CmdlineEnter and CmdlineLeave autocommands (#7422)
207b7ca #6844 channels: support buffered output and bytes sockets/stdio

API:
f85cbea #7917 API: buffer updates
418abfc #6743 API: list information about all channels/jobs.
36b2e3f #8375 API: nvim_get_commands
273d2cd #8329 API: Make nvim_set_option() update `:verbose set …`
8d40b36 #8371 API: more reliable/descriptive VimL errors
ebb1acb #8353 API: nvim_call_dict_function
9f994bb #8004 API: nvim_list_uis
3405704 #7520 API/UI: forward option updates to UIs
911b1e4 #7821 API: improve nvim_command_output

WINDOWS OS:
9cefd83 #8084, #8516 build/win: support MSVC
ee4e1fd win: Fix reading content from stdin (#8267)

TUI:
ffb8904 #8309 TUI: add support for mouse release events in urxvt
8d5a46e #8081 TUI: implement "standout" attribute
6071637 TUI: support TERM=konsole-256color
67848c0 #7653 TUI: report TUI info with -V3 ('verbose' >= 3)
3d0ee17 TUI/rxvt: enable focus-reporting
d109f56 #7640 TUI: 'term' option: reflect effective terminal behavior

FIXES:
ed6a113 #8273 'job-control: avoid kill-timer race'
4e02f1a #8107 'jobs: separate process-group'
451c48a terminal: flush vterm output buffer on pty output #8486
5d6732f :checkhealth fixes #8335
53f11dc #8218 'Fix errors reported by PVS'
d05712f inccommand: pause :terminal redraws (#8307)
51af911 inccommand: do not execute trailing commands #8256
84359a4 terminal: resize to the max dimensions (#8249)
d49c1dd #8228 Make vim_fgets() return the same values as in Vim
60e96a4 screen: winhl=Normal:Background should not override syntax (#8093)
0c59ac1 #5908 'shada: Also save numbered marks'
ba87a2c cscope: ignore EINTR while reading the prompt (#8079)
b1412dc #7971 ':terminal Enter/Leave should not increment jumplist'
3a5721e TUI: libtermkey: force CSI driver for mouse input #7948
6ff13d7 #7720 TUI: faster startup
1c6e956 #7862 TUI: fix resize-related segfaults
a58c909 #7676 TUI: always hide cursor when flushing, never flush buffers during unibilium output
303e1df #7624 TUI: disable BCE almost always
249bdb0 #7761 mark: Make sure that jumplist item will not have zero lnum
6f41ce0 #7704 macOS: Set $LANG based on the system locale
a043899 #7633 'Retry fgets on EINTR'

CHANGES:
ad60927 #8304 default to 'nofsync'
f3f1970 #8035 defaults: 'fillchars'
a6052c7 #7984 defaults: sidescroll=1
b69fa86 #7888 defaults: enable cscopeverbose
7c4bb23 defaults: do :filetype stuff unless explicitly "off"
2aa308c #5658 'Apply :lmap in macros'
8ce6393 terminal: Leave 'relativenumber' alone (#8360)
e46534b #4486 refactor: Remove maxmem, maxmemtot options
131aad9 win: defaults: 'shellcmdflag', 'shellxquote' #7343
c57d315 #8031 jobwait(): return -2 on interrupt also with timeout
6452831 clipboard: macOS: fallback to tmux if pbcopy is broken #7940
300d365 #7919 Make 'langnoremap' apply directly after a map
ada1956 #7880 'lua/executor: Remove lightuserdata'

INTERNAL:
de0a954 #7806 internal statistics for list impl
dee78a4 #7708 rewrite internal list impl
@justinmk

This comment has been minimized.

Member

justinmk commented on src/nvim/viml/parser/expressions.c in 9e72103 Oct 21, 2018

@ZyX-I can_be_ternary is not really used. Is it needed for future work, or can it be removed?

This comment has been minimized.

Contributor

ZyX-I replied Oct 21, 2018

I am not sure that it is not really used. But it either does serve its purpose right now or can be removed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment