New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Update to the experimental msgpack v5 branch and other msgpack-rpc improvements #1130

Merged
merged 25 commits into from Sep 12, 2014

Conversation

Projects
None yet
10 participants
@tarruda
Copy link
Member

tarruda commented Aug 31, 2014

Msgpack v5 spec changes that will impact us:

  • RAW was renamed to STR, which is a byte array containing only utf-8 sequences. This is the reason why some msgpack libraries(eg: node.js) fail to deserialize the API metadata
  • BIN was added to represent arbitrary byte arrays. Since this is a new type code, only clients supporting v5 can understand it.
  • EXT was added to support custom API-specific types. This is great because we can use type codes for automatic client deserialization of Window, Buffer, etc(no need to perform the metadata-signature-name-checking hack currently done in the python client). In particular, I intend to remove typed arrays(BufferArray,WindowArray) which only exist as hint mechanism for python-client. We can support typed arrays in static languages by adding more information to the API metadata(which is cleaner since C doesn't support generics)

The msgpack-c branch supporting v5 is still beta, but I think it's best we start using it for two reasons:

  • Lots of libraries already support v5
  • Neovim is still in alpha state, so it's a good opportunity to help @redboltz test the branch before it's merged into mainline

Here's an overview of things that will be done in this PR:

  • Use the improved msgpack_unpacker_next function
  • Remove integer method ids(We only need one way of doing RPC)
  • Make the method '0' callable through the "get_api_metadata"
  • Use msgpack EXT to serialize special integer types such as Buffer, Window, etc
  • Remove typed arrays. With msgpack EXT we only need a generic array
@@ -91,8 +91,8 @@ if(USE_BUNDLED_MSGPACK)
-DEXPECTED_MD5=${MSGPACK_MD5}
-DTARGET=msgpack
-P ${CMAKE_CURRENT_SOURCE_DIR}/cmake/DownloadAndExtractFile.cmake
CONFIGURE_COMMAND ${DEPS_BUILD_DIR}/src/msgpack/configure --disable-shared
--with-pic --prefix=${DEPS_INSTALL_DIR} CC=${CMAKE_C_COMPILER}
CONFIGURE_COMMAND cmake ${DEPS_BUILD_DIR}/src/msgpack -DMSGPACK_ENABLE_CXX=OFF -DCMAKE_INSTALL_PREFIX=${DEPS_INSTALL_DIR} -DCMAKE_C_COMPILER=${CMAKE_C_COMPILER}

This comment has been minimized.

@tarruda

tarruda Aug 31, 2014

Member

@jszakmeister is there a more idiomatic way of adding subprojects that also build with cmake?

@splinterofchaos

This comment has been minimized.

Copy link
Member

splinterofchaos commented Sep 1, 2014

Why do you use BIN over STR for strings?

I would prefer that if I requested a line from a buffer, neovim sent me a STR, but if I requested the metadata, neovim sent me a BIN to hint that I have to deserialize the data once more since it comes back in an encoded array.

@ZyX-I

This comment has been minimized.

Copy link
Contributor

ZyX-I commented Sep 1, 2014

@splinterofchaos Buffer contains invalid UTF-8. Neovim actions in case of STR?

It is not guaranteed that buffer contains valid UTF-8 sequences and that it contains UTF-8 sequences at all (AFAIK &encoding option was not dropped). So STR must not be used.

@splinterofchaos

This comment has been minimized.

Copy link
Member

splinterofchaos commented Sep 1, 2014

Odd... encoding is a global option rather than buffer local. How do we know what encoding a file should be interpreted with--especially on the client end? Should a client be expected to handle all possible encodings?

It looks like if we wanted to give the client UTF-8 regardless of its internal encoding, we'd just have to send the data to the client through buf_write_bytes(), though I expect there's a good reason why we shouldn't.

Possibly related: #1008

@ZyX-I

This comment has been minimized.

Copy link
Contributor

ZyX-I commented Sep 1, 2014

On September 1, 2014 12:05:26 PM GMT+03:00, Scott Prager notifications@github.com wrote:

Odd... encoding is a global option rather than buffer local. How do
we know what encoding a file should be interpreted with--especially on
the client end? Should a client be expected to handle all possible
encodings?

It looks like if we wanted to give the client UTF-8 regardless of its
internal encoding, we'd just have to send the data to the client
through
buf_write_bytes(),
though I expect there's a good reason why we shouldn't.


Reply to this email directly or view it on GitHub:
#1130 (comment)

You are missing how Vim works. &encoding is the encoding of the internal representation. All text Vim works with has (is considered to have) always this encoding. When buffer is read it is converted to &encoding, so no buffer-local &encoding.

This option is usually set once and forever. It does not have anything to do with buffer own read (e ++enc=..., &fileencodings) and write (&fileencoding) or terminal encodings: this is only internal representation.

@equalsraf

This comment has been minimized.

Copy link
Contributor

equalsraf commented Sep 1, 2014

Assumption: Buffer contents are either UTF8 (i.e. &encoding) or something else which Neovim could not decode (i.e. binary/unsupported encoding)

Buffer related functions should use BIN since the data may not be UTF-8. Plugins would need to decode the bytearrays as UTF8 or treat it as binary depending on the buffer.

What is not clear to me is how Vim/plugins determine when the buffer is not in &encoding - fileencoding does not seem to matter, e.g. when I open a binary file I get utf8 or latin1 if the contents are invalid utf8.

Plugins will eventually detect an error when they attempt to convert to utf8, it just seems redundant since Neovim already had to read the file anyway.

@ZyX-I

This comment has been minimized.

Copy link
Contributor

ZyX-I commented Sep 1, 2014

On September 1, 2014 1:29:21 PM GMT+03:00, Rui Abreu Ferreira notifications@github.com wrote:

Assumption Buffer contents are either UTF8 (i.e. &encoding) or
something else which Neovim could not decode (i.e. binary/unsupported
encoding)

Buffer related functions should use BIN since the data may not be
UTF-8. Plugins would need to decode the bytearrays as UTF8 or treat it
as binary depending on the buffer.

What is not clear to me is how Vim/plugins determine when the buffer is
not in &encoding - fileencoding does not seem to matter, e.g. when I
open a binary file I get utf8 or latin1 if the contents are invalid
utf8.

Plugins will eventually detect an error when they attempt to convert to
utf8, it just seems redundant since Neovim already had to read the file
anyway.


Reply to this email directly or view it on GitHub:
#1130 (comment)

Buffer is in &encoding always (precisely it should always be treated as having &encoding). But it may contain bytes that are not valid in the given encoding.

I have no idea how these invalid bytes are represented if coming from buffer read using some non-utf8 encoding which may have invalid bytes though.

For binary files there is a special case: ++bin sets &binary and disables any encoding conversions, so this particular case is easily detected by checking &l:binary option.

@equalsraf

This comment has been minimized.

Copy link
Contributor

equalsraf commented Sep 1, 2014

Quick testing with the python client. I've generated a small file filled with 0xFF. Vim considers fileencoding as latin1 and displays it as

ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ

The python code got this instead:

'\xc3\xbf\xc3\xbf\xc3\xbf\xc3\xbf\xc3\xbf'

Now if we decode that as utf8 we actually get a python unicode string that makes sense

u'\xff\xff\xff\xff\xff\xff\xff\xff'

And if I send the data back the file is still filled with 0xFF.

So it seems Vim is actually escaping (not sure into what) the invalid binary sequences before passing them on, the client is getting valid utf8.

tarruda referenced this pull request in equalsraf/neovim Sep 1, 2014

Use msgpack 2.0 bin/str types
- In msgpack 2.0 the raw data type no longer exists, instead
  there is a distinction between strings(str) and binary arrays(bin)
  this commit changes all uses of (raw) to (str), with the exception
  of method 0 that returns (bin)
@ZyX-I

This comment has been minimized.

Copy link
Contributor

ZyX-I commented Sep 1, 2014

On September 1, 2014 3:07:25 PM GMT+03:00, Rui Abreu Ferreira notifications@github.com wrote:

Quick testing with the python client. I've generated a small file
filled with 0xFF. Vim considers fileencoding as latin1 and displays it
as

ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ

The python code got this instead:

'\xc3\xbf\xc3\xbf\xc3\xbf\xc3\xbf\xc3\xbf'

Now if we decode that as utf8 we actually get a python unicode string
that makes sense

u'\xff\xff\xff\xff\xff\xff\xff\xff'

Which makes perfect sense.

And if I send the data back the file is still filled with 0xFF.

So it seems Vim is actually escaping (not sure into what) the
invalid binary sequences before passing them on, the client is getting
valid utf8.


Reply to this email directly or view it on GitHub:
#1130 (comment)

You are wrong. 0xFF is valid in latin1. When Vim found invalid utf-8 it falled back to the latin1 as described somewhere (start checking from :h 'fileencodings') and recoded latin1 to utf-8.

To get invalid 0xFF bytes you must open with ++enc=utf8. It is not escaping in any sense, it is conversion. If you had e.g. cp1251 in &fencs in place of latin1 you would have observed different result.

(I am talking about Python2 Vim API though. I have not used Neovim Python client.)

@equalsraf

This comment has been minimized.

Copy link
Contributor

equalsraf commented Sep 1, 2014

You are wrong. 0xFF is valid in latin1. When Vim found invalid utf-8 it falled back to the latin1 as described somewhere (start checking from :h 'fileencodings') and recoded latin1 to utf-8.

I see, thanks, I'm getting an ILLEGAL BYTE message now. Some more testing

For ++enc=utf8

  • Neovim displays a question mark in the position of the invalid chars (?)
  • The python client gets a valid utf8 string (with a question mark in the location of the invalid byte)

But if I do ++enc=utf8 ++bad=keep then

  • Neovim displays <ff>
  • I do get an invalid sequence at the python client, as expected
@tarruda

This comment has been minimized.

Copy link
Member

tarruda commented Sep 1, 2014

From what I understand about the STR type, it is just a hint mechanism used by msgpack, like it was saying: "This byte array probably contains valid utf-8 sequences". So in a msgpack implementation for a language that uses utf-16 or utf-32 to represent strings, the byte array can be safely decoded into the native string type

So assuming a byte array with the STR type doesn't contain a valid utf-8 string(assuming vim doesn't escape anything) the client would get a failure when decoding the data.

So it seems to me it's much safer to simply use the BIN type which would work as expected in languages that use 8-bit clean strings(python2, lua...) and deserialized into "native byte arrays types" for languages that assume an encoding for their normal strings(Buffer for node.js, b'' for python3).

@equalsraf we can still get unicode strings in a python 3 client by first trying to decode it as utf-8, falling back to bytestrings otherwise

@equalsraf

This comment has been minimized.

Copy link
Contributor

equalsraf commented Sep 1, 2014

So it seems to me it's much safer to simply use the BIN type which would work as expected in languages that use 8-bit clean strings(python2, lua...) and deserialized into "native byte arrays types" for languages that assume an encoding for their normal strings(Buffer for node.js, b'' for python3).

I was just trying to figure out if there was already any vim setting that plugins could use to know in advance if it was safe to decode buffer contents as utf8.

This only applies to buffer contents right? The other API methods should be returning utf8 strings. Can we have a Binary type (encoded as msgpack BIN) for this case, and use String(as STR) for the remainder of the API.

@tarruda

This comment has been minimized.

Copy link
Member

tarruda commented Sep 1, 2014

Can we have a Binary type (encoded as msgpack BIN) for this case, and use String(as STR) for the remainder of the API.

Yes this could be useful, but I'd like to leave String as the byte array typ" and create an extra UTF8String type for strings that will contain textual data. The reason is simple: utf-8 strings are a particular case of byte strings, so it makes more sense to assume a string can contain arbitrary data by default.

For reference, here's some pretty solid arguments by @mitsuhiko explaining why assuming everything is unicode(python3 approach) is simply wrong. The article focus on python 3 but it makes a strong point why we shouldn't have a type named Binary

@ZyX-I

This comment has been minimized.

Copy link
Contributor

ZyX-I commented Sep 1, 2014

On September 1, 2014 4:36:42 PM GMT+03:00, Thiago de Arruda notifications@github.com wrote:

From what I understand about the STR type, it is just a hint
mechanism used by msgpack, like it was saying: "This byte array
probably contains valid utf-8 sequences". So in a msgpack
implementation for a language that uses utf-16 or utf-32 to represent
strings, the byte array can be safely decoded into the native string
type

So assuming a byte array with the STR type doesn't contain a valid
utf-8 string(assuming vim doesn't escape anything) the client would get
a failure when decoding the data.

So it seems to me it's much safer to simply use the BIN type which
would work as expected in languages that use 8-bit clean
strings(python2, lua...) and deserialized into "native byte arrays
types" for languages that assume an encoding for their normal
strings(Buffer for node.js, b'' for python3).

@equalsraf we can still get unicode strings in a python 3 client by
first trying to decode it as utf-8, falling back to bytestrings
otherwise

We have not dropped &encoding. So one must try to decode as &encoding, not as utf-8. It is also bad idea to fall back to byte strings: if you want to be compatible throw usual decode error (Vim does exactly this and that is why my APIs use byte strings). If you do not use byte strings always and encourage programmers to deal with errors on their own. Falling back to bytes() only means that Python 3 clients will raise even less expected exceptions in some distant pieces of code then current decode errors when trying to use vim.current.buffer.name or buffer slices.


Reply to this email directly or view it on GitHub:
#1130 (comment)

@ZyX-I

This comment has been minimized.

Copy link
Contributor

ZyX-I commented Sep 1, 2014

On September 1, 2014 4:45:17 PM GMT+03:00, Rui Abreu Ferreira notifications@github.com wrote:

So it seems to me it's much safer to simply use the BIN type which
would work as expected in languages that use 8-bit clean
strings(python2, lua...) and deserialized into "native byte arrays
types" for languages that assume an encoding for their normal
strings(Buffer for node.js, b'' for python3).

I was just trying to figure out if there was already any vim setting
that plugins could use to know in advance if it was safe to decode
buffer contents as utf8.

This only applies to buffer contents right? The other API methods
should be returning utf8 strings. Can we have a Binary type (encoded as
msgpack BIN) for this case, and use String(as STR) for the remainder of
the API.


Reply to this email directly or view it on GitHub:
#1130 (comment)

Wrong! Buffer names are not unicode. Environment variables are not unicode. Every single string from vim.eval is not unicode. I have read this article too and can say that with languages like VimL or software like Vim it quickly becomes obvious once you try to think about possible edge cases.

So I see that STR is hardly ever useful. Usually you use pure ASCII (e.g. for "string bitfields" like &cpoptions or 2nd argument to feedkeys()) or byte string that may be valid utf-8 or may be not.

@tarruda

This comment has been minimized.

Copy link
Member

tarruda commented Sep 1, 2014

Wrong! Buffer names are not unicode. Environment variables are not unicode. Every single string from vim.eval is not unicode. I have read this article too and can say that with languages like VimL or software like Vim it quickly becomes obvious once you try to think about possible edge cases.

That also shows that @frsyuki's first instinct was right. While sometimes it may be useful having a type code for utf-8, I think the correct approach would be to give BIN the same type code as RAW so older msgpack-rpc servers would be compatible with newer msgpack clients.

So I see that STR is hardly ever useful. Usually you use pure ASCII (e.g. for "string bitfields" like &cpoptions or 2nd argument to feedkeys()) or byte string that may be valid utf-8 or may be not.

If that's the case, we'll continue with String for everything and add an UTF8String type later if it's ever needed

@tarruda tarruda force-pushed the tarruda:msgpack-rpc-improvements branch from 478e328 to 073b28b Sep 1, 2014

@splinterofchaos

This comment has been minimized.

Copy link
Member

splinterofchaos commented Sep 1, 2014

I think the correct approach would be to give BIN the same type code as RAW so older msgpack-rpc servers would be compatible with newer msgpack clients.

STR has the same code as RAW. BIN is the new value the current clients won't understand.

msgpack/msgpack-c@d70c44b

I won't get into a discussion about encodings because I know next-to-nothing about it, so I have just a few ignorant thoughts to leave here and I won't try to defend them or claim I'm right.

The clients I've seen thus far have been either atheistic (everything is ASCII) or agnostic (no thought put into encoding). C++ has no UTF-8 support or any sort of encoding outside of locales. The closest it gets is std::wstring, using wchar_t, which if I understand correctly, assumes that all unicode is UTF-16. The C++ version of mgspack-c will expect me to use std::vector<char> to receive BIN transmissions. Whether we call it STR or BIN, the data transmitted does not change.

I would prefer to think of STR for a semantic meaning like "something the user wants to see" (even if displaying a binary file) rather than some technical UTF-8 encoding, which if reading this discussion has taught me anything, it's that you can't bank on UTF-8, ever, anyway. And similarly, BIN to me means "something to hide from the user".

@equalsraf

This comment has been minimized.

Copy link
Contributor

equalsraf commented Sep 1, 2014

Wrong! Buffer names are not unicode. Environment variables are not unicode. Every single string from vim.eval is not unicode.

True they are not, but Neovim is supposed to represent them internally as UTF8, and pass them on as such - and if that is not the case then either fail early and not pass them at all, or make amends (like using ? to replace invalid bytes). So even if the input was not valid UTF8, by the time it gets passed out it actually is.

My expectation was not for all data passed in the API to be encoding aware, I'm fine with bytestrings, and painfully aware of path/shell expansion issues that I don't want to go into. In some cases it seems Neovim already does some mangling to convert to &encoding so I wanted some clarification to try to identify those cases.

That also shows that @frsyuki's first instinct was right. While sometimes it may be useful having a type code for utf-8, I think the correct approach would be to give BIN the same type code as RAW so older msgpack-rpc servers would be compatible with newer msgpack clients.

SideNote: He is right in that the full Unicode approach of python3 got into a lot of weird issues (surrogates being at the top of my list). The problem with Python2 is the complete opposite, you never knew beforehand what you had in your hands when you got a string, and error detection only came too late when you tried to encode it (and late error detection led to some very poor solutions).

@tarruda just leave String as RAW, semantics sound confusing though, maybe rename String as Binary/BinaryString.

@equalsraf

This comment has been minimized.

Copy link
Contributor

equalsraf commented Sep 1, 2014

I would prefer to think of STR for a semantic meaning like "something the user wants to see" rather than some technical UTF-8 encoding, which if reading this discussion has taught me anything, it's that you can't bank on UTF-8, ever, anyway. And similarly, BIN to me means "something to hide from the user".

Yes that would be it. A more specific definition would this string can be printed in encoding X for sure, I have no idea about Y!=X. Unicode representations may help finding out if the string can also be printed in encoding Y, but it doesn't do any miracles.

So for printing purposes it is not unusual to see some string mangling for unsupported chars (square chars, or question marks) - but if you want to write down a string you can't print you have a problem - you need to represent it as binary (e.g. filenames have no encoding, and the shell expands filenames, which means the arguments your program receives can also be binary, same for environment variables).

@ZyX-I

This comment has been minimized.

Copy link
Contributor

ZyX-I commented Sep 1, 2014

On September 1, 2014 7:16:37 PM GMT+03:00, Rui Abreu Ferreira notifications@github.com wrote:

Wrong! Buffer names are not unicode. Environment variables are not
unicode. Every single string from vim.eval is not unicode.

True they are not, but Neovim is supposed to represent them internally
as UTF8, and pass them on as such - and if that is not the case then
either fail early and not pass them at all, or make amends (like using
? to replace invalid bytes). So even if the input was not valid
UTF8, by the time it gets passed out it actually is.

I would say it is supposed to represent them internally in any way it can pass to (f)open or similar functions. Which effectively means that bytes() is the only choice. Names are always treated as binary strings until they have to be displayed, at least on *nix.

Note: there is no ++fnameenc!

My expectation was not for all data passed in the API to be encoding
aware, I'm fine with bytestrings, and painfully aware of path/shell
expansion issues that I don't want to go into. In some cases it seems
Neovim already does some mangling to convert to &encoding so I wanted
some clarification to try to identify those cases.

That also shows that @frsyuki's first instinct was right. While
sometimes it may be useful having a type code for utf-8, I think the
correct approach would be to give BIN the same type code as RAW so
older msgpack-rpc servers would be compatible with newer msgpack
clients.

SideNote: He is right in that the full Unicode approach of python3
got into a lot of weird issues (surrogates being at the top of my
list). The problem with Python2 is the complete opposite, you never
knew beforehand what you had in your hands when you got a string, and
error detection only came too late when you tried to encode it (and
late error detection led to some very poor solutions).

@tarruda just leave String as RAW, semantics sound confusing though,
maybe rename String as Binary/BinaryString.


Reply to this email directly or view it on GitHub:
#1130 (comment)

@myitcv

This comment has been minimized.

Copy link
Member

myitcv commented Sep 2, 2014

👍 for an approach that considers the buffer as a byte array, and an API that reflects this treatment. Related to #1114

@tarruda tarruda force-pushed the tarruda:msgpack-rpc-improvements branch from 90132bf to 6ea9f58 Sep 3, 2014

@tarruda tarruda changed the title [WIP] Update to the experimental msgpack 2.0 branch [WIP] Update to the experimental msgpack 2.0 branch and other msgpack-rpc improvements Sep 3, 2014

@tarruda tarruda force-pushed the tarruda:msgpack-rpc-improvements branch from e5d6dde to ab5c0a3 Sep 3, 2014

equalsraf added a commit to equalsraf/neovim-qt that referenced this pull request Sep 3, 2014

Neovim *Strings* are now QByteArrays not QString
- Following a discussion on neovim/neovim#1130 several methods in the
  API cannot guarantee that data encoding is actually &encoding, for
  now at least all methods are considered to return byte arrays.
- ready() signal waits for &encoding
- new methods encode()/decode() to convert strings using &encoding

@tarruda tarruda force-pushed the tarruda:msgpack-rpc-improvements branch from 6f67d03 to 002753f Sep 3, 2014

tarruda added some commits Sep 9, 2014

api/msgpack-rpc: Refactor msgpack_rpc_helpers.{c,h}
- Move helpers that are specific to API types to api/private/helpers.{c,h}
- Include headers with generated declarations
- Delete unused macros
main: Rename --embedded-mode and --api-msgpack-metadata options
--embedded-mode        -> --embed
--api-msgpack-metadata -> --api-info
wstream: Fix close/free
The current code was leading to an invalid free when the wstream was closed
api/msgpack-rpc: Implement `channel_close` and expose to vimscript
Simple function for closing a channel by id
provider: Major refactor
- Providers for features are now registered as a unit. For example, instead of
  calling `register_provider("clipboard_get")` and
  `register_provider("clipboard_set")`, clients call
  `register_provider("clipboard")` and nvim will assume it implements all
  methods of the "clipboard" feature
- Bootstrapping code was removed. With the `api_spawn` function exposed to
  vimscript, it's no longer necessary and will be handled by plugins
  distributed with nvim.
- Now the `has` function will return true if there's a live channel that
  has registered as a provider for the feature.
- 'initpython'/'initclipboard' options were removed
- A new API function was exposed: `vim_discover_features` which returns an
  object with information about pluggable features such as 'python' or
  'clipboard'

@tarruda tarruda force-pushed the tarruda:msgpack-rpc-improvements branch from f30db13 to 3f7162f Sep 12, 2014

" To load the python host a python 2 executable must be available
if exists('python_interpreter')
\ && executable(g:python_interpreter)
\ && match(system(g:python_interpreter.' --version'), "Python 2") == 0

This comment has been minimized.

@justinmk

justinmk Sep 12, 2014

Member

You could use this instead:

python -c "import sys; print(sys.version_info.major);"

That will return 2 or 3.

This comment has been minimized.

@tarruda

tarruda Sep 12, 2014

Member

👍 cool

tarruda added some commits Sep 10, 2014

api: Implement `vim_report_error` function
This function is used to report errors caused by remote functions called by
channel_send_call
api/msgpack-rpc: Refactor metadata object construction
Instead of building all metadata from msgpack-gen.lua, we now merge the
generated part with manual information(such as types and features). The metadata
is accessible through the api method `vim_get_api_info`.

This was done to simplify the generator while also increasing flexibility(by
being able to add more metadata)
api metadata: Allow typed container information in api functions
Adapt gendeclarations.lua/msgpack-gen.lua to allow the `ArrayOf(...)` and
`DictionaryOf(...)` types in function headers. These are simple macros that
expand to Array and Dictionary respectively, but the information is kept in the
metadata object, which is useful for building clients in statically typed
languages.

@tarruda tarruda force-pushed the tarruda:msgpack-rpc-improvements branch from 106724b to 2a67b84 Sep 12, 2014

@tarruda tarruda merged commit 2a67b84 into neovim:master Sep 12, 2014

1 check passed

continuous-integration/travis-ci The Travis CI build passed
Details

tarruda added a commit that referenced this pull request Sep 12, 2014

@myitcv

This comment has been minimized.

Copy link
Member

myitcv commented Sep 15, 2014

What is receives_channel_id? Might have missed this before

This is actually a private information used by the msgpack-gen.lua
script, I will remove it so it's not exposed publicly

Thanks for confirming.

We did have BufferArray et al for container types - you talked about
removing that but also talked about adding this information back in some
way to the API meta data for languages like Go. Related to your latest
comment, can we see that in a couple of examples? This is an important
point to my mind. Is resolution on this matter part of this issue?

I have changed how this information is available, here's the new json
with api metadata:http://pastebin.com/qkQiAVt9. As you can see, types can
be ArrayOf(...) or DictionaryOf(...) where the first parameter is the
expected type and the second(optional) is the exact number of entries in
the array/dictionary. You can parse that information in order to build a
more strongly typed client-side API

Thanks again for sharing the output. Looks great.

A few questions (which can be moved to a new, more relevant issue if that
makes more sense?)

  • buffer_get_length - I assume this is the length in bytes? More generally,
    would it be an idea to expose common documentation via --api-info? Is there
    somewhere I can help make a start on this?
  • Can I suggest we rename buffer_get_slice to buffer_get_line_slice. Same
    for buffer_set_slice. Reason being I can see us adding a byte-based slice
    methods in the future and also it keeps things consistent with
    buffer_get_line etc
  • Can I also suggest we rename buffer_insert to buffer_insert_lines. Same
    reasoning as before
  • For anything that returns a string (array) is this assumed to be a UTF8
    string? Or is the string essentially an array of bytes (Binary in the
    msgpack spec) which should be interpreted according to the detected
    encoding (per earlier discussion)?
  • Do we want to impose limits on the amount of data transferred for a given
    call? I alluded to a buffer_get_byte_slice method above which might have a
    return type of Binary (per MSGPACK spec). In this case we have an imposed
    limit of 2^32 (which is probably bigger than we would every want to allow;
    we could force the caller into making calls with smaller slice sizes).
    Should we define limits on the neovim API for clarity?

Thoughts?

@tarruda tarruda referenced this pull request Sep 15, 2014

Open

shared p2p state #1180

@tarruda

This comment has been minimized.

Copy link
Member

tarruda commented Sep 15, 2014

  • buffer_get_length - I assume this is the length in bytes? More generally,
    would it be an idea to expose common documentation via --api-info? Is there
    somewhere I can help make a start on this?

That can be done by modifying the generator scripts to extract documentation and embed into the metadata. I can't do it right now but if you want here's what need to be done:

  • Modify gendeclarations.lua to leave function comments in the prototype
  • Modify msgpack-gen.lua to extract the function comments from the prototype, and embed into a new field in each function metadata object

buffer_get_length - I assume this is the length in bytes? More generally,
would it be an idea to expose common documentation via --api-info? Is there
somewhere I can help make a start on this?

Actually that is the line count. I admit the name is misleading and we need to change(I used this name more because len(buffer) in python is the line count)

Can I suggest we rename buffer_get_slice to buffer_get_line_slice. Same
for buffer_set_slice. Reason being I can see us adding a byte-based slice
methods in the future and also it keeps things consistent with
buffer_get_line etc
Can I also suggest we rename buffer_insert to buffer_insert_lines. Same
reasoning as before

👍 This is the time for breaking changes, so feel free to send a PR :)

For anything that returns a string (array) is this assumed to be a UTF8
string? Or is the string essentially an array of bytes (Binary in the
msgpack spec) which should be interpreted according to the detected
encoding (per earlier discussion)?

For now all API functions return/accept byte arrays, there's no concept of an unicode string. Though it's possible to perform decoding/encoding on the client if that is more convenient for a language.

Do we want to impose limits on the amount of data transferred for a given
call? I alluded to a buffer_get_byte_slice method above which might have a
return type of Binary (per MSGPACK spec). In this case we have an imposed
limit of 2^32 (which is probably bigger than we would every want to allow;
we could force the caller into making calls with smaller slice sizes).
Should we define limits on the neovim API for clarity?

I didn't give this much thought because transfering 2^32 bytes of data is certainly going to be slow and very memory intensive regardless of limits(I doubt any plugin ever does that). But yes, 32-bit could be used as a formal limit

You might wanna check out the new internal documentation for ideas regarding client implementation(:help nvim-intro)

@myitcv

This comment has been minimized.

Copy link
Member

myitcv commented Sep 15, 2014

buffer_get_length - I assume this is the length in bytes? More
generally, would it be an idea to expose common documentation via
--api-info? Is there somewhere I can help make a start on this?

That can be done by modifying the generator scripts to extract
documentation and embed into the metadata. I can't do it right now but if
you want here's what need to be done:

Modify gendeclarations.lua to leave function comments in the prototype
Modify msgpack-gen.lua to extract the function comments from the
prototype, and embed into a new field in each function metadata object

OK I'll give this a go. Gives me an excuse to try and learn some Lua.

buffer_get_length - I assume this is the length in bytes? More generally,
would it be an idea to expose common documentation via --api-info? Is
there
somewhere I can help make a start on this?

Actually that is the line count. I admit the name is misleading and we
need to change(I used this name more because len(buffer) in python is the
line count)

As with the following issue I will create a PR when I'm back at a
computer-proper.

Can I suggest we rename buffer_get_slice to buffer_get_line_slice. Same
for buffer_set_slice. Reason being I can see us adding a byte-based slice
methods in the future and also it keeps things consistent with
buffer_get_line etc
Can I also suggest we rename buffer_insert to buffer_insert_lines. Same
reasoning as before

This is the time for breaking changes, so feel free to send a PR :)

Per my comment above. Very happy to do this.

For anything that returns a string (array) is this assumed to be a UTF8
string? Or is the string essentially an array of bytes (Binary in the
msgpack spec) which should be interpreted according to the detected
encoding (per earlier discussion)?

For now all API functions return/accept byte arrays, there's no concept
of an unicode string. Though it's possible to perform decoding/encoding on
the client if that is more convenient for a language.

In which case can I suggest we move to using the MSGPACK type Binary in
--api-info?

https://github.com/msgpack/msgpack/blob/master/spec.md#type-system

I think that more accurately reflects what we're passing around. As you
say, clients can decode as required.

I'll make this part of the PR. We can move discussion there.

Do we want to impose limits on the amount of data transferred for a given
call? I alluded to a buffer_get_byte_slice method above which might have
a
return type of Binary (per MSGPACK spec). In this case we have an imposed
limit of 2^32 (which is probably bigger than we would every want to
allow;
we could force the caller into making calls with smaller slice sizes).
Should we define limits on the neovim API for clarity?

I didn't give this much thought because transfering 2^32 bytes of data is
certainly going to be slow and very memory intensive regardless of limits(I
doubt any plugin ever does that). But yes, 32-bit could be used as a formal
limit

I'll make this a separate PR...along with documentation.

You might wanna check out the new internal documentation for ideas
regarding client implementation(:help nvim-intro)

Will definitely take a look.

Thanks

@jszakmeister

This comment has been minimized.

Copy link
Member

jszakmeister commented on scripts/msgpack-gen.lua in cd2e46c Sep 16, 2014

This looks like a bug. The assert() will discard everything inside the parens, removing the call to msgpack_unpack_next in a release build. Previously, we checked the status code and aborted if it was't MSGPACK_UNPACK_SUCCESS. I believe that logic should be reinstated.

This comment has been minimized.

Copy link
Member

tarruda replied Sep 16, 2014

👍 my bad

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment