Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature-request] emit line and column offsets in json output. #991

Open
skywind3000 opened this issue May 3, 2018 · 12 comments
Open

[feature-request] emit line and column offsets in json output. #991

skywind3000 opened this issue May 3, 2018 · 12 comments

Comments

@skywind3000
Copy link

ale (Asynchronous Lint Engine) is a famous vim 8 dedicated plugin which is used widely after vim8 released.

It uses textlint and write-good for text files linting. and I suppose it is a pity that LanguageTool can't be used in ale.

So, I decided to write ale extension for LanguageTool, the extension must be written in VimScript and will parse the output of LanguageTool.

There are 3 output formats in LanguageTool:

  1. readable format (default)
  2. json (by --json)
  3. xml (by --api)

XML is the best which have all the information I want, but parsing XML is too complex in VimScript, or it requires introducing a lot of dependences. The human readable format missed some key elements such as text range (from which column to which column).

Parsing json is much easier in VimScript, vim provides json_decode()/json_encode() functions for VimScript. but when I check the actual json output from LanguageTool, I can't find some key like fromx, fromy, tox and toy in xml format, the json output just have a offset field which is counted by characters.

Can we have the new fields of fromx, fromy, tox and toy in json output ? If so, it will be much easier to intergrate languagetool in vimscript.

As we know, parsing json is much easier in many programming language than parsing XML. Including more information in json format will help LanguageTool integrated with other languages.

Do you think it is a reasonable point to include row and line numbers in json output ?

@danielnaber
Copy link
Member

I chose not to include the information for several reasons:

  • it is buggy, i.e. there are off-by-one issues
  • I don't see why character based information is not enough - this should include row/line. In the worst case, you could calculate row/line from that value, can't you?

@skywind3000
Copy link
Author

skywind3000 commented May 3, 2018

VimScript is not as powerful as java, it stores strings in char, I have to count utf-8 bytes to calculate column and line numbers, it will be slow and unmaintenancable. Manipulating the whole text buffer in VimScript in very inefficient, nearly impossible in vimscript and may introduce some python script.

This will bring more side effects. and make LanguageTool difficult to use in vim.

Counting the col/line number may have issues, but there is one bug in LanguageTool, and someday , it could be fixed. But counting col/line in many different application using LanguageTool may cause many different bugs.

Since xml output has line and col fields, what about including line/col in json output and making a caution in documentation ?

@skywind3000
Copy link
Author

skywind3000 commented May 11, 2018

It is really hard to calculate characters in VimScript:

LanguageTool takes \r\n as two characters, but VimScript don't know there is a \r\n or a \n at the end of line. vim has a getbufline api to get lines from a buffer into a list, each line in the list doesn't contain any \n or \r, I don't have enough information to figure out the actual line size in bytes.

And calculating line/column number is impossible.

Could you please just re-consider it ??

@dpelle
Copy link
Member

dpelle commented May 11, 2018

If it's buggy, is there any reason we can't fix it in LT?

My LanguageTool vim plugin [1] at...
https://github.com/dpelle/vim-LanguageTool
... uses fromx, fromy, tox, toy values in LT's xml output and
it does not seem to be buggy. I recall that there were bugs
a long time ago but I don't see them anymore.

My LT does not use vim-8 new asynchronous feature, so
the new plugin from Linwei would be welcome.

@danielnaber
Copy link
Member

I still feel this is more a limitation of vim - an editor that cannot deal with character-based position sounds like a missing feature to me.

@skywind3000
Copy link
Author

skywind3000 commented May 11, 2018

Yes, there is a limitation of vim, so, could you please help vim users ??

@danielnaber
Copy link
Member

Can't this be fixed in vim?

@dpelle
Copy link
Member

dpelle commented May 11, 2018

I checket the json API at...
https://languagetool.org/http-api/swagger-ui/#!/default/post_check
... and it does not really specify what "offset" is in the response.
If I understand correctly, it's an offset in Unicode character from
the beginning of the document. The first character would have
offset=0. Is my interpretation correct?

And even that would still be ambiguous:

I still feel this is more a limitation of vim - an editor that cannot
deal with character-based position sounds like a missing feature to me.

Vim has at least the function byteidx({expr}, {nr}) which
returns the byte index of the {nr}'th character in the string {expr}.
Combining characters are not counted separately (there is also
byteidxcomp(…) to treat combining characters separately.
My vim-LanguageTool plugin uses byteidx(...).

But I agree with @skywind3000
that having LT provides fromx/tox (etc.) can be more convenient.
The JSON API is meant to be used in various softwares
(vim, emacs, etc.) so it should be convenient to use.
If only XML has fromx/tox then it's not an incentive
either to use the newer JSON format of LT. Users will likely
stick with the old XML format.

@danielnaber
Copy link
Member

You have to click on the "Model" link to get a more detailed description. I'm not sure about combining characters. I'd assume that invalid UTF-8 causes a Java exception, but I haven't tested it.

@vigoux
Copy link

vigoux commented Aug 13, 2019

I ran accross the same issue with offset not being really usable in VimScript, and the following solution seems to work for me, it actually places the cursor at the start of the error :

execute 'go ' . (byteidx(system("cat " . expand('%')), offset) + 1)

From there you can get the starting line and column
Gathering the end point is similar, but offset becomes offset + length - 1

Don't know if the solves the issue but it might help @skywind3000

EDIT: I have an even better solution, not moving the cursor:

let l:byte_index = byteidx(system("cat " . expand(myfile)), offset)
let l:line = byte2line(l:byte_index)
let l:col = l:byte_index - line2byte(l:line)

@jcs090218
Copy link

Just want to say, same here in Emacs.

@Britaliope
Copy link

Just to keep you updated, this issue is still blocking the vim integration of LanguageTool to work with LanguageTool 6.0 and higher. I don't know the status of this for emacs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants