Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paragraph separator character causes TS Server to get into bad state #38078

Closed
mjbvz opened this issue Apr 20, 2020 · 10 comments
Closed

Paragraph separator character causes TS Server to get into bad state #38078

mjbvz opened this issue Apr 20, 2020 · 10 comments
Assignees
Labels
External Relates to another program, environment, or user action which we cannot control.

Comments

@mjbvz
Copy link
Contributor

mjbvz commented Apr 20, 2020

From microsoft/vscode#95685

TypeScript Version: 3.9.0-dev.20200420

Search Terms:

  • invisible character
  • paragraph
  • diagnostics / errors

Code
For the JS

var ftext = ["Und", "dann", "eines"];
var a = 1;

Note there is an invisible paragraph separator character after the opening[

  1. In the file, delete the space after the = on the second line.

Bug:
TS starts reporting bogus errors:

Screen Shot 2020-04-20 at 3 31 24 PM

It looks the paragraph character messes up the parsing and causes the error line numbers to be off by one

[Trace  - 22:31:41.88] <semantic> Event received: semanticDiag (0).
Data: {
    "file": "/Users/matb/projects/san/test.js",
    "diagnostics": [
        {
            "start": {
                "line": 2,
                "offset": 8
            },
            "end": {
                "line": 2,
                "offset": 12
            },
            "text": "Cannot find name 'dann'.",
            "code": 2304,
            "category": "error"
        },
        {
            "start": {
                "line": 2,
                "offset": 16
            },
            "end": {
                "line": 2,
                "offset": 21
            },
            "text": "Cannot find name 'eines'.",
            "code": 2304,
            "category": "error"
        }
    ]
}
@mjbvz mjbvz changed the title Paragraph separator character causes TS Server's to get into bad state Paragraph separator character causes TS Server to get into bad state Apr 20, 2020
@DanielRosenwasser DanielRosenwasser added the Bug A bug in TypeScript label Apr 20, 2020
@DanielRosenwasser DanielRosenwasser self-assigned this Apr 20, 2020
@DanielRosenwasser DanielRosenwasser added this to the TypeScript 3.9.1 milestone Apr 20, 2020
@DanielRosenwasser DanielRosenwasser removed the Bug A bug in TypeScript label Apr 21, 2020
@DanielRosenwasser DanielRosenwasser removed their assignment Apr 21, 2020
@DanielRosenwasser DanielRosenwasser removed this from the TypeScript 3.9.1 milestone Apr 21, 2020
@RyanCavanaugh RyanCavanaugh added the External Relates to another program, environment, or user action which we cannot control. label Apr 21, 2020
@DanielRosenwasser
Copy link
Member

In JavaScript, Paragraph Separator is a line terminator; however, in the edits described above, VS Code says that edits are occur on line 2.

From TypeScript's perspective, line 2 occurs immediately after the Paragraph Separator. Ultimately, editors need to account for each language's newline settings if they intent to use line/offset encodings for commands, so I don't think we can fix this on the TypeScript side.

@DanielRosenwasser
Copy link
Member

DanielRosenwasser commented Apr 21, 2020

See also:

@mjbvz
Copy link
Contributor Author

mjbvz commented Apr 21, 2020

Here's the edit we send to TSServer when deleting the space after the = in a = 1:

[Trace  - 23:20:02.22] <syntax> Sending request: updateOpen (12). Response expected: yes. Current queue length: 0
Arguments: {
    "changedFiles": [
        {
            "fileName": "/Users/matb/projects/san/test.ts",
            "textChanges": [
                {
                    "newText": "",
                    "start": {
                        "line": 2,
                        "offset": 8
                    },
                    "end": {
                        "line": 2,
                        "offset": 9
                    }
                }
            ]
        }
    ],
    "closedFiles": [],
    "openFiles": []

The indexes one based so they look correct to me.

@RyanCavanaugh @DanielRosenwasser Does this also match what you see?

@DanielRosenwasser
Copy link
Member

Yeah, that's what I'm seeing as well, but if I understand correctly the edit needs to happen at line 3 because the paragraph separator is a line terminator.

@mjbvz
Copy link
Contributor Author

mjbvz commented Apr 21, 2020

VS Code does not consider the character a line break. Looks like VS does though

@alexdima should be able to say if this is intentional or not. If you have this character in your code and it's not inside a string though, it's probably a bug

@typescript-bot
Copy link
Collaborator

This issue has been marked as 'External' and has seen no recent activity. It has been automatically closed for house-keeping purposes.

@alexdima
Copy link
Member

alexdima commented Apr 25, 2020

Currently, VS Code considers the following characters/sequences to be line terminators: CRLF, CR or LF.

I feel that this is a difficult topic to get wide agreement on, since we need to work with multiple programming languages, each one with its own definition of what a line terminator is. So, if a compiler creates a diagnostic at line 100, it appears that line number is defined according to the programming language's own specification.

From my reading of https://www.unicode.org/reports/tr14/tr14-32.html , Unicode defines a mandatory line break after the following:

  • LB4. Always break after hard line breaks (BK). In the BK class there are the following:
    • 000C - FORM FEED (FF)
    • 000B - LINE TABULATION (VT) - [OPTIONAL]
    • 2028 - LINE SEPARATOR (LS)
    • 2029 - PARAGRAPH SEPARATOR (PS)
  • LB5. Treat CR followed by LF, as well as CR, LF, and NL as hard line breaks:
    • 000Dx000A- CARRIAGE RETURN (CR) x LINE FEED (LF)
    • 000D - CARRIAGE RETURN (CR)
    • 000A - LINE FEED (LF)
    • 0085 - NEXT LINE (NEL)

Here I'm looking at the most popular languages used in VS Code for which I could find a specification or where the decision is not pushed out to specific implementations (looking at you, C++).

It looks like the JS spec Table 33 defines the following as line terminators. I suppose TS uses the same:
image

Interestingly, the C# spec A.1.1 Line terminators defines the same line terminators:
image

HTML however defines newlines as just CR, LF and CRLF:
image

Python also defines just CR, LF and CRLF:
image

PHP also defines just CR, LF and CRLF:
image

Java defines LF, CR and CRLF:
image

YAML section 3.1.4 Line Breaks uses the following:
image

To sum up:

char/seq Unicode JS C# HTML Python PHP Java YAML
CRLF
CR
LF
LS
PS
NEL
FF
VT

I am not sure what would be best, since it would be very difficult to make VS Code end-of-line sequence dependent on the file's language and changing things now might break language extensions which follow our definition.

@dbaeumer Is the end-of-line sequence something that is specified in the Language Server Protocol?

@alexdima
Copy link
Member

Actually, a straight forward way to tackle this would be for us to prompt users and ask for permission to "fix" their files. Most likely, LS, PS or NEL are inserted by accident in files by copy-pasting and they are unwanted there anyways. I've opened microsoft/vscode#96142 to track this.

@dbaeumer
Copy link
Member

@alexdima yes it is specified here: https://microsoft.github.io/language-server-protocol/specifications/specification-current/#textDocuments

@dbaeumer
Copy link
Member

The LSP support \r, \n and \r\n

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
External Relates to another program, environment, or user action which we cannot control.
Projects
None yet
Development

No branches or pull requests

6 participants