Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode string containing "\000" is replaced by null character when being formatted. #2050

Closed
1 of 3 tasks
ds185357 opened this issue Jan 31, 2022 · 2 comments · Fixed by #2051
Closed
1 of 3 tasks

Comments

@ds185357
Copy link
Contributor

Issue created from fantomas-online

Code

type NotNullNotZero =
    static member String() =
        Arb.Default.String()
        |> Arb.mapFilter (fun a -> if isNull a then "" else a) (fun t -> not (t.Contains("\000")))

Result

type NotNullNotZero =
    static member String() =
        Arb.Default.String()
        |> Arb.mapFilter (fun a -> if isNull a then "" else a) (fun t -> not (t.Contains("")))

Problem description

A string containing explicit unicode character "\000" is converted into null character during formatting. The null character itself breaks some text editors when the file is opened.

Extra information

  • The formatted result breaks by code.
  • The formatted result gives compiler warnings.
  • I or my company would be willing to help fix this.

Options

Fantomas version 4.6.0

Default Fantomas configuration with these extra setting:

[*.fs]
max_line_length=120
fsharp_space_before_colon=false
fsharp_strict_mode=false

Did you know that you can ignore files when formatting from fantomas-tool or the FAKE targets by using a .fantomasignore file?

@nojaf
Copy link
Contributor

nojaf commented Jan 31, 2022

Hello,

Thank you for reporting this issue.
I noticed the settings that you've mentioned are actually part of the default ones, so adding those to an .editorconfig is unnecessary. For simplicity sake, you might want to keep those out.

As for the issue, the string content is not being recognized as a trivia item.
In short, we cannot always trust the string content that we find in the AST created by the compiler.
So, we try and detect this using a regex:

let escapedCharacterRegex =
System.Text.RegularExpressions.Regex("(\\\\(a|b|f|n|r|t|u|v|x|'|\\\"|\\\\))+")

Adding |0 there might solve your problem.

Would you be interested in submitting a PR for this?

@ds185357
Copy link
Contributor Author

I have created the PR, but I could use some guidance how to create a better unit test for this fix. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants