Updates for purescript compiler v0.12.0 #76

joncfoo · 2018-06-08T20:31:52Z

updated packages to match packages compatible with compiler v0.12.0
added psc-package.json tagged at v0.12.0
updated function imports
removed effect rows
changed combinator associativity (see Set combinator associativity to match (<|>) for 0.12 breaking changes #74)

- updated packages to match packages compatible with compiler v0.12.0 - added psc-package.json tagged at v0.12.0 - updated function imports - removed effect rows - changed combinator associativity (see #74)

thomashoneyman · 2018-06-13T22:49:48Z

Hey folks! Any updates on this one? Looks like the purescript-formatters update by @justinwoo requires this to be merged / released before it can be updated:

purescript-contrib/purescript-formatters#47

paf31 · 2018-06-15T22:52:45Z

There was some discussion around code units vs code points on string-parsers that is probably relevant here too.

joncfoo · 2018-06-15T22:54:12Z

There was some discussion around code units vs code points on string-parsers that is probably relevant here too.

Is it purescript-contrib/purescript-string-parsers#43 ?

paf31 · 2018-06-15T22:55:57Z

Yes. Maybe this library could support both options, actually.

davezuch · 2018-06-16T20:36:57Z

Does that have to block this PR though?

thomashoneyman · 2018-06-18T19:10:01Z

While the concern isn't exactly the same, I'd like to quote my comment from another thread here just for visibility's sake:

@paf31 Is it possible to release a new version of this library for 0.12 compatibility alone?

It would make for an easier transition for folks like @justinwoo and my team and likely less stress for you if this (and libraries like parsing) were available in their current form for 0.12, and later, when you have time & at your convenience, another update could be made to break Event out to a separate package.

Halogen updated this way and it's been quite nice; they get to relax and take their time working on the next big improvement to the library, but the rest of the dependent ecosystem can complete the transition to 0.12 without relying on forks.

thomashoneyman · 2018-06-18T21:12:11Z

Update on formatters: open PR relying on this branch, linking here in order to make sure to stay in sync if there are changes here or another update is done instead:

purescript-contrib/purescript-formatters#48

paf31 · 2018-06-18T21:15:14Z

I'm no longer maintainer of this library.

garyb · 2018-06-19T13:44:43Z

Thanks! I went with just making it a "0.12 compatibility" release for now. I think maybe we could support both code units and points though, by making modules under String / Token that allow the user to choose what they're interested in. Perhaps the default should be CodePoints, despite the insane overhead, since CodeUnits are kinda "wrong".

Correctly handle UTF-16 surrogate pairs in `String`s. We are not quite making the default `CodePoint`, as was discussed in purescript-contrib#76 (comment) . Rather we are keeping most of the current API and making it work properly with astral Unicode. We keep the `Char` parsers for ergonomic reasons. For example the parser `char :: forall s m. Monad m => Char -> ParserT s m Char`. This parser is usually called with a literal like `char 'a'`. It would be annoying to call this parser with `char (codePointFromChar 'a')`. Add primitive parsers `anyCodePoint` and `satisfyCodePoint` for parsing `CodePoint`s. To make this library handle Unicode correctly, it is necessary to delete the `StringLike` class. `StringLike` has no laws, and during the five years of its life, no-one on Github has ever written another instance of `StringLike`. https://github.com/search?l=&q=StringLike+language%3APureScript&type=code Move `updatePosString` to the `Text.Parsing.Parser.String` module and don't export it. Add the `match` combinator. Change the definition of `whiteSpace` and `skipSpaces` to `Data.CodePoint.Unicode.isSpace`. Move the character class parsers from `Text.Parsing.Parser.Token` module into the `Text.Parsing.Parser.String` module. All prior tests pass with no modifications. Add a few new tests.

Correctly handle UTF-16 surrogate pairs in `String`s. All prior tests pass with no modifications. Add a few new tests. Non-breaking changes ==================== Add primitive parsers `anyCodePoint` and `satisfyCodePoint` for parsing `CodePoint`s. Add the `match` combinator. Move `updatePosString` to the `Text.Parsing.Parser.String` module and don't export it. Breaking changes ================ Change the definition of `whiteSpace` and `skipSpaces` to `Data.CodePoint.Unicode.isSpace`. Move the character class parsers from `Text.Parsing.Parser.Token` module into the `Text.Parsing.Parser.String` module. To make this library handle Unicode correctly, it is necessary to either alter the `StringLike` class or delete it. We decided to delete it. The `String` module will now operate only on inputs of the concrete `String` type. `StringLike` has no laws, and during the five years of its life, no-one on Github has ever written another instance of `StringLike`. https://github.com/search?l=&q=StringLike+language%3APureScript&type=code The last time someone tried to alter `StringLike`, this is what happened: purescript-contrib#62 Breaking changes which won’t be caught by the compiler ====================================================== Fundamentally, we change the way we consume the next input character from `Data.String.CodeUnits.uncons` to `Data.String.CodePoints.uncons`. `anyChar` will no longer always succeed. It will only succeed on a Basic Multilingual Plane character. The new parser `anyCodePoint` will always succeed. We are not quite “making the default `CodePoint`”, as was discussed in purescript-contrib#76 (comment) . Rather we are keeping most of the current API and making it work properly with astral Unicode. We keep the `Char` parsers for backward compatibility. We also keep the `Char` parsers for ergonomic reasons. For example the parser `char :: forall s m. Monad m => Char -> ParserT s m Char`. This parser is usually called with a literal like `char 'a'`. It would be annoying to call this parser with `char (codePointFromChar 'a')`. Benchmarks ========== For Unicode correctness, we're now consuming characters with `Data.String.CodePoints.uncons` instead of `Data.String.CodeUnits.uncons`. If that were going to effect performance, then the effect would show up in the `runParser parse23` benchmark, but it doesn’t. Before ------ ``` runParser parse23 mean = 43.36 ms stddev = 6.75 ms min = 41.12 ms max = 124.65 ms runParser parseSkidoo mean = 22.53 ms stddev = 3.86 ms min = 21.40 ms max = 61.76 ms ``` After ----- ``` runParser parse23 mean = 42.90 ms stddev = 6.01 ms min = 40.97 ms max = 115.74 ms runParser parseSkidoo mean = 22.03 ms stddev = 2.79 ms min = 20.78 ms max = 53.34 ms ```

Correctly handle UTF-16 surrogate pairs in `String`s. We keep all of the API, but we change the primitive parsers so that instead of succeeding and incorrectly returning half of a surrogate pair, they will fail. All prior tests pass with no modifications. Add a few new tests. Non-breaking changes ==================== Add primitive parsers `anyCodePoint` and `satisfyCodePoint` for parsing `CodePoint`s. Add the `match` combinator. Move `updatePosString` to the `Text.Parsing.Parser.String` module and don't export it. Split dev dependencies into spago-dev.dhall. Add benchmark suite. Add astral UTF-16 test. Breaking changes ================ Change the definition of `whiteSpace` and `skipSpaces` to `Data.CodePoint.Unicode.isSpace`. To make this library handle Unicode correctly, it is necessary to either alter the `StringLike` class or delete it. We decided to delete it. The `String` module will now operate only on inputs of the concrete `String` type. `StringLike` has no laws, and during the five years of its life, no-one on Github has ever written another instance of `StringLike`. https://github.com/search?l=&q=StringLike+language%3APureScript&type=code The last time someone tried to alter `StringLike`, this is what happened: purescript-contrib#62 Breaking changes which won’t be caught by the compiler ====================================================== Fundamentally, we change the way we consume the next input character from `Data.String.CodeUnits.uncons` to `Data.String.CodePoints.uncons`. `anyChar` will no longer always succeed. It will only succeed on a Basic Multilingual Plane character. The new parser `anyCodePoint` will always succeed. We are not quite “making the default `CodePoint`”, as was discussed in purescript-contrib#76 (comment) . Rather we are keeping most of the current API and making it work properly with astral Unicode. We keep the `Char` parsers for backward compatibility. We also keep the `Char` parsers for ergonomic reasons. For example the parser `char :: forall s m. Monad m => Char -> ParserT s m Char`. This parser is usually called with a literal like `char 'a'`. It would be annoying to call this parser with `char (codePointFromChar 'a')`. Benchmarks ========== For Unicode correctness, we're now consuming characters with `Data.String.CodePoints.uncons` instead of `Data.String.CodeUnits.uncons`. If that were going to effect performance, then the effect would show up in the `runParser parse23` benchmark, but it doesn’t. Before ------ ``` runParser parse23 mean = 43.36 ms stddev = 6.75 ms min = 41.12 ms max = 124.65 ms runParser parseSkidoo mean = 22.53 ms stddev = 3.86 ms min = 21.40 ms max = 61.76 ms ``` After ----- ``` runParser parse23 mean = 42.90 ms stddev = 6.01 ms min = 40.97 ms max = 115.74 ms runParser parseSkidoo mean = 22.03 ms stddev = 2.79 ms min = 20.78 ms max = 53.34 ms ```

Jonathan Curran added 2 commits June 8, 2018 14:27

Updates for purescript compiler v0.12.0

dcd08bb

- updated packages to match packages compatible with compiler v0.12.0 - added psc-package.json tagged at v0.12.0 - updated function imports - removed effect rows - changed combinator associativity (see #74)

Update package.json

15281cf

garyb merged commit cd1cc9c into purescript-contrib:master Jun 19, 2018

davezuch mentioned this pull request Jun 19, 2018

Update for compiler 0.12 purescript-contrib/purescript-formatters#48

Merged

thomashoneyman mentioned this pull request Jun 19, 2018

Set combinator associativity to match (<|>) for 0.12 breaking changes #74

Closed

joncfoo deleted the update-for-0.12.0 branch June 21, 2018 14:20

jamesdbrock mentioned this pull request Sep 22, 2021

Unicode correctness #119

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updates for purescript compiler v0.12.0 #76

Updates for purescript compiler v0.12.0 #76

joncfoo commented Jun 8, 2018

thomashoneyman commented Jun 13, 2018 •

edited

Loading

paf31 commented Jun 15, 2018

joncfoo commented Jun 15, 2018

paf31 commented Jun 15, 2018

davezuch commented Jun 16, 2018

thomashoneyman commented Jun 18, 2018

thomashoneyman commented Jun 18, 2018

paf31 commented Jun 18, 2018

garyb commented Jun 19, 2018

Updates for purescript compiler v0.12.0 #76

Updates for purescript compiler v0.12.0 #76

Conversation

joncfoo commented Jun 8, 2018

thomashoneyman commented Jun 13, 2018 • edited Loading

paf31 commented Jun 15, 2018

joncfoo commented Jun 15, 2018

paf31 commented Jun 15, 2018

davezuch commented Jun 16, 2018

thomashoneyman commented Jun 18, 2018

thomashoneyman commented Jun 18, 2018

paf31 commented Jun 18, 2018

garyb commented Jun 19, 2018

thomashoneyman commented Jun 13, 2018 •

edited

Loading