-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updates for purescript compiler v0.12.0 #76
Conversation
joncfoo
commented
Jun 8, 2018
- updated packages to match packages compatible with compiler v0.12.0
- added psc-package.json tagged at v0.12.0
- updated function imports
- removed effect rows
- changed combinator associativity (see Set combinator associativity to match (<|>) for 0.12 breaking changes #74)
- updated packages to match packages compatible with compiler v0.12.0 - added psc-package.json tagged at v0.12.0 - updated function imports - removed effect rows - changed combinator associativity (see #74)
Hey folks! Any updates on this one? Looks like the |
There was some discussion around code units vs code points on |
|
Yes. Maybe this library could support both options, actually. |
Does that have to block this PR though? |
While the concern isn't exactly the same, I'd like to quote my comment from another thread here just for visibility's sake:
|
Update on formatters: open PR relying on this branch, linking here in order to make sure to stay in sync if there are changes here or another update is done instead: |
I'm no longer maintainer of this library. |
Thanks! I went with just making it a "0.12 compatibility" release for now. I think maybe we could support both code units and points though, by making modules under |
Correctly handle UTF-16 surrogate pairs in `String`s. We are not quite making the default `CodePoint`, as was discussed in purescript-contrib#76 (comment) . Rather we are keeping most of the current API and making it work properly with astral Unicode. We keep the `Char` parsers for ergonomic reasons. For example the parser `char :: forall s m. Monad m => Char -> ParserT s m Char`. This parser is usually called with a literal like `char 'a'`. It would be annoying to call this parser with `char (codePointFromChar 'a')`. Add primitive parsers `anyCodePoint` and `satisfyCodePoint` for parsing `CodePoint`s. To make this library handle Unicode correctly, it is necessary to delete the `StringLike` class. `StringLike` has no laws, and during the five years of its life, no-one on Github has ever written another instance of `StringLike`. https://github.com/search?l=&q=StringLike+language%3APureScript&type=code Move `updatePosString` to the `Text.Parsing.Parser.String` module and don't export it. Add the `match` combinator. Change the definition of `whiteSpace` and `skipSpaces` to `Data.CodePoint.Unicode.isSpace`. Move the character class parsers from `Text.Parsing.Parser.Token` module into the `Text.Parsing.Parser.String` module. All prior tests pass with no modifications. Add a few new tests.
Correctly handle UTF-16 surrogate pairs in `String`s. We are not quite making the default `CodePoint`, as was discussed in purescript-contrib#76 (comment) . Rather we are keeping most of the current API and making it work properly with astral Unicode. We keep the `Char` parsers for ergonomic reasons. For example the parser `char :: forall s m. Monad m => Char -> ParserT s m Char`. This parser is usually called with a literal like `char 'a'`. It would be annoying to call this parser with `char (codePointFromChar 'a')`. Add primitive parsers `anyCodePoint` and `satisfyCodePoint` for parsing `CodePoint`s. To make this library handle Unicode correctly, it is necessary to delete the `StringLike` class. `StringLike` has no laws, and during the five years of its life, no-one on Github has ever written another instance of `StringLike`. https://github.com/search?l=&q=StringLike+language%3APureScript&type=code Move `updatePosString` to the `Text.Parsing.Parser.String` module and don't export it. Add the `match` combinator. Change the definition of `whiteSpace` and `skipSpaces` to `Data.CodePoint.Unicode.isSpace`. Move the character class parsers from `Text.Parsing.Parser.Token` module into the `Text.Parsing.Parser.String` module. All prior tests pass with no modifications. Add a few new tests.
Correctly handle UTF-16 surrogate pairs in `String`s. We are not quite making the default `CodePoint`, as was discussed in purescript-contrib#76 (comment) . Rather we are keeping most of the current API and making it work properly with astral Unicode. We keep the `Char` parsers for ergonomic reasons. For example the parser `char :: forall s m. Monad m => Char -> ParserT s m Char`. This parser is usually called with a literal like `char 'a'`. It would be annoying to call this parser with `char (codePointFromChar 'a')`. Add primitive parsers `anyCodePoint` and `satisfyCodePoint` for parsing `CodePoint`s. To make this library handle Unicode correctly, it is necessary to delete the `StringLike` class. `StringLike` has no laws, and during the five years of its life, no-one on Github has ever written another instance of `StringLike`. https://github.com/search?l=&q=StringLike+language%3APureScript&type=code Move `updatePosString` to the `Text.Parsing.Parser.String` module and don't export it. Add the `match` combinator. Change the definition of `whiteSpace` and `skipSpaces` to `Data.CodePoint.Unicode.isSpace`. Move the character class parsers from `Text.Parsing.Parser.Token` module into the `Text.Parsing.Parser.String` module. All prior tests pass with no modifications. Add a few new tests.
Correctly handle UTF-16 surrogate pairs in `String`s. We are not quite making the default `CodePoint`, as was discussed in purescript-contrib#76 (comment) . Rather we are keeping most of the current API and making it work properly with astral Unicode. We keep the `Char` parsers for ergonomic reasons. For example the parser `char :: forall s m. Monad m => Char -> ParserT s m Char`. This parser is usually called with a literal like `char 'a'`. It would be annoying to call this parser with `char (codePointFromChar 'a')`. Add primitive parsers `anyCodePoint` and `satisfyCodePoint` for parsing `CodePoint`s. To make this library handle Unicode correctly, it is necessary to delete the `StringLike` class. `StringLike` has no laws, and during the five years of its life, no-one on Github has ever written another instance of `StringLike`. https://github.com/search?l=&q=StringLike+language%3APureScript&type=code Move `updatePosString` to the `Text.Parsing.Parser.String` module and don't export it. Add the `match` combinator. Change the definition of `whiteSpace` and `skipSpaces` to `Data.CodePoint.Unicode.isSpace`. Move the character class parsers from `Text.Parsing.Parser.Token` module into the `Text.Parsing.Parser.String` module. All prior tests pass with no modifications. Add a few new tests.
Correctly handle UTF-16 surrogate pairs in `String`s. We are not quite making the default `CodePoint`, as was discussed in purescript-contrib#76 (comment) . Rather we are keeping most of the current API and making it work properly with astral Unicode. We keep the `Char` parsers for ergonomic reasons. For example the parser `char :: forall s m. Monad m => Char -> ParserT s m Char`. This parser is usually called with a literal like `char 'a'`. It would be annoying to call this parser with `char (codePointFromChar 'a')`. Add primitive parsers `anyCodePoint` and `satisfyCodePoint` for parsing `CodePoint`s. To make this library handle Unicode correctly, it is necessary to delete the `StringLike` class. `StringLike` has no laws, and during the five years of its life, no-one on Github has ever written another instance of `StringLike`. https://github.com/search?l=&q=StringLike+language%3APureScript&type=code Move `updatePosString` to the `Text.Parsing.Parser.String` module and don't export it. Add the `match` combinator. Change the definition of `whiteSpace` and `skipSpaces` to `Data.CodePoint.Unicode.isSpace`. Move the character class parsers from `Text.Parsing.Parser.Token` module into the `Text.Parsing.Parser.String` module. All prior tests pass with no modifications. Add a few new tests.
Correctly handle UTF-16 surrogate pairs in `String`s. We are not quite making the default `CodePoint`, as was discussed in purescript-contrib#76 (comment) . Rather we are keeping most of the current API and making it work properly with astral Unicode. We keep the `Char` parsers for ergonomic reasons. For example the parser `char :: forall s m. Monad m => Char -> ParserT s m Char`. This parser is usually called with a literal like `char 'a'`. It would be annoying to call this parser with `char (codePointFromChar 'a')`. Add primitive parsers `anyCodePoint` and `satisfyCodePoint` for parsing `CodePoint`s. To make this library handle Unicode correctly, it is necessary to delete the `StringLike` class. `StringLike` has no laws, and during the five years of its life, no-one on Github has ever written another instance of `StringLike`. https://github.com/search?l=&q=StringLike+language%3APureScript&type=code Move `updatePosString` to the `Text.Parsing.Parser.String` module and don't export it. Add the `match` combinator. Change the definition of `whiteSpace` and `skipSpaces` to `Data.CodePoint.Unicode.isSpace`. Move the character class parsers from `Text.Parsing.Parser.Token` module into the `Text.Parsing.Parser.String` module. All prior tests pass with no modifications. Add a few new tests.
Correctly handle UTF-16 surrogate pairs in `String`s. We are not quite making the default `CodePoint`, as was discussed in purescript-contrib#76 (comment) . Rather we are keeping most of the current API and making it work properly with astral Unicode. We keep the `Char` parsers for ergonomic reasons. For example the parser `char :: forall s m. Monad m => Char -> ParserT s m Char`. This parser is usually called with a literal like `char 'a'`. It would be annoying to call this parser with `char (codePointFromChar 'a')`. Add primitive parsers `anyCodePoint` and `satisfyCodePoint` for parsing `CodePoint`s. To make this library handle Unicode correctly, it is necessary to delete the `StringLike` class. `StringLike` has no laws, and during the five years of its life, no-one on Github has ever written another instance of `StringLike`. https://github.com/search?l=&q=StringLike+language%3APureScript&type=code Move `updatePosString` to the `Text.Parsing.Parser.String` module and don't export it. Add the `match` combinator. Change the definition of `whiteSpace` and `skipSpaces` to `Data.CodePoint.Unicode.isSpace`. Move the character class parsers from `Text.Parsing.Parser.Token` module into the `Text.Parsing.Parser.String` module. All prior tests pass with no modifications. Add a few new tests.
Correctly handle UTF-16 surrogate pairs in `String`s. All prior tests pass with no modifications. Add a few new tests. Non-breaking changes ==================== Add primitive parsers `anyCodePoint` and `satisfyCodePoint` for parsing `CodePoint`s. Add the `match` combinator. Move `updatePosString` to the `Text.Parsing.Parser.String` module and don't export it. Breaking changes ================ Change the definition of `whiteSpace` and `skipSpaces` to `Data.CodePoint.Unicode.isSpace`. Move the character class parsers from `Text.Parsing.Parser.Token` module into the `Text.Parsing.Parser.String` module. To make this library handle Unicode correctly, it is necessary to either alter the `StringLike` class or delete it. We decided to delete it. The `String` module will now operate only on inputs of the concrete `String` type. `StringLike` has no laws, and during the five years of its life, no-one on Github has ever written another instance of `StringLike`. https://github.com/search?l=&q=StringLike+language%3APureScript&type=code The last time someone tried to alter `StringLike`, this is what happened: purescript-contrib#62 Breaking changes which won’t be caught by the compiler ====================================================== Fundamentally, we change the way we consume the next input character from `Data.String.CodeUnits.uncons` to `Data.String.CodePoints.uncons`. `anyChar` will no longer always succeed. It will only succeed on a Basic Multilingual Plane character. The new parser `anyCodePoint` will always succeed. We are not quite “making the default `CodePoint`”, as was discussed in purescript-contrib#76 (comment) . Rather we are keeping most of the current API and making it work properly with astral Unicode. We keep the `Char` parsers for backward compatibility. We also keep the `Char` parsers for ergonomic reasons. For example the parser `char :: forall s m. Monad m => Char -> ParserT s m Char`. This parser is usually called with a literal like `char 'a'`. It would be annoying to call this parser with `char (codePointFromChar 'a')`. Benchmarks ========== For Unicode correctness, we're now consuming characters with `Data.String.CodePoints.uncons` instead of `Data.String.CodeUnits.uncons`. If that were going to effect performance, then the effect would show up in the `runParser parse23` benchmark, but it doesn’t. Before ------ ``` runParser parse23 mean = 43.36 ms stddev = 6.75 ms min = 41.12 ms max = 124.65 ms runParser parseSkidoo mean = 22.53 ms stddev = 3.86 ms min = 21.40 ms max = 61.76 ms ``` After ----- ``` runParser parse23 mean = 42.90 ms stddev = 6.01 ms min = 40.97 ms max = 115.74 ms runParser parseSkidoo mean = 22.03 ms stddev = 2.79 ms min = 20.78 ms max = 53.34 ms ```
Correctly handle UTF-16 surrogate pairs in `String`s. All prior tests pass with no modifications. Add a few new tests. Non-breaking changes ==================== Add primitive parsers `anyCodePoint` and `satisfyCodePoint` for parsing `CodePoint`s. Add the `match` combinator. Move `updatePosString` to the `Text.Parsing.Parser.String` module and don't export it. Breaking changes ================ Change the definition of `whiteSpace` and `skipSpaces` to `Data.CodePoint.Unicode.isSpace`. Move the character class parsers from `Text.Parsing.Parser.Token` module into the `Text.Parsing.Parser.String` module. To make this library handle Unicode correctly, it is necessary to either alter the `StringLike` class or delete it. We decided to delete it. The `String` module will now operate only on inputs of the concrete `String` type. `StringLike` has no laws, and during the five years of its life, no-one on Github has ever written another instance of `StringLike`. https://github.com/search?l=&q=StringLike+language%3APureScript&type=code The last time someone tried to alter `StringLike`, this is what happened: purescript-contrib#62 Breaking changes which won’t be caught by the compiler ====================================================== Fundamentally, we change the way we consume the next input character from `Data.String.CodeUnits.uncons` to `Data.String.CodePoints.uncons`. `anyChar` will no longer always succeed. It will only succeed on a Basic Multilingual Plane character. The new parser `anyCodePoint` will always succeed. We are not quite “making the default `CodePoint`”, as was discussed in purescript-contrib#76 (comment) . Rather we are keeping most of the current API and making it work properly with astral Unicode. We keep the `Char` parsers for backward compatibility. We also keep the `Char` parsers for ergonomic reasons. For example the parser `char :: forall s m. Monad m => Char -> ParserT s m Char`. This parser is usually called with a literal like `char 'a'`. It would be annoying to call this parser with `char (codePointFromChar 'a')`. Benchmarks ========== For Unicode correctness, we're now consuming characters with `Data.String.CodePoints.uncons` instead of `Data.String.CodeUnits.uncons`. If that were going to effect performance, then the effect would show up in the `runParser parse23` benchmark, but it doesn’t. Before ------ ``` runParser parse23 mean = 43.36 ms stddev = 6.75 ms min = 41.12 ms max = 124.65 ms runParser parseSkidoo mean = 22.53 ms stddev = 3.86 ms min = 21.40 ms max = 61.76 ms ``` After ----- ``` runParser parse23 mean = 42.90 ms stddev = 6.01 ms min = 40.97 ms max = 115.74 ms runParser parseSkidoo mean = 22.03 ms stddev = 2.79 ms min = 20.78 ms max = 53.34 ms ```
Correctly handle UTF-16 surrogate pairs in `String`s. All prior tests pass with no modifications. Add a few new tests. Non-breaking changes ==================== Add primitive parsers `anyCodePoint` and `satisfyCodePoint` for parsing `CodePoint`s. Add the `match` combinator. Move `updatePosString` to the `Text.Parsing.Parser.String` module and don't export it. Breaking changes ================ Change the definition of `whiteSpace` and `skipSpaces` to `Data.CodePoint.Unicode.isSpace`. Move the character class parsers from `Text.Parsing.Parser.Token` module into the `Text.Parsing.Parser.String` module. To make this library handle Unicode correctly, it is necessary to either alter the `StringLike` class or delete it. We decided to delete it. The `String` module will now operate only on inputs of the concrete `String` type. `StringLike` has no laws, and during the five years of its life, no-one on Github has ever written another instance of `StringLike`. https://github.com/search?l=&q=StringLike+language%3APureScript&type=code The last time someone tried to alter `StringLike`, this is what happened: purescript-contrib#62 Breaking changes which won’t be caught by the compiler ====================================================== Fundamentally, we change the way we consume the next input character from `Data.String.CodeUnits.uncons` to `Data.String.CodePoints.uncons`. `anyChar` will no longer always succeed. It will only succeed on a Basic Multilingual Plane character. The new parser `anyCodePoint` will always succeed. We are not quite “making the default `CodePoint`”, as was discussed in purescript-contrib#76 (comment) . Rather we are keeping most of the current API and making it work properly with astral Unicode. We keep the `Char` parsers for backward compatibility. We also keep the `Char` parsers for ergonomic reasons. For example the parser `char :: forall s m. Monad m => Char -> ParserT s m Char`. This parser is usually called with a literal like `char 'a'`. It would be annoying to call this parser with `char (codePointFromChar 'a')`. Benchmarks ========== For Unicode correctness, we're now consuming characters with `Data.String.CodePoints.uncons` instead of `Data.String.CodeUnits.uncons`. If that were going to effect performance, then the effect would show up in the `runParser parse23` benchmark, but it doesn’t. Before ------ ``` runParser parse23 mean = 43.36 ms stddev = 6.75 ms min = 41.12 ms max = 124.65 ms runParser parseSkidoo mean = 22.53 ms stddev = 3.86 ms min = 21.40 ms max = 61.76 ms ``` After ----- ``` runParser parse23 mean = 42.90 ms stddev = 6.01 ms min = 40.97 ms max = 115.74 ms runParser parseSkidoo mean = 22.03 ms stddev = 2.79 ms min = 20.78 ms max = 53.34 ms ```
Correctly handle UTF-16 surrogate pairs in `String`s. We keep all of the API, but we change the primitive parsers so that instead of succeeding and incorrectly returning half of a surrogate pair, they will fail. All prior tests pass with no modifications. Add a few new tests. Non-breaking changes ==================== Add primitive parsers `anyCodePoint` and `satisfyCodePoint` for parsing `CodePoint`s. Add the `match` combinator. Move `updatePosString` to the `Text.Parsing.Parser.String` module and don't export it. Split dev dependencies into spago-dev.dhall. Add benchmark suite. Add astral UTF-16 test. Breaking changes ================ Change the definition of `whiteSpace` and `skipSpaces` to `Data.CodePoint.Unicode.isSpace`. To make this library handle Unicode correctly, it is necessary to either alter the `StringLike` class or delete it. We decided to delete it. The `String` module will now operate only on inputs of the concrete `String` type. `StringLike` has no laws, and during the five years of its life, no-one on Github has ever written another instance of `StringLike`. https://github.com/search?l=&q=StringLike+language%3APureScript&type=code The last time someone tried to alter `StringLike`, this is what happened: purescript-contrib#62 Breaking changes which won’t be caught by the compiler ====================================================== Fundamentally, we change the way we consume the next input character from `Data.String.CodeUnits.uncons` to `Data.String.CodePoints.uncons`. `anyChar` will no longer always succeed. It will only succeed on a Basic Multilingual Plane character. The new parser `anyCodePoint` will always succeed. We are not quite “making the default `CodePoint`”, as was discussed in purescript-contrib#76 (comment) . Rather we are keeping most of the current API and making it work properly with astral Unicode. We keep the `Char` parsers for backward compatibility. We also keep the `Char` parsers for ergonomic reasons. For example the parser `char :: forall s m. Monad m => Char -> ParserT s m Char`. This parser is usually called with a literal like `char 'a'`. It would be annoying to call this parser with `char (codePointFromChar 'a')`. Benchmarks ========== For Unicode correctness, we're now consuming characters with `Data.String.CodePoints.uncons` instead of `Data.String.CodeUnits.uncons`. If that were going to effect performance, then the effect would show up in the `runParser parse23` benchmark, but it doesn’t. Before ------ ``` runParser parse23 mean = 43.36 ms stddev = 6.75 ms min = 41.12 ms max = 124.65 ms runParser parseSkidoo mean = 22.53 ms stddev = 3.86 ms min = 21.40 ms max = 61.76 ms ``` After ----- ``` runParser parse23 mean = 42.90 ms stddev = 6.01 ms min = 40.97 ms max = 115.74 ms runParser parseSkidoo mean = 22.03 ms stddev = 2.79 ms min = 20.78 ms max = 53.34 ms ```
Correctly handle UTF-16 surrogate pairs in `String`s. We keep all of the API, but we change the primitive parsers so that instead of succeeding and incorrectly returning half of a surrogate pair, they will fail. All prior tests pass with no modifications. Add a few new tests. Non-breaking changes ==================== Add primitive parsers `anyCodePoint` and `satisfyCodePoint` for parsing `CodePoint`s. Add the `match` combinator. Move `updatePosString` to the `Text.Parsing.Parser.String` module and don't export it. Split dev dependencies into spago-dev.dhall. Add benchmark suite. Add astral UTF-16 test. Breaking changes ================ Change the definition of `whiteSpace` and `skipSpaces` to `Data.CodePoint.Unicode.isSpace`. To make this library handle Unicode correctly, it is necessary to either alter the `StringLike` class or delete it. We decided to delete it. The `String` module will now operate only on inputs of the concrete `String` type. `StringLike` has no laws, and during the five years of its life, no-one on Github has ever written another instance of `StringLike`. https://github.com/search?l=&q=StringLike+language%3APureScript&type=code The last time someone tried to alter `StringLike`, this is what happened: purescript-contrib#62 Breaking changes which won’t be caught by the compiler ====================================================== Fundamentally, we change the way we consume the next input character from `Data.String.CodeUnits.uncons` to `Data.String.CodePoints.uncons`. `anyChar` will no longer always succeed. It will only succeed on a Basic Multilingual Plane character. The new parser `anyCodePoint` will always succeed. We are not quite “making the default `CodePoint`”, as was discussed in purescript-contrib#76 (comment) . Rather we are keeping most of the current API and making it work properly with astral Unicode. We keep the `Char` parsers for backward compatibility. We also keep the `Char` parsers for ergonomic reasons. For example the parser `char :: forall s m. Monad m => Char -> ParserT s m Char`. This parser is usually called with a literal like `char 'a'`. It would be annoying to call this parser with `char (codePointFromChar 'a')`. Benchmarks ========== For Unicode correctness, we're now consuming characters with `Data.String.CodePoints.uncons` instead of `Data.String.CodeUnits.uncons`. If that were going to effect performance, then the effect would show up in the `runParser parse23` benchmark, but it doesn’t. Before ------ ``` runParser parse23 mean = 43.36 ms stddev = 6.75 ms min = 41.12 ms max = 124.65 ms runParser parseSkidoo mean = 22.53 ms stddev = 3.86 ms min = 21.40 ms max = 61.76 ms ``` After ----- ``` runParser parse23 mean = 42.90 ms stddev = 6.01 ms min = 40.97 ms max = 115.74 ms runParser parseSkidoo mean = 22.03 ms stddev = 2.79 ms min = 20.78 ms max = 53.34 ms ```