Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates for purescript compiler v0.12.0 #76

Merged
merged 2 commits into from
Jun 19, 2018
Merged

Updates for purescript compiler v0.12.0 #76

merged 2 commits into from
Jun 19, 2018

Conversation

joncfoo
Copy link
Contributor

@joncfoo joncfoo commented Jun 8, 2018

Jonathan Curran added 2 commits June 8, 2018 14:27
- updated packages to match packages compatible with compiler v0.12.0
- added psc-package.json tagged at v0.12.0
- updated function imports
- removed effect rows
- changed combinator associativity (see #74)
@thomashoneyman
Copy link
Contributor

thomashoneyman commented Jun 13, 2018

Hey folks! Any updates on this one? Looks like the purescript-formatters update by @justinwoo requires this to be merged / released before it can be updated:

purescript-contrib/purescript-formatters#47

@paf31
Copy link
Contributor

paf31 commented Jun 15, 2018

There was some discussion around code units vs code points on string-parsers that is probably relevant here too.

@joncfoo
Copy link
Contributor Author

joncfoo commented Jun 15, 2018

There was some discussion around code units vs code points on string-parsers that is probably relevant here too.

Is it purescript-contrib/purescript-string-parsers#43 ?

@paf31
Copy link
Contributor

paf31 commented Jun 15, 2018

Yes. Maybe this library could support both options, actually.

@davezuch
Copy link

Does that have to block this PR though?

@thomashoneyman
Copy link
Contributor

While the concern isn't exactly the same, I'd like to quote my comment from another thread here just for visibility's sake:

@paf31 Is it possible to release a new version of this library for 0.12 compatibility alone?

It would make for an easier transition for folks like @justinwoo and my team and likely less stress for you if this (and libraries like parsing) were available in their current form for 0.12, and later, when you have time & at your convenience, another update could be made to break Event out to a separate package.

Halogen updated this way and it's been quite nice; they get to relax and take their time working on the next big improvement to the library, but the rest of the dependent ecosystem can complete the transition to 0.12 without relying on forks.

@thomashoneyman
Copy link
Contributor

Update on formatters: open PR relying on this branch, linking here in order to make sure to stay in sync if there are changes here or another update is done instead:

purescript-contrib/purescript-formatters#48

@paf31
Copy link
Contributor

paf31 commented Jun 18, 2018

I'm no longer maintainer of this library.

@garyb garyb merged commit cd1cc9c into purescript-contrib:master Jun 19, 2018
@garyb
Copy link
Member

garyb commented Jun 19, 2018

Thanks! I went with just making it a "0.12 compatibility" release for now. I think maybe we could support both code units and points though, by making modules under String / Token that allow the user to choose what they're interested in. Perhaps the default should be CodePoints, despite the insane overhead, since CodeUnits are kinda "wrong".

@joncfoo joncfoo deleted the update-for-0.12.0 branch June 21, 2018 14:20
jamesdbrock added a commit to jamesdbrock/purescript-parsing that referenced this pull request Sep 22, 2021
Correctly handle UTF-16 surrogate pairs in `String`s.

We are not quite making the default `CodePoint`, as was discussed in
purescript-contrib#76 (comment) .
Rather we are keeping most of the current API and making it work
properly with astral Unicode.

We keep the `Char` parsers for ergonomic reasons. For example
the parser `char :: forall s m. Monad m => Char -> ParserT s m Char`.
This parser is usually called with a literal like `char 'a'`. It would
be annoying to call this parser with `char (codePointFromChar 'a')`.

Add primitive parsers `anyCodePoint` and `satisfyCodePoint` for parsing
`CodePoint`s.

To make this library handle Unicode correctly, it is necessary to
delete the `StringLike` class. `StringLike` has no laws, and during the
five years of its life, no-one on Github has ever written another
instance of `StringLike`.
https://github.com/search?l=&q=StringLike+language%3APureScript&type=code

Move `updatePosString` to the `Text.Parsing.Parser.String` module and don't
export it.

Add the `match` combinator.

Change the definition of `whiteSpace` and `skipSpaces` to
`Data.CodePoint.Unicode.isSpace`.

Move the character class parsers from `Text.Parsing.Parser.Token` module into
the `Text.Parsing.Parser.String` module.

All prior tests pass with no modifications. Add a few new tests.
@jamesdbrock jamesdbrock mentioned this pull request Sep 22, 2021
jamesdbrock added a commit to jamesdbrock/purescript-parsing that referenced this pull request Sep 22, 2021
Correctly handle UTF-16 surrogate pairs in `String`s.

We are not quite making the default `CodePoint`, as was discussed in
purescript-contrib#76 (comment) .
Rather we are keeping most of the current API and making it work
properly with astral Unicode.

We keep the `Char` parsers for ergonomic reasons. For example
the parser `char :: forall s m. Monad m => Char -> ParserT s m Char`.
This parser is usually called with a literal like `char 'a'`. It would
be annoying to call this parser with `char (codePointFromChar 'a')`.

Add primitive parsers `anyCodePoint` and `satisfyCodePoint` for parsing
`CodePoint`s.

To make this library handle Unicode correctly, it is necessary to
delete the `StringLike` class. `StringLike` has no laws, and during the
five years of its life, no-one on Github has ever written another
instance of `StringLike`.
https://github.com/search?l=&q=StringLike+language%3APureScript&type=code

Move `updatePosString` to the `Text.Parsing.Parser.String` module and don't
export it.

Add the `match` combinator.

Change the definition of `whiteSpace` and `skipSpaces` to
`Data.CodePoint.Unicode.isSpace`.

Move the character class parsers from `Text.Parsing.Parser.Token` module into
the `Text.Parsing.Parser.String` module.

All prior tests pass with no modifications. Add a few new tests.
jamesdbrock added a commit to jamesdbrock/purescript-parsing that referenced this pull request Sep 22, 2021
Correctly handle UTF-16 surrogate pairs in `String`s.

We are not quite making the default `CodePoint`, as was discussed in
purescript-contrib#76 (comment) .
Rather we are keeping most of the current API and making it work
properly with astral Unicode.

We keep the `Char` parsers for ergonomic reasons. For example
the parser `char :: forall s m. Monad m => Char -> ParserT s m Char`.
This parser is usually called with a literal like `char 'a'`. It would
be annoying to call this parser with `char (codePointFromChar 'a')`.

Add primitive parsers `anyCodePoint` and `satisfyCodePoint` for parsing
`CodePoint`s.

To make this library handle Unicode correctly, it is necessary to
delete the `StringLike` class. `StringLike` has no laws, and during the
five years of its life, no-one on Github has ever written another
instance of `StringLike`.
https://github.com/search?l=&q=StringLike+language%3APureScript&type=code

Move `updatePosString` to the `Text.Parsing.Parser.String` module and don't
export it.

Add the `match` combinator.

Change the definition of `whiteSpace` and `skipSpaces` to
`Data.CodePoint.Unicode.isSpace`.

Move the character class parsers from `Text.Parsing.Parser.Token` module into
the `Text.Parsing.Parser.String` module.

All prior tests pass with no modifications. Add a few new tests.
jamesdbrock added a commit to jamesdbrock/purescript-parsing that referenced this pull request Sep 23, 2021
Correctly handle UTF-16 surrogate pairs in `String`s.

We are not quite making the default `CodePoint`, as was discussed in
purescript-contrib#76 (comment) .
Rather we are keeping most of the current API and making it work
properly with astral Unicode.

We keep the `Char` parsers for ergonomic reasons. For example
the parser `char :: forall s m. Monad m => Char -> ParserT s m Char`.
This parser is usually called with a literal like `char 'a'`. It would
be annoying to call this parser with `char (codePointFromChar 'a')`.

Add primitive parsers `anyCodePoint` and `satisfyCodePoint` for parsing
`CodePoint`s.

To make this library handle Unicode correctly, it is necessary to
delete the `StringLike` class. `StringLike` has no laws, and during the
five years of its life, no-one on Github has ever written another
instance of `StringLike`.
https://github.com/search?l=&q=StringLike+language%3APureScript&type=code

Move `updatePosString` to the `Text.Parsing.Parser.String` module and don't
export it.

Add the `match` combinator.

Change the definition of `whiteSpace` and `skipSpaces` to
`Data.CodePoint.Unicode.isSpace`.

Move the character class parsers from `Text.Parsing.Parser.Token` module into
the `Text.Parsing.Parser.String` module.

All prior tests pass with no modifications. Add a few new tests.
jamesdbrock added a commit to jamesdbrock/purescript-parsing that referenced this pull request Sep 23, 2021
Correctly handle UTF-16 surrogate pairs in `String`s.

We are not quite making the default `CodePoint`, as was discussed in
purescript-contrib#76 (comment) .
Rather we are keeping most of the current API and making it work
properly with astral Unicode.

We keep the `Char` parsers for ergonomic reasons. For example
the parser `char :: forall s m. Monad m => Char -> ParserT s m Char`.
This parser is usually called with a literal like `char 'a'`. It would
be annoying to call this parser with `char (codePointFromChar 'a')`.

Add primitive parsers `anyCodePoint` and `satisfyCodePoint` for parsing
`CodePoint`s.

To make this library handle Unicode correctly, it is necessary to
delete the `StringLike` class. `StringLike` has no laws, and during the
five years of its life, no-one on Github has ever written another
instance of `StringLike`.
https://github.com/search?l=&q=StringLike+language%3APureScript&type=code

Move `updatePosString` to the `Text.Parsing.Parser.String` module and don't
export it.

Add the `match` combinator.

Change the definition of `whiteSpace` and `skipSpaces` to
`Data.CodePoint.Unicode.isSpace`.

Move the character class parsers from `Text.Parsing.Parser.Token` module into
the `Text.Parsing.Parser.String` module.

All prior tests pass with no modifications. Add a few new tests.
jamesdbrock added a commit to jamesdbrock/purescript-parsing that referenced this pull request Sep 23, 2021
Correctly handle UTF-16 surrogate pairs in `String`s.

We are not quite making the default `CodePoint`, as was discussed in
purescript-contrib#76 (comment) .
Rather we are keeping most of the current API and making it work
properly with astral Unicode.

We keep the `Char` parsers for ergonomic reasons. For example
the parser `char :: forall s m. Monad m => Char -> ParserT s m Char`.
This parser is usually called with a literal like `char 'a'`. It would
be annoying to call this parser with `char (codePointFromChar 'a')`.

Add primitive parsers `anyCodePoint` and `satisfyCodePoint` for parsing
`CodePoint`s.

To make this library handle Unicode correctly, it is necessary to
delete the `StringLike` class. `StringLike` has no laws, and during the
five years of its life, no-one on Github has ever written another
instance of `StringLike`.
https://github.com/search?l=&q=StringLike+language%3APureScript&type=code

Move `updatePosString` to the `Text.Parsing.Parser.String` module and don't
export it.

Add the `match` combinator.

Change the definition of `whiteSpace` and `skipSpaces` to
`Data.CodePoint.Unicode.isSpace`.

Move the character class parsers from `Text.Parsing.Parser.Token` module into
the `Text.Parsing.Parser.String` module.

All prior tests pass with no modifications. Add a few new tests.
jamesdbrock added a commit to jamesdbrock/purescript-parsing that referenced this pull request Sep 24, 2021
Correctly handle UTF-16 surrogate pairs in `String`s.

We are not quite making the default `CodePoint`, as was discussed in
purescript-contrib#76 (comment) .
Rather we are keeping most of the current API and making it work
properly with astral Unicode.

We keep the `Char` parsers for ergonomic reasons. For example
the parser `char :: forall s m. Monad m => Char -> ParserT s m Char`.
This parser is usually called with a literal like `char 'a'`. It would
be annoying to call this parser with `char (codePointFromChar 'a')`.

Add primitive parsers `anyCodePoint` and `satisfyCodePoint` for parsing
`CodePoint`s.

To make this library handle Unicode correctly, it is necessary to
delete the `StringLike` class. `StringLike` has no laws, and during the
five years of its life, no-one on Github has ever written another
instance of `StringLike`.
https://github.com/search?l=&q=StringLike+language%3APureScript&type=code

Move `updatePosString` to the `Text.Parsing.Parser.String` module and don't
export it.

Add the `match` combinator.

Change the definition of `whiteSpace` and `skipSpaces` to
`Data.CodePoint.Unicode.isSpace`.

Move the character class parsers from `Text.Parsing.Parser.Token` module into
the `Text.Parsing.Parser.String` module.

All prior tests pass with no modifications. Add a few new tests.
jamesdbrock added a commit to jamesdbrock/purescript-parsing that referenced this pull request Sep 24, 2021
Correctly handle UTF-16 surrogate pairs in `String`s.

All prior tests pass with no modifications. Add a few new tests.

Non-breaking changes
====================

Add primitive parsers `anyCodePoint` and `satisfyCodePoint` for parsing
`CodePoint`s.

Add the `match` combinator.

Move `updatePosString` to the `Text.Parsing.Parser.String` module and don't
export it.

Breaking changes
================

Change the definition of `whiteSpace` and `skipSpaces` to
`Data.CodePoint.Unicode.isSpace`.

Move the character class parsers from `Text.Parsing.Parser.Token` module into
the `Text.Parsing.Parser.String` module.

To make this library handle Unicode correctly, it is necessary to
either alter the `StringLike` class or delete it.
We decided to delete it. The `String` module will now operate only
on inputs of the concrete `String` type.
`StringLike` has no laws, and during the five years of its life,
no-one on Github has ever written another instance of `StringLike`.
https://github.com/search?l=&q=StringLike+language%3APureScript&type=code
The last time someone tried to alter `StringLike`, this is what
happened:
purescript-contrib#62

Breaking changes which won’t be caught by the compiler
======================================================

Fundamentally, we change the way we consume the next input character from
`Data.String.CodeUnits.uncons` to `Data.String.CodePoints.uncons`.

`anyChar` will no longer always succeed. It will only succeed on a Basic
Multilingual Plane character. The new parser `anyCodePoint` will always succeed.

We are not quite “making the default `CodePoint`”, as was discussed in
purescript-contrib#76 (comment) .
Rather we are keeping most of the current API and making it work
properly with astral Unicode.

We keep the `Char` parsers for backward compatibility.
We also keep the `Char` parsers for ergonomic reasons. For example
the parser `char :: forall s m. Monad m => Char -> ParserT s m Char`.
This parser is usually called with a literal like `char 'a'`. It would
be annoying to call this parser with `char (codePointFromChar 'a')`.

Benchmarks
==========

For Unicode correctness, we're now consuming characters with
`Data.String.CodePoints.uncons` instead of
`Data.String.CodeUnits.uncons`. If that were going to effect
performance, then the effect would show up in the `runParser parse23`
benchmark, but it doesn’t.

Before
------

```
runParser parse23
mean   = 43.36 ms
stddev = 6.75 ms
min    = 41.12 ms
max    = 124.65 ms

runParser parseSkidoo
mean   = 22.53 ms
stddev = 3.86 ms
min    = 21.40 ms
max    = 61.76 ms
```

After
-----

```
runParser parse23
mean   = 42.90 ms
stddev = 6.01 ms
min    = 40.97 ms
max    = 115.74 ms

runParser parseSkidoo
mean   = 22.03 ms
stddev = 2.79 ms
min    = 20.78 ms
max    = 53.34 ms
```
jamesdbrock added a commit to jamesdbrock/purescript-parsing that referenced this pull request Sep 24, 2021
Correctly handle UTF-16 surrogate pairs in `String`s.

All prior tests pass with no modifications. Add a few new tests.

Non-breaking changes
====================

Add primitive parsers `anyCodePoint` and `satisfyCodePoint` for parsing
`CodePoint`s.

Add the `match` combinator.

Move `updatePosString` to the `Text.Parsing.Parser.String` module and don't
export it.

Breaking changes
================

Change the definition of `whiteSpace` and `skipSpaces` to
`Data.CodePoint.Unicode.isSpace`.

Move the character class parsers from `Text.Parsing.Parser.Token` module into
the `Text.Parsing.Parser.String` module.

To make this library handle Unicode correctly, it is necessary to
either alter the `StringLike` class or delete it.
We decided to delete it. The `String` module will now operate only
on inputs of the concrete `String` type.
`StringLike` has no laws, and during the five years of its life,
no-one on Github has ever written another instance of `StringLike`.
https://github.com/search?l=&q=StringLike+language%3APureScript&type=code
The last time someone tried to alter `StringLike`, this is what
happened:
purescript-contrib#62

Breaking changes which won’t be caught by the compiler
======================================================

Fundamentally, we change the way we consume the next input character from
`Data.String.CodeUnits.uncons` to `Data.String.CodePoints.uncons`.

`anyChar` will no longer always succeed. It will only succeed on a Basic
Multilingual Plane character. The new parser `anyCodePoint` will always succeed.

We are not quite “making the default `CodePoint`”, as was discussed in
purescript-contrib#76 (comment) .
Rather we are keeping most of the current API and making it work
properly with astral Unicode.

We keep the `Char` parsers for backward compatibility.
We also keep the `Char` parsers for ergonomic reasons. For example
the parser `char :: forall s m. Monad m => Char -> ParserT s m Char`.
This parser is usually called with a literal like `char 'a'`. It would
be annoying to call this parser with `char (codePointFromChar 'a')`.

Benchmarks
==========

For Unicode correctness, we're now consuming characters with
`Data.String.CodePoints.uncons` instead of
`Data.String.CodeUnits.uncons`. If that were going to effect
performance, then the effect would show up in the `runParser parse23`
benchmark, but it doesn’t.

Before
------

```
runParser parse23
mean   = 43.36 ms
stddev = 6.75 ms
min    = 41.12 ms
max    = 124.65 ms

runParser parseSkidoo
mean   = 22.53 ms
stddev = 3.86 ms
min    = 21.40 ms
max    = 61.76 ms
```

After
-----

```
runParser parse23
mean   = 42.90 ms
stddev = 6.01 ms
min    = 40.97 ms
max    = 115.74 ms

runParser parseSkidoo
mean   = 22.03 ms
stddev = 2.79 ms
min    = 20.78 ms
max    = 53.34 ms
```
jamesdbrock added a commit to jamesdbrock/purescript-parsing that referenced this pull request Sep 28, 2021
Correctly handle UTF-16 surrogate pairs in `String`s.

All prior tests pass with no modifications. Add a few new tests.

Non-breaking changes
====================

Add primitive parsers `anyCodePoint` and `satisfyCodePoint` for parsing
`CodePoint`s.

Add the `match` combinator.

Move `updatePosString` to the `Text.Parsing.Parser.String` module and don't
export it.

Breaking changes
================

Change the definition of `whiteSpace` and `skipSpaces` to
`Data.CodePoint.Unicode.isSpace`.

Move the character class parsers from `Text.Parsing.Parser.Token` module into
the `Text.Parsing.Parser.String` module.

To make this library handle Unicode correctly, it is necessary to
either alter the `StringLike` class or delete it.
We decided to delete it. The `String` module will now operate only
on inputs of the concrete `String` type.
`StringLike` has no laws, and during the five years of its life,
no-one on Github has ever written another instance of `StringLike`.
https://github.com/search?l=&q=StringLike+language%3APureScript&type=code
The last time someone tried to alter `StringLike`, this is what
happened:
purescript-contrib#62

Breaking changes which won’t be caught by the compiler
======================================================

Fundamentally, we change the way we consume the next input character from
`Data.String.CodeUnits.uncons` to `Data.String.CodePoints.uncons`.

`anyChar` will no longer always succeed. It will only succeed on a Basic
Multilingual Plane character. The new parser `anyCodePoint` will always succeed.

We are not quite “making the default `CodePoint`”, as was discussed in
purescript-contrib#76 (comment) .
Rather we are keeping most of the current API and making it work
properly with astral Unicode.

We keep the `Char` parsers for backward compatibility.
We also keep the `Char` parsers for ergonomic reasons. For example
the parser `char :: forall s m. Monad m => Char -> ParserT s m Char`.
This parser is usually called with a literal like `char 'a'`. It would
be annoying to call this parser with `char (codePointFromChar 'a')`.

Benchmarks
==========

For Unicode correctness, we're now consuming characters with
`Data.String.CodePoints.uncons` instead of
`Data.String.CodeUnits.uncons`. If that were going to effect
performance, then the effect would show up in the `runParser parse23`
benchmark, but it doesn’t.

Before
------

```
runParser parse23
mean   = 43.36 ms
stddev = 6.75 ms
min    = 41.12 ms
max    = 124.65 ms

runParser parseSkidoo
mean   = 22.53 ms
stddev = 3.86 ms
min    = 21.40 ms
max    = 61.76 ms
```

After
-----

```
runParser parse23
mean   = 42.90 ms
stddev = 6.01 ms
min    = 40.97 ms
max    = 115.74 ms

runParser parseSkidoo
mean   = 22.03 ms
stddev = 2.79 ms
min    = 20.78 ms
max    = 53.34 ms
```
jamesdbrock added a commit to jamesdbrock/purescript-parsing that referenced this pull request Sep 29, 2021
Correctly handle UTF-16 surrogate pairs in `String`s.
We keep all of the API, but we change the primitive parsers so that instead
of succeeding and incorrectly returning half of a surrogate pair, they will fail.

All prior tests pass with no modifications. Add a few new tests.

Non-breaking changes
====================

Add primitive parsers `anyCodePoint` and `satisfyCodePoint` for parsing
`CodePoint`s.

Add the `match` combinator.

Move `updatePosString` to the `Text.Parsing.Parser.String` module and don't
export it.

Split dev dependencies into spago-dev.dhall.

Add benchmark suite.

Add astral UTF-16 test.

Breaking changes
================

Change the definition of `whiteSpace` and `skipSpaces` to
`Data.CodePoint.Unicode.isSpace`.

To make this library handle Unicode correctly, it is necessary to
either alter the `StringLike` class or delete it.
We decided to delete it. The `String` module will now operate only
on inputs of the concrete `String` type.
`StringLike` has no laws, and during the five years of its life,
no-one on Github has ever written another instance of `StringLike`.
https://github.com/search?l=&q=StringLike+language%3APureScript&type=code
The last time someone tried to alter `StringLike`, this is what
happened:
purescript-contrib#62

Breaking changes which won’t be caught by the compiler
======================================================

Fundamentally, we change the way we consume the next input character from
`Data.String.CodeUnits.uncons` to `Data.String.CodePoints.uncons`.

`anyChar` will no longer always succeed. It will only succeed on a Basic
Multilingual Plane character. The new parser `anyCodePoint` will always succeed.

We are not quite “making the default `CodePoint`”, as was discussed in
purescript-contrib#76 (comment) .
Rather we are keeping most of the current API and making it work
properly with astral Unicode.

We keep the `Char` parsers for backward compatibility.
We also keep the `Char` parsers for ergonomic reasons. For example
the parser `char :: forall s m. Monad m => Char -> ParserT s m Char`.
This parser is usually called with a literal like `char 'a'`. It would
be annoying to call this parser with `char (codePointFromChar 'a')`.

Benchmarks
==========

For Unicode correctness, we're now consuming characters with
`Data.String.CodePoints.uncons` instead of
`Data.String.CodeUnits.uncons`. If that were going to effect
performance, then the effect would show up in the `runParser parse23`
benchmark, but it doesn’t.

Before
------

```
runParser parse23
mean   = 43.36 ms
stddev = 6.75 ms
min    = 41.12 ms
max    = 124.65 ms

runParser parseSkidoo
mean   = 22.53 ms
stddev = 3.86 ms
min    = 21.40 ms
max    = 61.76 ms
```

After
-----

```
runParser parse23
mean   = 42.90 ms
stddev = 6.01 ms
min    = 40.97 ms
max    = 115.74 ms

runParser parseSkidoo
mean   = 22.03 ms
stddev = 2.79 ms
min    = 20.78 ms
max    = 53.34 ms
```
jamesdbrock added a commit to jamesdbrock/purescript-parsing that referenced this pull request Oct 6, 2021
Correctly handle UTF-16 surrogate pairs in `String`s.
We keep all of the API, but we change the primitive parsers so that instead
of succeeding and incorrectly returning half of a surrogate pair, they will fail.

All prior tests pass with no modifications. Add a few new tests.

Non-breaking changes
====================

Add primitive parsers `anyCodePoint` and `satisfyCodePoint` for parsing
`CodePoint`s.

Add the `match` combinator.

Move `updatePosString` to the `Text.Parsing.Parser.String` module and don't
export it.

Split dev dependencies into spago-dev.dhall.

Add benchmark suite.

Add astral UTF-16 test.

Breaking changes
================

Change the definition of `whiteSpace` and `skipSpaces` to
`Data.CodePoint.Unicode.isSpace`.

To make this library handle Unicode correctly, it is necessary to
either alter the `StringLike` class or delete it.
We decided to delete it. The `String` module will now operate only
on inputs of the concrete `String` type.
`StringLike` has no laws, and during the five years of its life,
no-one on Github has ever written another instance of `StringLike`.
https://github.com/search?l=&q=StringLike+language%3APureScript&type=code
The last time someone tried to alter `StringLike`, this is what
happened:
purescript-contrib#62

Breaking changes which won’t be caught by the compiler
======================================================

Fundamentally, we change the way we consume the next input character from
`Data.String.CodeUnits.uncons` to `Data.String.CodePoints.uncons`.

`anyChar` will no longer always succeed. It will only succeed on a Basic
Multilingual Plane character. The new parser `anyCodePoint` will always succeed.

We are not quite “making the default `CodePoint`”, as was discussed in
purescript-contrib#76 (comment) .
Rather we are keeping most of the current API and making it work
properly with astral Unicode.

We keep the `Char` parsers for backward compatibility.
We also keep the `Char` parsers for ergonomic reasons. For example
the parser `char :: forall s m. Monad m => Char -> ParserT s m Char`.
This parser is usually called with a literal like `char 'a'`. It would
be annoying to call this parser with `char (codePointFromChar 'a')`.

Benchmarks
==========

For Unicode correctness, we're now consuming characters with
`Data.String.CodePoints.uncons` instead of
`Data.String.CodeUnits.uncons`. If that were going to effect
performance, then the effect would show up in the `runParser parse23`
benchmark, but it doesn’t.

Before
------

```
runParser parse23
mean   = 43.36 ms
stddev = 6.75 ms
min    = 41.12 ms
max    = 124.65 ms

runParser parseSkidoo
mean   = 22.53 ms
stddev = 3.86 ms
min    = 21.40 ms
max    = 61.76 ms
```

After
-----

```
runParser parse23
mean   = 42.90 ms
stddev = 6.01 ms
min    = 40.97 ms
max    = 115.74 ms

runParser parseSkidoo
mean   = 22.03 ms
stddev = 2.79 ms
min    = 20.78 ms
max    = 53.34 ms
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants