Feature: `sconcat` and `stimes`. #580

kindaro · 2024-04-12T07:17:28Z

Resolve #288.

There are two commits here.

Commit № 1 adds two new benchmarks in the Pure section: one for sconcat and one for stimes.
Commit № 2 adds specialized implementation of sconcat and stimes to the instance of Semigroup for strict Text.

The benchmarks can be run like so:

first_commit_hash='180645e'
pattern='$2 == "Pure" && ($4 == "sconcat" || $4 == "stimes") && $5 != "LazyText"'
git checkout "$first_commit_hash" &&
    cabal run text-benchmarks -- --pattern "$pattern" --timeout 10s --csv benchmarks.csv
git switch feature-sconcat-stimes &&
    cabal run text-benchmarks -- --pattern "$pattern" --timeout 10s --baseline benchmarks.csv --fail-if-slower 110

This will take a few minutes. You should see better times almost everywhere — only the tiny.sconcat will show worse times.

This is the report as seen on my machine:

All
  Pure
    tiny
      sconcat
        Text: FAIL
          38.6 ns ± 2.3 ns, 148% more than baseline
          Use -p '(($2=="Pure"&&($4=="sconcat"||$4=="stimes"))&&$5!="LazyText")&&/tiny.sconcat.Text/' to rerun this test only.
      stimes
        Text: OK
          70.7 ns ± 2.9 ns, 83% less than baseline
    ascii-small
      sconcat
        Text: OK
          18.6 μs ± 148 ns, 99% less than baseline
      stimes
        Text: OK
          125  μs ±  10 μs, 59% less than baseline
    ascii
      sconcat
        Text: OK
          28.7 ms ± 2.4 ms
      stimes
        Text: OK
          63.3 ms ± 5.6 ms, 78% less than baseline
    english
      sconcat
        Text: OK
          1.07 ms ± 7.4 μs
      stimes
        Text: OK
          4.40 ms ± 373 μs, 69% less than baseline
    russian
      sconcat
        Text: OK
          2.45 μs ± 224 ns, 95% less than baseline
      stimes
        Text: OK
          15.9 μs ± 736 ns, 62% less than baseline
    japanese
      sconcat
        Text: OK
          4.17 μs ± 175 ns, 97% less than baseline
      stimes
        Text: OK
          15.7 μs ± 1.4 μs, 63% less than baseline

kindaro · 2024-04-12T07:52:07Z

Tada, all checks have passed!

Bodigrim

Thanks for writing benchmarks!

Bodigrim · 2024-04-12T21:40:54Z

src/Data/Text.hs

@@ -361,6 +361,8 @@ instance Read Text where
 -- | @since 1.2.2.0
 instance Semigroup Text where
    (<>) = append
+    stimes = replicate . P.fromIntegral


I'm somewhat cautious about potential overflow in fromIntegral. Let's throw an error if Text is non-empty and n does not fit into Int, same as base does for ByteArray: https://hackage.haskell.org/package/base-4.19.1.0/docs/src/Data.Array.Byte.html#stimesPolymorphic

(We do not need to check that len * n fits into Int, because this is validated by Data.Text.replicate itself)

I do not follow why we should evaluate to the empty text when given a negative number but evaluate to an error when given a number that is bigger than maxBound ∷ Int, but your wish is my command.

However, we are not going to check the input in the same way as base because we should stay abstract. The way integers are represented is an implementation detail that had changed in the past and may change in the future. Supporting all versions of GHC we test with will need a lot of CPP without significant improvement in performance. So, I shall do an equality check instead of matching on the constructors of Integer.

I do not follow why we should evaluate to the empty text when given a negative number but evaluate to an error when given a number that is bigger than maxBound ∷ Int, but your wish is my command.

Ah, I did not remember this peculiarity of replicate.

I'd be in favor of stimes throwing an error on negative arguments, even if replicate does not mind to swallow it silently. @Lysxia how do you feel about it?

Sounds good to me.

This is the behaviour on lists:

ghci> replicate (-1) [1..10] []

ghci> import Data.Semigroup ghci> stimes (-1) [1.. 10] *** Exception: stimes: [], negative multiplier

So, stimes is not defined even while replicate is defined on lists and negative numbers. We can do the same. Let me patch it up.

@Bodigrim So why would you be in favour of stimes throwing an error on negative arguments?

@Lysxia So why does this sound good to you?

I should like to have the underlying reasoning recorded for posterity.

Actually, this is a shocking revelation — the property checks of text overall compare the behaviour of functions on Text to their correspondents on String, but the check for stimes is missing both for strict and lazy Text. It would be consistent to add such a check for stimes for both strict and lazy Text and adjust the definitions as needed for it to pass. @Bodigrim Should I add a check for lazy Text and make it pass, or should I only add a check for strict Text?

Asking to "replicate a string n times" with negative n is nonsense. There must be an error in the definition of n, so throwing an exception lets users be aware of that error and fix it.

The default definition of stimes already throws an exception for n <= 0. People haven't complained about it. Extending the definition for n = 0 is reasonable for a monoid.

There could still be a case made in favor of making stimes less partial and more similar to replicate (I think "replicate a string n times" is nonsense as a sentence in natural language, but I don't have a strong argument that code must follow natural language). Until someone makes a good case for extending stimes, throwing an exception for negative arguments is forward-compatible: we can extend the function later (it would only break code that catches the exception, a fishy thing to do). If we made it total now and changed our minds later, that would be a more breaking change.

You can add the stimes test for strict Text now and add lazy stimes in another PR (small PR = good!)

src/Data/Text.hs

The constructors of `Integer` and the module they are exported from all changed between GHC 8 and 9.

kindaro · 2024-04-17T05:35:56Z

src/Data/Text.hs

+    -- | Beware: this function will evaluate to error if the given number does
+    -- not fit into an @Int@.


Sadly it turns out Haddock does not see comments to instance methods. I looked at the documentation generated by cabal haddock — this comment is not rendered. This also seems to be confirmed on the Internet.

https://stackoverflow.com/questions/17758681/haddock-documentation-for-instance-functions-with-quirks-replaced-by-default-cl

Comments on instance methods haddock#123

I am going to move this comment to the instance.

kindaro added 2 commits April 12, 2024 13:36

Add benchmarks for semigroup methods.

180645e

Add specialized implementation of semigroup methods.

bf0c343

Bodigrim reviewed Apr 12, 2024

View reviewed changes

kindaro force-pushed the feature-sconcat-stimes branch from 2a4b0fc to 9deed15 Compare April 13, 2024 09:14

Check that stimes works right in corner cases.

40ff68d

kindaro force-pushed the feature-sconcat-stimes branch from b653fe4 to b425505 Compare April 13, 2024 09:41

Bodigrim reviewed Apr 14, 2024

View reviewed changes

src/Data/Text.hs Outdated Show resolved Hide resolved

kindaro added 2 commits April 16, 2024 15:13

Make sure stimes works right in corner cases.

3c475cb

Be abstract of the implementation of Integer.

c314ce2

The constructors of `Integer` and the module they are exported from all changed between GHC 8 and 9.

kindaro force-pushed the feature-sconcat-stimes branch from b425505 to c314ce2 Compare April 16, 2024 08:14

Lysxia approved these changes Apr 16, 2024

View reviewed changes

kindaro commented Apr 17, 2024

View reviewed changes

kindaro marked this pull request as draft April 17, 2024 05:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: `sconcat` and `stimes`. #580

Feature: `sconcat` and `stimes`. #580

kindaro commented Apr 12, 2024

kindaro commented Apr 12, 2024

Bodigrim left a comment

Bodigrim Apr 12, 2024 •

edited

kindaro Apr 13, 2024

kindaro Apr 13, 2024 •

edited

Bodigrim Apr 16, 2024

Lysxia Apr 16, 2024

kindaro Apr 17, 2024

kindaro Apr 17, 2024

kindaro Apr 17, 2024 •

edited

Lysxia Apr 17, 2024

kindaro Apr 17, 2024

		-- \| Beware: this function will evaluate to error if the given number does
		-- not fit into an @Int@.

Feature: sconcat and stimes. #580

Are you sure you want to change the base?

Feature: sconcat and stimes. #580

Conversation

kindaro commented Apr 12, 2024

kindaro commented Apr 12, 2024

Bodigrim left a comment

Choose a reason for hiding this comment

Bodigrim Apr 12, 2024 • edited

Choose a reason for hiding this comment

kindaro Apr 13, 2024

Choose a reason for hiding this comment

kindaro Apr 13, 2024 • edited

Choose a reason for hiding this comment

Bodigrim Apr 16, 2024

Choose a reason for hiding this comment

Lysxia Apr 16, 2024

Choose a reason for hiding this comment

kindaro Apr 17, 2024

Choose a reason for hiding this comment

kindaro Apr 17, 2024

Choose a reason for hiding this comment

kindaro Apr 17, 2024 • edited

Choose a reason for hiding this comment

Lysxia Apr 17, 2024

Choose a reason for hiding this comment

kindaro Apr 17, 2024

Choose a reason for hiding this comment

Feature: `sconcat` and `stimes`. #580

Feature: `sconcat` and `stimes`. #580

Bodigrim Apr 12, 2024 •

edited

kindaro Apr 13, 2024 •

edited

kindaro Apr 17, 2024 •

edited