Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create benchmark #167

Closed
ivanperez-keera opened this issue Aug 30, 2020 · 5 comments
Closed

Create benchmark #167

ivanperez-keera opened this issue Aug 30, 2020 · 5 comments
Assignees
Milestone

Comments

@ivanperez-keera
Copy link
Owner

One of the main reasons for Yampa to have changed very little, and for so many PRs to take so long to integrate, is trying to make sure that we do not seriously impact performance or break the API when we do that.

We need a series of benchmarks for Yampa that allow us to 1) measure performance, and 2) compare performance, both to prior implementations and, potentially, to other FRP implementations. Of course, if they are very different, that may be really hard and need to be addressed separately.

I'm creating this issue to document this line of work and have a way to track progress in this direction.

This is also relevant for: ivanperez-keera/dunai#233

ivanperez-keera added a commit that referenced this issue Aug 30, 2020
One of the main reasons for Yampa to have changed very little, and for
so many PRs to take so long to integrate, is trying to make sure that we
do not seriously impact performance or break the API when we do that.

This commit implements a very, very basic benchmark that is integrated
in the cabal file.
@RiugaBachi
Copy link

I am interested in this, but have several concerns I feel should be clarified before proceeding further:

  1. At this point in time, would you say it is fair to evaluate the AFRP scene from a practical standpoint to narrow down implementations worth including in this benchmark? I personally felt as though it was very pointless to write my old benchmark suite against netwire and varying, which in all honesty, no one really uses anymore. In particular, I believe their unmaintained nature is problematic to the development of the benchmark suite, such as the unlawful ArrowApply instance in varying I discovered last year.

  2. Should we try to standardize AFRP - at least for benchmarking purposes - via GHC Backpacks? I have seen this approach taken in Patrick's monadic effects benchmarks, but had a negative experience with its semi-restrictive model when forking it to include benchmarks against effet. In fairness, this was caused by the heavy usage of TemplateHaskell, which is a non-problem as far as AFRP is concerned.

  3. I am going to assume bearriver and Yampa will both definitely make it into the benchmarks suite. These two implementations expose practically an identical interface; can we assume that the fairest comparison between these two would be namespace substitution while preserving the overall form of the expression as much as possible? (Similar to Introduce version bounds for dependencies #233 / Integration Performance)

  4. Granted, for implementations such as rhine and tegami (planning on publishing it next month), there are some functionality that will be hard to replicate fairly in other implementations. I can think of signal networks with resampling and parallelization in the case of rhine and Coyoneda-generalized subnetting (i.e. signal function branching) in tegami. I am not saying we cannot hack bearriver or Yampa to do the same, just that such implementations would not be particularly idiomatic or practical to use in regular application code, and may require access to Internal modules to define new primitive signal functions. At some point we may also have to come to terms that the performance discrepancy will likely be so large should we pursue this endeavor that we will end up sacrificing accuracy (low iteration count) or face insanely long benchmarking times a la Introduce version bounds for dependencies #233 / Integration Performance.

@ivanperez-keera
Copy link
Owner Author

ivanperez-keera commented Aug 31, 2020

  1. I would not complicate this more than necessary. The last time I explored the FRP arena to come up with a good comparison, I found more than 50 implementations. The problem is that it's too easy to implement a new F;RP implementation, and people do not seem to understand that we should be striving to remove implementations (not just in FRP but in all areas), not add more. So, we'll start with Yampa first, move to the immediate neighbours, and take it from there.

  2. I don't know what GHC Backpacks are. I don't see how they could influence the benchmarks. Unless the answer to this is extremely low cost and really high profit for this task, I don't want to open this door right now.

  3. Makes sense.

  4. Yes and no. You can write in dunai everything you can write in rhine, you will just not have the extra type-level safety. But the expectation that is dunai should be more performant than rhine and never less, unless, for a specific construct, the type gives so much information to the compiler that it can optimize better. Unlikely in this case, but theoretically possible. I don't know that tegami is or how it will differ and, at this point, I'm afraid to ask (<insert chris pratt meme here>).

As a general approach, not just to this task, but to all, I would go step by step. The reality is that we are all really busy, and it pays in the long run to strategize this as a series of ridiculously small steps in the right direction, trying to complete each step before going into the next one. A small step will give us the ability to complete things and put them completely out of our minds, and a sense of accomplishment for each task completed.

So, first, let's create a very small benchmark for Yampa. Then, let's see what we can say about different Yampa constructs. Then, how we could automate that analysis (if we can), or at least facilitate it. And so on. It'll be long, but it'll be attainable. If we try to aim too high from the beginning, will likely never finish and, even if we do, we will come up with the ultimate, perfect FRP benchmark that nobody will ever complete. (And for what is worth, we have tried this at least twice before with Yampa and twice with dunai, and that's precisely what happened in all cases.)

ivanperez-keera added a commit that referenced this issue Sep 18, 2021
One of the main reasons for Yampa to have changed very little, and for
so many PRs to take so long to integrate, is trying to make sure that we
do not seriously impact performance or break the API when we do that.

This commit implements a very, very basic benchmark that is integrated
in the cabal file.
@ivanperez-keera
Copy link
Owner Author

I know this has been a long standing issue, so I'm just writing so people know there's progress going on.

I currently have a small benchmark, together with a way of comparing it to the result of the prior benchmark. Hopefully, this will give us a measure of how much better/worse a change is, and help us start making decisions based on real data. We can later improve the quality of those decisions by improving the benchmarks in terms of coverage, depth, and their reliability.

ivanperez-keera added a commit that referenced this issue Mar 13, 2022
One of the main reasons for Yampa to have changed very little, and for
so many PRs to take so long to integrate, is trying to make sure that we
do not seriously impact performance or break the API when we do that.

This commit implements a very, very basic benchmark that is integrated
in the cabal file.

[ci skip]
@ivanperez-keera ivanperez-keera removed this from the Yampa (+1) - Performance milestone Apr 7, 2022
ivanperez-keera added a commit that referenced this issue Apr 1, 2023
One of the main reasons for Yampa to have changed very little, and for
so many PRs to take so long to integrate, is trying to make sure that we
do not seriously impact performance or break the API when we do that.

This commit implements a very, very basic benchmark that is integrated
in the cabal file.

[ci skip]
ivanperez-keera added a commit that referenced this issue Apr 8, 2023
One of the main reasons for Yampa to have changed very little, and for
so many PRs to take so long to integrate, is trying to make sure that we
do not seriously impact performance or break the API when we do that.

This commit implements a very, very basic benchmark that is integrated
in the cabal file.

[ci skip]
ivanperez-keera added a commit that referenced this issue May 8, 2023
One of the main reasons for Yampa to have changed very little, and for
so many PRs to take so long to integrate, is trying to make sure that we
do not seriously impact performance or break the API when we do that.

This commit implements a very, very basic benchmark that is integrated
in the cabal file.

[ci skip]
@ivanperez-keera ivanperez-keera self-assigned this Aug 5, 2023
@ivanperez-keera ivanperez-keera added this to the 0.14.4 milestone Aug 5, 2023
ivanperez-keera added a commit that referenced this issue Aug 5, 2023
One of the main reasons for Yampa to have changed very little, and for
so many PRs to take so long to integrate, is trying to make sure that we
do not seriously impact performance or break the API when we do that.

This commit implements a very, very basic benchmark that is integrated
in the cabal file.

[ci skip]
ivanperez-keera added a commit that referenced this issue Aug 5, 2023
This commit implements a basic benchmark that is integrated in the Yampa
package. The goal is to have an initial benchmark infrastructure that
can be used to evaluate the performance of Yampa, and compare between
different versions, and to make evidence-based decisions about when new
features should be incorporated into Yampa.

[ci skip]
@ivanperez-keera
Copy link
Owner Author

I have just put a branch with an initial benchmark: https://github.com/ivanperez-keera/Yampa/tree/develop-performance

It has two commits: the first one is for the release, the second one is just for documentation but not to be included into the official release (at least, not as is).

My intention is to merge the first as part of Yampa 0.14.4, which will be published 2 days from now.

This change does not address everything we've discussed so far. It is one benchmark to have some infrastructure in place, and to get things rolling.

I want benchmarks to become part of the normal process of releasing Yampa and evaluating new contributions. To me, it is VERY important that, whatever mechanism we put in place to create a benchmark, it be automated. If it takes a lot of steps to perform benchmarking, we will not use it continuously, or ever.

Please discuss. If there's an immediate lack that can be addressed quickly, please comment. If you see a problem with the current version, also please comment.

Performance evaluation and benchmarking will be a big enough effort that we will not be able to get it completely right the first time. Perfect is the enemy of good.

Tagging @RiugaBachi @turion for awareness. Ideas from this proposal may also be used with bearriver and/or dunai.

ivanperez-keera added a commit that referenced this issue Aug 5, 2023
This commit implements a basic benchmark that is integrated in the Yampa
package. The goal is to have an initial benchmark infrastructure that
can be used to evaluate the performance of Yampa, and compare between
different versions, and to make evidence-based decisions about when new
features should be incorporated into Yampa.

[ci skip]
ivanperez-keera added a commit that referenced this issue Aug 7, 2023
This commit implements a basic benchmark that is integrated in the Yampa
package. The goal is to have an initial benchmark infrastructure that
can be used to evaluate the performance of Yampa, and compare between
different versions, and to make evidence-based decisions about when new
features should be incorporated into Yampa.
ivanperez-keera added a commit that referenced this issue Aug 7, 2023
Benchmarks are useless unless people actually use them to evaluate their
proposed solutions.

This commit modifies the README to document the existence of benchmarks
already as part of the Yampa package.
ivanperez-keera added a commit that referenced this issue Aug 7, 2023
This commit implements a basic benchmark that is integrated in the Yampa
package. The goal is to have an initial benchmark infrastructure that
can be used to evaluate the performance of Yampa, and compare between
different versions, and to make evidence-based decisions about when new
features should be incorporated into Yampa.
ivanperez-keera added a commit that referenced this issue Aug 7, 2023
Benchmarks are useless unless people actually use them to evaluate their
proposed solutions.

This commit modifies the README to document the existence of benchmarks
already as part of the Yampa package.
@ivanperez-keera
Copy link
Owner Author

I'm about to merge this change. I will not merge the evaluation scripts, but I will simply put them here for now. To evaluate a specific commit, do the following:

#!/bin/bash
DESTINATION=$1
mkdir -p $DESTINATION
git log -1                    > $DESTINATION/git-log
git diff HEAD                 > $DESTINATION/git-diff
cabal v1-exec -- ghc-pkg list > $DESTINATION/packages
ghc --version                 > $DESTINATION/ghc-version
cabal --version               > $DESTINATION/cabal-version
cabal v1-bench --benchmark-option=$DESTINATION

To compare two evaluations, you can use the following Haskell program:

{-# LANGUAGE OverloadedStrings #-}

import qualified Data.ByteString.Lazy   as BL
import           Data.Csv               (FromNamedRecord (parseNamedRecord),
                                         decodeByName, (.:))
import           Data.List              (transpose)
import qualified Data.Map               as M
import qualified Data.Vector            as V
import           System.Environment     (getArgs)
import           Text.PrettyPrint.Boxes

main :: IO ()
main = do

  -- Parse arguments
  -- f1 is the CSV file produced with the new version
  -- f2 is the CSV file produced with the old version
  (f1:f2:v:_) <- getArgs

  -- Parse results. The result of decoding the file is an Either, and the Right
  -- case has a pair (Header, Vector Result). We use snd to keep the Vector.
  csvData1 <- fmap snd <$> decodeByName <$> BL.readFile f1
  csvData2 <- fmap snd <$> decodeByName <$> BL.readFile f2

  -- Obtain the list of violations and correct comparisons by comparing the
  -- data obtained from two CSV files using an auxiliary comparison function
  let correct    = fst <$> result
      violations = snd <$> result

      result = compareResults comparisonFunc <$> csvData1 <*> csvData2

      -- v is the third argument when executing the program. It acts as
      -- a threshold of how much better/worse the second version must be
      -- than the first.
      --
      -- 1 means the old version must be slower or the same as the new one.
      -- 2 means the new version must be at least twice as fast as the new one.
      -- 0.9 means the new version can be up to 10% slower than the old one.
      comparisonFunc x y = x * (read v) <= y

  -- Print results
  putStrLn "Correct"
  printResults correct

  putStrLn "Violations"
  printResults violations

-- | Compare two CSV databases, and produce the correct results and the
-- violations.
compareResults :: (Double -> Double -> Bool)  -- ^ Comparison function
               -> V.Vector Result             -- ^ Data from first file.
               -> V.Vector Result             -- ^ Data from second file.
               -> (M.Map String Comparison, M.Map String Comparison)
compareResults prop rows1 rows2 =
    M.partition (compareDurations prop) combinedCriterionData

  where

    combinedCriterionData = M.unionWith mergeComparisons map1 map2

    -- Turn the result data (a vector) into a map
    map1 = resultsToMapWith mkComparisonFst rows1
    map2 = resultsToMapWith mkComparisonSnd rows2

printResults :: Either String (M.Map String Comparison) -> IO ()
printResults (Left s)  = putStrLn $ "Error: " ++ s
printResults (Right m) = putStrLn table
  where
    table = render $ hsep 3 left $ map (vcat left . map text)
          $ transpose cols
    cols :: [[String]]
    cols = ["Benchmark", "Alt 1", "Alt 2", "Factor (alt2 / alt1)"]
         : [ [ benchmark, alt1, alt2, factor ]
           | (benchmark, comp) <- M.assocs m
           , let alt1 :: String
                 alt1 = maybe "(nothing)" show $ comparisonDuration1 comp
           , let alt2 :: String
                 alt2 = maybe "(nothing)" show $ comparisonDuration2 comp
           , let factor =
                   case (comparisonDuration1 comp, comparisonDuration2 comp) of
                     (Just v1, Just v2) -> show (v2/v1)
                     _                  -> "N/A"
           ]

-- * Comparisons

-- | Comparison entry
data Comparison = Comparison
    { comparisonDuration1 :: Maybe Double
    , comparisonDuration2 :: Maybe Double
    }
  deriving Show

-- | Constructor with 1st comparison value only.
mkComparisonFst :: Double -> Comparison
mkComparisonFst v = Comparison (Just v) Nothing

-- | Constructor with 2nd comparison value only.
mkComparisonSnd :: Double -> Comparison
mkComparisonSnd v = Comparison Nothing (Just v)

-- | Merge the first duration from one comparsion with the second duration from
-- another comparison.
mergeComparisons :: Comparison -> Comparison -> Comparison
mergeComparisons c1 c2 =
  Comparison (comparisonDuration1 c1) (comparisonDuration2 c2)

-- | A comparison succceds if both values exist and the first is greater or
-- equal than the second.
compareDurations :: (Double -> Double -> Bool) -> Comparison -> Bool
compareDurations prop (Comparison (Just d1) (Just d2)) = prop d1 d2
compareDurations _ _ = False

-- * Criterion

-- | Dataype representing a row of results from Criterion.
data Result = Result
  { name     :: !String
  , mean     :: !Double
  , meanLB   :: !Double
  , meanUB   :: !Double
  , stddev   :: !Double
  , stddevLB :: !Double
  , stddevUB :: !Double
  }

-- | Instance to parse a result from a named CSV row.
instance FromNamedRecord Result where
  parseNamedRecord r = Result <$> r .: "Name"
                              <*> r .: "Mean"
                              <*> r .: "MeanLB"
                              <*> r .: "MeanUB"
                              <*> r .: "Stddev"
                              <*> r .: "StddevLB"
                              <*> r .: "StddevUB"

-- | Build a map of comparisons from a vector of results read from the CSV
-- file. We use this auxiliary type so we can use Map union to merge results
-- from two files.
resultsToMapWith :: (Double -> Comparison)
                 -> V.Vector Result
                 -> M.Map String Comparison
resultsToMapWith f = vectorToMap . V.map (\r -> (name r, f (mean r)))

-- * Auxiliary

-- | Turn a vector into a map
vectorToMap :: Ord key
            => V.Vector (key, value)
            -> M.Map key value
vectorToMap vec =
  V.foldl (\myMap (key, value) -> M.insert key value myMap) M.empty vec

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants