`bearriver`: Integration Performance #233

RiugaBachi · 2020-07-06T04:36:29Z

I have a whole benchmark suite that I wrote half a year ago, but could not bring myself to write an issue on here regarding all the performance issues I've encountered with bearriver. With more time during this coronavirus situation however, I believe I wouldn't mind tackling the issues one at a time. I'm bringing up my double integration subgroup first as it is the most trivial of the bunch.

Freshly tested on GHC 8.8.3 w/ -O2 -threaded -rtsopts. As for my hypothesis, perhaps monad optimizations do not play well with an actuator that constantly checks for termination based on the SF output.

benchmarking dIntegral/pure
time                 961.8 ms   (481.3 ms .. 1.241 s)
                     0.973 R²   (0.911 R² .. 1.000 R²)
mean                 1.179 s    (1.046 s .. 1.425 s)
std dev              241.7 ms   (2.511 ms .. 305.0 ms)
variance introduced by outliers: 48% (moderately inflated)

benchmarking dIntegral/plain
time                 1.383 s    (NaN s .. 1.992 s)
                     0.969 R²   (0.890 R² .. 1.000 R²)
mean                 1.541 s    (1.384 s .. 1.637 s)
std dev              151.0 ms   (64.06 ms .. 194.7 ms)
variance introduced by outliers: 22% (moderately inflated)

benchmarking dIntegral/yampa
time                 1.361 s    (1.084 s .. 1.681 s)
                     0.993 R²   (0.977 R² .. 1.000 R²)
mean                 1.526 s    (1.444 s .. 1.607 s)
std dev              96.23 ms   (77.06 ms .. 111.0 ms)
variance introduced by outliers: 19% (moderately inflated)

benchmarking dIntegral/bearriver
time                 66.62 s    (65.92 s .. 67.96 s)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 66.23 s    (66.10 s .. 66.46 s)
std dev              218.4 ms   (15.54 ms .. 270.2 ms)
variance introduced by outliers: 19% (moderately inflated)

Relevant code:

newtype DTimeGen m a
  = DTimeGen { unwrapDTGen :: StateT UTCTime m a }
  deriving (Functor, Applicative, Monad, MonadTrans, MonadState UTCTime, MonadIO)                

runDTGen :: MonadIO m => DTimeGen m a -> m a
runDTGen g = liftIO getCurrentTime >>= evalStateT (unwrapDTGen g)

get' :: StateT UTCTime IO UTCTime
get' = get
put' :: UTCTime -> StateT UTCTime IO ()
put' = put
gct :: StateT UTCTime IO UTCTime
gct = liftIO getCurrentTime
comp :: UTCTime -> UTCTime -> Double
comp pt nt = realToFrac (diffUTCTime nt pt) / (1/60.0)

dt :: DTimeGen IO Double
dt = DTimeGen $ do 
  pt <- get'
  nt <- gct
  let dt' = comp pt nt
  put' nt
  return dt'

--

dIntSteps :: Int
dIntSteps = 3000000

dIntegralPure :: IO ()
dIntegralPure = void $ runDTGen $ foldM dIntgr (0::Double, 0) (replicate dIntSteps 1)
  where
    dIntgr ~(s1, s2) i = do
      dt' <- dt
      let i1 = s1 + (i * dt')
      let i2 = s2 + (i1 * dt')
      pure (i1, i2)
dIntegralPlain :: IO ()
dIntegralPlain = void $ runDTGen $ plainReactimate ((1::Double,) <$> dt) (pure . (<dIntSteps) . snd) $
  (plainIntegral 0 >>> plainIntegral 0) &&& plainSteps 1
dIntegralYampa :: IO ()
dIntegralYampa = runDTGen $ 
  Y.reactimate (pure (0::Double)) (const $ (,Just 1) <$> dt) (const $ pure . (>=dIntSteps) . snd) $ 
    (Y.imIntegral 0 >>> Y.imIntegral 0) &&& (Y.sscan (\x _ -> x + 1) 0) 
dIntegralBR :: IO ()
dIntegralBR = runDTGen $
  BR.reactimate (pure (0::Double)) (const $ (,Just 1) <$> dt) (const $ pure . (>=dIntSteps) . snd) $ 
    (BR.integral >>> BR.integral) &&& BR.count @Int

I don't wish to discuss my definition of Plain yet for this as it is basically on par with Yampa's performance in this test, making it irrelevant.

The text was updated successfully, but these errors were encountered:

ivanperez-keera · 2020-07-13T14:03:40Z

I'm looking into this. Thanks!

ivanperez-keera · 2020-07-13T17:37:20Z

@RiugaBachi I was experimenting a bit over the weekend. Here's some thoughts:

The Yampa definition does not free memory until the end (this may be just due to your expression and not a leak in Yampa). The dunai version consumes just Ks of memory and has a flat memory footprint.
Turning MSFs into a newtype in Dunai helps (noticeably).
Enabling -O2 when compiling both dunai and bearriver also helps (noticeably).
Expanding the definition of arrM makes a big difference.
These were just the ones that were immediately obvious. There's plenty of opportunities for optimization in dunai and in bearriver.
In order to do this properly, we want to probably sit and come up with an acceptable infrastructure for these benchmarks. Something that helps us keep track of 1) compilation flags for libraries, 2) compilation flags for examples, 3) code compiled, 4) execution flags, 5) generated profiling information. @mathandley and I started something in this area but never finished. Maybe time to resume this?

The above cuts the execution time by more than half. Note however that profiling with just one SF and profiling with an actual game are two completely different beasts. @mathandley and I also saw big wins for Yampa with just a few SFs, but then with whole applications BR won by a lot.

This seems like a lot of work, but it's unavoidable for any FRP implementation in the long run (you can come up with a better construct for a particular problem, but it will be lacking for other problems, so you'd need to do this for it anyway). My expectation is that, if we have an acceptable battery of benchmarks and a way to benchmark that helps us keep track of the results, we'll be able to profile and optimize both dunai and bearriver really, really well.

ivanperez-keera · 2020-08-30T11:20:38Z

i have opened a new issue on Yampa to start working on this and track progress. Due to the obvious connection between the two frameworks, I expect this to help us 1) compare dunai and Yampa, and 2) apply similar ideas to dunai so that we can improve the performance of dunai as well.

See: ivanperez-keera/Yampa#167

ivanperez-keera · 2021-09-18T03:34:52Z

I know this has been a long standing issue, so I'm just writing so people know there's progress going on.

I currently have a small benchmark, together with a way of comparing it to the result of the prior benchmark. I've done this for Yampa, but the results should translate. Hopefully, this will give us a measure of how much better/worse a change is, and help us start making decisions based on real data. We can later improve the quality of those decisions by improving the benchmarks in terms of coverage, depth, and their reliability.

ivanperez-keera added the enhancement label Jul 13, 2020

ivanperez-keera mentioned this issue Aug 30, 2020

Create benchmark ivanperez-keera/Yampa#167

Closed

ivanperez-keera mentioned this issue Aug 31, 2021

Made MSF and PureGameEnv newtypes. #225

Closed

ivanperez-keera changed the title ~~Integration Performance~~ bearriver: Integration Performance May 7, 2022

ivanperez-keera mentioned this issue Aug 22, 2023

dunai: Create benchmark #375

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`bearriver`: Integration Performance #233

`bearriver`: Integration Performance #233

RiugaBachi commented Jul 6, 2020 •

edited

ivanperez-keera commented Jul 13, 2020

ivanperez-keera commented Jul 13, 2020

ivanperez-keera commented Aug 30, 2020

ivanperez-keera commented Sep 18, 2021

bearriver: Integration Performance #233

bearriver: Integration Performance #233

Comments

RiugaBachi commented Jul 6, 2020 • edited

ivanperez-keera commented Jul 13, 2020

ivanperez-keera commented Jul 13, 2020

ivanperez-keera commented Aug 30, 2020

ivanperez-keera commented Sep 18, 2021

`bearriver`: Integration Performance #233

`bearriver`: Integration Performance #233

RiugaBachi commented Jul 6, 2020 •

edited