Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bearriver: Integration Performance #233

Open
RiugaBachi opened this issue Jul 6, 2020 · 4 comments
Open

bearriver: Integration Performance #233

RiugaBachi opened this issue Jul 6, 2020 · 4 comments

Comments

@RiugaBachi
Copy link

RiugaBachi commented Jul 6, 2020

I have a whole benchmark suite that I wrote half a year ago, but could not bring myself to write an issue on here regarding all the performance issues I've encountered with bearriver. With more time during this coronavirus situation however, I believe I wouldn't mind tackling the issues one at a time. I'm bringing up my double integration subgroup first as it is the most trivial of the bunch.

Freshly tested on GHC 8.8.3 w/ -O2 -threaded -rtsopts. As for my hypothesis, perhaps monad optimizations do not play well with an actuator that constantly checks for termination based on the SF output.

benchmarking dIntegral/pure
time                 961.8 ms   (481.3 ms .. 1.241 s)
                     0.973 R²   (0.911 R² .. 1.000 R²)
mean                 1.179 s    (1.046 s .. 1.425 s)
std dev              241.7 ms   (2.511 ms .. 305.0 ms)
variance introduced by outliers: 48% (moderately inflated)

benchmarking dIntegral/plain
time                 1.383 s    (NaN s .. 1.992 s)
                     0.969 R²   (0.890 R² .. 1.000 R²)
mean                 1.541 s    (1.384 s .. 1.637 s)
std dev              151.0 ms   (64.06 ms .. 194.7 ms)
variance introduced by outliers: 22% (moderately inflated)

benchmarking dIntegral/yampa
time                 1.361 s    (1.084 s .. 1.681 s)
                     0.993 R²   (0.977 R² .. 1.000 R²)
mean                 1.526 s    (1.444 s .. 1.607 s)
std dev              96.23 ms   (77.06 ms .. 111.0 ms)
variance introduced by outliers: 19% (moderately inflated)

benchmarking dIntegral/bearriver
time                 66.62 s    (65.92 s .. 67.96 s)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 66.23 s    (66.10 s .. 66.46 s)
std dev              218.4 ms   (15.54 ms .. 270.2 ms)
variance introduced by outliers: 19% (moderately inflated)

Relevant code:

newtype DTimeGen m a
  = DTimeGen { unwrapDTGen :: StateT UTCTime m a }
  deriving (Functor, Applicative, Monad, MonadTrans, MonadState UTCTime, MonadIO)                

runDTGen :: MonadIO m => DTimeGen m a -> m a
runDTGen g = liftIO getCurrentTime >>= evalStateT (unwrapDTGen g)

get' :: StateT UTCTime IO UTCTime
get' = get
put' :: UTCTime -> StateT UTCTime IO ()
put' = put
gct :: StateT UTCTime IO UTCTime
gct = liftIO getCurrentTime
comp :: UTCTime -> UTCTime -> Double
comp pt nt = realToFrac (diffUTCTime nt pt) / (1/60.0)

dt :: DTimeGen IO Double
dt = DTimeGen $ do 
  pt <- get'
  nt <- gct
  let dt' = comp pt nt
  put' nt
  return dt'

--

dIntSteps :: Int
dIntSteps = 3000000

dIntegralPure :: IO ()
dIntegralPure = void $ runDTGen $ foldM dIntgr (0::Double, 0) (replicate dIntSteps 1)
  where
    dIntgr ~(s1, s2) i = do
      dt' <- dt
      let i1 = s1 + (i * dt')
      let i2 = s2 + (i1 * dt')
      pure (i1, i2)
dIntegralPlain :: IO ()
dIntegralPlain = void $ runDTGen $ plainReactimate ((1::Double,) <$> dt) (pure . (<dIntSteps) . snd) $
  (plainIntegral 0 >>> plainIntegral 0) &&& plainSteps 1
dIntegralYampa :: IO ()
dIntegralYampa = runDTGen $ 
  Y.reactimate (pure (0::Double)) (const $ (,Just 1) <$> dt) (const $ pure . (>=dIntSteps) . snd) $ 
    (Y.imIntegral 0 >>> Y.imIntegral 0) &&& (Y.sscan (\x _ -> x + 1) 0) 
dIntegralBR :: IO ()
dIntegralBR = runDTGen $
  BR.reactimate (pure (0::Double)) (const $ (,Just 1) <$> dt) (const $ pure . (>=dIntSteps) . snd) $ 
    (BR.integral >>> BR.integral) &&& BR.count @Int

I don't wish to discuss my definition of Plain yet for this as it is basically on par with Yampa's performance in this test, making it irrelevant.

@ivanperez-keera
Copy link
Owner

I'm looking into this. Thanks!

@ivanperez-keera
Copy link
Owner

@RiugaBachi I was experimenting a bit over the weekend. Here's some thoughts:

  • The Yampa definition does not free memory until the end (this may be just due to your expression and not a leak in Yampa). The dunai version consumes just Ks of memory and has a flat memory footprint.

  • Turning MSFs into a newtype in Dunai helps (noticeably).

  • Enabling -O2 when compiling both dunai and bearriver also helps (noticeably).

  • Expanding the definition of arrM makes a big difference.

  • These were just the ones that were immediately obvious. There's plenty of opportunities for optimization in dunai and in bearriver.

  • In order to do this properly, we want to probably sit and come up with an acceptable infrastructure for these benchmarks. Something that helps us keep track of 1) compilation flags for libraries, 2) compilation flags for examples, 3) code compiled, 4) execution flags, 5) generated profiling information. @mathandley and I started something in this area but never finished. Maybe time to resume this?

The above cuts the execution time by more than half. Note however that profiling with just one SF and profiling with an actual game are two completely different beasts. @mathandley and I also saw big wins for Yampa with just a few SFs, but then with whole applications BR won by a lot.

This seems like a lot of work, but it's unavoidable for any FRP implementation in the long run (you can come up with a better construct for a particular problem, but it will be lacking for other problems, so you'd need to do this for it anyway). My expectation is that, if we have an acceptable battery of benchmarks and a way to benchmark that helps us keep track of the results, we'll be able to profile and optimize both dunai and bearriver really, really well.

@ivanperez-keera
Copy link
Owner

i have opened a new issue on Yampa to start working on this and track progress. Due to the obvious connection between the two frameworks, I expect this to help us 1) compare dunai and Yampa, and 2) apply similar ideas to dunai so that we can improve the performance of dunai as well.

See: ivanperez-keera/Yampa#167

@ivanperez-keera
Copy link
Owner

I know this has been a long standing issue, so I'm just writing so people know there's progress going on.

I currently have a small benchmark, together with a way of comparing it to the result of the prior benchmark. I've done this for Yampa, but the results should translate. Hopefully, this will give us a measure of how much better/worse a change is, and help us start making decisions based on real data. We can later improve the quality of those decisions by improving the benchmarks in terms of coverage, depth, and their reliability.

@ivanperez-keera ivanperez-keera changed the title Integration Performance bearriver: Integration Performance May 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants