space leaks on executeMany #39

Open
s9gf4ult opened this Issue Sep 23, 2012 · 11 comments

Comments

Projects
None yet
2 participants
Contributor

s9gf4ult commented Sep 23, 2012

Hello.
I am trying to use postgresql-simple to store many same data in the database.
When my function looks like this

postSaveCandles :: Connection -> [Candle] -> IO ()
postSaveCandles c cndls = do
  executeMany c "insert into candles (open, close, minc, maxc, volume, timec, periodtype, periodsecs) values (?,?,?,?,?,?,?,?)" cndls
  return ()

My program leaks. I mean it constructs very many Candle consuming memory and then reduces them and executes query.

When i do this

postSaveCandles :: Connection -> [Candle] -> IO ()
postSaveCandles c cndls = do
  mapM_ (execute c "insert into candles (open, close, minc, maxc, volume, timec, periodtype, periodsecs) values (?,?,?,?,?,?,?,?)") cndls
  return ()

It works smothsly, inserts each candle one by one, but not very fast. I think bacause of many exec's.

Candle looks like this

data Period = FixedSecs !Int64 -- ^ Fixed time period defined by seconds count (i.e. hour is 3600 seconds)
            | Month           -- ^ Month time period
            | Year            -- ^ Year time period
            deriving (Eq, Show)

data Candle = Candle {candleOpenCost :: ! Double  -- ^ Opening cost of candle
                     ,candleCloseCost :: ! Double -- ^ Closing cost of candle
                     ,candleMinCost :: ! Double   -- ^ Minimal cost in the candle
                     ,candleMaxCost :: ! Double   -- ^ Maximal cost in the candle
                     ,candleVolume :: ! Double    -- ^ Volume of all deals in candle
                     ,candleTime :: !UTCTime       -- ^ Start time of the candle
                     ,candlePeriod :: !Period      -- ^ Candle time period
                     }
            deriving (Eq, Show)

And have instance ToRow

instance ToRow Candle where
  toRow (Candle {candleOpenCost = oc,
                 candleCloseCost = cc,
                 candleMinCost = minc,
                 candleMaxCost = maxc,
                 candleVolume = v,
                 candleTime = tm,
                 candlePeriod = per}) = [toField oc,
                                         toField cc,
                                         toField minc,
                                         toField maxc,
                                         toField v,
                                         toField tm] ++ (ptofield per)
    where
      ptofield (FixedSecs s) = [toField ("seconds" :: String),
                                toField s]
      ptofield Month = [toField ("month" :: String),
                        toField (0 :: Int64)]
      ptofield Year = [toField ("year" :: String),
                       toField (0 :: Int64)]

Candle generates just random with this instance

instance Random Candle where
  random g = (Candle {candleOpenCost = oc,
                      candleCloseCost = cc,
                      candleMinCost = mc,
                      candleMaxCost = mac,
                      candleVolume = v,
                      candleTime = t,
                      candlePeriod = p}, ng)
    where
      (oc, g1) = random g
      (cc, g2) = random g1
      (mc, g3) = random g2
      (mac, g4) = random g3
      (v, g5) = random g4
      (t, g6) = random g5
      (p, ng) = random g6
  randomR (Candle {candleOpenCost = oc1,
                   candleCloseCost = cc1,
.....

which is quite lazy.
here is all code
https://github.com/s9gf4ult/hadan/blob/master/post.hs

Owner

lpsmith commented Sep 23, 2012

Well, the space leak does need to be investigated and fixed, but you can make your original code run much faster by running it in a single transaction instead of one transaction per insert as you are now.

postSaveCandles :: Connection -> [Candle] -> IO ()
postSaveCandles c cndls = withTransaction c $ do
  mapM_ (execute c "insert into candles (open, close, minc, maxc, volume, timec, periodtype, periodsecs) values (?,?,?,?,?,?,?,?)") cndls
  return ()
Contributor

s9gf4ult commented Sep 23, 2012

But i actually do

dbMakeCandles count = do
  con <- connect $ ConnectInfo "127.0.0.1" 5432 "test" "test" "test"
  cndls <- genCandles
  withTransaction con $ postSaveCandles con $ take count $ map (\x -> x `deepseq` x) cndls

I will try to write more tests to narrow this problem.

Owner

lpsmith commented Sep 24, 2012

Ok, what do you mean by a "space leak"? Have you tried running the heap profiler on your program?

Unfortunately, Haskell's memory consumption is not, in general, a local property. How much space a function uses is not only a question of how the function is written, but also how the result is consumed. Just be aware of that issue when trying to isolate this issue.

Contributor

s9gf4ult commented Sep 25, 2012

Have you tried running the heap profiler on your program

Yes. I am not sure i understand what i got in the profiling, i am not very familiar with heap profiling in Haskell, but there is truangular peak while program runs. It generates huge amount of some data, then folding it down.

Just be aware of that issue when trying to isolate this issue.

Ok

Owner

lpsmith commented Sep 26, 2012

Out of curiosity, have you tried compiling postgresql-simple with heap profiling enabled?

This may not actually be a space leak, though it's entirely possible that executeMany is temporarily using significantly more memory than it should. Though if it uses more than O(n) memory, I would classify that as a sort of space leak.

What I would suggest is writing a test program that exhibits the issue. I see that you are using UTCTime, so the problem may be in my re-worked time printers as well. In your test program, I would try to use the same mix of field types you use in this code. And don't worry too much about trying to locate the problem, I'm happy to help with that.

Contributor

s9gf4ult commented Oct 1, 2012

I have written some tests

https://github.com/s9gf4ult/teststorage

here two executables GenerateTest and PostTest. First generate given amount of objects and dies, second does "executeMany" on given amount of Storagles and then select is back. Both executables take one parameter with amount of storables to operate on, default is 10000 if parameter is skipped. To work PostTest you need "test" as user, password and database name on localhost, or change it manually on line 14 of PostTest.hs

Memory consumed 5 times more on inserts then on selects. I think this is because of Haskell does not release intermediate data memory until completely execute database query. This is posible not a space leak but some king of ... what ?

@s9gf4ult s9gf4ult closed this Oct 1, 2012

@s9gf4ult s9gf4ult reopened this Oct 1, 2012

Contributor

s9gf4ult commented Oct 1, 2012

I accidentally closed this issue ..

Owner

lpsmith commented May 8, 2013

By the way, did you ever get this resolved? The link to your code is no longer available...

Owner

lpsmith commented May 8, 2013

Oh, I was looking at the link in your original post; thanks for the test cases!

Owner

lpsmith commented Jul 8, 2013

Ok, I did take a brief look at your test case; the one thing that stands out to me is that your insert appears to take quadratic time: doubling the number of elements in the test case approximately quadruples the amount time it takes to make the insert.

And I did verify that the problem is inside postgresql-simple itself. It only appears to be a problem when strings need to be escaped via libpq... my first attempt to replicate this outside of your test case was not successful because I only used integers which do not get escaped via libpq.

In any case, here's a stripped down test case that exhibits the behavior:

{-# LANGUAGE OverloadedStrings, BangPatterns #-}

import Database.PostgreSQL.Simple
import System.Environment

main = do
   (nstr:_) <- getArgs
   let n = read nstr :: Int
   c <- connectPostgreSQL ""
   !q <- formatMany c "insert into foo values (?,?)" [ (i,show i) | i <- [1..n] ]
   close c

I then compiled this file and then timed it using the unix time command. I don't know if this is related to the memory consumption problems you are experiencing, but I figure there is a good chance it is.

lpsmith added a commit that referenced this issue Jul 8, 2013

Add s9gf4ult as a contributor
Bug #39 is definitely worthy of being added to CONTRIBUTORS
Owner

lpsmith commented Jul 9, 2013

Ok, I fixed postgresql-libpq's binding to PQescapeStringConn in bug #70, and it appears to be pretty close to linear time now. Could you upgrade to postgresql-libpq-0.8.2.3 and see if this fixed your space problem to your satisfaction?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment