Memory issue in with-git-revisions #9

Closed
woudshoo opened this Issue May 5, 2012 · 7 comments

Comments

Projects
None yet
2 participants
Contributor

woudshoo commented May 5, 2012

Dear Russell,

I found a small problem in with-git-revisions. The revwalker is never freed.
This leads to problems in my use case because it will keep pack files open and
after a about 10 times with opening / revwalking and closing a repository I hit
the 256 max open file limit I have.

The fix is trivial, changing the end of the with-git-revisions to

         (unwind-protect
      (revision-walker)
       (%git-revwalk-free revwalker))))))))

I have this added in my master branch. However this includes a few more changes
too. For example I have added functions for finding parents of commits etc.
If you want to include those too that would be great.
Otherwise I can apply a patch of the with-git-revsions issue to your branch and make
a clean pull request.

Kind regards,
Wim Oudshoorn

Owner

russell commented May 7, 2012

cool, thanks. I have merged your master into mine. i might see if i can add some kinda lsof test (it might only work on linux) but i should be able to check if there are any files left open, I am quite worried about leakage.

Also, i unfortunately had to revert the #! line back, seems that OSX will pass each arg to env but on Linux it tries to interpret it as one command.

russell closed this May 7, 2012

Contributor

woudshoo commented May 7, 2012

Russell Sim
reply@reply.github.com
writes:

cool, thanks. I have merged your master into mine. i might see if i
can add some kinda lsof test (it might only work on linux) but i
should be able to check if there are any files left open, I am quite
worried about leakage.

Testing might be tricky. I noticed it first when my interactive tests
on a relatively big repository started to fail with file open errors.
(This showed up because (my) MacOSX the default max open file limit is
256.)

I have tried to make a test case for you out of it. However, it seems
to only happen when reading from 'pack' files. I did not manage to
create a repository with a pack file. (and maybe onlye when multiple
pack files are read.)

I am also a bit worried about leakage, but now less than before.
Maybe tomorrow I will go over the cl-git.lisp with a toothcomb and
see if I spot potential leaks.

Also, i unfortunately had to revert the #! line back, seems that OSX
will pass each arg to env but on Linux it tries to interpret it as one
command.

Yes, I saw. Sorry or putting it in. For me sbcl is not installed in
/usr/bin and I thought that the standard trick was using env, so I tried
it and it worked for me, not realizing that it doesn't work on linux.

BTW, are you also targetting windows? I can test it on windows too.

Wim Oudshoorn.

Owner

russell commented May 8, 2012

On 8 May 2012 01:35, Willem Rein Oudshoorn
reply@reply.github.com
wrote:

Russell Sim
reply@reply.github.com
writes:

cool, thanks. I have merged your master into mine. i might see if i
can add some kinda lsof test (it might only work on linux) but i
should be able to check if there are any files left open, I am quite
worried about leakage.

Testing might be tricky.  I noticed it first when my interactive tests
on a relatively big repository started to fail with file open errors.
(This showed up because (my) MacOSX the default max open file limit is
256.)

I am thinking that I can write a check that can be run after some
other test, Were the file handles linked to a repository? Once you
freed the repository struct did they get cleaned up?

I have tried to make a test case for you out of it.  However, it seems
to only happen when reading from 'pack' files.  I did not manage to
create a repository with a pack file.  (and maybe onlye when multiple
pack files are read.)

I guess we will have to wait until we have pack support before we can
really test this.

I am also a bit worried about leakage, but now less than before.
Maybe tomorrow I will go over the cl-git.lisp with a toothcomb and
see if I spot potential leaks.

Do you know of any way to check the currently allocated memory in
lisp? File handles will be an easy way to spot leaks, but it would be
good to have a better way.

Also, i unfortunately had to revert the #! line back, seems that OSX
will pass each arg to env but on Linux it tries to interpret it as one
command.

Yes, I saw.  Sorry or putting it in.  For me sbcl is not installed in
/usr/bin and I thought that the standard trick was using env, so I tried
it and it worked for me, not realizing that it doesn't work on linux.

DW, I thought it would have worked too.

BTW, are you also targetting windows?  I can test it on windows too.

Not really, I don't have access to a windows machine. Unless you are
planning on using the library on windows I wouldn't worry too much
about it.

Thanks,
Russell

Contributor

woudshoo commented May 8, 2012

Russell Sim
reply@reply.github.com
writes:

On 8 May 2012 01:35, Willem Rein Oudshoorn
reply@reply.github.com
wrote:

Russell Sim
reply@reply.github.com
writes:

cool, thanks. I have merged your master into mine. i might see if i
can add some kinda lsof test (it might only work on linux) but i
should be able to check if there are any files left open, I am quite
worried about leakage.

Testing might be tricky.  I noticed it first when my interactive tests
on a relatively big repository started to fail with file open errors.
(This showed up because (my) MacOSX the default max open file limit is
256.)

I am thinking that I can write a check that can be run after some
other test, Were the file handles linked to a repository? Once you
freed the repository struct did they get cleaned up?

No that was the problem. When the revwalker was not freed,
even after closing the repository lsof showed open filehandles
to the pack files. After doing this multiple times the same
pack file would have 8 open handles to it (and than it crashed
because I ran out of filehandled).

I have tried to make a test case for you out of it.  However, it seems
to only happen when reading from 'pack' files.  I did not manage to
create a repository with a pack file.  (and maybe onlye when multiple
pack files are read.)

I guess we will have to wait until we have pack support before we can
really test this.

Yes, I think so, at the moment it seems to work ok for me.

I am also a bit worried about leakage, but now less than before.
Maybe tomorrow I will go over the cl-git.lisp with a toothcomb and
see if I spot potential leaks.

Do you know of any way to check the currently allocated memory in
lisp? File handles will be an easy way to spot leaks, but it would be
good to have a better way.

I only know of the lisp function (room), make sure to run the garbage
collector first with (gc :full t). However, to make use of (room)
in an automated way you have to rebind the standard out stream and
parse the result. Also I am not sure how 'stable' the results of room are.

Note that I found at leas one other memory leak in the code
for git-commit-create function. The git-signature's that are
optionally created, are never freed.

I would like to do the same thing to git-signature as I did to
the OIDs. Use the cffi type translation functions.
The advantagae is that it also get rid of the with-git-signature-return
macro.

This required a decision of how to present the signature data in lisp.
I do not have very strong opinions. One option is to preset it as a
property list, e.g:

(:author "Wim" :e-mail "wim@computer" :time @local-time-time-object)

With automatic conversions which will add sensible defaults like now, so
you could specify () for all defaults or (:author "wim") etc.

Now there are some things I am not sure about:

1 - is the property list format appropriate?
2 - e-mail or email?
3 - Instead of using local-time object it could also be a standard
universal-time value.

I would implement the following covnersions

1 - From libgit2's signature data to the property list above
2 - From a full or partially filled property list to the c equivalent
3 - A pass through conversion so if you supply a libgit2 c-signature
it will be left intact (and NOT freed afterwards).

Oh and this time I promise I will update the tests and run them until
they pass.

Let me know if this sounds good to you.

Wim Oudshoorn.

Owner

russell commented May 8, 2012

Willem Rein Oudshoorn
reply@reply.github.com
writes:

No that was the problem. When the revwalker was not freed,
even after closing the repository lsof showed open filehandles
to the pack files. After doing this multiple times the same
pack file would have 8 open handles to it (and than it crashed
because I ran out of filehandled).

Cool, I have pushed an extra post test check that will try and spot open
files. I'm not sure how this will go on the mac if you have any
problems let me know.

I only know of the lisp function (room), make sure to run the garbage
collector first with (gc :full t). However, to make use of (room)
in an automated way you have to rebind the standard out stream and
parse the result. Also I am not sure how 'stable' the results of room
are.

I might just leave this for the moment.

Note that I found at leas one other memory leak in the code
for git-commit-create function. The git-signature's that are
optionally created, are never freed.

I would like to do the same thing to git-signature as I did to
the OIDs. Use the cffi type translation functions.
The advantagae is that it also get rid of the with-git-signature-return
macro.

This required a decision of how to present the signature data in lisp.
I do not have very strong opinions. One option is to preset it as a
property list, e.g:

(:author "Wim" :e-mail "wim@computer" :time @local-time-time-object)

With automatic conversions which will add sensible defaults like now, so
you could specify () for all defaults or (:author "wim") etc.

Now there are some things I am not sure about:

1 - is the property list format appropriate?
2 - e-mail or email?
3 - Instead of using local-time object it could also be a standard
universal-time value.

I would implement the following covnersions

1 - From libgit2's signature data to the property list above
2 - From a full or partially filled property list to the c equivalent
3 - A pass through conversion so if you supply a libgit2 c-signature
it will be left intact (and NOT freed afterwards).

Oh and this time I promise I will update the tests and run them until
they pass.

Let me know if this sounds good to you.

It all looks good to me, the property list format looks good, I think
perhaps email is the better keyword.

The time object is an interesting one, from what I can tell the git
commits seem to store a timezone, as well so we can either remove the
local-time lib and find the timezone by some other means or use the
local-time object to get the timezone.

(multiple-value-bind 
    (ns ss mm hh day month year day-of-week 
     daylight-saving-time-p timezone-offset 
     timezone-abbreviation)
    (local-time:decode-timestamp (local-time:now)) 
   timezone-offset)

There might be a better way, what are your thoughts?

Cheers,
Russell

Contributor

woudshoo commented May 10, 2012

Russell Sim
reply@reply.github.com
writes:

(multiple-value-bind 
    (ns ss mm hh day month year day-of-week 
     daylight-saving-time-p timezone-offset 
     timezone-abbreviation)
    (local-time:decode-timestamp (local-time:now)) 
   timezone-offset)

There might be a better way, what are your thoughts?

I have send you a pull request and used, I think all of your
suggestions. I favored keeping the local-time library because
for me it prints nicely the timestamps and it has a standard conversion
to the unix-timestamp format that is needed.

Now an additional question. With the type translations some of the
functions become quite simple. For example:

  (cffi:defcfun ("git_commit_author" %git-commit-author)
      %git-signature
    (commit :pointer))

  ...
  (defun git-commit-author (commit)
      "Given a commit return the commit author's signature."
     (%git-commit-author commit))

This can be simplified to:

  (cffi:defcfun ("git_commit_author" git-commit-author)
      %git-signature
      "Given a commit return the commit author's signature."
    (commit :pointer))

I think this would be nice because it uses the cffi mechanisms and
removes some code. But are you attached to the %git-commit-author
and later wrapping it?

If not I will simplify a few of these functions in a later commit.

In addition, I see there is code in the doc folder.
I wouldn't mind adding some documntation, but what tool do you use to
create the documentation and where and which format for the
documentation do you use?
Any pointer is appreciated.

Wim Oudshoorn.

Owner

russell commented May 11, 2012

Willem Rein Oudshoorn
reply@reply.github.com
writes:

Russell Sim
reply@reply.github.com
writes:

(multiple-value-bind 
    (ns ss mm hh day month year day-of-week 
     daylight-saving-time-p timezone-offset 
     timezone-abbreviation)
    (local-time:decode-timestamp (local-time:now)) 
   timezone-offset)

There might be a better way, what are your thoughts?

I have send you a pull request and used, I think all of your
suggestions. I favored keeping the local-time library because
for me it prints nicely the timestamps and it has a standard conversion
to the unix-timestamp format that is needed.

Now an additional question. With the type translations some of the
functions become quite simple. For example:

  (cffi:defcfun ("git_commit_author" %git-commit-author)
      %git-signature
    (commit :pointer))

  ...
  (defun git-commit-author (commit)
      "Given a commit return the commit author's signature."
     (%git-commit-author commit))

This can be simplified to:

  (cffi:defcfun ("git_commit_author" git-commit-author)
      %git-signature
      "Given a commit return the commit author's signature."
    (commit :pointer))

I think this would be nice because it uses the cffi mechanisms and
removes some code. But are you attached to the %git-commit-author
and later wrapping it?

If not I will simplify a few of these functions in a later commit.

I am not attached to it. I agree completely. Using CFFI translations
is better the only reason I didn't take advantage of the CFFI
translations was I started learning lisp when I started the
documentation was a bit thin so I couldn't get my head around what was
going on. Now you have done a few I think that I would have more
success translating some of my own.

If possible, I think that it would be perfectly valid to rewrite some of
the libgit2 stuff in lisp, if it is sensible to do so.

In addition, I see there is code in the doc folder.
I wouldn't mind adding some documntation, but what tool do you use to
create the documentation and where and which format for the
documentation do you use?
Any pointer is appreciated.

Ah, I'm glad you asked, it is all generated with a tool called sphinx,
it's a python tool that supports generating documentation for many
languages. I wrote a lisp domain (plugin), but it isn't in the best
state, I have been updating it but I hacked it to get it to work for the
initial documentation runs. I'll let you know when it's done.
Hopefully I'll have something for you after the weekend. I'll also
add documentation about how to build the docs and I'll try and make any
scripts OSX compliant, but that might take a few attempts.

I did try and look around for some documentation tools in lisp before is
wrote the sphinx plugin, but I couldn't find any that were heavily
used. It seemed that everyone was rolling their own.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment