Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leaks in Zipper & Unzipper #16

Closed
oranja opened this issue Jun 5, 2017 · 9 comments
Closed

Memory leaks in Zipper & Unzipper #16

oranja opened this issue Jun 5, 2017 · 9 comments

Comments

@oranja
Copy link

oranja commented Jun 5, 2017

Hi,

First of all, thanks for this library.
I need some pretty straightforward zip file manipulation (take an existing zip, add a file from memory stream, close), and this helps me do it with 3 simple and straightforward code lines that read exactly that story (ctor/open, add, close).

While testing for any memory leaks in my program (with valgrind), I found out that I have some small memory leaks any time I do this process, and I pinned it down to Zipper.
I fixed the code enough that my Zipper use case does not leak anymore, and the test doesn't show any other leaks in Zipper.
However the valgrind test shows leaks in Unzipper too. I made some changes there, but not enough to fix all leaks apparently.

If you want to take a look at the changes I made, take a look at oranja/zipper, but note that (a) it's not a complete solution, and (b) I also made changes to use the latest minizip code from nmoinvaz. This is why I didn't create a PR yet.

@sebastiandev
Copy link
Owner

sebastiandev commented Jun 5, 2017

@oranja thanks for fixing this, i'm glad that you've found zipper helpful.

I do have some objections here. The initialization fixes look good, but i personally don't like having pointers around, and that was the main reason for methods to extract references instead of receiving pointers. So, I would prefer to keep it that way.

Regarding the other changes on the code, some related to types and date handling, Are you sure those wont break in other platforms? this code was changed in order to run on multiple environments, linux, iOS, windows, embedded systems, etc. I would only add the changes related to the memory leak fix, unless we are 100% sure the others doesn't break compilation, since we don't have zipper configured on travis yet (I would love if someone with experience configuring travis for c++ could help here)

@fbergmann what do you think about this?

Anyway, if you agree, please go ahead and make a PR with the memory leak only fixes.

Thanks again!

@oranja
Copy link
Author

oranja commented Jun 5, 2017

I personally don't like working with raw pointers either,
but I don't like forcing the compiler's hand into accepting code that shouldn't pass - even more.

The way I see it, by taking a reference to an external object we're taking a risk, and the least sensible thing to do would be to hide this risk. References always seem safer, but here I don't think it's the case.
Maybe it would work if the library requested a shared_ptr or a weak_ptr, or explicitly took ownership of the stream with a unique_ptr, but these are out of the question since #8. Also, it would force the client to allocate his/her buffer or stream on the heap (might be a good idea, but shouldn't be forced).

Alternatively, and maybe even more ideally, streams and buffers can be asked by the library only when they are really needed, so the API would look more like:

Zipper::Zipper(istream &input_stream) {
    // init in-memory zip from stream - exactly how it works now
    // but don't keep the stream after that.
}

Zipper::add(istream& input_stream) {
    // compress and add to in-memory zip structure
}

Zipper::write(ostream &output_stream) {
   // write the entire in-memory zip structure to the given output stream
}

Zipper::close() {
  // clear the in-memory data. hopefully the data was written somewhere earlier.
}

File operations can be easy with a few convience methods that create streams from files and even complete directories.

Anyway, this is a complete overhaul of the API, and naturally it's a tougher decision and change to make.
The leaks can be fixed while keeping the current code, and I will submit a PR if I find the time.

As for the timestamp handling:
It's a flaw of the format that it stores timestamps as MSDOS timestamps, meaning no timezone information and maximal resolution of 2 seconds (it's either 16:40:22 or 16:40:24, never 16:40:23). With extensions, however, (which are either supported or ignored, so shouldn't hurt any specific implementation), it is possible to store proper timestamps with higher resolution, which are future-proof and are either UTC or specify their offset from UTC.
Hence I believe that some proper timestamp struct should be used as widely as possible, and only converted to MSDOS/FAT timestamps when writing or reading such fields in the zip structure.
I couldn't fight the urge to remove the redundency and complication of timestamps as they are now in the code, but I didn't test this nearly enough, and I don't suggest that you should take my changes blindly.
I want to write some tests so I can make sure that proposed code handles timestamps as least as good as the old code, and with compliance to other standard zip handlers - like WinZip or what have you.
But again, this takes time, and I'm stretching my deadlines already.

@Lecrapouille
Copy link
Collaborator

@oranja, @sebastiandev

I'm following this thread because I also confirm memory leaks and found interesting the talk about "references vs pointers as members":

std::iostream *pios = new ...
std::shared_ptr<std::iostream> sios(pios);
sios->xxx();
  • You did not talk the case with a thread a user can remove an element of the std::vector &buffer while the zipper lib was iterating on it. Whatever using references, pointers or smart pointers will produce a crash. You have to use mutex (but not cool).

  • As explained, the main problem changing references for smart pointers, is that modifying public methods in a library (or API ...) is forbidden without changing its version (implying tagging your git repo and creating a new branch). Else users will have their software not compiling when updating their lib.

  • I like oranja's API suggestion: its looks simple (and thread safe ?). I would suggest to add an empty constructor. Zipper::Zipper() and a method Zipper::open(std::string const& zipname, ...) This will avoid calling a new in the potential case you do not know yet the zip filename to open.

  • I do not know well how to this lib, but cannot we just copy the buffer passed inside the constructor (simply) ?

  • An overkill solution will be to use observer patterns, and get std::iostram or std::vector be changed to an Obversable class. When an Observable is destroyed will prevent the Observer to stop using the reference. But here again that will change api.

PS:

(I would love if someone with experience configuring travis for c++ could help here)

I may help if you still want CI. zipper works in Travis for ubuntu and OS X. Concerning appveyor, I did not make progress.

@sebastiandev
Copy link
Owner

@oranja @Lecrapouille Guys, first of all, thanks for taking the time to contribute and improve the library.
I've departed from c++ daily coding so I'm a bit outdated. I've seen some questions about the library, and some other requests like the ones you are making and got me thinking. @fbergmann made awesome contributions to make zipper run in many platforms, but that comes with some limitations.
So i'm thinking that maybe we can for the library, have a c++17 version that its pure c++17 and the current one which is a more embeddable and portable version. (sort of like python2 and python3 libs)
The new fork may look the original version, making use of fancy new c++ features, make the code base a lot smaller, and here since we'll start from scratch, we can actually think about making these interface changes you are proposing.
Im in favour of simplicity and explicitness, so whatever is simpler and we let the user know about it so he can take the necessary actions. If examples for common scenarios are added, then I think that would be enough.

What do you think? If I see two thumbs up, ill create a new issue for the new fork where we can discuss the implementation details/interface changes

Thanks!

@Lecrapouille
Copy link
Collaborator

@sebastiandev

@fbergmann made awesome contributions to make zipper run in many platforms, ... So i'm thinking that maybe we can for the library, have a c++17 version that its pure c++17 ...

I'm not expert in c++17 but not all compilers are c++17 ready, so you'll limit yourself to some compiler / architecture. Look: http://en.cppreference.com/w/cpp/compiler_support

the new fork may look the original version, making use of fancy new c++ features, ... ill create a new issue for the new fork where we can discuss the implementation details/interface changes

Yes !! In addition we are drifting to another subject than memory leaks.

@oranja
Copy link
Author

oranja commented Sep 24, 2017

My main argument against having a fork at this point is that I don't think we have enough coding force here to maintain two forks. Besides, IMHO, the current code base is not far enough from good shape that it justifies starting from scratch over fixing a few issues.

Now, it's a real mystery how C++ leads the industry all these years without even the most basic path manipulation utility, but I think first priority should be to get the current non-C++17 version to work well.
It should be doable. There are a few hotspots where we need to normalize the paths, and the rest is pretty much taken care of. We can take guidance and even whole code segments from Boost::Filesystem's sources (with the appropriate attribution and license) to make sure we get it right.

I would consider C++11, though. It is already supported by the major compilers enough for our needs - mainly smart pointers.

Perhaps a separate issue dedicated to this discussion would be best, as suggested.

p.s.
Sorry for never submitting the corrected PR as suggested.
I am both working hard to finish my current project at work and looking for a new work place (brushing up on CS topics, problem solving and interviews). Hoping to get back to it soon.

@sebastiandev
Copy link
Owner

Although i haven't seen the thumbs up 😃 , I've created #24 to continue the discussion

@Lecrapouille
Copy link
Collaborator

Lecrapouille commented Sep 25, 2017

I can take a moment for solving memory links based on Oranja's work. I made a quick look at both code and now I understand better why oranja wanted use pointers. In current zipper code, in constructors there is a mixture between references to external objects and references to allocated internal members, making impossible to call delete (without additional informations). Look:

Zipper::Zipper(const std::string& zipname) : m_obuffer(*(new std::stringstream())) ...

versus:

Zipper::Zipper(std::iostream& buffer) : m_obuffer(buffer) ...

With the 1st constructor: we have to call delete in the destructor to release memory, but this will crash with the second constructor. For the moment, I did look in deep the code so I don't know if we can make a simple copy of std::vector buffer or std::iostream& buffer inside class members.

PS: std::iostream cannot be copied or moved :(

@Lecrapouille
Copy link
Collaborator

I may found a fix. I will create a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants