Skip to content

Conversation

@mgautierfr
Copy link
Contributor

No description provided.

The blob is mostly a pointer to a char array.
We must keep a reference on the `bytes` to be sure that the data is not
gc while we are using it.
(Assuming the python ZimBlob is not gc either)
If we already have a ZimBlob we don't need to get the data a second time.
Reuse the one we have.

More that a performance point, this is needed to avoid to loose the
reference to the existing blob (and so gc internal data).
The `*_cy_call_fct` will be called by the C++ code in a different thread.
To avoid race condition, we need to acquire the GIL before doing python
operation.

However, those function will also be called from the same thread
(in addArticle). To avoid deadlock we need to release the GIL when we
enter C++ function.
This ZimArticleWrapper that contains ZimArticle.

We create the wrapper "on the fly" when we need it.
The wrapper will increment the ref to ZimArticle to be sure that
ZimArticle is alive as long as the wrapper is.
Even if the ZimArticle is not referenced anymore in the python world.
If the ZimArticle is not referenced in the python world after the call
to add_article[1], the only reference to the zimArticle is the one the
wrapper has.

The wrapper is "stored" in a std::shared_ptr. When libzim ends using it,
it will decrement the reference in the std::shared_ptr, and if it comes
to 0, it will deleted the Wrapper, which in turn, will decrement the
ref to the article, and potentially makes the ZimArticle to be deleted.

However, everything is done in a different thread, so we must protect
the decrement by acquiring the GIL first.

[1] As this :
```
for i in range(N):
    article = ZimArticle(...)
    zimCreator.add_article(article)
```
None cannot be use to create a ZimBlob.
@mgautierfr
Copy link
Contributor Author

Also fix #12 and #13.

@pirate pirate requested a review from jdcaballerov April 23, 2020 01:30
Copy link
Contributor

@jdcaballerov jdcaballerov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great !

@jdcaballerov jdcaballerov self-requested a review April 23, 2020 04:08
@jdcaballerov
Copy link
Contributor

jdcaballerov commented Apr 23, 2020

@mgautierfr I think we also need to acquire the GIL at the helper (shared by all calls). I tested with this and works great with 100,000 arts compressed and indexed. I was still getting some SIGSEV before this.

e15448e

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

finalize() segfaults if missing metadata ZimCreator must delete the c_creator when it is deallocated. ZimBlob must keep a reference on the content.

4 participants