Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed CreatorData destructor #666

Merged
merged 3 commits into from
Feb 16, 2022
Merged

Conversation

veloman-yunkan
Copy link
Collaborator

Fixes openzim/zim-tools#289

Clusters owned by CreatorData are used in the worker threads. If the CreatorData destructor is executed as a result of stack unwinding caused by an exception raised before CreatorData::quitAllThreads() is called from Creator::finishZimCreation(), then the worker threads are still running and should be stopped by the CreatorData::quitAllThreads() call from CreatorData::~CreatorData(). However, that must happen before the clusters are destroyed.

@codecov
Copy link

codecov bot commented Feb 13, 2022

Codecov Report

Merging #666 (34fc632) into master (a12d14a) will increase coverage by 0.13%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #666      +/-   ##
==========================================
+ Coverage   84.78%   84.92%   +0.13%     
==========================================
  Files          98       98              
  Lines        4225     4231       +6     
  Branches     1890     1892       +2     
==========================================
+ Hits         3582     3593      +11     
+ Misses        642      637       -5     
  Partials        1        1              
Impacted Files Coverage Δ
src/writer/creator.cpp 86.66% <100.00%> (+1.55%) ⬆️
src/writer/xapianHandler.cpp 98.21% <0.00%> (+1.78%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a12d14a...34fc632. Read the comment docs.

@kelson42
Copy link
Contributor

kelson42 commented Feb 13, 2022

@veloman-yunkan Do we have an easy way to test this scenario? Maybe at zimwriterfs level?

I have also tried this patch with zimwriterfs and now it seems to leave the process too early. I don't get my ZIM file, the .zim.tmp file is left and zimwritefs exits with 1. I have sent to you the tarball of HTML data per private chat).

@veloman-yunkan
Copy link
Collaborator Author

@veloman-yunkan Do we have an easy way to test this scenario? Maybe at zimwriterfs level?

@kelson42 I have added a new unit-test targeting that scenario here in libzim

I have also tried this patch with zimwriterfs and now it seems to leave the process too early. I don't get my ZIM file, the .zim.tmp file is left and zimwritefs exits with 1. I have sent to you the tarball of HTML data per private chat).

What is the desired behaviour? Are you concerned that the .zim.tmp file was not removed, or you'd like zimwriterfs to report but otherwise ignore any errors and keep creating the ZIM file?

@kelson42
Copy link
Contributor

I have also tried this patch with zimwriterfs and now it seems to leave the process too early. I don't get my ZIM file, the .zim.tmp file is left and zimwritefs exits with 1. I have sent to you the tarball of HTML data per private chat).

What is the desired behaviour? Are you concerned that the .zim.tmp file was not removed, or you'd like zimwriterfs to report but otherwise ignore any errors and keep creating the ZIM file?

I'm a bit concerned to open the Pandora box, but let's go:

$ zimwriterfs --welcome index.html --illustration=icon.png --language=en --title=TheMathPage --description="TheMathPage by Lawrence Spector" --creator="Lawrence
Spector" --publisher="Kiwix" --verbose /home/kelson/Downloads/themathpage.com/ themathpage.zim 
Visiting directory /home/kelson/Downloads/themathpage.com
Visiting directory /home/kelson/Downloads/themathpage.com/aTrig
Visiting directory /home/kelson/Downloads/themathpage.com/aTrig/trig_img
Visiting directory /home/kelson/Downloads/themathpage.com/aTrig/TRIG_IMG
Visiting directory /home/kelson/Downloads/themathpage.com/aTrig/Trig_IMG
Visiting directory /home/kelson/Downloads/themathpage.com/aTrig/Trig_img
Visiting directory /home/kelson/Downloads/themathpage.com/aTrig/trig_IMG
Visiting directory /home/kelson/Downloads/themathpage.com/aBookI
Visiting directory /home/kelson/Downloads/themathpage.com/aBookI/geo_IMG
Visiting directory /home/kelson/Downloads/themathpage.com/aBookI/geoProblems
Visiting directory /home/kelson/Downloads/themathpage.com/aBookI/geo_img
Visiting directory /home/kelson/Downloads/themathpage.com/aBookI/GeoProblems
Visiting directory /home/kelson/Downloads/themathpage.com/aBookI/Geo_IMG
Visiting directory /home/kelson/Downloads/themathpage.com/aBookI/Geo_img
Visiting directory /home/kelson/Downloads/themathpage.com/aBookI/Geo_Img
Visiting directory /home/kelson/Downloads/themathpage.com/aCalc
Visiting directory /home/kelson/Downloads/themathpage.com/aCalc/calc_IMG
Visiting directory /home/kelson/Downloads/themathpage.com/aCalc/calc_img
Visiting directory /home/kelson/Downloads/themathpage.com/aCalc/Calc_IMG
Visiting directory /home/kelson/Downloads/themathpage.com/Arith
Visiting directory /home/kelson/Downloads/themathpage.com/Arith/Ar_imgD
T:0; A:1000; RA:0; CA:119; UA:881; C:0; CC:0; UC:0; WC:0
Visiting directory /home/kelson/Downloads/themathpage.com/Arith/Ar_Pr
Visiting directory /home/kelson/Downloads/themathpage.com/Arith/Ar_img
zimwriterfs: 
  • I don't have a definitive opinion if the process should stop or not because it is not clear what cause it to stop here... but if there is one small incongruity in a HTML file, I don't think it should stop (but it should be reported).

@veloman-yunkan
Copy link
Collaborator Author

In this particular case, the absence of information about the error is caused by an empty message string in zim::FS::openFile():

libzim/src/fs_unix.cpp

Lines 80 to 87 in c0486c3

FD FS::openFile(path_t filepath)
{
int fd = open(filepath.c_str(), O_RDONLY);
if (fd == -1) {
throw std::runtime_error("");
}
return FD(fd);
}

@kelson42
Copy link
Contributor

@veloman-yunkan OK... the dev was not well inspired here ;) We should at least fix this IMO. Something like "Unable to open , file does not exist or is not readable."

@kelson42
Copy link
Contributor

@veloman-yunkan Now at least, with #667, on the top, the process does not crash and we have a meaningfull error!

@veloman-yunkan
Copy link
Collaborator Author

@veloman-yunkan Now at least, with #667, on the top, the process does not crash and we have a meaningfull error!

@kelson42 Though the error was still a little strange. #668 made it more elaborate, highlighting that the problem is with too many open files.

@veloman-yunkan veloman-yunkan changed the title Made CreatorData destructor exception-safe Fixed CreatorData destructor Feb 13, 2022
@kelson42
Copy link
Contributor

@veloman-yunkan I have merged #668 and with increasing the max limit of files open with ulimit, I get the proper ZIM. Thank you very much for helping solving what looks like to be a multi-level problem.

So now I see the following questions/bugs left:

  • Why it seems that recent zimwriterfs is totally broken and we have not been able to detect it with the CI?
  • Why zimwriterfs needs now so many filedescriptors that the user has to tweak its setup?

Maybe you have already answers or we should open tickets? Let me know your opinion please.

The new unit test tries to provide some evidence that interrupted
creation of a ZIM file (when `zim::Creator` is destroyed without
`zim::Creator::finishZimCreation()` having been called) doesn't end up
similar to openzim/zim-tools#289.

So far this is not the case, that's why the new unit-test fails with a
crash.
Clusters owned by `CreatorData` are used in the worker threads. If the
`CreatorData` destructor is executed as a result of stack unwinding
caused by an exception raised before `CreatorData::quitAllThreads()` is
called from `Creator::finishZimCreation()`, then the worker threads are
still running and should be stopped by the `CreatorData::quitAllThreads()`
call from `CreatorData::~CreatorData()`. However, that must happen
*before* the clusters are destroyed.
@veloman-yunkan
Copy link
Collaborator Author

That PR (#603) only addressed a successful ZIM creation flow. I have now added to this PR a commit that removes the ".zim.tmp" temporary file in case of a failed flow.

@veloman-yunkan
Copy link
Collaborator Author

So now I see the following questions/bugs left:

  • Why it seems that recent zimwriterfs is totally broken and we have not been able to detect it with the CI?

The test-suite of zimwriterfs is quite small. There are only 3 basic tests and you can judge from their names about the functional coverage of the test-sutie:

TEST(ZimCreatorFSTest, MinimalZim)
TEST(ZimCreatorFSTest, SymlinkShouldCreateRedirectEntry)
TEST(ZimCreatorFSTest, ThrowsErrorIfDirectoryNotExist)
  • Why zimwriterfs needs now so many filedescriptors that the user has to tweak its setup?

Maybe you have already answers or we should open tickets? Let me know your opinion please.

I don't have a ready answer to this question. I don't think that it will be difficult to find out but let's better do it under a different ticket.

@mgautierfr
Copy link
Collaborator

The failing CI on Macos is not related to this change.
We have to understand why we use so many fd, but I agree with @veloman-yunkan, it should be made in another issue/pr.

I'm merging

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Segmentation fault with zimwriterfs
3 participants