Skip to content
This repository has been archived by the owner on May 14, 2024. It is now read-only.

Keras not multi-processing safe ? #8

Open
PaulMuadDib opened this issue Sep 11, 2018 · 2 comments
Open

Keras not multi-processing safe ? #8

PaulMuadDib opened this issue Sep 11, 2018 · 2 comments

Comments

@PaulMuadDib
Copy link

PaulMuadDib commented Sep 11, 2018

Hi there,
I have successfully run the sleep scorer training part (AutoSleepScorerDev) and generated new cnn & rnn weights:

  • they are slightly larger than the weights proposed for direct download (386 662 ko for the cnn weights I have generated, 386 652 ko for the one proposed & named cnn.hdf5, 18 368 ko for the rnn weights I have generated vs 14 212 ko for the rnn.hdf5).
  • for AutoSleepScorer only (not the training part), the keras prediction mode induces multiple runtime errors seemingly due to multiprocessing (I tried to disable it within the code in SleepLoader.py, tools.py, etc...) & not redibitory (it runs till the end & output a predicted hypnogram but not along with the groundtruth one ) like this:

RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

I use Python 3.6.6, tensorflow 1.10.0, tensorflow-gpu 1.10.0, Keras 2.2.2: did you experiment such runtime errors ?
Take care,
Jean-François Baure
PS) Thanks so much skjerns, your code is very interesting.

@skjerns
Copy link
Owner

skjerns commented Sep 14, 2018

I'm impressed you were able to get the training going!

different weights size:
This can have various reasons: The keras and tensorflow versions you are using are muchnewer than the one I used. Some protocol specifications might have changed, especially with all the API-shuffling in tensorflow. Furthermore it is quite possible that I experimented more with different settings after the creation of the weights and maybe they ended up in the repository. The best way to check for errors is to print the summary of the model and check with the specifications that are given in the thesis of the project: If they are the same, then it's some specification that changed.

multiprocessing
Which OS are you using? Multiprocessing is notorious for not working well between different OSs However, as far as I remember, I am not using any multiprocessing myself (mainly due to incompatibility of multiprocessing and tensorflow-gpu). I assume that this is a keras issue, or a more general problem.
Have you enclosed all statements in a if __name__ == 'main': clause? This is most often the problem when having issues with multiprocessing, especially on Windows. See here for more: https://stackoverflow.com/questions/18204782/runtimeerror-on-windows-trying-python-multiprocessing

@PaulMuadDib
Copy link
Author

PaulMuadDib commented Sep 14, 2018

I discovered the power of a GPU thanks to you Simon (after a first run on my i7 CPU of nearly 2 days), it's 15 times faster ! (16 Go of RAM is barely sufficient). You are right, I am on Windows 10, and I will add this week end (I do that on my free time) a if __name__ == '__main__': guard in the main module to see if it prevent this uncontrolled recursive subprocesses creation. Thanks for your lights !

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants