Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Topographica instances fail to initialize in parallel #595

Open
jlstevens opened this issue Oct 12, 2014 · 22 comments
Open

Topographica instances fail to initialize in parallel #595

jlstevens opened this issue Oct 12, 2014 · 22 comments

Comments

@jlstevens
Copy link
Member

Using Lancet, I am launching four concurrent topographica instances at a time. Sporadically this fails when the optimized components are being compiled - I get some sort of complaint regarding a truncated .o file in weave.

I do not have the exact error message to hand as my last attempt was successful, with all four instances running fine - as I mentioned, this problem is sporadic.

@ceball
Copy link
Member

ceball commented Oct 12, 2014

We've had problems in the past with weave not expecting multiple simultaneous compilations, and although our particular problem was addressed there appear to be at least two subsequent bug reports on scipy related to simultaneous compilations:
scipy/scipy#1895
scipy/scipy#2902

@philippjfr
Copy link
Member

What's the status for Cython on DICE? Maybe we can finally switch over to that implementation.

@jlstevens
Copy link
Member Author

My temporary hack is to force a delay between each call to subprocess.Popen in Lancet. It is ugly and should be unnecessary but maybe this will no longer be an issue once we switch over to Cython...

@ceball
Copy link
Member

ceball commented Oct 12, 2014

What's the status for Cython on DICE? Maybe we can finally switch over to that implementation.

There's a trial environment:

[14-10-12-20:39][hebb]: dice-pyenv CNV
    You have now entered the CNV python environment.

    Most python-based commands will work differently while you can
     see "(CNV)" in your prompt.  You can leave the environment
     by closing this shell (for example by typing "exit").

(CNV)[14-10-12-20:39][hebb]: python
Python 2.7.8 (default, Sep 24 2014, 13:04:01)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import cython
>>> cython.__version__
'0.21'

I'm happy to test and/or work on your cython stuff, if that's possible?

@philippjfr
Copy link
Member

That's great news. The cython implementation is a complete drop in replacement for the old weave based code. It's currently in topo.optimized. So all you should have to do is import your response and learning functions from there. Jim did want me to rename it from topo.optimized to topo.cythonopt, so if we're actually going to make the switch I'll rename it tomorrow.

@jlstevens
Copy link
Member Author

Personally, I am against renaming optimized. In my opinion, we should simply move over to Cython and discourage/replace all the weave stuff.

@philippjfr
Copy link
Member

Just tried working with the Cython implementation and found out the functions don't seem to pickle correctly. Will have to look into it and get back to you.

@jbednar
Copy link
Member

jbednar commented Oct 13, 2014

I too have noticed problems when launching simultaneous processes with weave (most recently when benchmarking trout.inf). I think that my workaround was to launch a single process running the same code, letting that run until after weave was done generating its code files, and only then launching multiple simultaneous processes. Obviously that's clunky and error-prone, though.

It would be good to use Cython only. However, such a directory can't be called topo.optimized, because the design currently requires that only Cython-related code can ever go into that directory, which is not only misleading right now (while there is weave-optimized code elsewhere in Topographica), but is also misleading for the future (as Cython is very unlikely to be the last word on optimization in Python). If topo.optimized were to be the home for all such future optimizations, that name would be fine, but instead simply importing topo.optimized pulls in a dependency on Cython, which is not appropriate for a generic container of all types of optimization. So topo.optimized must be named something that indicates clearly that that it is not a container for all optimizations, but simply those that use Cython, regardless of the status of weave. Just like topo.tkgui is not all possible GUI code, but only the Tk version (with a topo.qtgui a possibility someday), topo.cythonopt (or whatever less ugly name someone can suggest) is only the Cython-optimized code, not PyPy code, numexpr code, or CUDA code. Topographica users care a lot about optimization, and Cython is only the best solution we have right now, not the best we'll ever have!

@jlstevens
Copy link
Member Author

Ok. Why not call it topo.pyx if the folder is mostly about interfacing with Cython's .pyx files? Anyone who knows anything at all about Cython will know that this directory contains cython optimizations.

@jbednar
Copy link
Member

jbednar commented Oct 14, 2014

I don't like naming directories using periods, particularly not when there is a recognizable file extension after the period (very confusing!). We don't even have any with underscores, so topo_pyx would look strange. Yet topopyx is hard to parse. What about just pyx? It's already in topo/, so topo is redundant anyway.

@jlstevens
Copy link
Member Author

Just pyx is what I meant - hence my suggestion topo.pyx (the pyx directory would live in topo)

@philippjfr
Copy link
Member

Alright, I'll rename it to topo.pyx and try to fix the snapshotting issues tomorrow and then hopefully we can switch all the models over to cython.

@jbednar
Copy link
Member

jbednar commented Oct 14, 2014

Ah, topo.pyx is fine then.

@jlstevens
Copy link
Member Author

To get back to the original reason for this issue - do we think that Cython will work better than weave in this instance?

I assume is the .so file exists already, nothing will be compiled. However, if no .so file exists and multiple instances are launched at once, won't there be a bunch of cython process trying to compile the same file at the same time?

@ceball
Copy link
Member

ceball commented Nov 2, 2014

Yes, your original question has been hijacked :) I don't know exactly what the problem is you're seeing while using weave (though it seems likely to be multiple simultaneous compilations), and I agree that if multiple cython instances simultaneously try to compile the same file there could be a problem...but to me, it doesn't seem worthwhile trying to address the problem specifically for weave. I guess Philipp will have a better idea about cython, but maybe we should just wait and see at what point (in the cython processing chain) the problem occurs.

I think I tried a variety of things in the past, like set my HOME environment variable (or whatever environment variable(s) controlled the weave compilation and intermediate folders - I can't remember) to point to a folder on the local disk (if one topographica instance per machine), or to be different for each instance (if multiple instances per machine). Or like Jim said, I'd do something to make sure the compilation had already happened before launching my desired simulations (you just have to remember what things might trigger a re-compilation, e.g. you might change a python data type and get a recompilation).

@dabliss
Copy link

dabliss commented May 7, 2015

any updates on this? has using cython instead of weave been helpful? have any of @ceball's recommendations for getting around the weave issue proven useful?

@jbednar
Copy link
Member

jbednar commented May 10, 2015

ceball's suggestions do usually work for this. I don't think weave has fixed this problem, and as no one is working on weave that I know of, I doubt it will be fixed. Topographica does have cython versions of its weave components, but they are significantly slower, unfortunately.

@philippjfr
Copy link
Member

Topographica does have cython versions of its weave components, but they are significantly slower, unfortunately.

Is this true? When I implemented and profiled them, they came out slightly faster if anything. Are you talking about the sparse implementation, because we also have a Cython wrapper around the weave implementation, which is what I'm talking about. We just haven't gotten around to setting up the setup.py correctly for that implementation to work properly.

@jlstevens
Copy link
Member Author

We really should switch to cython as default, leaving weave as a fallback or for the occasional optional thing that still need it (e.g some of the color related stuff). Eventually, we need to get rid of weave anyway as it is essentially a dead project now.

@jbednar
Copy link
Member

jbednar commented May 10, 2015

I'm talking about the sparse components, which is what's usable right now and has been fully benchmarked by Ignotas. It would be great to make the dense Cython components ready for production use, and to similarly document their performance. And if it's really as fast, then getting rid of weave would be great!

@jlstevens
Copy link
Member Author

I think we should have Cython working in setup.py correctly to make it easy to use first. That way, it will be easy to use and also make it easier to convince you that it is production ready. :-)

@dabliss
Copy link

dabliss commented May 10, 2015

thanks for the reply, @jbednar. i encountered the same issue you all did when i was using weave in brian2. switching to cython solved the problem for me, as it does not have this race condition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants