Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Current 'dev' simstudy using coder + cluster = broken? #66

Open
asoplata opened this issue Oct 12, 2015 · 5 comments
Open

Current 'dev' simstudy using coder + cluster = broken? #66

asoplata opened this issue Oct 12, 2015 · 5 comments

Comments

@asoplata
Copy link
Collaborator

So, in trying map out the control flow of the program, I was trying to look at this scenario: when you want to use the coder, in a batch job on the cluster. However, it kept completely crashing (as in, MATLAB itself was crashing), and when I was debugging it it was giving me a

"cp cannot stat /$PATHTODNSIM/dnsim/$PATHTODNSIM/dnsim/odefun/default_8/odefun_20151009_<whatever>_mex.mexa64: no such file"

error. I did some digging, and now I'm really confused, because I'm not sure how the code currently in simstudy can properly run in this situation (more in a sec). I also think it's possible that the system/stdout error it was getting was enough (because it was a system error) to tell MATLAB to just halt entirely, but idk much about the interface between MATLAB and the OS.

@jsherfey @kupiqu Do either of you run simulations, successfully, using the current dev version of the code, with the coder activated, on the cluster?

(All of this applies to both the mex-file and the m-file treatment in simstudy.m) Currently, on the try block that starts on Line 63 of simstudy https://github.com/jsherfey/dnsim/blob/dev/matlab/functions/simstudy.m#L63 , "filemex" contains the full path up to and including odefun_subdir plus the actual mexfile filename. On line 230, the system tries to copy "fullfile(cwd,filemex)" to somewhere (cwd is made the pwd on Line 180), but this ends up concatenating the current working directory directly to the string of the entire filepath of the mexfile, creating the false mega-filepath mentioned in the error above. Maybe this was affected by July changes (there were a lot) to "dnsimulator.m", but that would mean there would have been big changes to what "odefun_subdir" represented, which I think is unlikely (but possible).

Maybe this scenario can still work, given that your code, and your rootdir, are both in the right place, but I don't know what that combination is - it's certainly not mine and I've tried using dnsim code directly from my personal account, with both a pwd rootdir and a /projectnb rootdir. But going through the filemex code, I just can't see how it would ever work. Is this the version of using the coder on the cluster that you use? Does it work for anyone? I think Salva doesn't use simstudy, but don't you use it Jason? If not, it's possible that using the coder on the cluster is broken this way...

If not, can either of you test this scenario, using the dev version of the code?

@kupiqu
Copy link
Collaborator

kupiqu commented Oct 12, 2015

This change was introduced by me following the logic that we talked about recently. It works for me both locally as well as in the cluster, but again I didn't try simstudy.

BTW, "whatever" refers to HHMMSS_MS_jidJOBID#

@kupiqu
Copy link
Collaborator

kupiqu commented Oct 12, 2015

Austin, could you try to run a runsim simulation directly and report the result? That would help to know whether this is sth entirely in the context of simstudy, or alternatively, whether dnsimulator itself is failing in some conditions...

@asoplata
Copy link
Collaborator Author

I didn't test runsim but instead only simstudy, as I've never run runsim directly -- but I will try.

@kupiqu
Copy link
Collaborator

kupiqu commented Oct 12, 2015

I know, but I think it may help to locate the root of the problem, whether the logic introduced in dnsimulator, or how it is used by simstudy, or both...

@asoplata
Copy link
Collaborator Author

I actually don't know what I would need to change to get a runsim simulation, using coder, that dispatches the job to the cluster automatically -- if I just pass all the same arguments to runsim, will runsim alone properly configure the coder and properly send the job to a cluster batch?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants