-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jobs indicated as running never actually start. #42
Comments
There is substantial overhead in starting jobs up on SGE (about a minute), so even when it says "running", that may not actually be true. GridMap is intended to be used for tasks that will take at least a few minutes to run, because otherwise the overhead is not in any way worth it. The example is kind of a bad one, because the calculations are so fast, so all you'll notice is the overhead. If you let it run for like 5 minutes and it still doesn't finish, then there's probably a real issue. As for JobMonitor, if you want more info you can either set the logging level to DEBUG (which will give you a ton of information), or run If you want to know more about how things work, check out this detailed rundown on the wiki. I'm well aware that the documentation for GridMap could use some work (see #39), but I actually no longer actively use gridmap because I've changed jobs and now work at a company that doesn't use SGE (or any DRMAA-compatible grid). If you want to help out with documentation or by tackling any of the open issues, please make a PR. Thanks for offering! |
Thanks for your reply.
If I can get this to work on our clusters I'll gladly contribute to documentation as I go along and figure things out. If this works for what im trying to |
The |
I am hitting the exact same issue. Was this ever fixed? Thanks |
Found the issue for my case, leaving some traces in case anyone else comes here: I am using SGE grid. Checking job status after they finished showed:
Turned out the default temp_dir (defined as /scratch/ in gridmap.conf) exists but is inaccessible in my case. This error is not caught by _append_job_to_session in job.py.
I am not sure what the intended way of overriding gridmap.conf default values is. |
Running into the same issue as people above me.
The code never reaches the |
Hello!
I really like your project but I'm having trouble running your example code in
examples\manual.py
.When I run it I get the promising output:
The output of qstat also looks fine:
As you can see, the jobs are indicated as (r)unning.
The problem however is that the jobs never actually seem to finish. Which is odd since the calculation should when done locally
takes about 10 seconds. As expected since the function
sleep_walk(10)
is being called.I then modified your example to skip the sleep function and write out a file called
test.txt
. But nothing ever happens.Which brings me to my second question. How do I use the JobMonitor feature? I didnt gather much information from your
documentation I'm afraid.
Any help is much appreciated. Also if there is any way I can contribute please let me know.
Kai
The text was updated successfully, but these errors were encountered: