Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

slurm_job_terminate core dump #26

Closed
laparn opened this issue Jan 4, 2013 · 4 comments
Closed

slurm_job_terminate core dump #26

laparn opened this issue Jan 4, 2013 · 4 comments

Comments

@laparn
Copy link

laparn commented Jan 4, 2013

Hello to all,

I am trying on slurm 2.5 slurm_job_terminate. What I am doing :

import pyslurm
jobs=pyslurm.job()
dod=jobs.get()
dod
{174: {u'comment': None, u'time_limit': 26L, u'cnode_cnt': None, u'alloc_node': u'D3550', u'features': [], u'eligible_time': 1357308185, u'contiguous': False, u'resv_id': None, u'ramdisk_image': None, u'block_id': None, u'sockets_per_node': 65534, u'req_switch': 0L, u'resv_name': None, u'licenses': {}, u'qos': None, u'submit_time': 1357308185, u'mloader_image': None, u'num_cpus': 1L, u'conn_type': (None, 'None'), u'show_flags': 0, u'user_id': 1001L, u'network': None, u'restart_cnt': 0, u'work_dir': u'/home/arnaud/src/slurmjob', u'pn_min_tmp_disk': 0L, u'max_nodes': 0L, u'job_state': (1, 'RUNNING'), u'assoc_id': 0L, u'exit_code': 0L, u'num_nodes': 1L, u'priority': 4294901721L, u'batch_script': None, u'boards_per_node': 0, u'ntasks_per_socket': 65535, u'batch_flag': 0, u'derived_ec': 0L, u'nodes': None, u'preempt_time': 0, u'pn_min_cpus': 1, u'nice': 10000, u'ntasks_per_node': 0, u'linux_image': None, u'altered': None, u'sockets_per_board': 0, u'alloc_sid': 10014L, u'start_time': 1357308185, u'pre_sus_time': 0, u'ionodes': None, u'state_reason': (0, 'None'), u'pn_min_memory': 0L, u'rotate': False, u'reboot': None, u'blrts_image': None, u'shared': 0, u'time_min': 10L, u'wait4switch': 0L, u'ntasks_per_core': 65535, u'wckey': None, u'account': None, u'requeue': True, u'name': u'wait-arg.sh', u'req_nodes': [], u'gres': [], u'suspend_time': 0, u'partition': 'debug', u'cores_per_socket': 65534, u'batch_host': u'D3550', u'dependency': None, u'max_cpus': 0L, u'state_desc': None, u'command': u'/home/arnaud/src/slurmjob/./wait-arg.sh', u'end_time': 1357309745, u'cpus_per_task': 1, u'resize_time': 0, u'group_id': 1001L, u'exc_nodes': [], u'threads_per_core': 65534}}
pyslurm.pyslurm.slurm_terminate_job(174)
Erreur de segmentation (core dumped)

Strange as at the same time notify, kill, suspend and resume are perfectly working.

@gingergeeks
Copy link
Member

Another one for us to investigate, what platform/cluster are you doing this on ?

Thanks for testing things out !

Mark

@gingergeeks
Copy link
Member

I've been able to reproduce and I'm going through the slurm source to trace the issue

@gingergeeks
Copy link
Member

I wrote a very small C program to call the same function for a jobstep (slurm-2.5.0) which generated a core. Using gdb I get

Program terminated with signal 11, Segmentation fault.
#0  0x00007fef6075631a in select_g_select_jobinfo_pack (jobinfo=, buffer=0x618578,
    protocol_version=6400) at node_select.c:1010
1010                    pack32(*(ops[plugin_id].plugin_id), buffer);

So it looks and smells of a Slurm issue, I will test slurm-2.5.1 as well and raise a ticket or chat with Danny and Moe.

@laparn
Copy link
Author

laparn commented Feb 4, 2013

Nice. Thanks for having traced it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants