-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
External Match doesn't work for me #135
Comments
I have been digging into this off and on, in between coding and other work, since I first saw your post, and I'm afraid I can't say definitively that I know what's wrong. But I'll share my efforts so far, and hopefully we can get this fixed for you anyway! First off, you didn't mention the specific version of NHC you're using. Everything in this post is based on the current dev branch here on GitHub (specifically, commit dc10825); I did not test prior releases. In order to get a clear view of what was going on, I started out by running ### test.conf
* || declare -p NHC_MCHECK_DELIM NHC_MCHECK_COMMAND
* || set | fgrep NHC_MCHECK_
@gpu@ || echo "GPU node"
!@gpu@ || echo "Not a GPU node" For expedience, I opted to put the external match settings on the command line, at least initially, using essentially the same settings you provided above (save the partition name, of course), and I ran nhc -avl - -c test.conf HOSTNAME=some-gpu-node NHC_MCHECK_DELIM[0]=@ NHC_MCHECK_COMMAND[0]='"sinfo -hp %m --format=\"%n\" | fgrep -qw %h"'
nhc -avl - -c test.conf HOSTNAME=some-nongpu-node NHC_MCHECK_DELIM[0]=@ NHC_MCHECK_COMMAND[0]='"sinfo -hp %m --format=\"%n\" | fgrep -qw %h"' The above settings/commands work perfectly, so I'd be curious to hear whether or not they work on your system. You may have noticed the single- and the double-quoting of the Unfortunately, when trying to move the external match settings from the command line to the global config in So I was able to get it to work reliably using exactly these two lines: NHC_MCHECK_DELIM=( [0]=@ )
NHC_MCHECK_COMMAND=( [0]='sinfo -hp %m --format="%n" | fgrep -qw %h' ) Can you try exactly those settings and see if they work for you? |
Thanks a lot! |
Awesome! I'm very glad to hear you got it working. :) I'll go ahead and close this, but let me know if you run into any other issues! |
Hello,
I'm trying to group NHC checks per partition instead of per node.
I have some nodes in a non-standard partition and will need to run different checks on them. Unfortunately they have the same naming convention as the standard nodes. So wildcards can not be used.
Learning about external match possibility here https://github.com/mej/nhc/blob/9c4a38c0c9f48f92005c9120ca88145c33841dac/scripts/common.nhc#LL296
I've tried to add the following to /etc/sysconfig/nhc:
NHC_MCHECK_DELIM=( [0]="@" )
NHC_MCHECK_COMMAND=(
[0]="sinfo -p %m --format="%n" | grep -v HOSTNAMES | fgrep -w %h"
)
and in "nhc.conf" I use @sra@ as "sra" is the partition name.
However this doesn't seem to be working. Checking the logs, "mcheck_external()" doesn't seem to be ever called. It seems to be trying to match it as glob
">[{L2/S0/D5/R1}@common.nhc:290:mcheck_glob()]> dbg 'Glob match check: hl-codon-113-01 does not match @sra@'"
Any hints on how to make this work?
Thanks
The text was updated successfully, but these errors were encountered: