GTM healthcheck issue #29
Comments
Yeah, this is still an issue. The healthcheck was improved, but it still doesn't send a GTM-compatible payload. Everything should work fine, though. A patch to improve the healthcheck to not have this error would be gratefully received, if you can find a simple, one-line way to do it without changing lots of things (otherwise, I'd just leave it as-is, even though it's not ideal). |
OK if I come up with anything I will let you know. It's a minor issue and doesn't effect anything running. Just thought I would let you know. |
Okay, thanks. Yes, it shouldn't affect anything. But I would love to resolve this properly. Perhaps we can see what the initial bytes are sent over the wire to advertise as a valid payload, and send them without any further data? i.e. by simulating whatever the Coordinators and Datanodes send in their hello-type message. |
OK I will have a look at this. |
I've tried nc -lkv "${PG_HOST}" 6666 and other host options. Not getting any output though. Do you have any ideas what I should try? I've tried wireshark but that needs a gui. I've looked at nmap but from my understanding it uses netcat anyway. I really would like to get this fixed but I'm not really a networking expert. Ideally I would like to post something on one of 2ndquadrants lists but I haven't had responses in the past so I'm not sure if I'm asking questions in the right place. |
Haven't had chance to look into this much, yet, but I think maybe it's
So, I suppose it's the
So, I guess it's possible to get the info by following that struct, or perhaps seeing if there's a test case somewhere that calls and checks it. Alternatively (and possibly easier), it might be possible to set up something to log incoming traffic, but that would assume a single-stage handshake, which might well not be the case. Or indeed to sniff the traffic as you were looking at. Or, I suppose there's the option to use pgxc_monitor directly—but this seems very heavy, to me, especially as the images no longer contain pgxc_ctl. I'll try to circle back round to this at some point, when I get a bit more time. :) |
Previously, although the healthcheck succeeded and everything seemed to work, the GTM logged error Expecting a startup message, but received � Fix by reverse-engineering the minimal startup packet for the GTM, using tcpdump and nikolaka/netshoot image tcpdump using a command like docker run -it --rm --net container:e0f3eec77071 nicolaka/netshoot \ tcpdump -X -i lo
Using
Stripping the header and null-padding appropriately to not cause GTM errors (such as OOM), a valid connection is: echo -n -e "\x41\x00\x00\x00\x50\x64\x61\x74\x61\x5f\x31\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00" | nc -w 1 "${PG_HOST}" "${PG_PORT}" Replacing name printf "%b" "\x41\x00\x00\x00\x50\x5f\x68\x65\x61\x6c\x74\x68\x63\x68\x65\x63\x6b\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00" | nc -w 1 "${PG_HOST}" "${PG_PORT}" Please see #33 for a working implementation. Please could you kindly try running it locally (I haven't built it into an image), and see if it solves the problem for you? Thanks. |
Hi, |
Hi @JuliuszJ . I think that's in the same contrib set as |
Thank you @tiredpixel for quick response.
It seems that PG-XL team moved pgxc_ctl from contrib to src/bin. pgxc_monitor left as separate contrib.
I am asking because magic scary me ;)
The doc says: "If the target node is running, it exits with exit code zero. If not, it exits with a non-zero exit code. "
No SSH, no setup, simple command line. Thank you |
@JuliuszJ, interesting, thanks; I didn't realise that. Let me take another look at it; I, too, am dubious of needless magic—but equally, I don't want to introduce some whole new piece because of it. But if it's as you say, it might well be suitable. I'll try to find some time in a bit, and run some tests. |
Previously, although the healthcheck succeeded and everything seemed to work, the GTM logged error Expecting a startup message, but received � Fix by replacing netcat with pgxc_monitor, and to check GTM health. Many thanks to @sstubbs for motivating me to fix this, and to @JuliuszJ for the suggestion to use pgxc_monitor instead of magic.
That's much better—thank you @JuliuszJ ! I didn't realise it would be so easy. I've replaced my magic with pgxc_monitor ; it seems to work fine. |
Seems fine to me. This will be included in the next release. |
I seem to be getting this error on the GTM.
Expecting a startup message, but received �
I wonder if it's related to this.
#15
both inserting and querying both coordinators is working though.
I will try and create another cluster and see if this issue is still there.
The text was updated successfully, but these errors were encountered: