v0.3.5 nil-pointer panic in ecdhSession.BroadcastPublicKey on rolling restart

## Summary

mpcium v0.3.5 nodes panic with `crypto/ecdh.(*PublicKey).Bytes` nil pointer deref during ECDH session bootstrap when one peer restarts mid-handshake. Auto-recovers (Docker `--restart` policy + ECDH retrigger logic eventually agree) so it's self-healing, but worth fixing because:

- Each panic adds ~10-15s before the node rejoins quorum.
- During a planned rolling restart, the panic+recover cycle effectively forces a slower roll than the operator anticipates.
- `goroutine 29 [running]` panics propagate as `panic: runtime error` to stderr — gets noisy in log shipping.

## Reproduction

3-node cluster on `mpcium:v0.3.5` (built from upstream Dockerfile, distroless image). Sequential `docker restart mpcium-nodeN` for N=0,1,2 with ~30s gap between each.

Repro hit ~50% of restarts in a 6-restart window (3 nodes × 2 cycles during a config change).

## Stack trace

```
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0xcd5ebf]

goroutine 29 [running]:
crypto/ecdh.(*PublicKey).Bytes(...)
	/usr/local/go/src/crypto/ecdh/ecdh.go:72
github.com/fystack/mpcium/pkg/mpc.(*ecdhSession).BroadcastPublicKey(0xc00001c800)
	/src/pkg/mpc/key_exchange_session.go:155 +0x3f
github.com/fystack/mpcium/pkg/mpc.(*registry).triggerECDHExchange(0xc00017c340)
	/src/pkg/mpc/registry.go:164 +0x44
created by github.com/fystack/mpcium/pkg/mpc.(*registry).registerReadyPairs in goroutine 52
	/src/pkg/mpc/registry.go:123 +0x278
```

## Likely cause (speculative)

`registerReadyPairs` spawns goroutines that call `triggerECDHExchange` → `BroadcastPublicKey`. If a peer restarts after `registerReadyPairs` schedules the broadcast goroutine but before `BroadcastPublicKey` reads the peer's PublicKey from whatever shared state holds it, the PublicKey is nil. `crypto/ecdh.(*PublicKey).Bytes` doesn't tolerate a nil receiver.

A nil-check at `key_exchange_session.go:155` (or wherever the receiver is dereffed) would convert the panic to a clean error returnable to the registry — let it retry the broadcast on the next ECDH retrigger pass instead of crashing the process.

## Environment

- mpcium v0.3.5 (`mpcium-cli version` confirms)
- distroless image built from upstream `Dockerfile`
- 3 nodes co-located on a single VPS (will move to separate hosts in a future op — not relevant to this bug)
- NATS v2.x, Consul 1.15.4

Happy to dig further if useful.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.3.5 nil-pointer panic in ecdhSession.BroadcastPublicKey on rolling restart #158

Summary

Reproduction

Stack trace

Likely cause (speculative)

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

v0.3.5 nil-pointer panic in ecdhSession.BroadcastPublicKey on rolling restart #158

Description

Summary

Reproduction

Stack trace

Likely cause (speculative)

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions