Skip to content

Commit

Permalink
Fix rare ERR_IPC_CHANNEL_CLOSED bug in cluster mode (#244)
Browse files Browse the repository at this point in the history
* Fix rare ERR_IPC_CHANNEL_CLOSED bug in cluster mode

A cluster worker can still be in the list of workers even if its IPC channel has closed (e.g. when the worker exits abruptly). Check that its IPC channel is open before sending a message.

* Remove a Node.js pre-v6 compatibility bit
  • Loading branch information
zbjornson authored and siimon committed Apr 2, 2019
1 parent 66f45d9 commit 5d21477
Show file tree
Hide file tree
Showing 2 changed files with 20 additions and 15 deletions.
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,9 @@ project adheres to [Semantic Versioning](http://semver.org/).

### Changed

- Check that cluster worker is still connected before attempting to query it for
metrics. (#244)

### Added

- Added a `remove()` method on each metric type, based on [Prometheus "Writing Client Libraries" section on labels](https://prometheus.io/docs/instrumenting/writing_clientlibs/#labels)
Expand Down
32 changes: 17 additions & 15 deletions lib/cluster.js
Original file line number Diff line number Diff line change
Expand Up @@ -38,8 +38,6 @@ class AggregatorRegistry extends Registry {
const requestId = requestCtr++;

return new Promise((resolve, reject) => {
const nWorkers = Object.keys(cluster.workers).length;

function done(err, result) {
// Don't resolve/reject the promise if a callback is provided
if (typeof callback === 'function') {
Expand All @@ -50,13 +48,9 @@ class AggregatorRegistry extends Registry {
}
}

if (nWorkers === 0) {
return process.nextTick(() => done(null, ''));
}

const request = {
responses: [],
pending: nWorkers,
pending: 0,
done,
errorTimeout: setTimeout(() => {
request.failed = true;
Expand All @@ -71,7 +65,21 @@ class AggregatorRegistry extends Registry {
type: GET_METRICS_REQ,
requestId
};
for (const id in cluster.workers) cluster.workers[id].send(message);

for (const id in cluster.workers) {
// If the worker exits abruptly, it may still be in the workers
// list but not able to communicate.
if (cluster.workers[id].isConnected()) {
cluster.workers[id].send(message);
request.pending++;
}
}

if (request.pending === 0) {
// No workers were up
clearTimeout(request.errorTimeout);
process.nextTick(() => done(null, ''));
}
});
}

Expand Down Expand Up @@ -147,13 +155,7 @@ function addListeners() {

if (cluster.isMaster) {
// Listen for worker responses to requests for local metrics
cluster.on('message', function(worker, message) {
if (arguments.length === 2) {
// pre-Node.js v6.0
message = worker;
worker = undefined;
}

cluster.on('message', (worker, message) => {
if (message.type === GET_METRICS_RES) {
const request = requests.get(message.requestId);
message.metrics.forEach(registry => request.responses.push(registry));
Expand Down

0 comments on commit 5d21477

Please sign in to comment.