Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mac OS is crashing when debugger stops on a breakpoint #47

Closed
dekelev opened this issue Jan 13, 2020 · 5 comments
Closed

Mac OS is crashing when debugger stops on a breakpoint #47

dekelev opened this issue Jan 13, 2020 · 5 comments

Comments

@dekelev
Copy link

dekelev commented Jan 13, 2020

I've encountered an issue with multiple Mac laptops running the latest Catalina OS.

The scenario is running 2 processes with Feathers services & feathers-distributed and stopping the first process on a debug breakpoint. when it happens, the second process keeps sending hello messages on the channels. this is causing "socket stress" and leads to TCP Zero-Window issue. a short while after that, the OS is crashing and reboots (sort of Apple bug).

The solution that I've come with is to run the second process as a fork of the first process, send it heartbeat messages every second. when the fork detects the absent of the heartbeat messages for more than 2 seconds, it will stop all channels opened by feathers-distributed and will resume them when heartbeat is received again.

I'm wondering if this a known issue, because it is reproduced easily locally with Redis or broadcast, though the workaround applies only to Redis. with broadcast, the process will fail after resuming the channels due to closed socket.

This is the gist of stopping/resuming all channels:

const stopChannels = app => {
  stop(app.serviceSubscriber);
  stop(app.servicePublisher);

  for (const service of Object.values(app.services)) {
    stop(service.requester);
    stop(service.responder);
    stop(service.serviceEventsSubscriber);
    stop(service.serviceEventsPublisher);
  }
};

const startChannels = app => {
  start(app.serviceSubscriber);
  start(app.servicePublisher);

  for (const service of Object.values(app.services)) {
    start(service.requester);
    start(service.responder);
    start(service.serviceEventsSubscriber);
    start(service.serviceEventsPublisher);
  }
};

const stop = channel => {
  if (channel)
    channel.discovery.stop();
};

const start = channel => {
  if (channel)
    channel.discovery.start();
};

We can integrate this into the library, since the project should not manage the list of opened channels.

@dekelev dekelev changed the title Mac OSX is crashing when debugger stops on a breakpoint Mac OS is crashing when debugger stops on a breakpoint Jan 13, 2020
@claustres
Copy link
Member

We experienced some issues with network performance using cote defaults, so that we provided different options by default in the module.

It might be interesting to have methods to stop/start events distribution but the fact that it is restricted to Redis can be a problem, moreover this design decision can't be only taken because a specific debugger or a specific OS version is crashing, we might need more use cases IMHO.

@dekelev
Copy link
Author

dekelev commented Jan 13, 2020

Locally, I only work with Redis, so I haven't spent much time figuring out if the broadcast can benefit from a similar solution.

I'm using the default feathers-distributed options and did notice that cote was shipping with much lower defaults.

Even though the hello messages are sent every 10 seconds, the amount of services we have (~150) made it unusable when stopping on a breakpoint. it usually happens between 10 to 60 seconds and it will always crashes the OS.

@claustres
Copy link
Member

claustres commented Jan 13, 2020

We also observed this kind of problem because we have about 100 services. However it seems to me pretty strange you experience this problem only after a couple of seconds. Indeed as far as I understand each service will send a hello message every 10s by default so that we get 600 messages after 60s, not much to crash an OS !

We anyway ask questions about this on the cote slack.

@dekelev
Copy link
Author

dekelev commented Jan 13, 2020

Thanks. It usually takes a minute or two to crash the OS with ~150 services (without Feathers events publishing) and for a developer, holding on a breakpoint for more than a minute is not a rare use-case.

@claustres
Copy link
Member

Closing in favor of #48.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants