Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deadlock on event_del_nolock_() #1654

Open
legale opened this issue May 13, 2024 · 2 comments
Open

deadlock on event_del_nolock_() #1654

legale opened this issue May 13, 2024 · 2 comments

Comments

@legale
Copy link

legale commented May 13, 2024

aarch64 system kernel 5.4.164
libevent2 master fc9bfd2

I got flickering bug with deadlock on event_del_nolock_()

bt:

image

stuck on the libmosquito loop event EV_PERSIST callback.

This is callback

void mqtt_loop_callback(int fd, short event, void *arg) {
  // syslog2(LOG_INFO, "%s\n", __func__);
  int ret;

  struct timespec ts;
  ts.tv_sec = 0;
  ts.tv_nsec = 100000000; // 100ms

  int ret_misc = 0, ret_read = 0, ret_write = 0;

  fd_set read_fds, write_fds;

  FD_ZERO(&read_fds);
  FD_ZERO(&write_fds);

  FD_SET(fd, &read_fds);

  ret = locksafe_ms(mqtt_lock, 2000);
  if (ret){
    // syslog2(LOG_INFO, "%s END\n", __func__);
    return;
  }

  if (mosquitto_want_write(mosq)) {
    FD_SET(fd, &write_fds);
  }
  unlocksafe(mqtt_lock);

  ret = pselect(fd + 1, &read_fds, &write_fds, NULL, &ts, NULL);
  if (ret == -1) {
    syslog2(LOG_ERR, "pselect failed: returned: %d\n", ret);
    return;
  }

  ret = locksafe_ms(mqtt_lock, 1000);
  if (ret){
    // syslog2(LOG_INFO, "%s END\n", __func__);
    return;
  }

  if (FD_ISSET(fd, &read_fds)) {
    ret_read = mosquitto_loop_read(mosq, 1);
  }

  if (FD_ISSET(fd, &write_fds)) {
    ret_write = mosquitto_loop_write(mosq, 10);
  }

  mosquitto_loop_misc(mosq);
  unlocksafe(mqtt_lock);
  // syslog2(LOG_INFO, "%s END\n", __func__);
}

This is init function

static int setup_mqtt_loop_event(agent_config_t *ac) {
  syslog2(LOG_INFO, "%s\n", __func__);
  if (locksafe_ms(ac_events_lock, 1000)) return 1;
  if (locksafe_ms(mqtt_lock, 1000)) {
    unlocksafe(ac_lock);
    return 1;
  }

  ac->events.mqtt_loop.fd = mosquitto_socket(mosq);
  if (ac->events.mqtt_loop.fd == -1) {
    syslog2(LOG_ERR, "get mqtt fd failed\n");
    unlocksafe(mqtt_lock);
    unlocksafe(ac_events_lock);
    return 1;
  }

  if (event_initialized(&ac->events.mqtt_loop.ev)) {
    if (event_del(&ac->events.mqtt_loop.ev))
      syslog2(LOG_ERR, "unable to delete prev. mqtt_loop event\n");
  }

  event_assign(&ac->events.mqtt_loop.ev, ac->events.base, ac->events.mqtt_loop.fd, EV_READ | EV_WRITE | EV_PERSIST, mqtt_loop_callback, (void *)ac);
  event_add(&ac->events.mqtt_loop.ev, NULL);
  unlocksafe(mqtt_lock);
  unlocksafe(ac_events_lock);
  return 0;
}
@azat
Copy link
Member

azat commented May 18, 2024

Looks like lines does not match.

Anyway it is hard to say something based on provided samples. But you can try to compile with TSan at least, and see what it will report about deadlocks. Also if you have threads, please make that you enabled pthread support (evthread_use_pthreads())

@liudongmiao
Copy link
Contributor

@legale I suggest you don't reuse the ac->events.mqtt_loop.ev. use event_new, then use event_free, avoid use event_del and event_assign.

You add an event for EV_READ and EV_WRITE, and select in the callback. Please note, most of the time, the sock can be writable, so the callback would be called, then I see you do a select in the callback.

And I suggest you use different events like ev_write and ev_read, then disable ev_write if there is nothing to write. Otherwise, you use the same event for both ev_read and ev_write, you need do select in the callback, then why do you use libevent? (That's the way of bufferevent_socket.) If the underlying socket is udp, you can use other magic way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants