ksmbd: rewrite stop_sessions() with restartable iteration

charsyam · gregkh · commit 9a4a4c98cef4 · 2026-05-14T15:31:03.000+02:00
commit c444139 upstream. stop_sessions() walks conn_list with hash_for_each() and, for every entry, drops conn_list_lock across the transport ->shutdown() call before re-acquiring the read lock to continue the loop. The hash walk relies on cross-iteration state (the current bucket and the hlist position), which is not preserved across unlock/relock: if another thread performs a list mutation during the unlocked window, the ongoing iteration becomes unreliable and can re-visit connections that have already been handled or skip connections that have not. The outer `if (!hash_empty(conn_list)) goto again;` retry masks the symptom in the common case but does not address the unsafe iteration itself. Reframe the loop so it never relies on iterator state across unlock/relock. Under conn_list_lock held for read, pick the first connection whose ->shutdown() has not yet been issued by this path, pin it by taking an extra reference, record that fact on the connection and mark it EXITING while still inside the locked walk, then drop the lock. Then call ->shutdown() outside the lock, drop the pin (freeing the connection if the handler already released its reference), and restart from the top. Use a new per-connection flag, conn->stop_called, as the "shutdown issued from stop_sessions()" marker rather than reusing the status state. ksmbd_conn_set_exiting() is also invoked by ksmbd_sessions_deregister() on sibling channels of a multichannel session without issuing a transport shutdown, so treating KSMBD_SESS_EXITING as "already handled here" would skip connections that still need shutdown() to wake their handler out of recv(), leaving the outer retry waiting indefinitely for the hash to drain. stop_sessions() is serialised by init_lock in ksmbd_conn_transport_destroy(), so writing stop_called under the read lock has no other writer. Set EXITING inside the locked walk so the selection, the stop_called marker, and the status transition all happen together, and guard against regressing a connection that has already advanced to KSMBD_SESS_RELEASING on its own (for example, if the handler exited its receive loop for an unrelated reason between teardown steps). When the pin drop is the last put, release the transport and pair ida_destroy(&target->async_ida) with the ida_init() done in ksmbd_conn_alloc(), so stop_sessions() retiring a connection on its own does not leak the xarray backing of the embedded async_ida. The outer retry with msleep() is kept to wait for handler threads to reach ksmbd_conn_free() and drain the hash. Observed with an instrumented build that logs one line per visit and widens the unlocked window before ->shutdown() by 200 ms, under five concurrent cifs mounts (nosharesock, one connection each): * Current code: the same connection address is revisited many times during a single stop_sessions() call and ->shutdown() is invoked well beyond the number of live connections before the hash finally drains. * Rewritten code: each live connection produces exactly one ->shutdown() call; the function returns as soon as the hash is empty. Functional teardown via `ksmbd.control --shutdown` with the same five mounts completes cleanly on the rewritten path. Performance is observably unchanged. Tearing down N concurrent nosharesock cifs connections with `ksmbd.control --shutdown` + `rmmod ksmbd` takes essentially the same wall time before and after the rewrite: N before after 10 4.93s 5.34s 30 7.34s 7.03s 50 7.31s 7.01s (3-run avg: 7.04s vs 7.25s) 100 6.98s 6.78s 200 6.77s 6.89s and the number of ->shutdown() calls equals the number of live connections on both paths when the race is not widened. The teardown is dominated by the msleep(100)-based outer retry waiting for handler threads to run ksmbd_conn_free(), not by the iteration itself; the restartable loop's worst-case O(N^2) visit cost is in the microseconds even at N=200 and sits far below the msleep(100) granularity. Applied alone on top of ksmbd-for-next-next, this patch does not introduce a new leak site. Under the same reproducer (10x concurrent-holders + ss -K + ksmbd.control --shutdown + rmmod), the tree still shows the pre-existing per-connection transport leak count that arises when the last refcount drop lands in one of ksmbd_conn_r_count_dec(), __free_opinfo() or session_fd_check() - all of which end with a bare kfree() today. kmemleak backtraces for the unreferenced objects point into the TCP accept path (sk_clone -> inet_csk_clone_lock, sock_alloc_inode) and none involve stop_sessions(). Plugging those bare-kfree sites is the responsibility of the follow-up patch. Fixes: e2f3448 ("cifsd: add server-side procedures for SMB3") Cc: stable@vger.kernel.org Signed-off-by: DaeMyung Kang <charsyam@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff --git a/fs/smb/server/connection.c b/fs/smb/server/connection.c
@@ -535,24 +535,54 @@ int ksmbd_conn_transport_init(void)
 
 static void stop_sessions(void)
 {
-	struct ksmbd_conn *conn;
+	struct ksmbd_conn *conn, *target;
 	struct ksmbd_transport *t;
+	bool any;
 	int bkt;
 
+	/*
+	 * Serialised via init_lock; no concurrent stop_sessions() can
+	 * touch conn->stop_called, so writing it under the read lock is
+	 * safe.
+	 */
 again:
+	target = NULL;
+	any = false;
 	down_read(&conn_list_lock);
 	hash_for_each(conn_list, bkt, conn, hlist) {
-		t = conn->transport;
-		ksmbd_conn_set_exiting(conn);
-		if (t->ops->shutdown) {
-			up_read(&conn_list_lock);
+		any = true;
+		if (conn->stop_called)
+			continue;
+		atomic_inc(&conn->refcnt);
+		conn->stop_called = true;
+		/*
+		 * Mark the connection EXITING while still holding the
+		 * read lock so the selection and the status transition
+		 * happen together.  Do not regress a connection that has
+		 * already advanced to RELEASING on its own (e.g. the
+		 * handler exited its receive loop for an unrelated
+		 * reason).
+		 */
+		if (READ_ONCE(conn->status) != KSMBD_SESS_RELEASING)
+			ksmbd_conn_set_exiting(conn);
+		target = conn;
+		break;
+	}
+	up_read(&conn_list_lock);
+
+	if (target) {
+		t = target->transport;
+		if (t->ops->shutdown)
 			t->ops->shutdown(t);
-			down_read(&conn_list_lock);
+		if (atomic_dec_and_test(&target->refcnt)) {
+			ida_destroy(&target->async_ida);
+			t->ops->free_transport(t);
+			kfree(target);
 		}
+		goto again;
 	}
-	up_read(&conn_list_lock);
 
-	if (!hash_empty(conn_list)) {
+	if (any) {
 		msleep(100);
 		goto again;
 	}
diff --git a/fs/smb/server/connection.h b/fs/smb/server/connection.h
@@ -49,6 +49,7 @@ struct ksmbd_conn {
 	struct mutex			srv_mutex;
 	int				status;
 	unsigned int			cli_cap;
+	bool				stop_called;
 	union {
 		__be32			inet_addr;
 #if IS_ENABLED(CONFIG_IPV6)