Skip to content

Slowness due to blocked threads in Channel.getChannel call#887

Merged
mwiede merged 6 commits intomwiede:masterfrom
DavidTavoularis:master
Sep 17, 2025
Merged

Slowness due to blocked threads in Channel.getChannel call#887
mwiede merged 6 commits intomwiede:masterfrom
DavidTavoularis:master

Conversation

@DavidTavoularis
Copy link
Contributor

No description provided.

@DavidTavoularis DavidTavoularis changed the title Slowness due to blocked threads in Channel.getChannel call #885 Slowness due to blocked threads in Channel.getChannel call Aug 21, 2025
@DavidTavoularis
Copy link
Contributor Author

Summary of changes made:

  1. Replaced Vector with ConcurrentHashMap for the channel pool - eliminates synchronization bottlenecks
  2. Changed from static int index to AtomicInteger - ensures thread-safe ID generation
  3. Optimized channel lookup - Direct HashMap access O(1) instead of linear search O(n)
  4. Updated del method - Changed from del(Channel c) to del(int id) for efficiency
  5. Maintained OpenSSH compatibility - Preserved the INT_MAX capping logic in a thread-safe manner using compare-and-set
  6. Updated ChannelDirectTCPIP - Fixed the API change for the new del(int id) signature

The implementation now provides better performance and thread safety for concurrent channel operations while maintaining backward compatibility with the OpenSSH workaround.

Copy link
Contributor

@norrisjeremy norrisjeremy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The strategy that is being proposed here could potentially result in memory leaks due to stranded Channels.

For example, the old strategy, while likely slow as you have pointed out, there was no chance for a Channel to not be added to the pool, even if they happened to somehow share the same id value.

However, the strategy proposed here, if two Channel's ever ended up sharing the same id value, the later created one would overwrite it's position in the pool & thus result in a memory leak (since the earlier created Channel would no longer be present in the pool).

nextIndex = (currentIndex + 1) & Integer.MAX_VALUE;
} while (!index.compareAndSet(currentIndex, nextIndex));
id = currentIndex;
pool.put(id, this);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To illustrate, what I mean, suppose at time X, the id value for a particular Channel A was computed as a value of 5.
Then suppose at some later time Y, the id value for a different Channel B happens to be computed as a value of 5 (since the AtomicInteger index value cycled through enough values to rollover and begin to be recycled).
The pool.put(id, this) call here that puts Channel B into into the pool with an id key value of 5 overwrites the previous entry with a key value of 5 that points at the earlier Channel A.
And thus we have a memory leak, since Channel A is no longer being maintained in the pool.

@DavidTavoularis
Copy link
Contributor Author

Thank you very much for the problem you identified:

  • With ConcurrentHashMap, if two channels get the same ID (after ~2 billion channel creations and ID rollover), the second channel would overwrite the first in the pool
  • The overwritten channel becomes orphaned - still consuming resources but unreachable for cleanup
  • Original Vector approach, while slow, never lost channels even with ID collisions

Here is solution 1:

while (true) {
    int candidateId = index.getAndUpdate(i -> (i + 1) & Integer.MAX_VALUE);
    if (pool.putIfAbsent(candidateId, this) == null) {
        id = candidateId;
        break;
    }
}

Here is solution 2:

private static AtomicLong index = new AtomicLong(0);

Channel() {
    while (true) {
        int candidateId = (int) (index.getAndIncrement() % (Integer.MAX_VALUE + 1L));
        if (pool.putIfAbsent(candidateId, this) == null) {
            id = candidateId;
            break;
        }
    }
}

@norrisjeremy I prefer solution 1, what about you ?

@norrisjeremy
Copy link
Contributor

Hi @DavidTavoularis,

I'm not a fan of either solution honestly, because it artificially restricts the reuse of a specific id values.
I'm working on an alternate solution to see if it would be possible to remedy the issue you have reported by simply using a ReadWriteLock.
I'll report back once I've had some time to perform more testing.

Thanks,
Jeremy

@DavidTavoularis
Copy link
Contributor Author

Hi @norrisjeremy thank you for suggesting to use ReadWriteLock.

I just updated the pull request with these new changes:

  • ReadWriteLock for concurrent access control
  • ArrayList instead of Vector (no need for Vector's synchronization)
  • Read lock for lookups - allows multiple concurrent reads
  • Write lock for modifications - exclusive access when adding/removing

Benefits:

  1. Better concurrency - Multiple threads can search channels simultaneously
  2. No ID collision issues - IDs can be freely reused as channels are removed
  3. No memory leaks - Every channel is tracked in the list
  4. Cleaner code - Using modern ArrayList with explicit locking instead of legacy Vector
  5. Maintains original behavior - ID overflow handling preserved exactly

Key operations:

  • getChannel() - Uses read lock, allows concurrent lookups
  • Channel() constructor - Uses write lock for adding to pool
  • del() - Uses write lock for removal
  • disconnect() - Uses read lock to collect channels, then disconnects outside the lock

This solution provides thread safety without the artificial restrictions of collision detection, while improving performance for concurrent read operations.

@norrisjeremy
Copy link
Contributor

Hi @DavidTavoularis,

See #888 for my current draft proposal.

Thanks,
Jeremy

@norrisjeremy
Copy link
Contributor

Hi @DavidTavoularis,

See #888 for my current draft proposal.

Thanks, Jeremy

FYI, I still need to think through the memory usage ramifications of changing the pool structure from a Vector to either an ArrayList as you have proposed or a HashMap as I have done in my draft PR.

@DavidTavoularis
Copy link
Contributor Author

@norrisjeremy I like your approach, here are my comments, regarding your pull-request:

  1. Redundant ID check in getChannel:
    if (c.id == id && c.session == session)
    Since you're already looking in pool.get(id), the c.id == id check is redundant. Should be:
    if (c.session == session)

  2. SonarQube: Unused imports - You have Collection imported but not used
    The other SonarQube "Remove this assignment of "index" should be flagged as "OK" (as no change compared to before)

  3. Should you put pool & poolLock as final ?

  4. While this new implementation handles collisions, the Set approach adds small memory overhead of maintaining Sets even for single channels, so should you replace:
    pool.computeIfAbsent(id, k -> new HashSet<>()).add(this);
    by:
    pool.computeIfAbsent(id, k -> new HashSet<>(1)).add(this);
    as 99.99999% of the time, the HashSet size will be 1 ?

Best Regards

@DavidTavoularis
Copy link
Contributor Author

@norrisjeremy Let's estimate the memory usage for 10000 Channels.

Original Vector (or ArrayList) approach = O(N):

  • Vector object: ~24 bytes overhead
  • Array storage: 10000 × 8 bytes (object references) = 80,000 bytes
  • Total: ~80,024 bytes

Map<Integer, Set> approach = O(1):

  • HashMap object: ~48 bytes overhead
  • Entry array: 10000 × 32 bytes (Entry objects) = 320,000 bytes
  • HashSet objects: 10000 × 48 bytes = 480,000 bytes
  • HashSet arrays: 10000 × 4 bytes (capacity=1, single element) = 40,000 bytes
  • Total: ~888,048 bytes

Another approach would be to have 2 maps (a regular one, and one only for collisions) = O(1):
Map<Integer, Channel> singleChannelPool = new HashMap<>();
Map<Integer, Set> collisionPool = new HashMap<>(); // empty 99.999% of the time

  • HashMap object: ~48 bytes overhead
  • Entry array: 10,000 × 32 bytes (Entry objects) = 320,000 bytes
  • Subtotal: ~320,048 bytes
    This last approach would add unnecessary code complexity...

80KB vs 890KB vs 320KB

I agree that it requires more thinking.
David

@norrisjeremy
Copy link
Contributor

@norrisjeremy I like your approach, here are my comments, regarding your pull-request:

  1. Redundant ID check in getChannel:
    if (c.id == id && c.session == session)
    Since you're already looking in pool.get(id), the c.id == id check is redundant. Should be:
    if (c.session == session)
  2. SonarQube: Unused imports - You have Collection imported but not used
    The other SonarQube "Remove this assignment of "index" should be flagged as "OK" (as no change compared to before)
  3. Should you put pool & poolLock as final ?
  4. While this new implementation handles collisions, the Set approach adds small memory overhead of maintaining Sets even for single channels, so should you replace:
    pool.computeIfAbsent(id, k -> new HashSet<>()).add(this);
    by:
    pool.computeIfAbsent(id, k -> new HashSet<>(1)).add(this);
    as 99.99999% of the time, the HashSet size will be 1 ?

Best Regards

Hi @DavidTavoularis,

I've implemented 1, 2 & 4 as you suggested.
As for item 3, I don't see any reason to further increase the size of the Channel object by permanently storing additional properties pointing at the read & write locks derived from the ReadWriteLock.

@DavidTavoularis
Copy link
Contributor Author

@norrisjeremy Thanks, I agree with you for item 3

@norrisjeremy
Copy link
Contributor

@norrisjeremy Let's estimate the memory usage for 10000 Channels.

Original Vector (or ArrayList) approach = O(N):

  • Vector object: ~24 bytes overhead
  • Array storage: 10000 × 8 bytes (object references) = 80,000 bytes
  • Total: ~80,024 bytes

Map<Integer, Set> approach = O(1):

  • HashMap object: ~48 bytes overhead
  • Entry array: 10000 × 32 bytes (Entry objects) = 320,000 bytes
  • HashSet objects: 10000 × 48 bytes = 480,000 bytes
  • HashSet arrays: 10000 × 4 bytes (capacity=1, single element) = 40,000 bytes
  • Total: ~888,048 bytes

Another approach would be to have 2 maps (a regular one, and one only for collisions) = O(1): Map<Integer, Channel> singleChannelPool = new HashMap<>(); Map<Integer, Set> collisionPool = new HashMap<>(); // empty 99.999% of the time

  • HashMap object: ~48 bytes overhead
  • Entry array: 10,000 × 32 bytes (Entry objects) = 320,000 bytes
  • Subtotal: ~320,048 bytes
    This last approach would add unnecessary code complexity...

80KB vs 890KB vs 320KB

I agree that it requires more thinking. David

Yes, that is what I'm worried about and thus want to think further on.

Comment on lines +52 to +53
private static final Lock readLock = poolLock.readLock();
private static final Lock writeLock = poolLock.writeLock();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would omit these two properties and simply derive the read & write locks in the methods that need to use them like I do in #888.

@DavidTavoularis
Copy link
Contributor Author

@norrisjeremy Local ephemeral port range (typically ~28,000-65,000 ports)
Is it true that each Channel is using a different local ephemeral port range ?

If yes, then the maximum amount of simultaneous Channels is about 50000, but I don't believe that anyone would use more than 10000 simultaneous Channels, a more typical usage would be 1000 simultaneous Channels for high-performance systems (this is my use-case).
Using Map<Integer, Set> would increase memory usage by 800KB for 10000 Channels and by only 80KB for 1000 Channels.

@norrisjeremy
Copy link
Contributor

@norrisjeremy Local ephemeral port range (typically ~28,000-65,000 ports) Is it true that each Channel is using a different local ephemeral port range ?

If yes, then the maximum amount of simultaneous Channels is about 50000, but I don't believe that anyone would use more than 10000 simultaneous Channels, a more typical usage would be 1000 simultaneous Channels for high-performance systems (this is my use-case). Using Map<Integer, Set> would increase memory usage by 800KB for 10000 Channels and by only 80KB for 1000 Channels.

The id values are only locally significant to a particular session.
You could have thousands of sessions, and in each session have thousands of channels.

@DavidTavoularis
Copy link
Contributor Author

I believe that there is usually a limit of 10 channels per SSH session (cf MaxSessions 10 in /etc/ssh/sshd_config)

channels[count++] = c;
channels.add(c);
}
} catch (Exception e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why this inner try block that catches exceptions & ignores them exists.
I wonder if we should just drop it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, I dropped it and pushed in PR

@DavidTavoularis
Copy link
Contributor Author

Main Achievement: Replaced slow synchronized blocks with ReadWriteLock for better thread concurrency.

Key Changes:

  • Migrated from Vector + synchronized(pool) to ArrayList + ReentrantReadWriteLock
  • Read operations (getChannel, disconnect) use read locks - allow concurrent execution
  • Write operations (constructor, del) use write locks - require exclusive access
  • Removed unnecessary exception handling in disconnect() method

Benefits: Multiple threads can now lookup channels concurrently instead of being serialized. Write operations still require exclusive access for data consistency. Same memory usage (~80KB for 10K channels) with significantly better scalability for read-heavy SSH workloads.

Compatibility: All APIs unchanged, maintains OpenSSH compatibility and existing behavior.

private static Vector<Channel> pool = new Vector<>();
private static final ReadWriteLock poolLock = new ReentrantReadWriteLock();
private static int index = 0;
private static List<Channel> pool = new ArrayList<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think pool could be marked final?

@sonarqubecloud
Copy link

Copy link
Contributor

@norrisjeremy norrisjeremy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think going with this more conservative approach is safer than the alternate approach I explored in #888, so I think we should go with this.

@DavidTavoularis
Copy link
Contributor Author

I think going with this more conservative approach is safer than the alternate approach I explored in #888, so I think we should go with this.

agreed

@norrisjeremy
Copy link
Contributor

Hi @DavidTavoularis,

So I have started exploring a different approach to fixing this issue, see #889.
Basically it remove the global pool from the Channel class and instead maintains a per-Session List of all Channels linked to that particular Session.
This would result in a memory increase per Session (since there is now an ArrayList and ReentrantReadWriteLock maintained for each Session object), but the performance impact may be worthwhile.

Thanks,
Jeremy

@norrisjeremy
Copy link
Contributor

@mwiede I think what we should do is merge this PR (#887), and then I will rebase my PR (#889) on top of the changes in this, and then I will take #889 out of draft status and you can then merge it.
That will give us the ability to easily rollback the changes in #889 in case we discover any sort of regression.

@mwiede mwiede merged commit 6995f49 into mwiede:master Sep 17, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants