Skip to content

Conversation

NoahStapp
Copy link
Contributor

No description provided.

@NoahStapp NoahStapp requested a review from a team as a code owner January 5, 2024 20:15
@NoahStapp NoahStapp requested review from caseyclements and removed request for a team January 5, 2024 20:15
@ShaneHarvey ShaneHarvey self-requested a review January 8, 2024 19:27
timeout = _POLL_TIMEOUT
readable = conn.socket_checker.select(sock, read=True, timeout=timeout)
if conn.cancel_context.cancelled:
raise _OperationCancelled("operation cancelled")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block is under indented one-level now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block still looks under indented. Is that intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original code has the following if statements all at the same level:

if context.cancelled:
	...
if readable:
	...
if timed_out:
	...

This change keeps that consistency, but removes the initial if context: check above, which might be causing the confusion.

Copy link
Member

@ShaneHarvey ShaneHarvey Jan 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining. Github's diff still shows this as if it changed for me when I set the ignore whitespace option. Must be a bug on their end.

Copy link
Member

@ShaneHarvey ShaneHarvey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you run a benchmark with:

  • 100 threads all running short running op (eg ping)
  • 100 threads all running a long running op (eg find one with a sleep)

As well as the EVG benchmark? I'm wondering if there's a cost to adding this polling loop to every connection.

@@ -874,15 +874,23 @@ class PoolClearedEvent(_PoolEvent):
:param address: The address (host, port) pair of the server this Pool is
attempting to connect to.
:param service_id: The service_id this command was sent to, or ``None``.
- `service_id`: The service_id this command was sent to, or ``None``.
- '__interrupt_in_use_connections': True if all active connections were interrupted by the Pool during clearing.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the leading __. Also is this really the name? It seems comically long. Can we think of a better one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could shorten it to __interrupt_connections, but then we lose the information that only connections in use are interrupted.

Copy link
Contributor Author

@NoahStapp NoahStapp Jan 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we follow the spec this closely, but the spec explicitly calls this flag interruptInUseConnections.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's rename it interrupt_connections throughout, even on the monitoring events.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec tests actually explicitly check for the existence of interruptInUseConnections in the event, so renaming this breaks the tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What we can do in that situation is update the test runner to map the field name. It looks like the test runner ignores the interruptInUseConnections field and only checks for hasServiceId:

elif name == "poolClearedEvent":
self.test.assertIsInstance(actual, PoolClearedEvent)
self.assertHasServiceId(spec, actual)

timeout = _POLL_TIMEOUT
readable = conn.socket_checker.select(sock, read=True, timeout=timeout)
if conn.cancel_context.cancelled:
raise _OperationCancelled("operation cancelled")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block still looks under indented. Is that intentional?

sock = conn.conn
timed_out = False
# Check if the socket is already closed
if sock.fileno() == -1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When would the socket be closed? The comment could be improved by explaining that case (ie the when/why not the how).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have at least one test that explicitly closes the socket itself rather than the connection. Sounds good, I'll add a comment.

@NoahStapp
Copy link
Contributor Author

Could you run a benchmark with:

  • 100 threads all running short running op (eg ping)
  • 100 threads all running a long running op (eg find one with a sleep)

As well as the EVG benchmark? I'm wondering if there's a cost to adding this polling loop to every connection.

EVG benchmark results: https://performance-analyzer.server-tig.prod.corp.mongodb.com/perf-analyzer-viz?comparison_id=9802c9f9-659e-46b3-97ac-c268f34a581c&selected_tab=data-table&percent_filter=0%7C%7C100&z_filter=0%7C%7C10

A slight decrease for some operations, a slight improvement or no change for others.

I ran a very simple local benchmark that created 100 threads and executed either a ping or a find_one with a sleep on each, timing the runtime and averaging out the result over all the threads. The runtime over 100 iterations of this benchmark is essentially unchanged when compared to master.

@NoahStapp NoahStapp requested a review from ShaneHarvey January 17, 2024 18:45
Copy link
Contributor

@caseyclements caseyclements left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to go. Thank you. I learned a lot doing this review!

timeout = _POLL_TIMEOUT
readable = conn.socket_checker.select(sock, read=True, timeout=timeout)
if conn.cancel_context.cancelled:
raise _OperationCancelled("operation cancelled")
Copy link
Member

@ShaneHarvey ShaneHarvey Jan 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining. Github's diff still shows this as if it changed for me when I set the ignore whitespace option. Must be a bug on their end.

@@ -874,15 +874,23 @@ class PoolClearedEvent(_PoolEvent):
:param address: The address (host, port) pair of the server this Pool is
attempting to connect to.
:param service_id: The service_id this command was sent to, or ``None``.
- `service_id`: The service_id this command was sent to, or ``None``.
- '__interrupt_in_use_connections': True if all active connections were interrupted by the Pool during clearing.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's rename it interrupt_connections throughout, even on the monitoring events.

@ShaneHarvey
Copy link
Member

ShaneHarvey commented Jan 17, 2024

Looking at the perf results it does appear this results in a ~5% decrease. Locally with 100 threads I see a decrease of somewhere between 5-10%:

$ git:(master) time python bench-100-threads.py 
python bench-100-threads.py  6.61s user 8.06s system 129% cpu 11.320 total
$ git:(master) time python bench-100-threads.py 
python bench-100-threads.py  6.65s user 8.13s system 130% cpu 11.297 total
$ git:(master) time python bench-100-threads.py 
python bench-100-threads.py  6.86s user 8.52s system 129% cpu 11.842 total

vs

$ git:(NoahStapp-PYTHON-3175) time python bench-100-threads.py 
python bench-100-threads.py  7.18s user 8.52s system 127% cpu 12.323 total
$ git:(NoahStapp-PYTHON-3175) time python bench-100-threads.py 
python bench-100-threads.py  7.26s user 8.88s system 130% cpu 12.320 total
$ git:(NoahStapp-PYTHON-3175) time python bench-100-threads.py 
python bench-100-threads.py  7.34s user 9.02s system 132% cpu 12.335 total

With a single threaded benchmark the total execution time is about the same but the user and system CPU time goes up by 5%

$ git:(master) time python bench-single-thread.py
1.80s user 0.34s system 50% cpu 4.259 total
$ git:(master) time python bench-single-thread.py
1.81s user 0.34s system 50% cpu 4.267 total
$ git:(master) time python bench-single-thread.py
1.85s user 0.36s system 51% cpu 4.277 total

VS

$ git:(NoahStapp-PYTHON-3175) time python bench-single-thread.py
1.99s user 0.41s system 56% cpu 4.277 total
$ git:(NoahStapp-PYTHON-3175) time python bench-single-thread.py
1.97s user 0.42s system 56% cpu 4.239 total
$ git:(NoahStapp-PYTHON-3175) time python bench-single-thread.py
1.97s user 0.41s system 55% cpu 4.254 total

Here's the single-threaded benchmark:

from pymongo import MongoClient
client = MongoClient()
N_ITERATIONS = 50000

def main():
    for _ in range(N_ITERATIONS):
        client.admin.command('ping')

if __name__ == "__main__":
    main()

I don't think this should hold up this PR but it would be good to circle back and see how we can improve. Could you open a new ticket?

@NoahStapp NoahStapp requested a review from ShaneHarvey January 18, 2024 20:23
Copy link
Member

@ShaneHarvey ShaneHarvey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still need to update the unified runner to check interruptInUseConnections. It looks like the test runner ignores the interruptInUseConnections field and only checks for hasServiceId:

elif name == "poolClearedEvent":
self.test.assertIsInstance(actual, PoolClearedEvent)
self.assertHasServiceId(spec, actual)

@NoahStapp NoahStapp requested a review from ShaneHarvey January 19, 2024 00:31
Copy link
Member

@ShaneHarvey ShaneHarvey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@NoahStapp NoahStapp merged commit c4e4bd6 into mongodb:master Jan 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants