Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PYTHON-4533 Convert essential tests to async for Async PyMongo beta release #1724

Open
wants to merge 15 commits into
base: master
Choose a base branch
from

Conversation

NoahStapp
Copy link
Contributor

Needs #1718 to be merged first.

@NoahStapp NoahStapp changed the title PYTHON-4533 Convert test_client to async PYTHON-4533 Convert essential tests to async for Async PyMongo beta release Jul 1, 2024
@NoahStapp NoahStapp marked this pull request as draft July 1, 2024 23:38
@NoahStapp NoahStapp removed the request for review from caseyclements July 1, 2024 23:38
@NoahStapp NoahStapp marked this pull request as ready for review July 3, 2024 18:11
@blink1073 blink1073 requested review from blink1073 and removed request for Jibola July 10, 2024 15:14
@@ -1517,6 +1521,9 @@ async def close(self) -> None:
# TODO: PYTHON-1921 Encrypted MongoClients cannot be re-opened.
await self._encrypter.close()

async def aclose(self) -> None:
await self.close()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Public API should have a docstring.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need aclose() when we already have close()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

contextlib expects that asynchronous context managers use aclose. We could simplify by just setting aclose = close instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case should we remove "close()"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's better, yeah.

pymongo/synchronous/client_session.py Show resolved Hide resolved

client_options = client_context.default_client_options.copy()
client_options.update(kwargs)

super().__init__(*args, **client_options)

@classmethod
def get_async_mock_client(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be get_sync_mock_client?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, using get_mock_client seems better.

@@ -2332,7 +2398,7 @@ class TestClientPool(MockClientTest):
@client_context.require_connection
def test_rs_client_does_not_maintain_pool_to_arbiters(self):
listener = CMAPListener()
c = MockClient(
c = MockClient.get_async_mock_client(
standalones=[],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be get_sync_mock_client?

f"{f.__name__} sent wrong $clusterTime with {event.command_name}",
)

# class TestCausalConsistency(unittest.TestCase):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops yes good catch.

Copy link
Member

@ShaneHarvey ShaneHarvey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does anyone else feel there's a bit too much going on in this one PR to review effectively? Can you split it up into smaller and more manageable pieces?

@@ -861,6 +861,10 @@ def __init__(
# This will be used later if we fork.
AsyncMongoClient._clients[self._topology._topology_id] = self

async def connect(self) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Connect" might not be the best name here since we're not actually connecting. I know we have the "connect=True" argument but I can't help but feel like "open()" would be more appropriate. If a user actually wanted to connect I would expect them to run something like "db.command('ping')". What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels a bit odd to me for the synchronous API to have a connect=True option but the equivalent async option is called open instead even though they do the same thing. Consistency of naming makes more sense to me here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case I think the docstring just needs to explain the concept more clearly.

@NoahStapp
Copy link
Contributor Author

Does anyone else feel there's a bit too much going on in this one PR to review effectively? Can you split it up into smaller and more manageable pieces?

Good point, let me try.

@@ -860,6 +860,10 @@ def __init__(
# This will be used later if we fork.
MongoClient._clients[self._topology._topology_id] = self

def connect(self) -> None:
"""Explicitly connect synchronously."""
self._get_topology()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid adding connect() and aclose() to the sync client.

Copy link
Contributor Author

@NoahStapp NoahStapp Jul 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we rename them to be private for the synchronous client? Like aconnect -> _aconnect and aclose -> _aclose? There isn't a clean way of having the synchro process just not convert specific methods.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about something like this?:

_IS_SYNC = False
class AsyncMongoClient:
    if not _IS_SYNC:
        async def aclose(self):
            pass

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A better solution for aclose is to just rename it to close when it's synchronized. That way all async code will use aclose and all sync code will use close without any extra work. For aconnect what about this:

async def aconnect(self):
	if _IS_SYNC:
		pass
	else:
		...

Copy link
Member

@ShaneHarvey ShaneHarvey Jul 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aclose -> close SGTM. For aconnect() I would prefer the approach I outlined above so that the method is never added to the sync client.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually. renaming aconnect -> _aconnect is fine too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The synchro process will still convert the aconnect method within the if not _IS_SYNC block and add it to the sync client source. Is that acceptable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with any approach as long as aconnect() or connect() isn't a public method on MongoClient.

with warnings.catch_warnings():
# Ignore warning that ping is always routed to primary even
# if client's read preference isn't PRIMARY.
warnings.simplefilter("ignore", UserWarning)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've never seen this warning. What is it referring to?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This already exists in our test suite:

# Ignore warning that ping is always routed to primary even

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some digging and that warning was removed in https://jira.mongodb.org/browse/PYTHON-814 in PyMongo 3.0. Could you remove the catch_warnings block?

sys.setswitchinterval(interval)


def lazy_client_trial(reset, target, test, get_client):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this even make sense to test in async? I would expect an async client to only be useable from a single thread since an I/O loop only runs on a single thread.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't being tested at all in async, not sure why I moved it here. Putting back into test/utils.py.

@@ -389,7 +392,7 @@ def test_transaction_direct_connection(self):
with client.start_session() as s, s.start_transaction():
res = f(*args, session=s) # type:ignore[operator]
if isinstance(res, (CommandCursor, Cursor)):
list(res)
res.to_list()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a docstring and changelog entry for to_list?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list(cursor) still works on synchronous cursors, but asynchronous cursors require the use of Cursor.to_list() instead. This change is purely so our tests can be synchronized smoothly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, the only issue is that to_list() is undocumented.

@@ -598,7 +601,10 @@ def _inherit_option(self, name: str, val: _T) -> _T:

async def with_transaction(
self,
callback: Callable[[AsyncClientSession], _T],
callback: Union[
Callable[[AsyncClientSession], Coroutine[Any, Any, _T]],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use Coroutine here or Awaitable? Which is a better fit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've used Coroutine consistently so far, Awaitable is a less-specific type hint that allows several other objects besides asynchronous functions.

callback: Callable[[AsyncClientSession], _T],
callback: Union[
Callable[[AsyncClientSession], Coroutine[Any, Any, _T]],
Callable[[AsyncClientSession], _T],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are there two callables here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An artifact of the reasoning below, removed.

@@ -693,7 +699,10 @@ async def callback(session, custom_arg, custom_kwarg=None):
read_concern, write_concern, read_preference, max_commit_time_ms
)
try:
ret = callback(self)
if not _IS_SYNC and iscoroutinefunction(callback):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why check IS_SYNC or iscoroutinefunction here? Shouldn't it just be:

ret = await callback(self)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to be more lenient and allow both async and sync callbacks, but it's better to require all asynchronous callbacks for the Async API.

Copy link
Contributor

@Jibola Jibola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Halfway through! Left some comments for now.

Comment on lines +603 to +606
callback: Union[
Callable[[ClientSession], _T],
Callable[[ClientSession], _T],
],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the type definition is the same twice here. Is that needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is an artifact of supporting both kinds of callbacks in the async API. Will be resolved in the separate PR for test_transaction

@@ -1512,6 +1516,9 @@ def close(self) -> None:
# TODO: PYTHON-1921 Encrypted MongoClients cannot be re-opened.
self._encrypter.close()

def aclose(self) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as the comment from @ShaneHarvey before. We can remove close

def require_sync(self, func):
"""Run a test only if using the synchronous API."""
return self._require(
lambda: _IS_SYNC, "This test only works with the synchronous API", func=func
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since _IS_SYNC is defined as true, would the lambda ever fail?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is generated from test/asynchronous/__init__.py, it always passes for a synchronous test, as intended.


@classmethod
def setUpClass(cls):
if _IS_SYNC:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment. How does/may _IS_SYNC get redefined when tests are running?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_IS_SYNC is always true for a synchronous file and always false for an asynchronous file.


def drop_collections(db: Database):
# Drop all non-system collections in this database.
for coll in db.list_collection_names(filter={"name": {"$regex": r"^(?!system\.)"}}):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to filter the non system collection names? I had assumed those couldn't be dropped.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is ancient code, probably it's to avoid attempting to drop a system collection and getting an OperationFailure because it's not allowed.

t.start()

for t in threads:
t.join(60)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

60 seconds is quite a bit of time. Have we been doing 60 historically?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, 60 is just a fail safe in case something goes wrong.

for _i in range(NTRIALS):
reset(collection)
lazy_client = get_client()
lazy_collection = lazy_client.pymongo_test.test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: do we need to redefine the lazy_client each time?

Copy link
Member

@ShaneHarvey ShaneHarvey Jul 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the point of repeating the test is to run with a new client each time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants