Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(yajl2_c) TypeError when reading from an aiofiles file #32

Closed
MKuranowski opened this issue Aug 1, 2020 · 5 comments
Closed

(yajl2_c) TypeError when reading from an aiofiles file #32

MKuranowski opened this issue Aug 1, 2020 · 5 comments
Labels
bug Something isn't working

Comments

@MKuranowski
Copy link

MKuranowski commented Aug 1, 2020

Running Python 3.8.2 (on Linux), ijson 3.1.post0, backend yajl2_c, aiofiles 0.5.0.

When trying to asynchronously get items from a simple json file, a TypeError: _get_read() missing 1 required positional argument: 'f' is raised.

Code:

import aiofiles
import asyncio
import ijson

ijson_backend = ijson.get_backend("yajl2_c")

async def main():
    async with aiofiles.open("test.json", "r", encoding="utf-8") as buff:
        async for i in ijson_backend.items_async(buff, "item"):
            print(i)

asyncio.run(main())

test.json:

["1", "2", "3", "4", "5", "6", "7", "8", "9", "10"]

Output:

Traceback (most recent call last):
  File "test_ijson.py", line 12, in <module>
    asyncio.run(main())
  File "/usr/lib/python3.8/asyncio/runners.py", line 43, in run
    return loop.run_until_complete(main)
  File "/usr/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
    return future.result()
  File "ignore_test_ijson.py", line 9, in main
    async for t in ijson_backend.items_async(buff, "item"):
TypeError: _get_read() missing 1 required positional argument: 'f'

I would expect normal operation that would print all items from the test.json file.


Synchronous code works correctly:

import ijson

ijson_backend = ijson.get_backend("yajl2_c")


def noasync_main():
    with open("test.json", "r", encoding="utf-8") as buff:
        for i in ijson_backend.items(buff, "item"):
            print(i)


noasync_main()

Output:

1
2
3
4
5
6
7
8
9
10

Update: the yajl2_cffi backend works correctly:

Code:

import aiofiles
import asyncio
import ijson

ijson_backend = ijson.get_backend("yajl2_cffi")

async def main():
    async with aiofiles.open("test.json", "r", encoding="utf-8") as buff:
        async for i in ijson_backend.items_async(buff, "item"):
            print(i)

asyncio.run(main())

Output:

1
2
3
4
5
6
7
8
9
10
@rtobar
Copy link

rtobar commented Aug 1, 2020

@MKuranowski thanks for reporting. I can locally reproduce too, which is good (in a way!), but oddly it didn't happen with the unit tests. I'll have to study a bit closer what's going on.

@rtobar rtobar added the bug Something isn't working label Aug 1, 2020
rtobar added a commit that referenced this issue Aug 1, 2020
Our C code was not ready to deal with suspended executions of
utils35._get_read, as the logic assumed that once an awaitable was
created to get the read function, it would yield the result (via
StopIteration) after the first next() invocation. This broken logic
meant that when our asynchronous iterable was called again, it would
invoke _get_read() again with no arguments, resulting in an exception
being thrown.

In our unit tests our dummy async read functions don't suspend their
execution, which means that when calling the _get_read(f) function from
C we have always gotten away with this bug.

This patch fixes this problem, correctly iterating as long as necessary
on this first awaitable, rather than only once.

This is the original problem reported in #32.

Signed-off-by: Rodrigo Tobar <rtobar@icrar.org>
@rtobar
Copy link

rtobar commented Aug 1, 2020

Found the problems (there were two actually), both of course in the yajl2_c backend only:

  • There was an issue if the first call to await f.read() resulted on a yield to the scheduler. This wasn't caught by our unit tests because the simple class we use for testing doesn't yield execution when read is called. After adding an await asyncio.sleep(0) in there the unit tests starting failing in the same way your test script does. This is now fixed, and I pushed the fix to the master branch.
  • After fixing that, a second problem came up. This one has to do with the fact that aiofiles implements the read operation (and most of them, I think) using @types.coroutines decorations rather than async def function definitions. The ijson C code wasn't prepared for that case (they are intrinsically different at this level), and I'll need to have a closer inspection yet into how to properly support both. I have something that seems to be working, but is still not fully stable, so I'd rather not push that fix yet.

I don't think I'll be able to tackle this second until early next week though. In the meanwhile you can workaround this by defining your own thin wrapper around aiofiles.open so it defines an async def read coroutine, that should work.

rtobar added a commit that referenced this issue Aug 2, 2020
Awaitable objects that don't have an __await__ method are probably
generators that have been marked with @types.coroutine, and therefore we
should be able to use them directly instead. This check needs only to be
done when invoking the read() function, as we know our _get_read
function is a native coroutine.

What we are *not* yet considering are C-defined python types that have
an tp_as_async.am_await field defined. For the time being we are
assuming those are rare, but it might bite us in the future.

This is the second problem found during the investigation of #32.

Signed-off-by: Rodrigo Tobar <rtobar@icrar.org>
@rtobar
Copy link

rtobar commented Aug 2, 2020

I just pushed the last set of changes to the master branch that fix the second problem (correctly dealing with @type.coroutine awaitable read functions in the C backend), plus a new set of unit tests that ensure this will not creep up again. @MKuranowski please give this a try to confirm that things are working as intended.

A new release will follow probably tomorrow, time permitting.

@rtobar rtobar closed this as completed Aug 2, 2020
@rtobar
Copy link

rtobar commented Aug 4, 2020

A new 3.1.1 release of ijson including these fixes is now available on PyPI.

@MKuranowski
Copy link
Author

Thank you very much, everything's working as intended :)

rtobar added a commit that referenced this issue Nov 24, 2020
Most of python awaitables are native coroutines (async def functions),
but the definition of an awaitable includes also generator functions
decorated with the @types.coroutine function, named generated-based
coroutines.

This small difference has bitten us already in the past (see #32 for
example), and while doing other work I realized we hadn't seen the end
of it. In particular, the automatic detection of the file object given
to the main entry point functions of ijson (basic_parse, parse, kvitems
and items) didn't recognize file objects with a generator-based read
coroutine as asynchronous file objects, and therefore incorrectly routed
them to be parsed using the synchronous family of functions, which of
course fails.

This commit adds detection of file objects with generator-base read()
coroutines on the main entry point functions of all backends, correctly
routing them through the corresponding *_async function.

Signed-off-by: Rodrigo Tobar <rtobar@icrar.org>
rtobar added a commit that referenced this issue Nov 24, 2020
Most of python awaitables are native coroutines (async def functions),
but the definition of an awaitable includes also generator functions
decorated with the @types.coroutine function, named generated-based
coroutines.

This small difference has bitten us already in the past (see #32 for
example), and while doing other work I realized we hadn't seen the end
of it. In particular, the automatic detection of the file object given
to the main entry point functions of ijson (basic_parse, parse, kvitems
and items) didn't recognize file objects with a generator-based read
coroutine as asynchronous file objects, and therefore incorrectly routed
them to be parsed using the synchronous family of functions, which of
course fails.

This commit adds detection of file objects with generator-base read()
coroutines on the main entry point functions of all backends, correctly
routing them through the corresponding *_async function.

Signed-off-by: Rodrigo Tobar <rtobar@icrar.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants