Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manage local storage inherit behavior #84

Closed
AmatanHead opened this issue Oct 10, 2017 · 18 comments
Closed

Manage local storage inherit behavior #84

AmatanHead opened this issue Oct 10, 2017 · 18 comments
Assignees
Labels
bug Describes a bug in the system.
Milestone

Comments

@AmatanHead
Copy link
Contributor

Hello!

Is there a reason to do task.task_local = get_local() or {} instead of just task.task_local = {}? Because now if you spawn a task inside of another task, you get the same local context for both tasks ant this may cause very strange errors.

@AmatanHead
Copy link
Contributor Author

I propose removing this logic (at least I don't see any reason to keep it).

@fantix
Copy link
Member

fantix commented Oct 11, 2017

This was intentional, because I'd like to make context shared in such case:

async def main():
    get_local()['val'] = 123
    await loop.create_task(sub())

async def sub():
    print(get_local().get('val'))  # 123

And yes as you mentioned, this caused side effect that even if the sub task is never awaited, it shares the same context:

async def main():
    get_local()['val'] = 123
    loop.create_task(sub())

async def sub():
    print(get_local().get('val'))  # still 123

There are cases where not all sub calls are plain coroutines but tasks which need the same context (asyncio.wait_for for example), and it is a less hacky way to achieve so at task creation period. I've tried to pass the context to a sub task only when it is joined into current task, with a lot of hacks in Task._step(), which have problems with non-default Task implementations.

So before PEP-550, I suppose we'll need to make decision on whether to share context at task creation, or not to share context between tasks at all.

Also if possible, please share a bit more about the strange errors you met, which might be helpful for the decision.

@fantix fantix added the question A community question, closed when inactive. label Oct 11, 2017
@fantix fantix self-assigned this Oct 11, 2017
@fantix fantix added this to the v0.5.x milestone Oct 11, 2017
@AmatanHead
Copy link
Contributor Author

AmatanHead commented Oct 11, 2017

I always thought of the create_task / ensure_future as a way to spawn a completely detached task (mostly because you can usually replace constructions like await loop.create_task(sub()) with simple await sub()). That's why such inheritance surprised me.

As for my case. In my app, I have a bunch of background processes, each of them creates a transaction and queries a database once per few seconds. Each process is a coroutine with an infinite loop inside it:

async def process():
    while True:
        await do_some_job()
        await asyncio.sleep(5)

All of those tasks start in the init procedure (init is async because I want to keep all initialisation, both sync and async, in the same place):

async def init(loop):
    await setup_database_pool_and_other_stuff()
    ...
    loop.create_task(process())
    ...

The init function called via loop.run_until_complete(init()).

As you can see, all background tasks share the same context. This sometimes causes 'wrong release order' and 'connection closed in the middle of operation' errors.

An easy solution would be to create a function which enables or disables inheritance:

await loop.create_task(inherit_context(sub()))

The implementation might look like this:

class inherit_context:
    def __init__(self, coro=None):
        self._coro = coro
        self._local = get_local()

    async def __await__(self):
        # An example implementation. We should think whether we want to update
        # or to replace local context completely.
        get_local().update(self._local)
        return await self._coro

@fantix
Copy link
Member

fantix commented Oct 11, 2017

I see, it is a critical issue in your case then. Thanks for the suggestion, which inspired me to try something, one sec I'll test and update.

fantix added a commit that referenced this issue Oct 11, 2017
@fantix fantix mentioned this issue Oct 11, 2017
@fantix fantix added bug Describes a bug in the system. and removed question A community question, closed when inactive. labels Oct 11, 2017
@fantix fantix changed the title Why inherit task locals? Spawned tasks shouldn't inherit task local storage from creator task if not joined Oct 11, 2017
@smagafurov
Copy link

Some proposal:

May be you can provide ability to explicitly disable inheritance
But implicitly (by default) we need to keep inheritance
Because we need this to work:

async with db.transaction():
    await gather(sub1(), sub2())

sub1 and sub2 create tasks inside gather and must share transaction by default

if we need separate tasks - we can do it explicitly

async with db.transaction():
    await gather(drop_context(sub1()), drop_context(sub2()))

@AmatanHead
Copy link
Contributor Author

sub1 and sub2 create tasks inside gather and must share transaction by default

This sounds reasonable. I've also realised that cancellation signal would be sent to any subtask that was spawned in the task that's being cancelled and there's a shield function to protect subtasks from cancellation. So this seems to be ok to inherit locals by default.

fantix added a commit that referenced this issue Oct 11, 2017
@fantix
Copy link
Member

fantix commented Oct 11, 2017

@smagafurov That makes sense, thanks for the proposal. Feature added on task object, please see above commit. Should it also accept coroutines/awaitables and create non-inheriting tasks internally? That would require additional parameter "loop".

@fantix
Copy link
Member

fantix commented Oct 11, 2017

Fixed in e8f6d85. Example:

from gino.local import reset_local

async def main():
    await reset_local(sub())
    # or
    await loop.create_task(sub()).with_local_reset()

fantix added a commit that referenced this issue Oct 12, 2017
@fantix
Copy link
Member

fantix commented Oct 12, 2017

Updated, thanks for @AmatanHead on this! Example now:

from gino import get_local(), reset_local, is_local_root

async def main():
    # spawn detached
    reset_local(sub())

    # spawn attached
    await loop.create_task(sub())

    # context is always inherited even without joining
    loop.create_task(sub())

async def sub():
    # you can dynamically reset local to empty dict
    if is_local_root():
        reset_local()

    # but it works only once when current local is inherited
    get_local()['val'] = 123
    reset_local()  # this won't clear val=123

@fantix fantix changed the title Spawned tasks shouldn't inherit task local storage from creator task if not joined Manage local storage inherit behavior Oct 12, 2017
@smagafurov
Copy link

May be we should change public API terminology from local context to transaction
reset_local -> separate_transaction
Correct semantic for user is "I want separate transaction for this task" but not "I want separate local storage for this task"
User should not know about local storage inside at all

When pep 550 released, we change our inner realization of separate_transaction to execution context usage

But semantic "I want separate transaction for this task" is not changed

@fantix
Copy link
Member

fantix commented Oct 12, 2017

Ah right, makes perfect sense!

@smagafurov
Copy link

smagafurov commented Oct 12, 2017

IMHO separate_task is good name for this

separate_transaction may assume that we start separate transaction for this task but we don't

@fantix
Copy link
Member

fantix commented Oct 13, 2017

True, db.transaction(reuse=False) does exactly what "separate transaction" indicates.

separate_task stands in somewhere between the two - it is not about transactions, not about local storages. I'm not quite a native English speaker, to "separate task" is a bit confusing to me - what is actually separated, when we could still do await separate_task(task).

Though, I believe your attempt was on the right direction (much appreciated!), which inspired me to check how local storage is used and what this function really affects. The local storage is the main part behind the scene of db.acquire(reuse=True) or db.transaction(reuse=True), which is, to reuse current connection if there is one in the context. By resetting the local storage, we simply created a new context and thus prevented previous connection from being reused in the given coroutine. Therefore I propose replacing reset_local with forget_connection - meaning previous connection is simply forgot in the given task. Works both as transparent task wrapper, and inline action without parameter:

from gino import forget_connection, is_root_connection

async def main():
    res = await forget_connection(sub())  # make sub forget current connection if any
    loop.create_task(sub_inline())

async def sub_inline():
    forget_connection()  # try to forget inherited connection
    new_conn = await db.acquire()  # then we get brand new connection, even with reuse=True
    print(new_conn.is_root)  # True

Meanwhile, I'll deprecate enable_task_local, disable_task_local and get_local on gino level, and add enable_connection_reuse and disable_connection_reuse in replace, while keeping methods as they are on gino.local level in case task local as a separate module might be useful. Also I'll expose LazyConnection._root is self as is_root as shown in above example. Please feel free to raise better ideas or it will be so.

@AmatanHead
Copy link
Contributor Author

Therefore I propose replacing reset_local with forget_connection

I disagree. Task local storage can be used for more than just storing a connection stack. For example, I store logging information, sentry context, a reference to the current request there. By replacing reset_local with forget_connection you'll force users to implement their own task local storages. They'll than have to use their own methods to forget theirs local storage and they'll have to be compatible with gino's implementation. There also be issues with double-patching task factory and so on.

User should not know about local storage inside at all.

Why shouldn't they? I mean, you have to understand the tool you are using. And this is critically important when talking about local storage.

An example: will transaction be inherited if we use loop.call_soon(f)? The answer is no, because f will be called outside of any task. On the other hand, when using loop.create_task(f()), transaction will be inherited.

We'll do a disservice to our users by hiding these implementation details.

When pep 550 released, we change our inner realization of separate_transaction to execution context usage

When pep-550 released, we indeed can hide our implementation, because pep-550 proposes a uniform way to manage local contexts and (ideally) every python programmer will know how it works and how to avoid problems described above.

fantix added a commit that referenced this issue Oct 15, 2017
fantix added a commit that referenced this issue Oct 15, 2017
@fantix
Copy link
Member

fantix commented Oct 15, 2017

I'm kinda with @AmatanHead on this. Please see commit 820990c - gino/local.py itself is still a self-contained module which is relatively independent to gino itself. And how GINO "renames" the locals API looks a bit embarrassing - it is only a renaming and does nothing else, that's why this commit is not included in PR #85. I think perhaps it's better to move local.py into a new project completely, and make it an optional dependency in GINO. I dislike hundreds of dependencies too (manually at@ NPM here), but maybe this one is worth it. After PEP-550 released, users could simply stop using local.py while everything is still the same because GINO should then internally switch to use PEP-550 with a higher priority. (Of course local.py should also adapt PEP-550 to its API so that it is also okay not to drop local.py immediately after PEP-550).

@fantix
Copy link
Member

fantix commented Oct 15, 2017

Thanks everyone for your precious input! It had been off topic a bit, so I simply merged #85 as it is, closed this issue and created #89. Please kindly let me know if you have further ideas about the renaming or creating new package there. Much appreciated!

@Norfolks
Copy link

Just for people who came here with such error and want to make independent context for create_task. I took some time for me, before I got this.

event_loop.call_soon(event_loop.create_task, coroutine,
                             context=contextvars.Context())

@fantix
Copy link
Member

fantix commented Oct 23, 2019

@Norfolks thanks for the info! See also here for Python < 3.7: #572 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Describes a bug in the system.
Projects
None yet
Development

No branches or pull requests

4 participants