[Python] Fix duplicate event on drain callback #3119

gtopper · 2024-01-14T06:47:42Z

Also:

Exit python wrapper following a termination signal.
Await a coroutine if returned by drain or termination callback (relates to [Platform] Return result of drain and termination callbacks nuclio-sdk-py#60). This unblocks ML-4421, because it allows for drain/termination callbacks to call explicit_ack() without having to start a separate thread. E.g.:

    async def drain_callback():
        for qualified_offset in context.qualified_offsets.values():
            await context.platform.explicit_ack(qualified_offset)

instead of

    def drain_callback():
        for qualified_offset in context.qualified_offsets.values():
            def commit():
                asyncio.run(context.platform.explicit_ack(qualified_offset))
            thread = threading.Thread(target=commit)
            thread.start()
            thread.join()

Also: * Exit python wrapper following a termination signal. * Await a coroutine if returned by drain or termination callback (relates to nuclio/nuclio-sdk-py#60). [NUC-119](https://jira.iguazeng.com/browse/NUC-119)

TomerShor

Looking really good!
I'm not convinced this will fix all of the event duplications issues (as we discussed, there are other possible race conditions that can cause duplications e.g. drain was done and an event is already on its way to the runtime) but this should be handled in a WIP fix by @rokatyy .

Other than that - was this tested somehow?
See our drain/termination tests, and add a testcase with an async termination handler.

pkg/processor/runtime/python/py/_nuclio_wrapper.py

gtopper

I'm pretty sure it does fix the race condition, because the callback is only ever called within the finally clause.

I tested it using the repro function attached to NUC-119.

Edit: actually, I was able to simplify the repro function, because it's no longer necessary to commit the offsets from another thread:

from nuclio_sdk import QualifiedOffset
import threading
import asyncio

def init_context(context):
    async def drain_callback():
        context.logger.info("Drain callback called")
        for qualified_offset in context.qualified_offsets.values():
            context.logger.info(f"111 Committing offset: worker={context.worker_id}, topic={qualified_offset.topic}, partition={qualified_offset.partition}, offset={qualified_offset.offset}")
            await context.platform.explicit_ack(qualified_offset)

        context.qualified_offsets = {}
        context.shard_dict = {}
        context.logger.info("Drain callback done")

    context.qualified_offsets = {}
    context.platform.set_drain_callback(drain_callback)
    context.shard_dict = {}

async def handler(context, event):
    if event.shard_id not in context.shard_dict:
        context.shard_dict[event.shard_id] = True
        context.logger.info(f"111 First event: worker={context.worker_id}, topic={event.path}, shard={event.shard_id}, offset={event.offset}")
    context.qualified_offsets[(event.path, event.shard_id)] = QualifiedOffset(event.path, event.shard_id, event.offset)
    await asyncio.sleep(0.01)

pkg/processor/runtime/python/py/_nuclio_wrapper.py

rokatyy

This PR indeed fixes "event duplication", because it doesn't allow draining happen on context switch. But we definitely still need to cover another issue with processing events after draining.

Not sure I understand the sense of adding return statement to handlers. @gtopper could you please clarify the flow where it is useful?

gtopper · 2024-01-14T11:45:31Z

@rokatyy, please see the second bullet in the description regarding your question. It allows for the user callback(s) to be coroutines. Without it, the coroutines won't be awaited (in the finally block).

pkg/processor/runtime/python/py/_nuclio_wrapper.py

TomerShor

last minor comment

pkg/processor/runtime/python/py/_nuclio_wrapper.py

Co-authored-by: TomerShor <90552140+TomerShor@users.noreply.github.com>

TomerShor

🚀

[Python] Fix duplicate event on drain callback

10ff41f

Also: * Exit python wrapper following a termination signal. * Await a coroutine if returned by drain or termination callback (relates to nuclio/nuclio-sdk-py#60). [NUC-119](https://jira.iguazeng.com/browse/NUC-119)

github-actions bot added the runtime/python label Jan 14, 2024

TomerShor requested changes Jan 14, 2024

View reviewed changes

pkg/processor/runtime/python/py/_nuclio_wrapper.py Outdated Show resolved Hide resolved

pkg/processor/runtime/python/py/_nuclio_wrapper.py Show resolved Hide resolved

pkg/processor/runtime/python/py/_nuclio_wrapper.py Show resolved Hide resolved

gtopper commented Jan 14, 2024

View reviewed changes

pkg/processor/runtime/python/py/_nuclio_wrapper.py Outdated Show resolved Hide resolved

Add comments

224be52

rokatyy reviewed Jan 14, 2024

View reviewed changes

gtopper requested a review from TomerShor January 14, 2024 11:42

Extract constants

3fbef51

rokatyy reviewed Jan 14, 2024

View reviewed changes

pkg/processor/runtime/python/py/_nuclio_wrapper.py Outdated Show resolved Hide resolved

This was referenced Jan 14, 2024

[Processor] Stop processing events on draining and send signal to continue processing #3120

Merged

[Platform] Return result of drain and termination callbacks nuclio/nuclio-sdk-py#60

Merged

Fix

efb679c

gtopper requested a review from rokatyy January 14, 2024 14:06

TomerShor requested changes Jan 14, 2024

View reviewed changes

pkg/processor/runtime/python/py/_nuclio_wrapper.py Outdated Show resolved Hide resolved

Change f-string to normal string

c10d592

Co-authored-by: TomerShor <90552140+TomerShor@users.noreply.github.com>

gtopper requested a review from TomerShor January 15, 2024 02:57

rokatyy approved these changes Jan 15, 2024

View reviewed changes

TomerShor approved these changes Jan 15, 2024

View reviewed changes

TomerShor merged commit 5100ad3 into nuclio:development Jan 15, 2024
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Python] Fix duplicate event on drain callback #3119

[Python] Fix duplicate event on drain callback #3119

gtopper commented Jan 14, 2024 •

edited

TomerShor left a comment

gtopper left a comment •

edited

rokatyy left a comment

gtopper commented Jan 14, 2024

TomerShor left a comment

TomerShor left a comment

[Python] Fix duplicate event on drain callback #3119

[Python] Fix duplicate event on drain callback #3119

Conversation

gtopper commented Jan 14, 2024 • edited

TomerShor left a comment

Choose a reason for hiding this comment

gtopper left a comment • edited

Choose a reason for hiding this comment

rokatyy left a comment

Choose a reason for hiding this comment

gtopper commented Jan 14, 2024

TomerShor left a comment

Choose a reason for hiding this comment

TomerShor left a comment

Choose a reason for hiding this comment

gtopper commented Jan 14, 2024 •

edited

gtopper left a comment •

edited