Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The redis cache lock reduces task parallelism and causes unnecessary latency in execution #265

Closed
mski-iksm opened this issue Jan 14, 2022 · 3 comments

Comments

@mski-iksm
Copy link
Contributor

Problem

Using task cache locking prevents cache conflicts even when multiple gokart tasks are running in parallel. However, if multiple tasks running in parallel depend on a same task, read process may cause congestion, and parallelism may be compromised.

The figure below shows a case where the problem occurs.
Suppose app A, B and C are running on different nodes and are trying to do a read() of the same task. Although the three apps are running in parallel, the task lock causes congestion.

image

How to fix the problem

Use different procedures to get locks when writing, reading, and deleting the cache as illustrated in the figures below.

image

_remove() is the method to remove the cache which occurs when rerun parameter is set to True in the task.

As mentioned in case 4 and 5 below, rerun is prone to cause problems when combined with the cache lock feature. Therefore, we would like to have a warning when rerun is set when the cache redis lock function is turned on.

Why it works

case1: When multiple writing tasks happen at the same time

Suppose, app A and B are both writing tasks.

In such a case, one of the tasks will wait until the other completes its writing process to prevent cache collisions.

image

case2: When a writing task and reading tasks happen

Suppose, app A is a writing task and apps B and C are the reading tasks. Consider the case where task B and C try to read the cache at almost the same time while task A is executing a write operation.

In this case, tasks B and C will wait for task A to complete writing. This is because before _load(), there is a process that takes lock for a moment and releases it immediately.
Since tasks B and C do not hold the lock the whole time while they are reading, but release it immediately, they can start the other _load() without waiting for the completion of either _load(). This is useful when _load() takes a long time.

image

case3: When a writing task happens after a reading task

Contrary to the previous case, consider the case where app B for writing occurs while app A for reading is running first.

In this case, app B's lock will be taken while app A's _load() is being executed. However, app B has the ability to do an exist check before doing the writing process and skip _dump() if the cache already exists. Therefore, the writing process of app B can be completed without overwriting or deleting the cache that is currently being read by app A.

image

case4: When a removing task and reading task happen

Suppose, app A is a removing task and apps B is a reading task.

In this case, App B's processing starts after App A's _remove() is completed and the lock is removed. However, when App B tries to read the file, the cache file has already been removed, so an error occurs.

This is the first reason why rerun which causes _remove() and cache lock does not work well.

image

case5: When a removing task happens while other task is reading

In contrast to the previous case, consider the case where App B executes _remove() while App A executes _load().

In this case, App A's _load() may cause an error depending on the file system because the cache is removed by App B during execution.

This is the second reason why rerun which causes _remove() and cache lock does not work well.

image

@mski-iksm
Copy link
Contributor Author

I'm thinking of mitigating the above problem with the measures described in “How to fix the problem”. How do you think? Please review.

@Hi-king @hirosassa @e-mon @ujiuji1259 @kitagry

@hirosassa
Copy link
Collaborator

@mski-iksm Thanks for taking the time to reporting an exhaustive implementation ideas!
Looks great to me! Please go ahead to create a PR!

@hirosassa
Copy link
Collaborator

Thanks @mski-iksm

@mski-iksm mski-iksm pinned this issue Jul 23, 2023
@mski-iksm mski-iksm unpinned this issue Jul 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants