Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel execution of multiple unrelated statements written sequentially #78507

Open
Susmit-A opened this issue May 31, 2022 · 1 comment
Open
Labels
enhancement Not as big of a feature, but technically not a bug. Should be easy to fix triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@Susmit-A
Copy link

Susmit-A commented May 31, 2022

馃殌 The feature, motivation and pitch

Often, I write statements in a sequential manner that do not relate to each other - the order of their execution does not matter. Such statements can be parallelized to achieve better performance. One example: resize three different images using nearest, bilinear, and bicubic:

Old method:

img1 = F.interpolate(img1, mode="nearest", ...)
img2 = F.interpolate(img2, mode="bilinear", ...)
img3 = F.interpolate(img3, mode="bicubic", ...)

New method:

img1, img2, img3 = parallel_api_fn([
    {
        "function": F.interpolate, 
        "args": {"inputs": img1, "mode": "nearest"}
    },
    {
        "function": F.interpolate, 
        "args": {"inputs": img2, "mode": "bilinear"}
    },
    {
        "function": F.interpolate, 
        "args": {"inputs": img3, "mode": "bicubic"}
    }
])

Of course, the implementation mentioned above is just an example. The core idea is to take a list of functions (same or different), and their respective arguments. These functions are then executed in parallel using the new API function.

Alternatives

Currently, the functionality can be implemented using vmap or python multiprocessing. However, each of these requires a lot of additional code and has a lot of added complexity. If we assume that every line to be executed in parallel is independent and all arguments are immutable, I believe the lines can easily be run in parallel on GPU.

Additional context

No response

cc @ngimel @bdhirsh

@ejguan ejguan added module: cuda Related to torch.cuda, and CUDA support in general triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module module: functionalization used for issues that are specific to functionalization (AOTAutograd bugs should start w aotdispatch) labels May 31, 2022
@ngimel ngimel added enhancement Not as big of a feature, but technically not a bug. Should be easy to fix and removed module: cuda Related to torch.cuda, and CUDA support in general module: functionalization used for issues that are specific to functionalization (AOTAutograd bugs should start w aotdispatch) labels May 31, 2022
@ngimel
Copy link
Collaborator

ngimel commented May 31, 2022

You can use cuda streams for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Not as big of a feature, but technically not a bug. Should be easy to fix triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

3 participants