Skip to content

Conversation

@pytorch-bot
Copy link

pytorch-bot bot commented Oct 31, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/166695

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 1 New Failure

As of commit b8cb398 with merge base 4295a9a (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

mikaylagawarecki added a commit that referenced this pull request Oct 31, 2025
ghstack-source-id: aecb681
Pull Request resolved: #166695
@github-actions
Copy link
Contributor

This PR needs a release notes: label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

mikaylagawarecki added a commit that referenced this pull request Oct 31, 2025
ghstack-source-id: a1dab1c
Pull Request resolved: #166695
mikaylagawarecki added a commit that referenced this pull request Oct 31, 2025
ghstack-source-id: 0bc9afd
Pull Request resolved: #166695
mikaylagawarecki added a commit that referenced this pull request Oct 31, 2025
ghstack-source-id: c918b82
Pull Request resolved: #166695
mikaylagawarecki added a commit that referenced this pull request Nov 3, 2025
ghstack-source-id: f439b80
Pull Request resolved: #166695
const int64_t end,
const int64_t grain_size,
const F& f) {
if (end - begin < grain_size) {
Copy link
Contributor Author

@mikaylagawarecki mikaylagawarecki Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@malfet want to double check this matches what you asked for on chat (I interpreted that you meant end - start < grain_size)

May be the logic we want to preserve in header only if range is trivial (i.e. end-start < 1) we just run this thing sequentially

If I misunderstood and you actually meant we want to preserve this in the header I'll redo this

const bool use_parallel =
(numiter > grain_size && numiter > 1 && !at::in_parallel_region() &&
at::get_num_threads() > 1);
if (!use_parallel) {
internal::ThreadIdGuard tid_guard(0);
c10::ParallelGuard guard(true);
f(begin, end);
return;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this is an optimization that could be landed later, and may be not needed here, i.e. often when dev calls at::parallel_for their regions should be reasonably large

const int64_t grain_size,
const F& f) {
if (end - begin < grain_size) {
f(begin, end);
Copy link
Contributor Author

@mikaylagawarecki mikaylagawarecki Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I didn't add this as I wasn't sure if it's actually needed

internal::ThreadIdGuard tid_guard(0);
c10::ParallelGuard guard(true);

Let me know if I should shim and add these here

@mikaylagawarecki mikaylagawarecki marked this pull request as ready for review November 3, 2025 21:45
Copy link
Contributor

@swolchok swolchok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks fine to me except possibly the TODO. leaving for @malfet since it's in response to a comment of his IIUC

Comment on lines +724 to +725
int64_t sizes[] = {size};
int64_t strides[] = {1};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you might find that you get internal lint complaints about C arrays because of this. Since you specifically need an array of size 1, you can just not make these arrays:

int64_t stride = 1;

aoti_torch_empty_strided(
  1,
  &size,
  &stride,
  // ...


// Get the current thread index in a parallel region
// Returns 0 if not in a parallel region
AOTI_TORCH_EXPORT int32_t torch_get_thread_idx();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a document on signed vs unsigned types? I.e. what thread_idx == -1 means?
Is there an expectations that it should be in range [0, torch_get_max_threads())?

const int64_t end,
const int64_t grain_size,
const F& f) {
if (end - begin < grain_size) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this is an optimization that could be landed later, and may be not needed here, i.e. often when dev calls at::parallel_for their regions should be reasonably large

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants