Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Update on "[reland][inductor] make thread order consistent with loop …
…order" This PR relands #106827 which get reverted because of causing compilation error for some ads model. Yanbo provide a repro in one of the 14k model ( `pytest ./generated/test_KaiyangZhou_deep_person_reid.py -k test_044`). This is also the model I used to confirm the fix and come up with a unit test. In this model, we call `tritoin_heuristics.triton_config` with size_hints [2048, 2]. Previously this would result in a trition config with XBLOCK=2048 and YBLOCK=2 . But since we change the mapping between size_hints and XYZ dimension, we now generate a triton config with XBLOCK=2 and YBLOCK=2048. This fails compilation since we set max YBLOCK to be 1024. My fix is to make sure we never generate a triton config that exceeds the maximum block size. [ghstack-poisoned]
- Loading branch information