Skip to content

Conversation

xuhancn
Copy link
Collaborator

@xuhancn xuhancn commented Aug 21, 2024

MSVC don't support dynamic array. Please use std::vector to instead of it.
Ref: https://stackoverflow.com/questions/56555406/creating-dynamic-sized-array-using-msvc-c-compiler

Reproduce UT:

pytest test/inductor/test_cpu_repro.py -v -k test_reduction_with_dynamic_threads

Error msg:

C:/Users/Xuhan/AppData/Local/Temp/tmpncykej5v/a4/ca4534cazplidnf7vopaaxaifqkjiyhxm3h2gsylgztputbaeybx.cpp(13): error C2131: expression did not evaluate to a constant
C:/Users/Xuhan/AppData/Local/Temp/tmpncykej5v/a4/ca4534cazplidnf7vopaaxaifqkjiyhxm3h2gsylgztputbaeybx.cpp(13): note: failure was caused by a read of a variable outside its lifetime
C:/Users/Xuhan/AppData/Local/Temp/tmpncykej5v/a4/ca4534cazplidnf7vopaaxaifqkjiyhxm3h2gsylgztputbaeybx.cpp(13): note: see usage of 'max_threads'
C:/Users/Xuhan/AppData/Local/Temp/tmpncykej5v/a4/ca4534cazplidnf7vopaaxaifqkjiyhxm3h2gsylgztputbaeybx.cpp(16): error C3863: array type 'float [max_threads]' is not assignable

Genarated code:

#include "C:/Users/Xuhan/AppData/Local/Temp/tmpt6mxcjzi/j2/cj22tgrdgh42wbunl7gdptg2lintcziox2kmr7rdbcc6n2njrhgx.h"
extern "C" __declspec(dllexport) void kernel(const float* in_ptr0,
                       const float* in_ptr1,
                       float* out_ptr0,
                       float* out_ptr1)
{
    {
        {
            float tmp_acc0 = 0;
            at::vec::Vectorized<float> tmp_acc0_vec = at::vec::Vectorized<float>(0);
            int max_threads = omp_get_max_threads();
            float tmp_acc0_arr[max_threads];
            for (int tid = 0; tid < max_threads; tid++)
            {
                tmp_acc0_arr[tid] = 0;
            }
            at::vec::Vectorized<float> tmp_acc0_vec_arr[max_threads];
            for (int tid = 0; tid < max_threads; tid++)
            {
                tmp_acc0_vec_arr[tid] = at::vec::Vectorized<float>(0);
            }
            #pragma omp parallel

Fixed by this PR and tested on local machine:
image

cc @peterjc123 @mszhanyi @skyline75489 @nbcsm @iremyux @Blackhex @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @voznesenskym @penguinwu @EikanWang @Guobing-Chen @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

Copy link

pytorch-bot bot commented Aug 21, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/134140

Note: Links to docs will display an error until the docs builds have been completed.

❌ 14 New Failures

As of commit c692068 with merge base 1da3a04 (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@xuhancn xuhancn added module: windows Windows support for PyTorch ciflow/trunk Trigger trunk jobs on your pull request topic: not user facing topic category intel This tag is for PR from Intel labels Aug 21, 2024
@xuhancn xuhancn marked this pull request as ready for review August 21, 2024 19:45
@xuhancn xuhancn changed the title [inductor] Fix dynamic array error on MSVC. [inductor] Fix dynamic build array error on MSVC. Aug 21, 2024
@xuhancn xuhancn changed the title [inductor] Fix dynamic build array error on MSVC. [inductor] Fix dynamic array build error on MSVC. Aug 21, 2024
@xuhancn xuhancn marked this pull request as draft August 21, 2024 20:54
@xuhancn
Copy link
Collaborator Author

xuhancn commented Aug 21, 2024

Try to fix it by unique_ptr: #134156

@xuhancn xuhancn changed the title [inductor] Fix dynamic array build error on MSVC. [Don't merge] Fix dynamic array build error on MSVC. Aug 21, 2024
@xuhancn xuhancn closed this Aug 23, 2024
pytorchmergebot pushed a commit that referenced this pull request Aug 23, 2024
MSVC don't support dynamic array.
Ref: https://stackoverflow.com/questions/56555406/creating-dynamic-sized-array-using-msvc-c-compiler

We tried to solutions:
1. use std::vector to instead of it in previous PR: #134140, but it changed variable's type and failed at UTs.
2. Use `std::unique_ptr` to instead of it in PR: #134156, @jansel reviewed and give comments:  #134156 (review). It is make sense, allocation memory maybe make code run slower.
3. Use fixed size array to instead of it in PR: #134210, fixed size is hard to process the situlation, reserved size if small than CPU number.
> a. Use min() function limited is local test failed: #134210 (comment)
> b. Dynamic select fixed size or dynamic array: #134210 (comment) . It makes code too complex to maintains.

Discussed with origin PR(#115620) author @zhuhaozhe, we think:
1. MSVC it the only one compiler, which not support VLA.
2. MSVC it worse performance than other compilers, use `std::unique_ptr` for MSVC and make it works.
3. For other compilers, keep using current `VLA` code.
4. For Windows users, they can use `clang-cl` or `icx` to get better performance than MSVC.
5. Discussed with @jansel , we need to move compiler check to python side, and make output code cleaner.

Reproduce UT:
```cmd
pytest test/inductor/test_cpu_repro.py -v -k test_reduction_with_dynamic_threads
```

Error msg:
```cmd
C:/Users/Xuhan/AppData/Local/Temp/tmpncykej5v/a4/ca4534cazplidnf7vopaaxaifqkjiyhxm3h2gsylgztputbaeybx.cpp(13): error C2131: expression did not evaluate to a constant
C:/Users/Xuhan/AppData/Local/Temp/tmpncykej5v/a4/ca4534cazplidnf7vopaaxaifqkjiyhxm3h2gsylgztputbaeybx.cpp(13): note: failure was caused by a read of a variable outside its lifetime
C:/Users/Xuhan/AppData/Local/Temp/tmpncykej5v/a4/ca4534cazplidnf7vopaaxaifqkjiyhxm3h2gsylgztputbaeybx.cpp(13): note: see usage of 'max_threads'
C:/Users/Xuhan/AppData/Local/Temp/tmpncykej5v/a4/ca4534cazplidnf7vopaaxaifqkjiyhxm3h2gsylgztputbaeybx.cpp(16): error C3863: array type 'float [max_threads]' is not assignable
```
Genarated code:
```c++

#include "C:/Users/Xuhan/AppData/Local/Temp/tmpt6mxcjzi/j2/cj22tgrdgh42wbunl7gdptg2lintcziox2kmr7rdbcc6n2njrhgx.h"
extern "C" __declspec(dllexport) void kernel(const float* in_ptr0,
                       const float* in_ptr1,
                       float* out_ptr0,
                       float* out_ptr1)
{
    {
        {
            float tmp_acc0 = 0;
            at::vec::Vectorized<float> tmp_acc0_vec = at::vec::Vectorized<float>(0);
            int max_threads = omp_get_max_threads();
            float tmp_acc0_arr[max_threads];
            for (int tid = 0; tid < max_threads; tid++)
            {
                tmp_acc0_arr[tid] = 0;
            }
            at::vec::Vectorized<float> tmp_acc0_vec_arr[max_threads];
            for (int tid = 0; tid < max_threads; tid++)
            {
                tmp_acc0_vec_arr[tid] = at::vec::Vectorized<float>(0);
            }
            #pragma omp parallel
```

Pull Request resolved: #134221
Approved by: https://github.com/zhuhaozhe, https://github.com/jansel
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request intel This tag is for PR from Intel module: inductor module: windows Windows support for PyTorch open source topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants