Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Illegal instruction (core dumped) : PyTorch 2.0 on Raspberry Pi 4.0 8gb #97226

Closed
P-Blackburn opened this issue Mar 21, 2023 · 23 comments
Closed
Labels
module: arm Related to ARM architectures builds of PyTorch. Includes Apple M1 module: binaries Anything related to official binaries that we release to users triage review triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Milestone

Comments

@P-Blackburn
Copy link

P-Blackburn commented Mar 21, 2023

🐛 Describe the bug

Virgin install on Raspberry Pi 4.0 8gb

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
python3 -c "import torch;print(torch.__version__)"

results in:
Illegal instruction. and a core dump (on Ubuntu) and fails on the basic 64 Raspberry Pi Debian based OS...

HOWEVER:

Ignoring the installation instructions on https://pytorch.org/, and instead installing with:

pip3 install https://download.pytorch.org/whl/torch-2.0.0-cp310-cp310-manylinux2014_aarch64.whl

results in version of torch 2.0.0 that will work on the Raspberry Pi 4

Versions

executing:

wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py
# For security purposes, please check the contents of collect_env.py before running it.
python collect_env.py

also results in Illegal instruction

(further - uninstalling torch 2.0.0 and attempting to install 1.13.1 per the instructions on https://pytorch.org/get-started/previous-versions/ i.e.

pip install torch==1.13.1+cpu torchvision==0.14.1+cpu torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cpu

results in

ERROR: Could not find a version that satisfies the requirement torch==1.13.1+cpu
ERROR: No matching distribution found for torch==1.13.1+cpu

that said an install of the form:

pip install torch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cpu

works a treat and

python3 -c "import torch;print(torch.__version__);print(torch.rand(3))"

returns what is expected:

1.13.1
tensor([0.1033, 0.6867, 0.4403])

python collect_env.py against the 1.13.1 works and returns:

Collecting environment information...
PyTorch version: 1.13.1
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: Debian GNU/Linux 11 (bullseye) (aarch64)
GCC version: (Debian 10.2.1-6) 10.2.1 20210110
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.31

Python version: 3.9.2 (default, Feb 28 2021, 17:03:44) [GCC 10.2.1 20210110] (64-bit runtime)
Python platform: Linux-6.1.19-v8+-aarch64-with-glibc2.31
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture: aarch64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
Vendor ID: ARM
Model: 3
Model name: Cortex-A72
Stepping: r0p3
CPU max MHz: 1800.0000
CPU min MHz: 600.0000
BogoMIPS: 108.00
L1d cache: 128 KiB
L1i cache: 192 KiB
L2 cache: 1 MiB
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Retbleed: Not affected
Vulnerability Spec store bypass: Vulnerable
Vulnerability Spectre v1: Mitigation; __user pointer sanitization
Vulnerability Spectre v2: Vulnerable
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Flags: fp asimd evtstrm crc32 cpuid

Versions of relevant libraries:
[pip3] numpy==1.19.5
[pip3] torch==1.13.1
[pip3] torchaudio==0.13.1
[pip3] torchvision==0.14.1
[conda] Could not collect

cc @ezyang @seemethere @malfet

@janeyx99 janeyx99 added module: build Build system issues triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Mar 21, 2023
@malfet
Copy link
Contributor

malfet commented Mar 22, 2023

@P-Blackburn can you please attach gdb to the process and print the instruction that causes an issue?
Also, can you try installing torch==1.13.0 and see if invalid instruction issue reproduces there. (I suspect it's ACL/MKLDNN integration)

@malfet malfet added module: binaries Anything related to official binaries that we release to users module: arm Related to ARM architectures builds of PyTorch. Includes Apple M1 and removed module: build Build system issues labels Mar 22, 2023
@P-Blackburn
Copy link
Author

Hi yes torch==1.13.0 works, also as I said in the main post pip3 install https://download.pytorch.org/whl/torch-2.0.0-cp310-cp310-manylinux2014_aarch64.whl 'works' with a version of PyTorch 2.0.0.

What doesn't work is pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu which essentially is the equivalent of pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/torch-2.0.0-1-cp310-cp310-manylinux2014_aarch64.whl

I am not very familiar with gdb, so please bear with me and direct me if I need to do something differently....
I installed gdb with sudo apt install gdb and installed sudo pip install pygdbmi

I then created a test.py of:

import torch
print(torch.__version__)

having then installed PyTorch with: pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

I get the following output:

pi@TRACY:~$ vi test.py
pi@TRACY:~$ python3 test.py
Illegal instruction (core dumped)
pi@TRACY:~$ gdb python3
GNU gdb (Ubuntu 12.1-3ubuntu2) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "aarch64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python3...
(No debugging symbols found in python3)
(gdb) run test.py
Starting program: /usr/bin/python3 test.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".

Program received signal SIGILL, Illegal instruction.
0x0000ffffed488930 in std::string::_Rep::_M_dispose(std::allocator<char> const&) [clone .part.0] () from /usr/local/lib/python3.10/dist-packages/torch/lib/../../torch.libs/libarm_compute-d27e629c.so

I removed PyTorch and installed torch==1.13.0 and also put that through gdb and the output was:

pi@TRACY:~$ python3 test.py
1.13.0
pi@TRACY:~$ gdb python3
GNU gdb (Ubuntu 12.1-3ubuntu2) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "aarch64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python3...
(No debugging symbols found in python3)
(gdb) run test.py
Starting program: /usr/bin/python3 test.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
[New Thread 0xffffeb34f1a0 (LWP 12077)]
[New Thread 0xffffeab3f1a0 (LWP 12078)]
[New Thread 0xffffe632f1a0 (LWP 12079)]
1.13.0
[Thread 0xffffe632f1a0 (LWP 12079) exited]
[Thread 0xffffeab3f1a0 (LWP 12078) exited]
[Thread 0xffffeb34f1a0 (LWP 12077) exited]
[Inferior 1 (process 12075) exited normally]

@malfet
Copy link
Contributor

malfet commented Mar 22, 2023

@P-Blackburn thank you very much for detailed repro (alas, my Raspberry Pi is still running 32-bit linux, guess it's time to upgrade)
Most likely regression caused by pytorch/builder#1291 and the fact that ACL integration was enabled, see #96983
@snadampal is it really necessary to compile ACL with arch=armv8.2-a

@malfet malfet added this to the 2.0.1 milestone Mar 22, 2023
@snadampal
Copy link
Collaborator

Hi @malfet , I'm surprised it's working on PT 1.13.0 but not on 2.0.0.1 because we have enabled the MKLDNN+ACL integration in 1.13.0 itself, and ACL was built witharch=armv8.2-a even for 1.13.0. Of course we have upgraded the ACL version this time. Let me check if armv8a is sufficient.

@P-Blackburn
Copy link
Author

@malfet @snadampal I'm around for the next 12 hours so if you would like me to run any other tests, or provide you with ssh access to the PI4 - ping me. :)

@snadampal
Copy link
Collaborator

Hi @P-Blackburn , just to be double sure your 1.13.0 has mkldnn+acl, could you please run this on 1.13.0 installation and share the output.
$ python -c "import torch; print(torch.__version__, torch.backends.mkldnn.is_available())"

Is the illegal instruction coming from any particular gemm kernel? I'm trying to understand whether the issue is specific to PT2.0 ACL version or present even on PT1.13 ACL.

@P-Blackburn
Copy link
Author

Hi @snadampal - I installed and tested 1.13.0 and 1.13.1 one was True the other was False

torch==1.13.0 True

peter@TRACY:~$ sudo pip install torch==1.13.0                                       Collecting torch==1.13.0
  Downloading torch-1.13.0-cp310-cp310-manylinux2014_aarch64.whl (73.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 73.2/73.2 MB 1.2 MB/s eta 0:00:00
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.10/dist-packages (from torch==1.13.0) (4.5.0)
Installing collected packages: torch
Successfully installed torch-1.13.0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
peter@TRACY:~$ python3 -c "import torch; print(torch.__version__, torch.backends.mkldnn.is_available())"
1.13.0 True
peter@TRACY:~$

torch==1.13.1 False

peter@TRACY:~$ sudo pip install torch==1.13.1
Collecting torch==1.13.1
  Using cached torch-1.13.1-cp310-cp310-manylinux2014_aarch64.whl (60.5 MB)
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.10/dist-packages (from torch==1.13.1) (4.5.0)
Installing collected packages: torch
Successfully installed torch-1.13.1
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
peter@TRACY:~$ python -c "import torch; print(torch.__version__, torch.backends.mkldnn.is_available())"
Command 'python' not found, did you mean:
  command 'python3' from deb python3
  command 'python' from deb python-is-python3
peter@TRACY:~$ python3 -c "import torch; print(torch.__version__, torch.backends.mkldnn.is_available())"
1.13.1 False
peter@TRACY:~$

@P-Blackburn
Copy link
Author

P-Blackburn commented Mar 22, 2023

Hi @snadampal in answering the part of your question "Is the illegal instruction coming from any particular gemm kernel" Please do forgive me if I have not understood the question properly. However, I believe that simply trying to "import torch" when the installed version is torch-2.0.0-1-cp310-cp310-manylinux2014_aarch64.whl (which is the version installed following the instructions on the PyTorch.org installation page giving installation as: pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu immediately causes the illegal instruction and core dump

peter@TRACY:~$ sudo pip install torch
Collecting torch
  Using cached torch-2.0.0-1-cp310-cp310-manylinux2014_aarch64.whl (74.3 MB)
Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch) (1.11.1)
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.10/dist-packages (from torch) (4.5.0)
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch) (3.10.0)
Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch) (3.0)
Requirement already satisfied: jinja2 in /usr/lib/python3/dist-packages (from torch) (3.0.3)
Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from sympy->torch) (1.3.0)
Installing collected packages: torch
Successfully installed torch-2.0.0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
peter@TRACY:~$ python3 -c "import torch; print(torch.__version__, torch.backends.mkldnn.is_available())"
Illegal instruction (core dumped)
peter@TRACY:~$

@malfet
Copy link
Contributor

malfet commented Mar 22, 2023

@P-Blackburn it would be really nice if you can give me temp access to your Raspberry Pi (I'm using mine for home automation, so wouldn't even try to upgrade it until the weekend) Would you mind adding https://github.com/malfet.keys for next 48 hours and send me an IP address (can invite you to collaborate on a private repo, or send me an email with it)

@malfet
Copy link
Contributor

malfet commented Mar 22, 2023

Is the illegal instruction coming from any particular gemm kernel? I'm trying to understand whether the issue is specific to PT2.0 ACL version or present even on PT1.13 ACL.

@snadampal No, it is not, as visible from the backtrace posted in #97226 (comment) :

Program received signal SIGILL, Illegal instruction.
0x0000ffffed488930 in std::string::_Rep::_M_dispose(std::allocator<char> const&) [clone .part.0] () from /usr/local/lib/python3.10/dist-packages/torch/lib/../../torch.libs/libarm_compute-d27e629c.so

And instruction it crashes at are the following:

(gdb) disassemble 
Dump of assembler code for function _ZNSs4_Rep10_M_disposeERKSaIcE.part.0:
   0x0000ffffed3a8904 <+0>:	adrp	x2, 0xffffed7ec000
   0x0000ffffed3a8908 <+4>:	ldr	x2, [x2, #1336]
   0x0000ffffed3a890c <+8>:	cbnz	x2, 0xffffed3a8928 <_ZNSs4_Rep10_M_disposeERKSaIcE.part.0+36>
   0x0000ffffed3a8910 <+12>:	ldr	w2, [x0, #16]
   0x0000ffffed3a8914 <+16>:	sub	w3, w2, #0x1
   0x0000ffffed3a8918 <+20>:	str	w3, [x0, #16]
   0x0000ffffed3a891c <+24>:	cmp	w2, #0x0
   0x0000ffffed3a8920 <+28>:	b.le	0xffffed3a8938 <_ZNSs4_Rep10_M_disposeERKSaIcE.part.0+52>
   0x0000ffffed3a8924 <+32>:	ret
   0x0000ffffed3a8928 <+36>:	add	x3, x0, #0x10
   0x0000ffffed3a892c <+40>:	mov	w2, #0xffffffff            	// #-1
=> 0x0000ffffed3a8930 <+44>:	ldaddal	w2, w2, [x3]
   0x0000ffffed3a8934 <+48>:	b	0xffffed3a891c <_ZNSs4_Rep10_M_disposeERKSaIcE.part.0+24>
   0x0000ffffed3a8938 <+52>:	b	0xffffed0ce090 <_ZNSs4_Rep10_M_destroyERKSaIcE@plt>

I.e. it's likely coming from the global constructor, and that's exactly the reason why PyTorch for x86 is not compiled with AVX512 extensions by default, only part of it that is prohibited from having any global constructors that would be executed irrespective of the arch guards.

@pytorch pytorch deleted a comment from P-Blackburn Mar 22, 2023
@malfet
Copy link
Contributor

malfet commented Mar 22, 2023

@P-Blackburn thank you very much, I'm in

@malfet
Copy link
Contributor

malfet commented Mar 23, 2023

Downgrading ACL to 22.05 moves crash from global constructor in the library, to ideep registration:

#0  0x0000ffffed0ca824 in arm_compute::cpuinfo::num_threads_hint() () from /home/nikita/.local/lib/python3.10/site-packages/torch/lib/../../torch.libs/libarm_compute-0268506b.so
#1  0x0000ffffed113498 in arm_compute::IScheduler::IScheduler() () from /home/nikita/.local/lib/python3.10/site-packages/torch/lib/../../torch.libs/libarm_compute-0268506b.so
#2  0x0000ffffed116c8c in arm_compute::Scheduler::get() () from /home/nikita/.local/lib/python3.10/site-packages/torch/lib/../../torch.libs/libarm_compute-0268506b.so
#3  0x0000fffff36c950c in std::call_once<dnnl::impl::cpu::aarch64::acl_thread_utils::acl_thread_bind()::{lambda()#1}>(std::once_flag&, dnnl::impl::cpu::aarch64::acl_thread_utils::acl_thread_bind()::{lambda()#1}&&)::{lambda()#2}::_FUN() () from /home/nikita/.local/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so
#4  0x0000fffff7d54508 in __pthread_once_slow (once_control=0xfffff5a64050 <dnnl::impl::cpu::aarch64::acl_thread_utils::acl_thread_bind()::flag_once>, 
    init_routine=0xffffedbc0d90 <std::__once_proxy()>) at ./nptl/pthread_once.c:116
#5  0x0000fffff36c95bc in dnnl::impl::cpu::aarch64::acl_thread_utils::acl_thread_bind() () from /home/nikita/.local/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so
#6  0x0000fffff2fe79ec in dnnl_engine_create () from /home/nikita/.local/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so
#7  0x0000ffffef3a16d4 in ideep::engine::cpu_engine() () from /home/nikita/.local/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so
#8  0x0000ffffeea0de0c in _GLOBAL__sub_I_IDeepRegistration.cpp () from /home/nikita/.local/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so

which is the same ldaddal instruction:

 x/i $pc
=> 0xffffed0ca824 <_ZN11arm_compute7cpuinfo16num_threads_hintEv+2192>:	ldaddal	w2, w2, [x1]

@nSircombe
Copy link
Contributor

Hi @malfet
I'm able to reproduce the illegal instruction running ACL's validation quite on Cortex-A72. I've not gone back to check earlier releases. It looks like an issue with the 'multi-isa' build, we'll have to take a closer look.

@P-Blackburn
Copy link
Author

Hi @malfet and @nSircombe - I have also tested on the Raspberry Pi3 and found that the same illegal instruction and core dump also affects Cortex-A53.

@nSircombe
Copy link
Contributor

I did bit more digging on ACL's multi_isa support - there are problems for <v8.2. Irrespective of what arch is set in the scons build, a multi_isa only has runtime selection of features since v8.2 - so there are 8.0 and 8.1 instructions (like ldaddal which I think came with v8.1) present in the multi-isa build.

@malfet
Copy link
Contributor

malfet commented Mar 23, 2023

@nSircombe, at the very least, there are no global constructor issue if one uses 22.05 vs v22.11

@malfet
Copy link
Contributor

malfet commented Mar 27, 2023

Small update: the same crash is observable on AWS A1 instances
Also, I wonder whether the crash is reproducible on M1 machines running aarch64 docker

@malfet malfet added triage review and removed triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Mar 27, 2023
@drisspg drisspg added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Mar 27, 2023
@snadampal
Copy link
Collaborator

snadampal commented Mar 29, 2023

I have raised this PR (pytorch/builder#1370) to address this issue.
I have tested torchbench resnet50 and bert benchmarks on A72 and observed no crashes even with gemm execution.

@35grain
Copy link

35grain commented Apr 1, 2023

Can confirm this is happening with the latest torch 2.0.0 version when importing the package on Raspberry Pi 4 4GB Ubuntu.

@atalman
Copy link
Contributor

atalman commented Apr 4, 2023

cc @snadampal . Thank you for cherry-picks. Please make sure to have all cherry-picks ready for April 14, 5PM PST.

This issue is in the milestones : https://github.com/pytorch/pytorch/milestone/36?closed=1, if you want to see your fix included in this minor release. Please post it as a cherry-pick into the [v2.0.1] Release Tracker.

The deadline is April 14, 5PM PST.

Only issues that have ‘cherry-picks’ will be considered for the release.

Common FAQs:

Q1: Where can I find more information on the release process and terminology?

A: pytorch/RELEASE.md at master · pytorch/pytorch · GitHub

Q2: Am I guaranteed to be included in the cherry-pick if I do above?

A: No, it is not guaranteed, the Release Team will review all submissions against the listed criteria before making the final decision on what to include on 4/17.

Q3: When is 2.1 going to be released?

A: We do not have a formal date at this time but will update the community when we do. Our immediate focus is 2.0.1. Note that 1.12 was released on 6/28/22, 1.13 on 10/28/22 and 2.0 on 3/15/23.

Q4: I missed the 4/14 5PM PST deadline, is there any option to have an extension?

A: No, in order to meet our 4/28 goal, we must hold 4/14 as our deadline and will not accept any requests after the fact. We are over communicating the timelines and process with the community to avoid such issues.

Q5: Where should I double check to see if my issue is in the cherry pick tracker?

A: [v2.0.1] Release Tracker · Issue #97272 · pytorch/pytorch · GitHub

Q6: Where can I find the Release Compatibility Matrix for PyTorch?

A: pytorch/RELEASE.md at master · pytorch/pytorch · GitHub

Please contact OSS Releng team members if you have any questions/comments. Again we appreciate everyone’s time and commitment to the community, PyTorch and 2.0 and 2.01 releases!

Please refer to this post for more details: https://dev-discuss.pytorch.org/t/pytorch-release-2-0-1-important-information/1176

@snadampal
Copy link
Collaborator

thanks, @atalman . I have posted this cherrypick request in the PyTorch 2.0.1 release tracker

@malfet
Copy link
Contributor

malfet commented May 3, 2023

Tested that 2.0.1 release candidate binaries (available at https://download.pytorch.org/whl/test/cpu ) no longer exhibit the problem on AWS A1 instances:

$ pip3 install torch --index-url https://download.pytorch.org/whl/test/cpu
$ python3 -mtorch.utils.collect_env
Collecting environment information...
PyTorch version: 2.0.1
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.2 LTS (aarch64)
GCC version: (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.35

Python version: 3.10.6 (main, Mar 10 2023, 10:55:28) [GCC 11.3.0] (64-bit runtime)
Python platform: Linux-5.15.0-1031-aws-aarch64-with-glibc2.35
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:                    aarch64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
CPU(s):                          8
On-line CPU(s) list:             0-7
Vendor ID:                       ARM
Model name:                      Cortex-A72
Model:                           3
Thread(s) per core:              1
Core(s) per socket:              4
Socket(s):                       2
Stepping:                        r0p3
BogoMIPS:                        166.66
Flags:                           fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
L1d cache:                       256 KiB (8 instances)
L1i cache:                       384 KiB (8 instances)
L2 cache:                        4 MiB (2 instances)
NUMA node(s):                    1
NUMA node0 CPU(s):               0-7
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Mmio stale data:   Not affected
Vulnerability Retbleed:          Not affected
Vulnerability Spec store bypass: Not affected
Vulnerability Spectre v1:        Mitigation; __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Branch predictor hardening, BHB
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected

Versions of relevant libraries:
[pip3] torch==2.0.1
[conda] Could not collect
$ python3 -c "import torch;x=torch.rand(3, 3);print(torch.linalg.svd(torch.mm(x,x.t())))"
<string>:1: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at /root/pytorch/torch/csrc/utils/tensor_numpy.cpp:84.)
torch.return_types.linalg_svd(
U=tensor([[-0.4932, -0.0094, -0.8699],
        [-0.8040, -0.3770,  0.4599],
        [-0.3322,  0.9262,  0.1783]]),
S=tensor([1.9125e+00, 6.0053e-01, 1.5409e-03]),
Vh=tensor([[-0.4932, -0.8040, -0.3322],
        [-0.0094, -0.3770,  0.9262],
        [-0.8699,  0.4599,  0.1783]]))

@aseok
Copy link

aseok commented Jun 7, 2023

Facing same issue at 'import torch' with pytorch 2.0.1 rocm 5.4.2 on a pentium g4400 (no avx)

malfet pushed a commit to pytorch/builder that referenced this issue Sep 28, 2023
atalman added a commit to atalman/builder that referenced this issue Oct 2, 2023
atalman added a commit to pytorch/builder that referenced this issue Oct 2, 2023
…Pie (#1562)

* [aarch64] set acl_build_flags arch=armv8a, remove editing build flags (#1550)

Looking at this PR:
#1370
this line:
https://github.com/pytorch/builder/pull/1370/files#diff-54480d0a69ca27f54fb0736a9762caa8b03bd4736dcd77190d99ec3033c9bd2fR229

That fixed the issue:
pytorch/pytorch#97226

One of the changes is to set 
```
arch=armv8a
```
We are experiencing the same issue now: pytorch/pytorch#109312
Hence this fix.

* [aarch64] patch mkl-dnn to use 'march=armv8-a' as the default build (#1554)

* [aarch64] patch pytorch 2.1 for mkl-dnn fix (#1555)

* patch ci script with mkldnn fix (#1556)

* Fix path issue when building aarch64 wheels (#1560)

---------

Co-authored-by: snadampal <87143774+snadampal@users.noreply.github.com>
jithunnair-amd added a commit to ROCm/builder that referenced this issue Feb 22, 2024
* Set FORCE_RPATH for ROCm (pytorch#1468)

* Decouple aarch64 ci setup and build (pytorch#1470)

* Run  git update-index --chmod=+x aarch64_ci_setup.sh (pytorch#1471)

* [aarch64][CICD]Add aarch64 docker image build. (pytorch#1472)

* Add aarch64 docker image build

* removing ulimit for PT workflow

* set aarch64 worker for docker build

* Fix `install_conda.sh`

By pinning conda version to 23.5.2 as latest(23.7.2 at this time) does not have a compatible version of `git` packages

Fixes pytorch#1473

* Remove explicit `conda install cmake`

As it's already done as part of `common/install_conda.sh` script

* update to CUDA 12.1U1 (pytorch#1476)

Should fix  pytorch/pytorch#94772 in wheel builds

* Use conda version 23.5.2 for conda pytorch build (pytorch#1477)

* Use py311 miniconda install (pytorch#1479)

* Windows conda build fix (pytorch#1480)

* Revert "Use py311 miniconda install (pytorch#1479)" (pytorch#1481)

This reverts commit 5585c05.

* Remove c/cb folder on windows (pytorch#1482)

* Add numpy install - fix windows smoke tests (pytorch#1483)

* Add numpy install

* Add numpy install

* Add hostedtoolcache purge step (pytorch#1484)

* Add hostedtoolcache purge step

* Change step name

* Update CUDA_UPGRADE_GUIDE.MD

* update CUDA to 12.1U1 for Windows (pytorch#1485)

* Small improvements in build pytorch script (pytorch#1486)

* Undo using conda activate (pytorch#1487)

* Update meta.yaml (pytorch#1389)

* Add pytorch-triton-rocm as an install dependency for ROCm (pytorch#1463)

* Add pytorch-triton-rocm as an install dependency for ROCm

* Update build_rocm.sh

* Add aarch64 to validation framework (pytorch#1474)

* Add aarch64 to validation framework (pytorch#1489)

* Add aarch64 to validation framework (pytorch#1490)

* Add aarch64 to validation framework

* Add aarch64 to validation framework

* Add aarch64 to validation framework (pytorch#1491)

* Add aarch64 to validation framework

* Add aarch64 to validation framework

* Add aarch64 to validation framework

* Temporary disable poetry test (pytorch#1492)

* Add torchonly option to validation workflows (pytorch#1494)

* Add torchonly option to validation workflows

* fix typo

* Remove pipy validation temporarily (pytorch#1495)

* Remove pipy validation temporarily (pytorch#1496)

* Add no-sudo to linux-aarch64 tests (pytorch#1499)

* Pass container image to aarch64 test jobs (pytorch#1500)

* Add setup aarch64 builds for aarch64 testing (pytorch#1501)

* Fix DESIRED_PYTHON setting for aarch64 validations (pytorch#1502)

* Use extra-index-url for aarch64 builds (pytorch#1503)

* Pypi validation enable (pytorch#1504)

* Validation pypi torchonly (pytorch#1505)

* Pipy validation workflow (pytorch#1506)

* Pipy validation workflow (pytorch#1507)

* Pipy validation workflow (pytorch#1508)

* Pipy validation workflow (pytorch#1509)

* Validate poetry workflow (pytorch#1511)

* Validate poetry workflow (pytorch#1512)

* Remove linux-aarch64 installation workaround (pytorch#1513)

* Temporary change test aarch64 builds (pytorch#1514)

* Remove torchonly restictions from aarch64 builds (pytorch#1517)

* Fix aarch64 nightly/release version override (pytorch#1518)

* Aarch64 fix overrdie passing from CI to build

* Aarch64 fix overrdie passing from CI to build

* Aarch64 fix overrdie passing from CI to build

* Revert "Temporary change test aarch64 builds (pytorch#1514)" (pytorch#1521)

This reverts commit 1e281be.

* Changes related to OVERRIDE_PACKAGE_VERSION in aarch64 builds (pytorch#1520) (pytorch#1523)

* Torchmetrics in S3 Index (pytorch#1522)

We will need the stable torchmetrics wheel in the S3 index, since torchrec depends on it. This is similar to how pytorch depends on numpy, etc. and these binaries need to be hosted in our index when uses try to pip install from download.pytorch.org.

* [aarch64] update ACL version to v23.05.1 and OpenBLAS to v0.3.20 (pytorch#1488)

* Changed runner for linux arm64 (pytorch#1525)

* Add torch-tensorrt to S3 PyPI Index (pytorch#1529)

As pytorch/tensorrt moves off of CCI onto Nova, we must to host their nightlies on our S3 index. This change allows the indexing to occur correctly for this package.

* Enable torch compile for python 3.11 smoke tests (pytorch#1534)

* Enable torch compile for python 3.11 smoke tests

* Make sure release is covered

* Fix typo

* add jinja2 (pytorch#1536)

* Remove restriction on 3.11 (pytorch#1537)

* Revert "add jinja2 (pytorch#1536)" (pytorch#1538)

This reverts commit 224a4c5.

* S3 Management Job Outside Docker (pytorch#1531)

* S3 Management Job Outside Docker

* job name

* remove failfast

* no matrix

* inherit secrets

* spacing?

* random nits

* add back secrets

* add back matrix

* export env vars correctlty

* Update update-s3-html.yml

* Add fbgemm-gpu to S3 Index (pytorch#1539)

* Update builder images to ROCm5.7 (pytorch#1541)

* Update docker build images for rocm5.7

* Fix erroneous logic that was skipping msccl files even for ROCm5.6; update msccl path for ROCm5.7

(cherry picked from commit 36c10cc)

* missing bzip2 package install for miopen

* Revert "missing bzip2 package install for miopen"

This reverts commit 8ef5fc9.

* ROCm 5.7 MIOpen does not need any patches, do not build from source

---------

Co-authored-by: Jeff Daily <jeff.daily@amd.com>

* Update docker build convenience scripts to ROCm5.7 (pytorch#1543)

* Do not uninstall MIOpen if skipping build-from-source (pytorch#1544)

* Install nvtx3 on Windows (pytorch#1547)

* Provide file hashes in the URLs to avoid unnecessary file downloads (bandwidth saver) (pytorch#1433)

Supply sha256 query parameters using boto3 to avoid hundreds of extra Gigabytes of downloads each day during pipenv and poetry resolution lock cycles.

Fixes point 1 in pytorch/pytorch#76557
Fixes pytorch#1347

* Workaround for older files

* Bugfixes introduced by pytorch#1433

Replace `obj` with `obj.key` in few places
Dismantle pyramid of doom while iterating over objects

Test plan: Run `python manage.py whl/test --generate-pep503`

* [S3_management] Update boto3 to 1.28.53

* [manage_s3] Download objects metadata concurrently

Using `concurrent.futures.ThreadPoolExecutor`
This speeds up rebuilding `whl/test` index from 300 sec to 90 sec on my
laptop

* Make smoke-test runnable without envvars

* [aarch64] set acl_build_flags arch=armv8a, remove editing build flags (pytorch#1550)

Looking at this PR:
pytorch#1370
this line:
https://github.com/pytorch/builder/pull/1370/files#diff-54480d0a69ca27f54fb0736a9762caa8b03bd4736dcd77190d99ec3033c9bd2fR229

That fixed the issue:
pytorch/pytorch#97226

One of the changes is to set 
```
arch=armv8a
```
We are experiencing the same issue now: pytorch/pytorch#109312
Hence this fix.

* [BE] Fix all flake8 violations in `smoke_test.py` (pytorch#1553)

Namely:
 - `if(x):` -> `if x:`
 - `"dev\d+"` -> `"dev\\d+"`
 - Keep 2 newlines between functions
 - Add `assert foo is not None` to suppress "variable assigned but not used" warning

* [aarch64] patch mkl-dnn to use 'march=armv8-a' as the default build (pytorch#1554)

* [aarch64] patch pytorch 2.1 for mkl-dnn fix (pytorch#1555)

* patch ci script with mkldnn fix (pytorch#1556)

* [BE] Add lint workflow (pytorch#1557)

And format `smoke_test.py` with `ruff`
Invoke/confgure `ruff` using `lintrunner`
Copy lint runner adapters from https://github.com/pytorch/pytorch/tree/main/tools/linter/adapters

* [BE] Add `s3_management` to the linted folders (pytorch#1558)

Add `PERF401` to list of ignored suggestions, fix the rest.

* Fix path issue when building aarch64 wheels (pytorch#1560)

* Fix linalg smoke tests (pytorch#1563)

* Towards enabling M1 wheel builds

Do not try to install MKL on Apple Silicon

* And only install llvm-9 on x86 systems

* Do not build tests when building natively on M1

* And fix Python-3.8 native compilation on M1

There are no numpy=3.17 for M1

* Release 2.1 update promotion scripts (pytorch#1564)

* [BE] Small code cleanup

Fold multiple inidices and single index generation into one loop

As loop body is the same anyway...

* S3_management: Add option to  compute sha256

That will be used later to generate sha256 indexes in PEP503

* Remove debug print

* [S3_management] Minor improvements

- Refactor `fetch_obj_names` into class method
- Make sure that object remains public when ACL is computed
- Add `has_public_read` and `grant_public_read` class methods

* s3_management: compute checksum in cloud

I.e. file never gets downloaded on the client, which is a nice thing

* [S3Management] Add `undelete_prefix` method

That can be used to recover object in a versioned bucket

* Validate poetry for release (pytorch#1567)

* Validate poetry for release

* test

* test

* fixtypo

* Use released version of 3.12 (pytorch#1568)

As it was released on Oct 6 2023: https://www.python.org/downloads/release/python-3120/

* Move manywheel builds to `linux.12xlarge.ephemeral` (pytorch#1569)

Should be faster(<20 min vs 40+ min) and as secure as using GH ones

* Add cuSparseLt-0.5.0 to manywheel images

* Use `linux.12xlarge.ephemeral` for conda docker builds (pytorch#1570)

As `ubuntu.20.04` often OOM/failed to fetch data from RHEL repo

* Revert "Add cuSparseLt-0.5.0 to manywheel images"

This reverts commit 00841b6 as
cuSparseLT is not compatible with CentOS 7

* Move libtorch docker builder to `linux.12xlarge.ephemeral` (pytorch#1571)

As running it on `ubutu22.04` often results in flay infra failures/running out of disk space, for example, from https://github.com/pytorch/builder/actions/runs/6484948230/job/17609933012
```
cat: write error: No space left on device
```

* Add cuSparseLt-0.4.0 to manywheel images

But set USE_CUSPARSELT to 0 by default

* Add xformers to the list of indexable packages

* Build wheels with cuSparseLt

Build libtorch without cuSparseLt so far

Factor out `DEPS_LIST` to top level and add cuSparseLt of
`USE_CUSPARSELT` is set to 1

Tested in pytorch/pytorch#111245

* Do not build conda with CuSparseLT

* Add ROCM_PATH env var to Dockerfile for ROCm5.7 issue with finding HIP (pytorch#1572)

* [aarch64_wheel] Minor typing improvements

* [aarch64_wheel] Flake8 fix

* [aarch64_wheel] Cosmetic changes

* [aarch64_wheel] Fix readdir crash

Probably fixes pytorch/pytorch#111695

* [S3_management] generate libtorch index.html

* [CI] Update ruff to 0.1.1

To keep it in sync with pytorch

* Get rid of http://repo.okay.com.mx (pytorch#1575)

* [S3_management] Print time it takes to fetch index

* [S3_manage] Handle invalid versions

* [S3_management] Fix Version on error

And fix flake8 lint violation

* [S3_Management] Refactor `from_S3`

Move `fetch_metadata` into its own method, which could be called later on

Make S3Object non-frozen and introduce implicit __hash__ method

* [S3_Management] Filter nighly before `fetch_metadata`

This reduces time to call `from_S3Index` from 600 to 80 sec

* Add option to build -arm64- libtorch binaries

* [Docker] Remove trailing whitespace

And cause docker rebuild, to overwrite docker build from release/2.1
branch artifacts

* [MacOS] Small changes to libtorch naming

Intel x86 libtorch builds will have `x86_64` suffix and Apple Silicon ones will have `arm64` ones, but latest will point to Intel ones for now.

* Update libtorch/Dockerfile to use Ubuntu-20.04 (pytorch#1578)

As 18.04 EOLed

* Conda builds should respect `MAX_JOBS`

May be this help with OOMs

* [S3_management] Fix subpackage urls

Make them `lower()`

* Advance versions for release 2.1.1 (pytorch#1583)

* [aarch64] Release pypi prep script change for aarch64 builds (pytorch#1585)

* Changes needed for core enablement of 3.12 binary wheels (pytorch#1586)

* Fix aarch64 build on 3.8 (pytorch#1593)

* Add some more validation checks for torch.linalg.eigh and torch.compile (pytorch#1580)

* Add some more validation checks for torch.linalg.eigh and torch.compile

* Update test

* Also update smoke_test.py

* Fix lint

* Revert "Add some more validation checks for torch.linalg.eigh and torch.compile (pytorch#1580)" (pytorch#1594)

This reverts commit 4c7fa06.

* Release validations using release version matrix (pytorch#1611)

* Release pypi prep change (pytorch#1587)

* [aarch64] Release pypi prep script change for aarch64 builds

* Release versions for testing

Testing calling version (pytorch#1588)

Upstream/release validations (pytorch#1589)

* Testing calling version

* add release matrix

Upstream/release validations (pytorch#1590)

* Testing calling version

* add release matrix

* test

test (pytorch#1591)

test (pytorch#1592)

Release v1 (pytorch#1595)

* test

* test

Release v1 (pytorch#1596)

* test

* test

* test

test (pytorch#1597)

Test versions validations (pytorch#1598)

* test

* basedir

Test versions validations (pytorch#1599)

* test

* basedir

* test

test (pytorch#1600)

* test

* test

Add release versions everywhere (pytorch#1601)

* test

* test

* test

* test

test (pytorch#1602)

Test version validations (pytorch#1603)

* test

* test

Test version validations (pytorch#1604)

* test

* test

* test

tests (pytorch#1605)

More tests nov16 (pytorch#1606)

* tests

* test

More tests nov16 (pytorch#1607)

* tests

* test

* test

More tests nov16 (pytorch#1608)

* tests

* test

* test

* test

More tests nov16 (pytorch#1609)

* tests

* test

* test

* test

* test

* fix_lint

* fix: typo (pytorch#1581)

* desired_cuda -> DESIRED_CUDA (pytorch#1612)

* desired_cuda -> DESIRED_CUDA

Found with shellcheck

* Update manywheel/build_cuda.sh

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>

---------

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>

* [BE] Cleanup build unused code (pytorch#1613)

1. Upload Scripts are not used anymore. We use Github Action upload workflows
2. M1 Builds are now automated
3. build_all.bat run git grep in pytorch and builder - No result

* Changes to pypi release promotion scripts introduced for 2.1.0 and 2.1.1 (pytorch#1614)

* Changes topypi release promotion scripts introduced during 2.1.1

* typo

* Pin miniconda version for Windows

To Miniconda3-py311_23.9.0-0-Windows-x86_64.exe

* Fix poetry and pypi validations when version is specified (pytorch#1622)

* test (pytorch#1617)

Fix validations (pytorch#1618)

* test

* poetry_fix

* test

Fix validations (pytorch#1619)

* test

* poetry_fix

* test

* test

* restrict

* Validate pypi build only for release (pytorch#1623)

* Validate pypi build only for release (pytorch#1624)

* [Manywheel] Do not hardcode triton version

* [Manywheel][BE] Dedup Triton requirement spec

* [Manywheel] Restrict `pytorch-triton` to x86-64 Linux

Partially addresses pytorch/pytorch#114042

* Tweak py312 conda requirements

* Build PyTorch without TLS for 3.12

Because GLOO still expect OpenSSL-1, but 3.12 is build with OpenSSL-3

* [conda] Skip sympy for 3.12

As at the moment it is only available for Windows %)

* [conda] Do not depend on triton for 3.12 yet

* Tweak mkl requirements for win+py312

* Add aarch64 conda env lib to LD_LIBRARY_PATH (pytorch#1628)

After the change on pytorch#1586, nightly aarch64 wheel fails to find `libopenblas.so` which is now installed under `/opt/conda/envs/aarch64_env/lib/` instead of the base conda `/opt/conda/lib`.  Using CPU nightly wheels on aarch64 from Nov 16 then ends up with the error as described in pytorch/pytorch#114862: `Calling torch.geqrf on a CPU tensor requires compiling PyTorch with LAPACK. Please use PyTorch built with LAPACK support`.  The error can be found on night build log https://github.com/pytorch/pytorch/actions/runs/6887666324/job/18735230109#step:15:4933

Fixes pytorch/pytorch#114862

I double check `2.1.[0-1]` and the current RC for 2.1.2, the issue is not there because pytorch#1586 only change builder main, thus impacting nightly.

### Testing

Build nightly wheel manually on aarch64 runner and confirm that openblas is detected correctly:

```
-- Found a library with BLAS API (open). Full path: (/opt/conda/envs/aarch64_env/lib/libopenblas.so)
...
--   USE_BLAS              : 1
--     BLAS                : open
--     BLAS_HAS_SBGEMM     :
--   USE_LAPACK            : 1
--     LAPACK              : open
...
```

* Revert "[conda] Skip sympy for 3.12"

This reverts commit 88457a1.
As sympy has been updated to 1.12 and it now supports Python-3.12

* [aarch64] ACL, OpenBLAS and mkldnn updates for PyTorch 2.2 (pytorch#1627)

Note# ~~This PR has a dependency on updating the oneDNN version to v3.3 (via ideep submodule to v3.3)~~
ideep submodule update is done, so, this PR can be merged anytime now.

This PR is for:
ACL - build with fixed format kernels 
OpenBLAS - upgrade the version to 0.3.25
numpy - upgrade version to 1.26.2
and mkldnn - cleanup the patches that are already upstreamed.

* Validation scripts, install using version (pytorch#1633)

* Test Windows static lib (pytorch#1465)

Add support for testing Windows Cuda static lib

* Pin windows intel-openmp to 2023.2.0 (pytorch#1635) (pytorch#1636)

* Torch compile test for python 3.8-3.11 linux only (pytorch#1629)

This should fix failure on with Python 3.12 validations:
https://github.com/pytorch/builder/actions/runs/7064433251/job/19232483984#step:11:4859

* [aarch64] cleanup mkldnn patching (pytorch#1630)

pytorch is moved to oneDNN v3.3.2 and some of the
 old patches are not applicable any more.

* Add `aarch64_linux` to the list of linted files

* Actually fix lint this type

* Extend test_linalg from smoke_test.py

To take device as an argument and run tests on both cpu and cuda

* Run smoke_test_linalg during check_binary

This is a regression test for pytorch/pytorch#114862

* Fix linalg testing

* [BE] Add CI for check_binary.sh changes (pytorch#1637)

Make sure latest nightly passes the testing for:
 - Linux Wheel CPU
 - Linux Wheel CUDA

Tweak script a bit to work correctly with relative path to executable

* Keep nightly 20231010 for ExecuTorch alpha 0.1 for now (pytorch#1642)

* [Validations] do conda update before starting validations (pytorch#1643)

* [Validations] Validate aarch64 if all is slected (pytorch#1644)

* Fix validation workflow on aarch64 with conda 23.11.0 and GLIBC_2.25 (pytorch#1645)

* Debug aarch64 clone

* Debug

* Fix validation workflow with conda 23.11.0 and GLIBC_2.25

* Gate the change on linux-aarch64 and keep the old LD_LIBRARY_PATH

* Try to unset LD_LIBRARY_PATH in the workflow instead

* Fix copy/paste typo

* Do not hardcode triton version in builder code (pytorch#1646)

* Do not hardcode triton version in builder code

* Minor tweak to use pytorch_rootdir

* [Lint] Prohibit tabs in shell scripts

Fix current violations

* Link conda packages with cusparselt

Fixes pytorch/pytorch#115085

* aarch64: patch mkl-dnn for xbyak crashes due to /sys not accessible (pytorch#1648)

There are platforms with /sys not mounted. skip handling HW caps
for such platforms.

cherry-pick of: oneapi-src/oneDNN#1773
This fixes the issue# pytorch/pytorch#115482

* Update builder images to ROCm6.0 (pytorch#1647)

* Update ROCm versions for docker images

* Don't build MIOpen from source for ROCm6.0

* Temporarily use magma fork with ROCm6.0 patch

* Update ROCm versions for docker images

* Add gfx942

* Update MIOpen repo

* Magma PR 42 is merged, so use upstream repo master branch now

* gfx942 target only fully supported for ROCm6.0 and above

* Avoid finding out std::basic_string_view (pytorch#1528)

As pytorch moving to C++17, the binary can contain both "std::basic_string_view" and "std::__cxx11::basic_string<", change the pattern to avoid finding out std::basic_string_view, causing false positives.

* Add test ops validation for validation workflows (pytorch#1650)

* Add test ops validation

* include workflows

* Add test ops validation for validation workflows (pytorch#1651)

* Add test ops validation for validation workflows (pytorch#1652)

* Add test ops validation for validation workflows (pytorch#1653)

* Add test ops validation for validation workflows (pytorch#1654)

* Add test ops validation for validation workflows (pytorch#1655)

* [validations] Add missing required packages (pytorch#1656)

* [validations] Perform test_ops only on CUDA binaries (pytorch#1657)

* [validations] Adjust timeout for linux jobs (pytorch#1658)

* [validations] Restrict testing for python 3.8-3.11 (pytorch#1659)

* [validations] Fix use case if INCLUDE_TEST_OPS is not set (pytorch#1660)

* Add unit tests and one line reproducers to detect bad pytorch cuda wheels (pytorch#1663)

* Add one line reproducers and unit tests that would fail when bad wheels
were generated by the compiler(s).
nextafter reproducer thanks to @malfet!

* cosmetic fixes

* fix comments

* Fix quotation issues when migrating from python file to one line format (pytorch#1664)

Sorry, looks like the last line had an issue while porting it from multi-line python file to one-line.

Side question: when does this file get used? Is it only used during release binary generation/testing?

* Add nccl version print for cuda related smoke test (pytorch#1667)

* Apply nccl test to linux only (pytorch#1669)

* Build nccl after installing cuda (pytorch#1670)

Fix: pytorch/pytorch#116977

Nccl 2.19.3 don't exist for cuda 11.8 and cuda 12.1. Refer to https://docs.nvidia.com/deeplearning/nccl/release-notes/rel_2-19-3.html#rel_2-19-3 CUDA 12.0, 12.2, 12.3 are supported.

Hence we do manual build. Follow this build process:
https://github.com/NVIDIA/nccl/tree/v2.19.3-1?tab=readme-ov-file#build

We want nccl version be exactly the same as installed here:
https://github.com/pytorch/pytorch/blob/main/.github/scripts/generate_binary_build_matrix.py#L45

* Update cusparselt to v0.5.2 (pytorch#1672)

This PR adds in support for cuSPARSELt v0.5.2 and updates the cuda 12.1 build step to use it instead of 0.4.0

Also fixes a typo when deleting the cusparselt folder after installing.

* Run test ops tests from outside of pytorch root folder (pytorch#1676)

* Remove s3 update html job and scripts (pytorch#1677)

* [BE] Remove unused nightly_defaults.bat (pytorch#1678)

* [Conda] Mark `blas * mkl` as x86 only dependency

* [Conda] Download arch appropriate Miniconda

By using `$(uname -m)` as suffix, which is arm64 on Apple Silicon and
x86 on Intel Macs

* [Conda] Do not depend on llvmdev-9 on ARM

As earliest available for the platform is llvmdev-11

* [Conda] Set correct developer dir for MacOS runners

* [Conda] Add llvm-openmp dependency for ARM64

PyTorch for M1 is finally built with OpenMP, so it needs to depend on it

* Use dynamic MKL on Windows (pytorch#1467)

Use dynamic MKL on Windows and updated MKL to 2021.4.0
On conda python 3.12 use mkl 2023.1

* Add torchrec to promote s3 script (pytorch#1680)

* Add torchrec to promote s3 script

* Add torchrec version to release_version.sh

* Revert "Dynamic MKL windows" (pytorch#1682)

* Revert "Revert "Dynamic MKL windows"" (pytorch#1683)

* Add numpy install to windows conda tests (pytorch#1684)

* Windows conda test. Install numpy in conda testenv (pytorch#1685)

* Add fbgemm to promote s3 script (pytorch#1681)

* Release 2.2.0 pypi prep script modifications (pytorch#1686)

* [Analytics] add pypi staging validations, remove circleci script (pytorch#1688)

* [Analytics] Pypi validations. Add call to check-wheel-contents (pytorch#1689)

* Modify Validate Nightly PyPI Wheel Binary Size to pick correct binary (pytorch#1690)

* Fix test_ops scripts on release validation testing (pytorch#1691)

* Add option to validate only from download.pytorch.org (pytorch#1692)

* Exclude pipy and poetry tests when USE_ONLY_DL_PYTORCH_ORG is set (pytorch#1693)

* [ROCm] add hipblaslt library files (pytorch#1695)

With pytorch/pytorch#114329 merged, we need to include hipblaslt library files within the ROCm wheel.

* Minor tweak to fbgemmgpu version to ignore RC suffix (pytorch#1694)

* Remove custom PyTorch build dependency logic on 3.11 (pytorch#1697)

* Remove custom PyTorch build dependency logic on 3.11

* Add a smoke test for openmp

* Pin conda-build to 3.28.4 (pytorch#1698)

* ci: aarch64 linux: fix torch performance issue with conda openblas package (pytorch#1696)

changing the conda openblas package from pthread version
to openmp version to match torch openmp runtime. The pthread
version was conflicting with the openmp runtime and causing
thread over-subscription and performance degradation.

* Add triton version for nightly and release (pytorch#1703)

* Bundle PTXAS into 11.8 wheel

* Add tensorrt promo script, bump release version for 2.2.1 (pytorch#1706)

* Pin Conda to 23.11.0

---------

Co-authored-by: Andrey Talman <atalman@fb.com>
Co-authored-by: Mike Schneider <104035434+xncqr@users.noreply.github.com>
Co-authored-by: Nikita Shulga <nshulga@meta.com>
Co-authored-by: ptrblck <ptrblck@users.noreply.github.com>
Co-authored-by: JYX <jyx21@mails.tsinghua.edu.cn>
Co-authored-by: Omkar Salpekar <osalpekar@gmail.com>
Co-authored-by: snadampal <87143774+snadampal@users.noreply.github.com>
Co-authored-by: Danylo Baibak <baibak@meta.com>
Co-authored-by: Supadchaya <138070207+spcyppt@users.noreply.github.com>
Co-authored-by: Jeff Daily <jeff.daily@amd.com>
Co-authored-by: cyy <cyyever@outlook.com>
Co-authored-by: Matt Davis <matteius@gmail.com>
Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
Co-authored-by: Huy Do <huydhn@gmail.com>
Co-authored-by: albanD <desmaison.alban@gmail.com>
Co-authored-by: Luo Bo <84075753+0x804d8000@users.noreply.github.com>
Co-authored-by: Sergii Dymchenko <kit1980@gmail.com>
Co-authored-by: Ionuț Manța <ionut@janeasystems.com>
Co-authored-by: Wei Wang <143543872+nWEIdia@users.noreply.github.com>
Co-authored-by: Jesse Cai <jessecai@fb.com>
Co-authored-by: henrylhtsang <91030427+henrylhtsang@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: arm Related to ARM architectures builds of PyTorch. Includes Apple M1 module: binaries Anything related to official binaries that we release to users triage review triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

9 participants