Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic math operations produce a "floating point exception" #89817

Open
timothygebhard opened this issue Nov 29, 2022 · 26 comments
Open

Basic math operations produce a "floating point exception" #89817

timothygebhard opened this issue Nov 29, 2022 · 26 comments
Assignees
Labels
module: cpu CPU specific problem (e.g., perf, algorithm) module: crash Problem manifests as a hard crash, as opposed to a RuntimeError needs reproduction Someone else needs to try reproducing the issue given the instructions. No action needed from user triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@timothygebhard
Copy link

timothygebhard commented Nov 29, 2022

🐛 Describe the bug

When I try to run the following simple piece of code:

import numpy as np
import torch

np.random.seed(42)

x = torch.from_numpy(np.random.rand(100)).float()
print(x)

exp_x = torch.exp(x)
print(exp_x)

I get a floating point exception that kills my Python interpreter:

(venv) [tgebhard@g108] ~ % python test.py                                                                                                                                                                   
tensor([0.3745, 0.9507, 0.7320, 0.5987, 0.1560, 0.1560, 0.0581, 0.8662, 0.6011,
        0.7081, 0.0206, 0.9699, 0.8324, 0.2123, 0.1818, 0.1834, 0.3042, 0.5248,
        0.4319, 0.2912, 0.6119, 0.1395, 0.2921, 0.3664, 0.4561, 0.7852, 0.1997,
        0.5142, 0.5924, 0.0465, 0.6075, 0.1705, 0.0651, 0.9489, 0.9656, 0.8084,
        0.3046, 0.0977, 0.6842, 0.4402, 0.1220, 0.4952, 0.0344, 0.9093, 0.2588,
        0.6625, 0.3117, 0.5201, 0.5467, 0.1849, 0.9696, 0.7751, 0.9395, 0.8948,
        0.5979, 0.9219, 0.0885, 0.1960, 0.0452, 0.3253, 0.3887, 0.2713, 0.8287,
        0.3568, 0.2809, 0.5427, 0.1409, 0.8022, 0.0746, 0.9869, 0.7722, 0.1987,
        0.0055, 0.8155, 0.7069, 0.7290, 0.7713, 0.0740, 0.3585, 0.1159, 0.8631,
        0.6233, 0.3309, 0.0636, 0.3110, 0.3252, 0.7296, 0.6376, 0.8872, 0.4722,
        0.1196, 0.7132, 0.7608, 0.5613, 0.7710, 0.4938, 0.5227, 0.4275, 0.0254,
        0.1079])
zsh: floating point exception  python test.py
(venv) [tgebhard@g108] ~ %

The problem also occurs for other mathematical operations such as torch.log() or torch.cos(). It seems like it only happens if the size of the input tensor is at least 100, though.

Moreover, the issue only occurs on some machines, under some specific circumstances: My local machine will run the code above without any problem, but one of the machines at work reproducibly gives the error above, but only if I request at least 14 CPU cores (it's a batch queue system based on HTCondor). It might, therefore, be the case that only this particular machine has a problem. Any pointers for debugging this are greatly appreciated! 🙂

Versions

Information about the Python environment:

PyTorch version: 1.13.0+cu117
Is debug build: False
CUDA used to build PyTorch: 11.7
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.2 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: version 3.16.3
Libc version: glibc-2.31

Python version: 3.8.10 (default, Jun  2 2021, 10:49:15)  [GCC 9.4.0] (64-bit runtime)
Python platform: Linux-5.4.0-80-generic-x86_64-with-glibc2.29
Is CUDA available: False
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No devices found.
Nvidia driver version: Could not collect
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] numpy==1.23.5
[pip3] pytorch-lightning==1.8.3.post0
[pip3] torch==1.13.0
[pip3] torchmetrics==0.10.3
[conda] Could not collect

Information about the machine where the problem occurs (output of lscpu):

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   43 bits physical, 48 bits virtual
CPU(s):                          256
On-line CPU(s) list:             0,1,9-16,26-31
Off-line CPU(s) list:            2-8,17-25,32-255
Thread(s) per core:              0
Core(s) per socket:              64
Socket(s):                       2
NUMA node(s):                    2
Vendor ID:                       AuthenticAMD
CPU family:                      23
Model:                           49
Model name:                      AMD EPYC 7662 64-Core Processor
Stepping:                        0
Frequency boost:                 enabled
CPU MHz:                         1499.941
CPU max MHz:                     2000.0000
CPU min MHz:                     1500.0000
BogoMIPS:                        3999.98
Virtualization:                  AMD-V
L1d cache:                       2 MiB
L1i cache:                       2 MiB
L2 cache:                        32 MiB
L3 cache:                        256 MiB
NUMA node0 CPU(s):               0-63,128-191
NUMA node1 CPU(s):               64-127,192-255
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Full AMD retpoline, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpui
                                 d extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dn
                                 owprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate sme ssbd mba sev ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2
                                  cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr wbnoinvd arat npt lbrv svm_lock n
                                 rip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif umip rdpid overflow_recov succor smca

cc @VitalyFedyunin @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10

@jgong5
Copy link
Collaborator

jgong5 commented Nov 29, 2022

@timothygebhard Seems we are not able to repro it locally on Intel CPU. To further narrow down the issue, are you able to try sanitizer build and run your test with sanitizer?

option(USE_ASAN "Use Address+Undefined Sanitizers" OFF)

@albanD albanD added needs reproduction Someone else needs to try reproducing the issue given the instructions. No action needed from user module: crash Problem manifests as a hard crash, as opposed to a RuntimeError module: cpu CPU specific problem (e.g., perf, algorithm) triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Nov 29, 2022
@timothygebhard
Copy link
Author

@jgong5 I tried all day to build PyTorch with ASAN as described here (the guide might need an update, BTW), but I did not succeed. Ultimately, it always crashes with some clang: error: linker command failed with exit code 1, and the undefined messages that it keeps complaining about seem related to OpenMP (e.g., /usr/bin/ld: lib/libtorch_cpu.so: undefined reference to omp_in_parallel'), even though the build_with_asan() command from the guide above sets USE_OPENMP=0 🤷‍♂️

When talking to our cluster admin at work, he suspected that the problem might be related to MKL, possibly in combination with an AMD CPU? In that case, I guess I should try to make sure to compile the sanitizer build with the same MKL version as the PyTorch version that's being shipped via pip?

@zadaianchuk
Copy link

Similar issue seems to occur more than a year ago:
#66247 (comment)

Are there any plans to update MKL version that potentially doesn't have this bug in combination with an AMD CPU?

@timothygebhard
Copy link
Author

Running echo 'run' | gdb --args python test.py seems to confirm that it's indeed an MKL issue:

Thread 1 "python" received signal SIGFPE, Arithmetic exception.
0x000015551042ec47 in mkl_vml_serv_GetMinN () from /lustre/home/tgebhard/.virtualenvs/venv/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so

@piraka9011
Copy link

I have also seen the same issue in #66247 and had to resort to building my own PyTorch container with USE_MKL=OFF in order to resolve this issue.
My stack trace was also similar to what was reported there.

This does not happen on Intel CPUs and occurs on specific AMD CPUs.
For ex. it works on my local AMD Ryzen 7 3700X.
But when I deploy it to a cluster with Epyc Rome or Threadripper CPUs, I get a SIGFPE.

@jgong5
Copy link
Collaborator

jgong5 commented Dec 16, 2022

@timothygebhard You mentioned that it is related to the MKL version that pytorch release package is built with. May I know what particular MKL version it is? Have you tried with a newer version of MKL?

@mseitzer
Copy link
Contributor

This issue happens for me with pre-built Pytorch packages on versions 1.11, 1.12, 1.13, 2.0 (nightly). I have not tried earlier versions.

The relevant line from torch.__config__.show() (the same for all tested Pytorch versions):

  - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications

According to #66247 (comment), the issue disappears with MKL 2021.4, although I have not tried this.

It would be great to upgrade the MKL standard Pytorch is built with, as this essentially renders Pytorch unusable on machines with the affected CPUs.

@timothygebhard
Copy link
Author

@jgong5 The MKL version is the one mentioned by @mseitzer. I haven't tried to build PyTorch myself using a newer version of MKL yet; however, I seem to recall that when I did build it without MKL (using OpenBLAS instead), the error was gone.

@jingxu10
Copy link
Collaborator

jingxu10 commented Dec 16, 2022

I cannot reproduce this issue on neither AWS M5a nor M6a 4xlarge EC2 instances with the nightly built
pip3 install --pre torch --extra-index-url https://download.pytorch.org/whl/nightly/cpu
Would you let us know which instance type could reproduce this issue?

Alternatively, could you try compiling PyTorch from source? Instructions are available at https://github.com/pytorch/pytorch#from-source
conda install commands should install the latest version of MKL.

@piraka9011
Copy link

Here is the output of lscpu from an example machine of where this happens

Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         43 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  32
  On-line CPU(s) list:   0,6,11-15,22,27-31
  Off-line CPU(s) list:  1-5,7-10,16-21,23-26
Vendor ID:               AuthenticAMD
  Model name:            AMD EPYC 7F52 16-Core Processor
    CPU family:          23
    Model:               49
    Thread(s) per core:  2
    Core(s) per socket:  16
    Socket(s):           1
    Stepping:            0
    Frequency boost:     enabled
    CPU max MHz:         3500.0000
    CPU min MHz:         2500.0000
    BogoMIPS:            6999.96
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc 
                         cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3d
                         nowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cq
                         m rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm
                         _lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es
Virtualization features: 
  Virtualization:        AMD-V
Caches (sum of all):     
  L1d:                   512 KiB (16 instances)
  L1i:                   512 KiB (16 instances)
  L2:                    8 MiB (16 instances)
  L3:                    256 MiB (16 instances)
NUMA:                    
  NUMA node(s):          1
  NUMA node0 CPU(s):     0-31
Vulnerabilities:         
  Itlb multihit:         Not affected
  L1tf:                  Not affected
  Mds:                   Not affected
  Meltdown:              Not affected
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl and seccomp
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:            Mitigation; LFENCE, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling
  Srbds:                 Not affected
  Tsx async abort:       Not affected

I don't think this kind of machine explicitly is available on AWS, but it is on coreweave.com for example.

@jingxu10
Copy link
Collaborator

could you try compiling PyTorch from source? Instructions are available at https://github.com/pytorch/pytorch#from-source
conda install commands should install the latest version of MKL. If the latest MKL solves this issue, compiling from source by yourself should work. Could you have a try on your environment?

@soumith
Copy link
Member

soumith commented Dec 17, 2022

This issue happens for me with pre-built Pytorch packages on versions 1.11, 1.12, 1.13, 2.0 (nightly)

@malfet any reason we are holding back on MKL 2020 for these packages, instead of just upgrading to the latest MKL?
You touched it last in 2020 haha, but these are the relevant lines: https://github.com/pytorch/builder/blob/main/common/install_mkl.sh#L6-L13
They build our docker image that builds the wheels, and the docker image in-turn statically links to this MKL. Upgrading this and rebuilding the docker image would likely solve this.

@malfet
Copy link
Contributor

malfet commented Dec 17, 2022

@malfet any reason we are holding back on MKL 2020 for these packages, instead of just upgrading to the latest MKL?

MKL 2020 is the last one that allows AMD CPU users to opt into AVX2 accelerated kernels using environment variable. Later ones require injecting a symbol in global namespace.

Here is an old issue (dating back to 2020) that discusses perf problems associated with updating to newer MKL: pytorch/builder#504

But stability is more important than perf, so imo we should update to the one that does not crash.

@malfet malfet self-assigned this Dec 17, 2022
@malfet
Copy link
Contributor

malfet commented Dec 17, 2022

Here is the output of lscpu from an example machine of where this happens

Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         43 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  32
  On-line CPU(s) list:   0,6,11-15,22,27-31
  Off-line CPU(s) list:  1-5,7-10,16-21,23-26
Vendor ID:               AuthenticAMD
  Model name:            AMD EPYC 7F52 16-Core Processor
    CPU family:          23
    Model:               49
    Thread(s) per core:  2
    Core(s) per socket:  16

I can not repro it on AWS C5a.4xlarge, which has the same CPU family and model:

$ lscpu 
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   48 bits physical, 48 bits virtual
CPU(s):                          16
On-line CPU(s) list:             0-15
Thread(s) per core:              2
Core(s) per socket:              8
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       AuthenticAMD
CPU family:                      23
Model:                           49
Model name:                      AMD EPYC 7R32

@piraka9011 what is the type of instance you've allocated on coreweave.com? (As I could not repro it on 16 vCPU EPYC Rome instance, i.e. running AMD EPYC 7402P 24-Core Processor )

@piraka9011
Copy link

@jingxu10 Ive been compiling from source with the USE_MKL=OFF flag.
I will try compiling with the flag removed and the latest mkl installed via Conda and report back.

@malfet so I don't think there's an easy way to get the same exact config on CoreWeave using their standard virtual server offering, but I will try to find a config to repro the instance exactly.

@malfet
Copy link
Contributor

malfet commented Dec 18, 2022

@piraka9011 I'm going to switch to 2022.2.1 in the nightlies and will run CPU performance tests afterwards (on Intel CPUs it showed moderate perf gains, and smoke tests I've run on AMD ones showed they they are no worse, but much better if LD_PRELOAD trick is used)

malfet added a commit to pytorch/builder that referenced this issue Dec 18, 2022
As previous one occasionally crashes on AMD CPUs

May be addresses pytorch/pytorch#89817

Please note, that in order to get maximum perf on AMD CPUs one needs to compile and LD_PRELOAD following library:
```
int mkl_serv_intel_cpu_true() {
	return 1;
}
```
@timothygebhard
Copy link
Author

I can confirm that the problem is still present in the current nightly built (2.0.0.dev20221216+cpu, which still uses MKL Version 2020.0.0 Product Build 20191122).

I also finally got around to building PyTorch from source (with conda)1, which gave me version 2.0.0a0+git212873c, including a newer version of MKL:

Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications

Unfortunately, even with this version, the SIGFPE issue is still present on my machine.

Footnotes

  1. As a side node, building PyTorch from source took well over 24 hours for me (on a machine with 32 cores), so even if that had fixed the issue, I am not sure how feasible that would be for most people who just want to use PyTorch?

@jingxu10
Copy link
Collaborator

@timothygebhard The latest MKL is 2022.2.1, but the one you got was 2021.4. Since @malfet had enabled the 2022.2.1 into nightly builds, you can probably try the ones later than 20221218? @malfet Any idea which one has 2022.2.1 enabled?

@malfet
Copy link
Contributor

malfet commented Dec 18, 2022

Unfortunately, even with this version, the SIGFPE issue is still present on my machine.

@timothygebhard would you be comfortable attaching gdb/lldb to the process running pytorch and sharing exact instruction/FPU registers when exception occurs? (I can provide all the instruction on how to do it if needed)

  1. As a side node, building PyTorch from source took well over 24 hours for me (on a machine with 32 cores), so even if that had fixed the issue, I am not sure how feasible that would be for most people who just want to use PyTorch?

Hmm, this shouldn't be the case, even if you build with CUDA support for all possible architectures. I.e. CPU build of PyTorch takes less than 30 min on my rather aniquated laptop. Have you installed ninja? If not, have you passed MAX_JOBS parameter to ensure build is running in parallel rather than sequetnially?

@timothygebhard
Copy link
Author

@jingxu10 I just installed the 2.0.0.dev20221218+cpu nightly, which indeed was built with the 2022.2 version of MKL. I am happy to report that with this version, the problem seems resolved — my minimal examples run without causing a floating point exception 🙂

@malfet Sure, I can give it a try! I just ran gdb -ex r --args python test.py for a PyTorch version using MKL 2020, and it resulted in:

Thread 1 "python" received signal SIGFPE, Arithmetic exception.
0x0000155488495c47 in mkl_vml_serv_GetMinN () from /lustre/home/tgebhard/.virtualenvs/venv/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so

Is this helpful / what you were looking for, or what else should I run?

Regarding the long build time: I did install ninja via conda (as described in the "Install dependencies" command here), but I did not explicitly set MAX_JOBS. I can try again if you think that was the issue?

@malfet
Copy link
Contributor

malfet commented Dec 19, 2022

@timothygebhard can you run following commands x/4i $rip and info all-registers and share the output here.

As for build time, ninja should auto-detect the number of CPU cores and do parallelization automatically. Do you mind sharing .ninja_log file from build folder, that would give me an idea about parallelism/built times you've observed
I believe 16 core builders are used for nightly builds of PyTorch and it finishes in under 4 hours for all GPU architectures. If one builds for themselves it's not necessary to build for all arches, and this way builds are much faster.

@timothygebhard
Copy link
Author

Here is the output from the those two commands (I hope I understood correctly how to use them):

(gdb) x/4i $rip
=> 0x15548827bc47 <mkl_vml_serv_GetMinN+135>:   idiv   %esi
   0x15548827bc49 <mkl_vml_serv_GetMinN+137>:   cvtsi2sd %eax,%xmm5
   0x15548827bc4d <mkl_vml_serv_GetMinN+141>:   mov    %ecx,%eax
   0x15548827bc4f <mkl_vml_serv_GetMinN+143>:   pxor   %xmm0,%xmm0
(gdb) info all-registers
rax            0x3b                59
rbx            0x7ffffffdc30c      140737488208652
rcx            0x3b                59
rdx            0x0                 0
rsi            0x0                 0
rdi            0x7ffffffdc30c      140737488208652
rbp            0x7ffffffdc360      0x7ffffffdc360
rsp            0x7ffffffdc2b0      0x7ffffffdc2b0
r8             0x0                 0
r9             0x15549ace7d20      23453118659872
r10            0x501ca5f           84003423
r11            0x56                86
r12            0x501ca00           84003328
r13            0x3e8               1000
r14            0x7fffffffc430      140737488340016
r15            0x3e8               1000
rip            0x15548827bc47      0x15548827bc47 <mkl_vml_serv_GetMinN+135>
eflags         0x10206             [ PF IF RF ]
cs             0x33                51
ss             0x2b                43
ds             0x0                 0
es             0x0                 0
fs             0x0                 0
gs             0x0                 0
st0            1.490116119384765625e-08 (raw 0x3fe58000000000000000)
st1            0                   (raw 0x00000000000000000000)
st2            0                   (raw 0x00000000000000000000)
st3            1.00000000000000000005e-31 (raw 0x3f9881ceb32c4b43fcf5)
st4            -31                 (raw 0xc003f800000000000000)
st5            10000000000000000   (raw 0x40348e1bc9bf04000000)
st6            nan(0xc000000000000000) (raw 0x7fffc000000000000000)
st7            4                   (raw 0x40018000000000000000)
fctrl          0x37f               895
fstat          0x20                32
ftag           0xffff              65535
fiseg          0x0                 0
fioff          0x0                 0
foseg          0x0                 0
fooff          0x0                 0
fop            0x0                 0
mxcsr          0x1fa2              [ DE PE IM DM ZM OM UM PM ]
ymm0           {v16_bfloat16 = {0x1440, 0x1e, 0x0 <repeats 14 times>}, v8_float = {0x1e, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0xa2, 0x45, 0xf3, 0x41, 0x0 <repeats 28 times>}, v16_int16 = {0x45a2, 0x41f3, 0x0 <repeats 14 times>}, v8_int32 = {0x41f345a2, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x41f345a2, 0x0, 0x0, 0x0}, v2_int128 = {0x41f345a2, 0x0}}
ymm1           {v16_bfloat16 = {0x0 <repeats 16 times>}, v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0x0 <repeats 32 times>}, v16_int16 = {0x0 <repeats 16 times>}, v8_int32 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x0, 0x0, 0x0, 0x0}, v2_int128 = {0x0, 0x0}}
ymm2           {v16_bfloat16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v8_float = {0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x7fffffffffffffff, 0x7fffffffffffffff, 0x0, 0x0}, v32_int8 = {0xff <repeats 16 times>, 0x0 <repeats 16 times>}, v16_int16 = {0xffff, 0xffff, 0xffff, 0xffff, 0xffff, 0xffff, 0xffff, 0xffff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v8_int32 = {0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0xffffffffffffffff, 0xffffffffffffffff, 0x0, 0x0}, v2_int128 = {0xffffffffffffffffffffffffffffffff, 0x0}}
ymm3           {v16_bfloat16 = {0x0, 0x0, 0x0, 0x0 <repeats 13 times>}, v8_float = {0xca000000, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0xa0, 0xbc, 0x4d, 0x55, 0x55, 0x15, 0x0 <repeats 26 times>}, v16_int16 = {0xbca0, 0x554d, 0x1555, 0x0 <repeats 13 times>}, v8_int32 = {0x554dbca0, 0x1555, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x1555554dbca0, 0x0, 0x0, 0x0}, v2_int128 = {0x1555554dbca0, 0x0}}
ymm4           {v16_bfloat16 = {0x0 <repeats 16 times>}, v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0x0 <repeats 32 times>}, v16_int16 = {0x0 <repeats 16 times>}, v8_int32 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x0, 0x0, 0x0, 0x0}, v2_int128 = {0x0, 0x0}}
ymm5           {v16_bfloat16 = {0x0 <repeats 16 times>}, v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0x0 <repeats 32 times>}, v16_int16 = {0x0 <repeats 16 times>}, v8_int32 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x0, 0x0, 0x0, 0x0}, v2_int128 = {0x0, 0x0}}
ymm6           {v16_bfloat16 = {0x1440, 0x1e, 0x0 <repeats 14 times>}, v8_float = {0x1e, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0xa2, 0x45, 0xf3, 0x41, 0x0 <repeats 28 times>}, v16_int16 = {0x45a2, 0x41f3, 0x0 <repeats 14 times>}, v8_int32 = {0x41f345a2, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x41f345a2, 0x0, 0x0, 0x0}, v2_int128 = {0x41f345a2, 0x0}}
ymm7           {v16_bfloat16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x30, 0x2e, 0x37, 0x39, 0x39, 0x34, 0x2c, 0x20, 0x0 <repeats 16 times>}, v16_int16 = {0x2020, 0x2020, 0x2020, 0x2020, 0x2e30, 0x3937, 0x3439, 0x202c, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v8_int32 = {0x20202020, 0x20202020, 0x39372e30, 0x202c3439, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x2020202020202020, 0x202c343939372e30, 0x0, 0x0}, v2_int128 = {0x202c343939372e302020202020202020, 0x0}}
ymm8           {v16_bfloat16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0x30, 0x2e, 0x36, 0x38, 0x38, 0x34, 0x2c, 0x20, 0x30, 0x2e, 0x34, 0x33, 0x34, 0x38, 0x2c, 0x20, 0x0 <repeats 16 times>}, v16_int16 = {0x2e30, 0x3836, 0x3438, 0x202c, 0x2e30, 0x3334, 0x3834, 0x202c, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v8_int32 = {0x38362e30, 0x202c3438, 0x33342e30, 0x202c3834, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x202c343838362e30, 0x202c383433342e30, 0x0, 0x0}, v2_int128 = {0x202c383433342e30202c343838362e30, 0x0}}
ymm9           {v16_bfloat16 = {0x0, 0x0, 0x2, 0x4, 0x0 <repeats 12 times>}, v8_float = {0x0, 0x4, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x3e8, 0x0, 0x0, 0x0}, v32_int8 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x40, 0x8f, 0x40, 0x0 <repeats 24 times>}, v16_int16 = {0x0, 0x0, 0x4000, 0x408f, 0x0 <repeats 12 times>}, v8_int32 = {0x0, 0x408f4000, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x408f400000000000, 0x0, 0x0, 0x0}, v2_int128 = {0x408f400000000000, 0x0}}
ymm10          {v16_bfloat16 = {0x0, 0x0, 0x0, 0x1, 0x0 <repeats 12 times>}, v8_float = {0x0, 0x1, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0x0, 0x0, 0x0, 0x0, 0xe0, 0x7f, 0xad, 0x3f, 0x0 <repeats 24 times>}, v16_int16 = {0x0, 0x0, 0x7fe0, 0x3fad, 0x0 <repeats 12 times>}, v8_int32 = {0x0, 0x3fad7fe0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x3fad7fe000000000, 0x0, 0x0, 0x0}, v2_int128 = {0x3fad7fe000000000, 0x0}}
ymm11          {v16_bfloat16 = {0x0 <repeats 16 times>}, v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0x0 <repeats 32 times>}, v16_int16 = {0x0 <repeats 16 times>}, v8_int32 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x0, 0x0, 0x0, 0x0}, v2_int128 = {0x0, 0x0}}
ymm12          {v16_bfloat16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0x10, 0xba, 0x90, 0x0, 0x0, 0x0, 0x0, 0x0, 0x10, 0xba, 0x90, 0x0 <repeats 21 times>}, v16_int16 = {0xba10, 0x90, 0x0, 0x0, 0xba10, 0x90, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v8_int32 = {0x90ba10, 0x0, 0x90ba10, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x90ba10, 0x90ba10, 0x0, 0x0}, v2_int128 = {0x90ba10000000000090ba10, 0x0}}
ymm13          {v16_bfloat16 = {0x0, 0x0, 0x0 <repeats 14 times>}, v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0x10, 0xba, 0x90, 0x0 <repeats 29 times>}, v16_int16 = {0xba10, 0x90, 0x0 <repeats 14 times>}, v8_int32 = {0x90ba10, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x90ba10, 0x0, 0x0, 0x0}, v2_int128 = {0x90ba10, 0x0}}
ymm14          {v16_bfloat16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0x10, 0xba, 0x90, 0x0, 0x0, 0x0, 0x0, 0x0, 0x10, 0xba, 0x90, 0x0 <repeats 21 times>}, v16_int16 = {0xba10, 0x90, 0x0, 0x0, 0xba10, 0x90, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v8_int32 = {0x90ba10, 0x0, 0x90ba10, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x90ba10, 0x90ba10, 0x0, 0x0}, v2_int128 = {0x90ba10000000000090ba10, 0x0}}
ymm15          {v16_bfloat16 = {0x0, 0x0, 0x0 <repeats 14 times>}, v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0x10, 0xba, 0x90, 0x0 <repeats 29 times>}, v16_int16 = {0xba10, 0x90, 0x0 <repeats 14 times>}, v8_int32 = {0x90ba10, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x90ba10, 0x0, 0x0, 0x0}, v2_int128 = {0x90ba10, 0x0}}

Regarding the long build time: I realized I made a dumb error on my side — I ran the build on a machine using a shared file system, which was a bad idea. When I re-run the build using local storage, it finishes in about 35 minutes 🙂

@malfet
Copy link
Contributor

malfet commented Dec 19, 2022

(gdb) x/4i $rip
=> 0x15548827bc47 <mkl_vml_serv_GetMinN+135>:   idiv   %esi

This looks like a good old integral division by zero (easily reproducible by the following code: https://godbolt.org/z/rh7d8MdEE ), so I'm confused why python runtime reports it as floating point exception.... (And even more confused what MKL is trying to do here.)

@piraka9011
Copy link

Using the nightly version works for me too.
Also FWIW, sharing how CoreWeave resolved this for nodes/pods running in their k8s cluster.

...applying lxcfs-admission-webhook: disabled has done the trick here for prior clients seeking the same solution.
There is a layer (lxcfs) that exposes /proc/cpuinfo /proc/meminfo /proc/stat etc to look like the limits section of the resources, so applications that read those to determine ie how many number of threads to spin up don’t spin up more than the limit. Seems like something pytorch reads there messes it up sometime. Doesn’t make sense to me, but you also don’t really need that feature when doing most ML tasks so I disabled it for your namespace.

@levmckinney
Copy link

Will there be a patch for torch 1.13 to resolve this? I'm still running into this issue using the 1.13.1-cuda11.6-cudnn8-runtime container.

@malfet
Copy link
Contributor

malfet commented Mar 28, 2023

@levmckinney no, there are usually no back ports to previous releases, but it should be fixed in PyTorch-2.0

jithunnair-amd added a commit to ROCm/builder that referenced this issue Apr 11, 2023
* Make sure package_type is set (pytorch#1139)

* Update check_binary.sh

* Update check_binary.sh

* Modifying smoke test to add more advanced validation as requested (pytorch#1124)

* Modify smoke test matrix

More vision smoke tests

Temporary pointing to my repo for testing

Try 2 use atalman builder

Modify path

Fixing commits

Testing

Testing

Smoke test modifications

Refactor test code

Fix typo

Fixing image read

A little more refactoring

Addressing comments

Testing

* Add same test for windows and macos

* Addressing c omments

* Add manywheel special build for including pypi package (pytorch#1142)

* Add manywheel special build

Testing

Builder change

Testing

Adding manywheel cuda workflow

Simplify

Fix expr

* address comments

* checking for general setting

* Pass correct parameters for macos validations (pytorch#1143)

* Revert "Update check_binary.sh"

This reverts commit 6850bed.

* Revert "Update check_binary.sh"

This reverts commit 051b9d1.

* setup periodic test to run binary verification  pytorch/pytorch#84764: (pytorch#1144)

* add a reusable workflow to run all smoke tests/or smoke tests for a specific os/channel
* add workflows to schedule the periodic smoke tests for nightly and release channels

* Update aarch64 script to latest one (pytorch#1146)

* minor: fix the typo job name for windows binaries validation workflow (pytorch#1147)

* fix the typo in the the job name for the release binaries validation workflow (pytorch#1148)

issue was introduced in pytorch#1144

* Move to rc2 of 3.11 python (pytorch#1149)

Need it to get several convenience functions

* Integrates CUDA pip wheels (pytorch#1136)

* Refactors rpath to externally set var. Adds mechanism to add metadata

* Sets RUNPATH when using cudnn and cublas wheels

* Escapes dollar sign

* Fix rpath for cpu builds

Co-authored-by: atalman <atalman@fb.com>

* Uses RPATH instead of RUNPATH so that user strictly uses pypi libs (pytorch#1150)

* Binary Validation Workflow - Adding check binary script (pytorch#1127)

* Update action.yml

* Update validate-macos-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Update validate-linux-binaries.yml

* Fix check binary for arm64 (pytorch#1155)

* Fix check binary for arm64

* Update check_binary.sh

Co-authored-by: Nikita Shulga <nikita.shulga@gmail.com>

Co-authored-by: Nikita Shulga <nikita.shulga@gmail.com>

* Fix for including nvtx dll and cudart (pytorch#1156)

* Fix for invluding nvtx dll and cudart

* Fix for include nvtx

* Fix spaces

* Back out inclusion of cudart (pytorch#1157)

* Add cuda and date check to smoke test (pytorch#1145)

* shorten binary validation workflow names, so they are more readable in the HUD and GH job view (pytorch#1159)

* Fix anaconda torchaudio smoke test (pytorch#1161)

* Fix anaconda torchaudio smoke test

* Format using ufmt

* Fix whels tests for torchaudio (pytorch#1162)

* Pin condaforge version

Most recent version fails with  invalid cert error when trying to update
python

* Option to run resnet classifier on specific device

* Fix typo

`.test/smoke_test` -> `test/smoke_test`

Noticed when pushed pytorch@3b93537 and no tests were run

* Test resnet classifier on CUDA (pytorch#1163)

* [ROCm] support for rocm5.3 wheel builds (pytorch#1160)

* Updates to support rocm5.3 wheel builds (#6)

* Changes to support ROCm 5.3

* Updated as per comments

* Installing python before magma build

- In ROCm 5.3 libtorch build are failing during magma build due to
  to missing python binary so added install statement

* Move python install to libtorch/Dockerfile (#8)

* Updating the condition for noRCCL build (#9)

* Updating the condition for noRCCL build

* Updated changes as per comments

* Use MIOpen branch for ROCm5.3; Change all conditions to -eq

* Use staging branch of MIOpen for ROCm5.3

* Fix merge conflict

Fix merge conflict

Co-authored-by: Pruthvi Madugundu <pmagundu@amd.com>
Co-authored-by: Pruthvi Madugundu <pruthvigithub@gmail.com>
Co-authored-by: Jithun Nair <37884920+jithunnair-amd@users.noreply.github.com>
Co-authored-by: Jithun Nair <jithun.nair@amd.com>

* Validate python 3.11 (pytorch#1165)

* Validate python 3.11

* Validate linux binaries change

Add options

Import torchvision

Adding python 3.11 install

pass package to check nightly binaries date

Test

test

Add python 3.11 code

testing

Adding python 3.11 test

Add python 3.11 validation

Adding zlib develop install

Install zlib etc..

Adding zlib1g as well

testing

testing

Adding validate windows binary

Trying to workaround

testing

Refacor smoke test

Add import statement

fix datetime call

* Fix stripping dev

* fix import

* Strip pypi-cudnn from the version.py (pytorch#1167)

* Strip pypi-cudnn from the version.py

* small fix

* Regenerates RECORD file to reflect hash changes caused by sed'ing the version suffix (pytorch#1164)

* Add pypi cudnn package to tests (pytorch#1168)

* Add pypi cudnn package to tests

* Fix pypi installation check

* Fix pypi instructions setting

* Update DEVELOPER_DIR in build_pytorch.sh

Not sure why we are still expecting Xcode9 to be present there, update it to the same folder as wheel builds

May be fixes pytorch/pytorch#87637

* Fix to not use sccache if it's not setup properly (pytorch#1171)

* Revert "Fix to not use sccache if it's not setup properly (pytorch#1171)" (pytorch#1172)

This reverts commit 377efea.

* Remove cuda102 and cuda115 docker builds and regenerate manylinux docker (pytorch#1173)

* Rebuild manywheel

* Remove cuda102 and cuda115

* [aarch64] add mkldnn acl backend build support for pytorch cpu libary (pytorch#1104)

* Only push to Docker and Anaconda repo from main (pytorch#1175)

We currently allow push from any branch to go to Docker (and Anaconda) prod. This is a dangerous practice because it allows unfinished works to jump to prod and used by other workflows

* Release 1.13 script changes (pytorch#1177)

* Test ResNet on MPS (pytorch#1176)

After pytorch/pytorch#86954 is fixed, we should be able to test resnet on MPS

* Revert "Test ResNet on MPS (pytorch#1176)" (pytorch#1180)

This reverts commit efa1bc7.

* Add v1.13 versions

* Update CMake to 3.18, needed for C++17 compilation (pytorch#1178)

* release: separate out version suffixes for torch pypi promotion (pytorch#1179)

* Fixup wheel published to PyPI (pytorch#1181)

* Fixup wheel published to PyPI

* Update prep_binary_for_pypi.sh

* Fix folder deletion for pypi prep

Co-authored-by: Andrey Talman <atalman@fb.com>

* Update cmake version to 3.18 for libtorch docker

* Pins cuda runtime to 111.7.99 (pytorch#1182)

* Fixes cuda pypi rpaths and libnvrtc name (pytorch#1183)

* Allow ROCm minor releases to use the same MIOpen branch as the major release (pytorch#1170)

* Allow ROCm minor releases to use the same MIOpen branch as the major release

* correct logic to ensure rocm5.4 doesn't fall in wrong condition

* add 11.8 workflow for docker image build (pytorch#1186)

* Using windows runners from test-infra for validation workflows (pytorch#1188)

* Testing new windows runners

test

Testing

Testing

testing

testing

test

Test

Testing

testing

Testing

Testing

test

Test

test

testing

testing

Test

testing

test

testing

testing

testing

testing

testing

testing

test

test

testing

testing

testing

testing

Test

test

test

testing

testing

testing

testing

testing

testing

testing

testing

testing

Refactor code

* Adding details for the test-infra issue

* Update current CUDA supported matrix

* add magma build for CUDA11.8 (pytorch#1189)

* Test setting job name (pytorch#1191)

* Use official Python-3.11 tag (pytorch#1195)

* remove CUDA 10.2-11.5 builds (pytorch#1194)

* remove CUDA 10.2-11.5 builds

* remove 11.5 and 11.3 builds

* build libtorch and manywheel for 11.8 (pytorch#1190)

* build libtorch and manywheel for 11.8

* Update common/install_magma.sh

* use magma-cuda build-1 by default; remove CUDA 10.2-11.5 builds

Co-authored-by: Andrey Talman <atalman@fb.com>

* [Validation] Pass ref:main to general worker (pytorch#1197)

* Pass ref:main to general worker

* Try to pass reference to workflow

* Pass ref:main to general worker

* Test

* Pass reference as input parameter

* Make new variable not required

* Fix typo

* Add workflow for manywheel cpu-cxx11-abi (pytorch#1198)

* [Validation] Use linux_job for linux workers (pytorch#1199)

* Use linux_job for linux workers

Test

Testing

Test

testing

Tetsing

testing

Change linux binary action

test

Simplify version check

* Fix if statement

* Fix typo

* Fix cuda version check

Fix Audio and Vision version check

Add check binary to libtorch

test

test

testing

testing

testing

Testing

Testing

testing

* Use macos generic workers (pytorch#1201)

* Use macos generic workers

fix workflow

testing

Add arm64 builds

test

Remove validate binary action

* add check binary step

* fix ld_library path

* add package type

* Adding ref to validate binaries (pytorch#1204)

* ROCm5.3 nightly wheels (pytorch#1193)

* Enable ROCm5.3 nightly wheels

* Enable ROCm5.3 docker builds

* Update amdgpu repo url for ROCm5.3

* ROCm5.3 not supported on Ubuntu 18.04

* empty

* Another empty commit

* Try disabling MLIR build to shorten docker build time

* Clean up disk space

* MLIR project changed names from ROCm5.4

* Retrigger CI to get around flaky magma git access error

* One more cmake-3.18.4 update

* Use cmake-3.18 for ROCM builds

* More cmake ROCM tweaks

* cmake-3.18 installation on ROCM (take 3)

* add conda builds for CUDA 11.8 (pytorch#1205)

* Enable nightly CUDA 11.8 builds (pytorch#1206)

* enable nightly builds for CUDA 11.8

* add CUDA 11.8 version to manywheel, remove 11.3 and 11.5

* Windows CUDA 11.8 changes (pytorch#1207)

* Add continue on error to validation jobs (pytorch#1209)

* Add continue on error to validation jobs

* test

* Delete unmaintaned torchvision build scripts (pytorch#1210)

All build logic has long moved to torchvision repo and now is executed
by reusable workflow from https://github.com/pytorch/test-infra/tree/main/.github/workflows

* build_pytorch.sh replace tabs with spaces (pytorch#1211)

* Make PyTorch depend on TorchTrition (pytorch#1213)

Remove me when Triton is properly released elsewhere

* Remove smoke test script that is no longer used (pytorch#1212)

* Another tabs-to-spaces change

`s/\t/\ \ \ \ \ \ \ \ /`

* Disable continue on error (pytorch#1214)

* Add torchtrition dependency for wheels

* Make PyTorchConda depend on Triton (Take 2)

Multi-line environment variables are hard, so lets do it traditional way

* Revert "Add torchtrition dependency for wheels"

This reverts commit 475100b.

* Add TorchTrition dependency for wheels (take 2)

Now tests should be green thanks to pytorch/pytorch#90017

* Add sympy to pytorch linux dependencies

* Mitigate windows nightly build regressions

By pinning conda to 22.9.0

Fixes pytorch/pytorch#90059

* Consolidating validation scripts (pytorch#1219)

* Consolidating validation scripts

* Fix validate script name

* Correct script path

* Correct script path

* test

* testing

* testing

* testing

* testing

* test

* test

* test

* testing

* testc

* test hook

* adding wondows use case

* windows use case

* test

* testing

* Windows fixes

* more fixes

* Add package type

* testing more

* Truncate RECORD instead of delete (pytorch#1215)

* Refactor and fix windows smoke tests (pytorch#1218)

* Fix windows smoke test

* Fix first if statement

* Refactor not to cal install nightly package

* Revert "Refactor not to cal install nightly package"

This reverts commit ac580c8.

* Fix pip install command remove cu102

* Refacor the conda installation

* Add cuda profiler apu to cuda install 11.8 (pytorch#1221)

* Update CUDA upgrade runbook to mention subpackages changes

As per following doc: https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html

* conda: Add CUDA_HOME, cuda binaries to path (pytorch#1224)

* Refactor macos-arm64 into separate group (pytorch#1226)

* Adding libcufft constraint (pytorch#1227)

* Adding libcufft constraint

* Adding rest of the dependencies

* Advance build number in pytorch-cuda (pytorch#1229)

* Make sympy mandatory dependency of PyTorch

Should fix 
https://github.com/pytorch/audio/actions/runs/3684598046/jobs/6234531675

* Revert me later: Fix conda package smoke tests

* Install `sympy` via pip rather than conda

Needs to be reverted as well

* Refactor smoke tests to configure module included in the release (pytorch#1223)

* Changes to prep for pypi script for release 1.13.1 (pytorch#1231)

* PyPi binary validation and size check (pytorch#1230)

* Validate binary size

* Validate binary size linux_job

* evaluate the fix from pytorch#1231

* Add an optional artifact upload, consolidate fixes to `prep_binary_for_pypi.sh`

* Adding new workflow to call from domain libraries to validate on domain libraries such as text (pytorch#1234)

* Testing new workflow

Fix naming

fix input

* Changed comments

* Ad ability to call validate domain library manually (pytorch#1235)

* Adding test for validate dm workflow and fixing dm validation workflow (pytorch#1236)

* Test manywheel packages (pytorch#1239)

Change only docker file

* Bump scripts in release (pytorch#1241)

* release: Strip whitespace from version_with_suffix (pytorch#1242)

* Cuda 11.8 and removal of dev packages (pytorch#1243)

* Adding more OS's to validate domain library workflow (pytorch#1238)

* Adding more OS's to validate domain library workflow

* conda and wheel togeather

* add macos workflows

* fix workflow

* Add target os variable to windows validation (pytorch#1244)

* Update MKL to 2022.1 (pytorch#1245)

As previous one occasionally crashes on AMD CPUs

May be addresses pytorch/pytorch#89817

Please note, that in order to get maximum perf on AMD CPUs one needs to compile and LD_PRELOAD following library:
```
int mkl_serv_intel_cpu_true() {
	return 1;
}
```

* Adds infra to use nvidia dependencies from pypi and cleans up patches (pytorch#1196)

* Installs NCCL from redist, uses system NCCL, and adds pypi RPATH

* Cleans up nvrtc patches and adds it using main script

* Fixes typo

* Adds more dependencies and builds torch with dynamic linking

* NCCL dirs have to be specified. Otherwise picks up different version

* Handles 11.8

* Adds echo message for nccl 2.15

* Remove invalid git option (pytorch#1246)

* Revert "Adds infra to use nvidia dependencies from pypi and cleans up patches (pytorch#1196)" (pytorch#1247)

This reverts commit ee59264.

* Add with_cuda flag (pytorch#1249)

* Add GPU architecture env variables (pytorch#1250)

* Add cuda to jobname for validate domain library (pytorch#1252)

* Remove pylief dependency (pytorch#1255)

* Fix PEP503 for packages with dashes

* Rename `torchtriton` to `pytorch-triton`

Companion change for pytorch/pytorch#91539

* s3_management: Hide specific packages between dates (pytorch#1256)

* s3_management: Pin requirements.txt

Packaging got updated and that's not what we want

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

* s3_management: except ValueError

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

* s3_management: Use the correct format for strptime

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

* s3_management: Bump bad dates to october 17th (pytorch#1257)

* s3_management: hide torchtriton (pytorch#1258)

* s3_management: Add PACKAGE_ALLOW_LIST for indices (pytorch#1259)

* s3_management: Bump bad date end to 12/30 (pytorch#1260)

* Adds infra to use nvidia dependencies from pypi and cleans up patches (pytorch#1248)

* Installs NCCL from redist, uses system NCCL, and adds pypi RPATH

* Cleans up nvrtc patches and adds it using main script

* Fixes typo

* Adds more dependencies and builds torch with dynamic linking

* NCCL dirs have to be specified. Otherwise picks up different version

* Handles 11.8

* Adds echo message for nccl 2.15

* Fixes logic for 11.8 and adds missing names for DEPS_SONAME

* s3_management: Account for underscore packages

pytorch-triton is listed as pytorch_triton

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

* s3_management: simplify allowlist, correct underscores

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

* Fix cuda version in nightly (pytorch#1261)

* Adding py311 validations (pytorch#1262)

* Use MATRIX_* variables instead of redeefining new var each time (pytorch#1265)

* Fix validation domain library (pytorch#1266)

remove ref main

fix workflow

more refactor

* Nightly: do test install with the dependencies better and skip CUDA tests on cpu only box (pytorch#1264)

* Refactor PyTorch wheel and libtorch build scripts for ROCm (pytorch#1232)

* Refactor wheel and libtorch build scripts (#7)

* Update to so patching for ROCm

Wildcard used in grep to grab the actual numbered so file referenced
in patchelf. This allows the removal of specifying the so number in
DEPS_LIST & DEPS_SONAME

This commit also adds the functionality for trimming so names to
build_libtorch.sh from build_common.sh

* Refactor to remove switch statement in build_rocm.sh

This commit refactors build_rocm.sh and brings in a few major updates:
 - No longer required to specify the full .so name (with number) for ROCm libraries
       - The .so versions are copied and the patching code will fix the links to point to this version
 - No longer required to specify paths for ROCm libraries allowing the removal of the large switch
       - Paths are acquired programmatically with find
 - No longer required to specify both the path and filename for the OS specific libraries
       - Programatically extract file name from the path
 - Automatically extract Tensile/Kernels files for the architectures specified in PYTORCH_ROCM_ARCH
   and any non-arch specific files e.g. TensileLibrary.dat

* rocfft/hipfft link to libhiprtc.so in ROCm5.4 (#15)

Co-authored-by: Jack Taylor <108682042+jataylo@users.noreply.github.com>

* add sm_90 to CUDA11.8 builds (pytorch#1263)

* add sm_90 to CUDA11.8 builds

* Manually invoke bash for Miniconda

* Revert "add sm_90 to CUDA11.8 builds (pytorch#1263)" (pytorch#1275)

This reverts commit e1453a4.

* Set ubuntu distribution correctly for ROCm5.3 and above (pytorch#1268)

* Fix unbound variable error (pytorch#1276)

Regression introduced (and ignored) by pytorch#1262
Test plan:
```
% bash -c 'set -u; if [[ -z "${FOO}" ]]; then echo "bar"; fi' 
bash: FOO: unbound variable
(base) nshulga@nshulga-mbp builder % bash -c 'set -u; if [[ -z "${FOO+x}" ]]; then echo "bar"; fi'
bar
(base) nshulga@nshulga-mbp builder % FOO=1 bash -c 'set -u; if [[ -z "${FOO+x}" ]]; then echo "bar"; fi'

```

* Manually invoke bash for miniconda (pytorch#1277)

Fixes build issues failing with:
```
./Miniconda3-latest-Linux-x86_64.sh: 438: ./Miniconda3-latest-Linux-x86_64.sh: [[: not found
```
as seen in e.g.: pytorch#1271

* Fix perm

Which somehow got changed by pytorch@62103bf

* add sm_90 to CUDA11.8 builds (pytorch#1278)

* libtinfo.so version update and logic fix for ROCm libtorch (pytorch#1270)

* Use libtinfo.so.6 for Ubuntu 2004

* Fix to origname grep

* Condition on ROCM_VERSION for libtinfo6

* Looks like it is not used anywhere. (pytorch#1273)

* Build Windows binaries with Visual Studio 2022 Build Tools (pytorch#1240)

* Build Windows binaries with Visual Studio 2022 Build Tools

* Unify casing in Batch files, remove VS 2017 installation

* Remove VS 2017 Conda scripts, unify casing in conda Batch scripts, minor Conda scripts tweaks

* Slim down `pytorch-cuda`

It should only contain runtime dependencies that PyTorch+domain
libraries depend on, namely:
 - cudart
 - cublas
 - cusparse
 - cufft
 - curand
 - nvtx
 - nvrtc
 - nvjpeg (for TorchVision)

This removes dependencies on NVCC, build/debug tools, etc which are not
needed for running the pytorch

Test Plan:
  `conda create -n tmp -c nvidia -c malfet cuda-toolkit==11.7` and
observe that only relevant packages are installed

Fixes pytorch/pytorch#91334

* [BE] Delete `unicode-flags` build options (pytorch#1284)

There were relevant only for Python<=3.3

* [BE] Define `openssl_flags` (pytorch#1285)

Rather than have two invocations of `./configure`

* Build with `--enabled-shared` if `patchelf` is found (pytorch#1283)

This is needed to make `manylinux-wheel` images usable for building new Triton binaries.

Test plan: Build docker and verify that following `CMakeLists.txt` finishes successfully:
```
cmake_minimum_required(VERSION 3.6)
find_package(Python3 REQUIRED COMPONENTS Interpreter Development)
message(WARNING Executable ${Python3_EXECUTABLE})
message(WARNING IncludeDirs ${Python3_INCLUDE_DIRS})
message(WARNING Libraries ${Python3_LIBRARIES})
```

* Update cudnn to 8.7.0.84 for CUDA 11.8 builds (pytorch#1271)

* update cudnn to 8.7.0.84 for CUDA 11.8 builds

* workaround for pytorch#1272

* Revert "workaround for pytorch#1272"

This reverts commit c0b10d8.

* update cudnn==8.7.0.84 for windows

* [BE] Remove references to Python<3.6 (pytorch#1287)

* Upgrade desired python versoin to 3.8

For libtorch builds

* Fix how libtorch picks the python version

* Tweak conda builds to support 3.11

Add `-c malfet` when building for 3.11 (though perhaps it's better to
move numpy to pytorch channel)

Tweak some build time dependencies

* Fix typo

* Skip triton dependency for 3.11 CUDA builds

* Update build-number to 3

* Add ability to override cuda archs for conda (pytorch#1282)

* [ROCm] reduce disk space used in image (pytorch#1288)

Fixes pytorch#1286

* Extend MacOS/Windows builds to 3.11

By installing dependencies from pip
Should be a no-op for <=3.10

* ci: Migrate to checkout@v3 (pytorch#1290)

checkout@v2 is deprecated moving to checkout@v3

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

* Fix typo

* Add 3.11 option for Windows builds

* Add python-3.11 download location for windows

* Add pypi with cudnn package test (pytorch#1289)

* Add pypi with cudnn package test

* Add pypi with cudnn package test

* test

* test

* More pypi cudnn changes

* test

* Fix pipy smoke test

* Remove debug comments

* Delete some ancient checks for MacOS builds

As we no longer build for Python-2.7 or 3.5

* Add libnvjpeg-dev package as fallback (pytorch#1294)

* Add libnvjpeg-dev package as fallback

* Move libnvjpeg and libnvjpeg-dev to required packages

* Update conda/pytorch-cuda/meta.yaml

---------

Co-authored-by: Nikita Shulga <nikita.shulga@gmail.com>

* Upgrade nightly wheels to rocm5.4.2 (pytorch#1225)

* Upgrade nightly wheels to rocm5.4

* Adding graphic architectures for ROCm 5.4

* Updated to use ROCm5.4.1

* Updated to use ROCm5.4.2

* Fixed syntax error

* Perform build on image with magma and miopen preinstalled

* Add dev packages for windows pytorch-cuda dependencies (pytorch#1295)

* Add dev packages for windows dependencies

* Adding architecture dependent builds

* Add notes around windows

* fix typo

* Bumping version to v3

* rocm libtorch prebuild magma; fix manylinux cmake version (pytorch#1296)

* Add manywheel:cpu-cxx11-abi checkup for check_binary.sh (pytorch#1251)

* Remove with_py311 flag (pytorch#1301)

* rocm manylinux now uses devtoolset 9 (pytorch#1300)

* fix ACL_ROOT_DIR setting and upgrade the ACL version to 22.11 (pytorch#1291)

* Add `-c malfet` for Windows builds as well

* Set torch._C._PYBIND11_BUILD_ABI version check only for GLIBCXX_USE_CXX11_ABI=0 (pytorch#1303)

* Adding limit windows builds logic (pytorch#1297)

* Adding limit windows builds logic

* Remove empty space

* Simplify mkl build dependencies (pytorch#1305)

On Linux and Mac PyTorch must be built against `mkl=2020.x` in order to be compatible with both `mkl-2021` and `mkl-2022`, that added `.so.1` and `.so.2` files respectively, that would make binary linked against those versions incompatible with the newer/older toolchains.

This is not an issue on Windows, as all mkl binaries there end with simple `.dll`

* "Fix" PyTorch CPU conda testing

It's still horribly broken, but make it a bit better by not installing
pytorch from default anaconda channel (which installs 1.12.1 that does
not have any dependencies 2.0 dev package supposed to have)

For example, see this runlog
https://github.com/pytorch/pytorch/actions/runs/4155371267/jobs/7189101147

* Update torch._C._PYBIND11_BUILD_ABI version check (pytorch#1306)

* Skip tests for manywheel built with _GLIBCXX_USE_CXX11_ABI=1

* Put back smoke test label (pytorch#1310)

* [aarch64] add support for torchdata wheel building (pytorch#1309)

* Python 3.11 validation workflow tests (pytorch#1304)

* Test windows py311

* Nightly binaries

* Fix py311 tests

* fix python calling

* Revert "Nightly binaries"

This reverts commit cbf80ca.

* add a scheduled workflow for the nightly pypi binary size validation (compliments pytorch/test-infra#2681) (pytorch#1312)

* Add regression test for pytorch/pytorch#94751

* Add 3.11 and `--pytorch-only` options

* Add `lit` to list of allowed packages

As it is now mandatory (albeit spurious) dependency of pytorch-triton

See https://pypi.org/project/lit/ for more details

* s3: Allow tar.gz as an accepted file extension (pytorch#1317)

* Changes for Python 3.11 and smoke Test RC cut (pytorch#1316)

* Smoke Test RC cut

* Validate binaries 3.11

* test

* Smoke test binaries

* Fix pytorch-cuda chan download

* Remove temp change

* Make sure we don't use GPU runners for any of libtorch validations (pytorch#1319)

* Make sure we don't use GPU runners for any of libtorch

* Make sure we don't use GPU runners for any of libtorch

* s3: Add pytorch_triton_rocm to index (pytorch#1323)

Signed-off-by: Eli Uriegas <eliuriegas@meta.com>

* s3: Add tqdm package req for text (pytorch#1324)

* Add `--analyze-stacks` option

That using `git rev-base`, prints total number of stacks, and its
average, mean and max depth

At the time of submission here is top 10 ghstack uses of pytorch:
```
ezyang has 462 stacks max depth is 15 avg depth is 1.70 mean is 1
awgu has 240 stacks max depth is 28 avg depth is 4.30 mean is 1
peterbell10 has 146 stacks max depth is 7 avg depth is 1.84 mean is 1
zou3519 has 128 stacks max depth is 7 avg depth is 1.98 mean is 1
jerryzh168 has 113 stacks max depth is 16 avg depth is 1.45 mean is 1
bdhirsh has 111 stacks max depth is 7 avg depth is 1.85 mean is 2
wconstab has 108 stacks max depth is 7 avg depth is 2.15 mean is 1
SherlockNoMad has 99 stacks max depth is 4 avg depth is 1.24 mean is 1
zasdfgbnm has 80 stacks max depth is 11 avg depth is 2.52 mean is 6
desertfire has 73 stacks max depth is 3 avg depth is 1.14 mean is 1
```

* Add filelock and networkx deps (pytorch#1327)

To match dependencies for wheel files defined in https://github.com/pytorch/pytorch/blob/ed1957dc1989417cb978d3070a4e3d20520674b4/setup.py#L1021-L1024

* Remove building magma from source

* Revert

* Upgrade cmake version to 3.22.1 to build triton (pytorch#1331)

* Upgrade cmake version to 3.22.1 to build triton

* Pin patcheft version

* Fix comment typo

* Smoke test for cuda runtime errors (pytorch#1315)

* Add test for cuda runtime errors

* Add cuda exception smoke test

* Move cuda runtime error to end

* Move cuda runtime error to end

* Address comments

* Address comments

* Add Jinja2 Dependency (pytorch#1332)

As part of the effort to fix pytorch/pytorch#95986

* Add MarkupSafe to S3 Index (pytorch#1335)

* Remove rocm5.1 rocm5.2 from libtorch Dockerfile

* [aarch64] Adding CI Scripts to build aarch64 wheels (pytorch#1302)

* add aarch64 ci scripts

* added readme. get branch from /pytorch

* Add smoke tests conv,linalg,compile. And better version check. (pytorch#1333)

* Add smoke tests conv,linalg,compile

* Add version check

* Fix typo

Fix version check

Add not

* Add exception for python 3.11

* fix typo

* Try to exit after CUDA Runtime exception

* Restrict carsh test only to conda

* Restrict carsh test only to conda

* Fix tests

* Turn off cuda runtime issue

* tests

* more tests

* test

* remove compile step

* test

* disable some of the tests

* testing

* Remove extra index url

* test

* Fix tests

* Additional smoke tests

Remove release blocking changes

* Aarch64 changes for PyTorch release 2.0 (pytorch#1336)

* Aarch64 changes for PyTorch release 2.0

* Fix spacing

* Update aarch64_linux/build_aarch64_wheel.py

Co-authored-by: Nikita Shulga <nikita.shulga@gmail.com>

* Update aarch64_linux/build_aarch64_wheel.py

Co-authored-by: Nikita Shulga <nikita.shulga@gmail.com>

---------

Co-authored-by: Nikita Shulga <nikita.shulga@gmail.com>

* Aarch64 build py3.11 fix (pytorch#1341)

* Fix nightly smoke test (pytorch#1340)

* Fix nightly smoke test

* Fix nightly builds

* Release 2.0 release scripts changes (pytorch#1342)

* Release 2.0 release scripts changes

* Release script modifications

* Add more packages to allow list (pytorch#1344)

* Add `jinja2` dependency to conda package

To be consistent with wheels, see
https://github.com/pytorch/pytorch/95961

* Restrict jinja to py 3.10 or less (pytorch#1345)

* Update `torchtriton` version to 2.1.0

* And update trition version here as well

* added smoke test for max-autotune (pytorch#1349)

Co-authored-by: agunapal <agunapal@berkeley.edu>

* Refactor conda backup script (pytorch#1350)

* Refacto conda backup

* Fix space

* Minor style

* Revert "Upgrade cmake version to 3.22.1 to build triton (pytorch#1331)" (pytorch#1351)

* Revert "Upgrade cmake version to 3.22.1 to build triton (pytorch#1331)"

This reverts commit 18c5017.

* Selective revert

* Get cmake from pip

* Use 3.18.2 from conda

* Release script changes, add more release dependencies, bump version for aarch64 builds (pytorch#1352)

* Release script changes

* Add Jinja2 dependency

* Fix typo

* Add pytorch conda dependencies (pytorch#1353)

* Add latest dependencies for pytorch 2.0 release (pytorch#1357)

* Fix typo

* Revert "Revert me later: Fix conda package smoke tests"

This reverts commit d7f2a7c.

* [aarch64] update readme with the "--enable-mkldnn" option (pytorch#1362)

This needs to be enabled for official wheel building.

* Replace `--enable-mkldnn` with `--disable-mkldnn`

Also, change default to ubuntu-20.04

* Update AMIs

Using following images:
```
% aws ec2 describe-images --image-ids ami-078eece1d8119409f ami-052eac90edaa9d08f ami-0c6c29c5125214c77 --query "Images[].[ImageId, Description]"
[
    [
        "ami-078eece1d8119409f",
        "Canonical, Ubuntu, 18.04 LTS, arm64 bionic image build on 2023-03-02"
    ],
    [
        "ami-0c6c29c5125214c77",
        "Canonical, Ubuntu, 22.04 LTS, arm64 jammy image build on 2023-03-03"
    ],
    [
        "ami-052eac90edaa9d08f",
        "Canonical, Ubuntu, 20.04 LTS, arm64 focal image build on 2023-03-01"
    ]
]
```

* Update tags for domain libraries

* Add PyTorch version pinning to release wheels

* Fix flake8

* [BE] Introduce `build_domains` function

And call it to rebuild only domains if torch wheel is available

* Switch deprecated ubuntu-18.04 runner to ubuntu-latest (pytorch#1334)

* Switch deprecated ubuntu-18.04 runner to self-hosted 2xlarge

* Leave build-nvidia-docker for now

* Apply suggestions from code review

Co-authored-by: Nikita Shulga <nikita.shulga@gmail.com>

* Use ephemeral runners

* Use ubuntu-latest

* Apply suggestions from code review

Co-authored-by: Nikita Shulga <nikita.shulga@gmail.com>

* Switch from latest to 22.04 to pin the version

---------

Co-authored-by: Nikita Shulga <nikita.shulga@gmail.com>

* Introduce optional --build-number parameter

* Revert me later: Fix conda package smoke tests

(cherry picked from commit d7f2a7c)

Alas, it's still used and causes nightly build failures

* Fix aarch64 torchvision build (pytorch#1363)

* Fix torchvision image extension compilation

* Fix torchvision image extension compilation

* Set enable_mkldnn to pypi build

* Remove unused `enable_mkldnn` for configure_system

* [aarch64] Try to link statically with png/jpeg

Also, add testing (which is currently broken)

* Revert "Revert me later: Fix conda package smoke tests"

This reverts commit ce427de.

* [AARCH64] Fix image.so wheel

By adding explicit libz dependency

* [AARCH64] Pass `BUILD_S3` to torchdata

To make build consistent with Linux-x86_64

* Revert "[AARCH64] Pass `BUILD_S3` to torchdata"

This reverts commit ae8e825.

As it does not want to be built on aarch64

* Add portalocker (pytorch#1364)

* [BE] Error handling in build_aarch64_wheel

I've noticed that build errors in `build_ArmComputeLibrary` would be
ignored as semicolon is used between the commands, instead of &&
Also, replace nightly version evaluation by relying on torch, to rely on
individual libraries

* [AArch64] Pass `args.instance_type` to `start_instance`

* use c++17 when building windows smoke tests (pytorch#1365)

Summary:
We are seeing failures during CI dealing with some headers that have
nested namespaces. This is expected to remedy them.

One such example:
https://github.com/pytorch/pytorch/actions/runs/4510336715/jobs/7942660912

Test Plan: Test this with CI.

---------

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Signed-off-by: Eli Uriegas <eliuriegas@meta.com>
Co-authored-by: Andrey Talman <atalman@fb.com>
Co-authored-by: andysamfb <111015134+andysamfb@users.noreply.github.com>
Co-authored-by: izaitsevfb <108101595+izaitsevfb@users.noreply.github.com>
Co-authored-by: Nikita Shulga <nshulga@fb.com>
Co-authored-by: Syed Tousif Ahmed <syed.ahmed.emails@gmail.com>
Co-authored-by: Syed Tousif Ahmed <syeahmed@nvidia.com>
Co-authored-by: Nikita Shulga <nikita.shulga@gmail.com>
Co-authored-by: Wei Wang <109318740+weiwangmeta@users.noreply.github.com>
Co-authored-by: Nikita Shulga <nshulga@meta.com>
Co-authored-by: Pruthvi Madugundu <pmagundu@amd.com>
Co-authored-by: Pruthvi Madugundu <pruthvigithub@gmail.com>
Co-authored-by: Jithun Nair <37884920+jithunnair-amd@users.noreply.github.com>
Co-authored-by: Jithun Nair <jithun.nair@amd.com>
Co-authored-by: Huy Do <huydhn@gmail.com>
Co-authored-by: snadampal <87143774+snadampal@users.noreply.github.com>
Co-authored-by: Eli Uriegas <1700823+seemethere@users.noreply.github.com>
Co-authored-by: ptrblck <ptrblck@users.noreply.github.com>
Co-authored-by: zhuhong61 <95205772+zhuhong61@users.noreply.github.com>
Co-authored-by: Greg Roodt <groodt@gmail.com>
Co-authored-by: Eli Uriegas <eliuriegas@fb.com>
Co-authored-by: Dmytro Dzhulgakov <dzhulgakov@users.noreply.github.com>
Co-authored-by: albanD <desmaison.alban@gmail.com>
Co-authored-by: Radek Bartoň <blackhex@post.cz>
Co-authored-by: divchenko <divchenko@users.noreply.github.com>
Co-authored-by: Jeff Daily <jeff.daily@amd.com>
Co-authored-by: Bo Li <110066325+BLOrange-AMD@users.noreply.github.com>
Co-authored-by: Mike Schneider <104035434+xncqr@users.noreply.github.com>
Co-authored-by: Ankith Gunapal <agunapal@ischool.Berkeley.edu>
Co-authored-by: agunapal <agunapal@berkeley.edu>
Co-authored-by: dagitses <dagitses@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: cpu CPU specific problem (e.g., perf, algorithm) module: crash Problem manifests as a hard crash, as opposed to a RuntimeError needs reproduction Someone else needs to try reproducing the issue given the instructions. No action needed from user triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
Status: Done
Development

No branches or pull requests

10 participants