Basic math operations produce a "floating point exception" #89817

timothygebhard · 2022-11-29T02:36:33Z

🐛 Describe the bug

When I try to run the following simple piece of code:

import numpy as np
import torch

np.random.seed(42)

x = torch.from_numpy(np.random.rand(100)).float()
print(x)

exp_x = torch.exp(x)
print(exp_x)

I get a floating point exception that kills my Python interpreter:

(venv) [tgebhard@g108] ~ % python test.py                                                                                                                                                                   
tensor([0.3745, 0.9507, 0.7320, 0.5987, 0.1560, 0.1560, 0.0581, 0.8662, 0.6011,
        0.7081, 0.0206, 0.9699, 0.8324, 0.2123, 0.1818, 0.1834, 0.3042, 0.5248,
        0.4319, 0.2912, 0.6119, 0.1395, 0.2921, 0.3664, 0.4561, 0.7852, 0.1997,
        0.5142, 0.5924, 0.0465, 0.6075, 0.1705, 0.0651, 0.9489, 0.9656, 0.8084,
        0.3046, 0.0977, 0.6842, 0.4402, 0.1220, 0.4952, 0.0344, 0.9093, 0.2588,
        0.6625, 0.3117, 0.5201, 0.5467, 0.1849, 0.9696, 0.7751, 0.9395, 0.8948,
        0.5979, 0.9219, 0.0885, 0.1960, 0.0452, 0.3253, 0.3887, 0.2713, 0.8287,
        0.3568, 0.2809, 0.5427, 0.1409, 0.8022, 0.0746, 0.9869, 0.7722, 0.1987,
        0.0055, 0.8155, 0.7069, 0.7290, 0.7713, 0.0740, 0.3585, 0.1159, 0.8631,
        0.6233, 0.3309, 0.0636, 0.3110, 0.3252, 0.7296, 0.6376, 0.8872, 0.4722,
        0.1196, 0.7132, 0.7608, 0.5613, 0.7710, 0.4938, 0.5227, 0.4275, 0.0254,
        0.1079])
zsh: floating point exception  python test.py
(venv) [tgebhard@g108] ~ %

The problem also occurs for other mathematical operations such as torch.log() or torch.cos(). It seems like it only happens if the size of the input tensor is at least 100, though.

Moreover, the issue only occurs on some machines, under some specific circumstances: My local machine will run the code above without any problem, but one of the machines at work reproducibly gives the error above, but only if I request at least 14 CPU cores (it's a batch queue system based on HTCondor). It might, therefore, be the case that only this particular machine has a problem. Any pointers for debugging this are greatly appreciated! 🙂

Versions

Information about the Python environment:

PyTorch version: 1.13.0+cu117
Is debug build: False
CUDA used to build PyTorch: 11.7
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.2 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: version 3.16.3
Libc version: glibc-2.31

Python version: 3.8.10 (default, Jun  2 2021, 10:49:15)  [GCC 9.4.0] (64-bit runtime)
Python platform: Linux-5.4.0-80-generic-x86_64-with-glibc2.29
Is CUDA available: False
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No devices found.
Nvidia driver version: Could not collect
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] numpy==1.23.5
[pip3] pytorch-lightning==1.8.3.post0
[pip3] torch==1.13.0
[pip3] torchmetrics==0.10.3
[conda] Could not collect

Information about the machine where the problem occurs (output of lscpu):

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   43 bits physical, 48 bits virtual
CPU(s):                          256
On-line CPU(s) list:             0,1,9-16,26-31
Off-line CPU(s) list:            2-8,17-25,32-255
Thread(s) per core:              0
Core(s) per socket:              64
Socket(s):                       2
NUMA node(s):                    2
Vendor ID:                       AuthenticAMD
CPU family:                      23
Model:                           49
Model name:                      AMD EPYC 7662 64-Core Processor
Stepping:                        0
Frequency boost:                 enabled
CPU MHz:                         1499.941
CPU max MHz:                     2000.0000
CPU min MHz:                     1500.0000
BogoMIPS:                        3999.98
Virtualization:                  AMD-V
L1d cache:                       2 MiB
L1i cache:                       2 MiB
L2 cache:                        32 MiB
L3 cache:                        256 MiB
NUMA node0 CPU(s):               0-63,128-191
NUMA node1 CPU(s):               64-127,192-255
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Full AMD retpoline, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpui
                                 d extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dn
                                 owprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate sme ssbd mba sev ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2
                                  cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr wbnoinvd arat npt lbrv svm_lock n
                                 rip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif umip rdpid overflow_recov succor smca

cc @VitalyFedyunin @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10

The text was updated successfully, but these errors were encountered:

jgong5 · 2022-11-29T08:17:32Z

@timothygebhard Seems we are not able to repro it locally on Intel CPU. To further narrow down the issue, are you able to try sanitizer build and run your test with sanitizer?

pytorch/CMakeLists.txt

Line 183 in 8226a5d

option(USE_ASAN "Use Address+Undefined Sanitizers" OFF)

timothygebhard · 2022-11-29T18:35:50Z

@jgong5 I tried all day to build PyTorch with ASAN as described here (the guide might need an update, BTW), but I did not succeed. Ultimately, it always crashes with some clang: error: linker command failed with exit code 1, and the undefined messages that it keeps complaining about seem related to OpenMP (e.g., /usr/bin/ld: lib/libtorch_cpu.so: undefined reference to omp_in_parallel'), even though the build_with_asan() command from the guide above sets USE_OPENMP=0 🤷‍♂️

When talking to our cluster admin at work, he suspected that the problem might be related to MKL, possibly in combination with an AMD CPU? In that case, I guess I should try to make sure to compile the sanitizer build with the same MKL version as the PyTorch version that's being shipped via pip?

zadaianchuk · 2022-12-13T15:23:39Z

Similar issue seems to occur more than a year ago:
#66247 (comment)

Are there any plans to update MKL version that potentially doesn't have this bug in combination with an AMD CPU?

timothygebhard · 2022-12-15T00:43:23Z

Running echo 'run' | gdb --args python test.py seems to confirm that it's indeed an MKL issue:

Thread 1 "python" received signal SIGFPE, Arithmetic exception.
0x000015551042ec47 in mkl_vml_serv_GetMinN () from /lustre/home/tgebhard/.virtualenvs/venv/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so

piraka9011 · 2022-12-15T23:10:01Z

I have also seen the same issue in #66247 and had to resort to building my own PyTorch container with USE_MKL=OFF in order to resolve this issue.
My stack trace was also similar to what was reported there.

This does not happen on Intel CPUs and occurs on specific AMD CPUs.
For ex. it works on my local AMD Ryzen 7 3700X.
But when I deploy it to a cluster with Epyc Rome or Threadripper CPUs, I get a SIGFPE.

jgong5 · 2022-12-16T04:05:00Z

@timothygebhard You mentioned that it is related to the MKL version that pytorch release package is built with. May I know what particular MKL version it is? Have you tried with a newer version of MKL?

mseitzer · 2022-12-16T11:11:09Z

This issue happens for me with pre-built Pytorch packages on versions 1.11, 1.12, 1.13, 2.0 (nightly). I have not tried earlier versions.

The relevant line from torch.__config__.show() (the same for all tested Pytorch versions):

  - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications

According to #66247 (comment), the issue disappears with MKL 2021.4, although I have not tried this.

It would be great to upgrade the MKL standard Pytorch is built with, as this essentially renders Pytorch unusable on machines with the affected CPUs.

timothygebhard · 2022-12-16T13:58:49Z

@jgong5 The MKL version is the one mentioned by @mseitzer. I haven't tried to build PyTorch myself using a newer version of MKL yet; however, I seem to recall that when I did build it without MKL (using OpenBLAS instead), the error was gone.

jingxu10 · 2022-12-16T23:05:13Z

I cannot reproduce this issue on neither AWS M5a nor M6a 4xlarge EC2 instances with the nightly built
pip3 install --pre torch --extra-index-url https://download.pytorch.org/whl/nightly/cpu
Would you let us know which instance type could reproduce this issue?

Alternatively, could you try compiling PyTorch from source? Instructions are available at https://github.com/pytorch/pytorch#from-source
conda install commands should install the latest version of MKL.

piraka9011 · 2022-12-17T19:44:47Z

Here is the output of lscpu from an example machine of where this happens

Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         43 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  32
  On-line CPU(s) list:   0,6,11-15,22,27-31
  Off-line CPU(s) list:  1-5,7-10,16-21,23-26
Vendor ID:               AuthenticAMD
  Model name:            AMD EPYC 7F52 16-Core Processor
    CPU family:          23
    Model:               49
    Thread(s) per core:  2
    Core(s) per socket:  16
    Socket(s):           1
    Stepping:            0
    Frequency boost:     enabled
    CPU max MHz:         3500.0000
    CPU min MHz:         2500.0000
    BogoMIPS:            6999.96
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc 
                         cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3d
                         nowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cq
                         m rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm
                         _lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es
Virtualization features: 
  Virtualization:        AMD-V
Caches (sum of all):     
  L1d:                   512 KiB (16 instances)
  L1i:                   512 KiB (16 instances)
  L2:                    8 MiB (16 instances)
  L3:                    256 MiB (16 instances)
NUMA:                    
  NUMA node(s):          1
  NUMA node0 CPU(s):     0-31
Vulnerabilities:         
  Itlb multihit:         Not affected
  L1tf:                  Not affected
  Mds:                   Not affected
  Meltdown:              Not affected
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl and seccomp
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:            Mitigation; LFENCE, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling
  Srbds:                 Not affected
  Tsx async abort:       Not affected

I don't think this kind of machine explicitly is available on AWS, but it is on coreweave.com for example.

jingxu10 · 2022-12-17T20:43:08Z

could you try compiling PyTorch from source? Instructions are available at https://github.com/pytorch/pytorch#from-source
conda install commands should install the latest version of MKL. If the latest MKL solves this issue, compiling from source by yourself should work. Could you have a try on your environment?

soumith · 2022-12-17T21:02:59Z

This issue happens for me with pre-built Pytorch packages on versions 1.11, 1.12, 1.13, 2.0 (nightly)

@malfet any reason we are holding back on MKL 2020 for these packages, instead of just upgrading to the latest MKL?
You touched it last in 2020 haha, but these are the relevant lines: https://github.com/pytorch/builder/blob/main/common/install_mkl.sh#L6-L13
They build our docker image that builds the wheels, and the docker image in-turn statically links to this MKL. Upgrading this and rebuilding the docker image would likely solve this.

malfet · 2022-12-17T21:06:25Z

@malfet any reason we are holding back on MKL 2020 for these packages, instead of just upgrading to the latest MKL?

MKL 2020 is the last one that allows AMD CPU users to opt into AVX2 accelerated kernels using environment variable. Later ones require injecting a symbol in global namespace.

Here is an old issue (dating back to 2020) that discusses perf problems associated with updating to newer MKL: pytorch/builder#504

But stability is more important than perf, so imo we should update to the one that does not crash.

malfet · 2022-12-17T22:24:15Z

Here is the output of lscpu from an example machine of where this happens

Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         43 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  32
  On-line CPU(s) list:   0,6,11-15,22,27-31
  Off-line CPU(s) list:  1-5,7-10,16-21,23-26
Vendor ID:               AuthenticAMD
  Model name:            AMD EPYC 7F52 16-Core Processor
    CPU family:          23
    Model:               49
    Thread(s) per core:  2
    Core(s) per socket:  16

I can not repro it on AWS C5a.4xlarge, which has the same CPU family and model:

$ lscpu 
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   48 bits physical, 48 bits virtual
CPU(s):                          16
On-line CPU(s) list:             0-15
Thread(s) per core:              2
Core(s) per socket:              8
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       AuthenticAMD
CPU family:                      23
Model:                           49
Model name:                      AMD EPYC 7R32

@piraka9011 what is the type of instance you've allocated on coreweave.com? (As I could not repro it on 16 vCPU EPYC Rome instance, i.e. running AMD EPYC 7402P 24-Core Processor )

piraka9011 · 2022-12-18T00:15:16Z

@jingxu10 Ive been compiling from source with the USE_MKL=OFF flag.
I will try compiling with the flag removed and the latest mkl installed via Conda and report back.

@malfet so I don't think there's an easy way to get the same exact config on CoreWeave using their standard virtual server offering, but I will try to find a config to repro the instance exactly.

malfet · 2022-12-18T02:35:08Z

@piraka9011 I'm going to switch to 2022.2.1 in the nightlies and will run CPU performance tests afterwards (on Intel CPUs it showed moderate perf gains, and smoke tests I've run on AMD ones showed they they are no worse, but much better if LD_PRELOAD trick is used)

As previous one occasionally crashes on AMD CPUs May be addresses pytorch/pytorch#89817 Please note, that in order to get maximum perf on AMD CPUs one needs to compile and LD_PRELOAD following library: ``` int mkl_serv_intel_cpu_true() { return 1; } ```

timothygebhard · 2022-12-18T16:26:53Z

I can confirm that the problem is still present in the current nightly built (2.0.0.dev20221216+cpu, which still uses MKL Version 2020.0.0 Product Build 20191122).

I also finally got around to building PyTorch from source (with conda)¹, which gave me version 2.0.0a0+git212873c, including a newer version of MKL:

Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications

Unfortunately, even with this version, the SIGFPE issue is still present on my machine.

As a side node, building PyTorch from source took well over 24 hours for me (on a machine with 32 cores), so even if that had fixed the issue, I am not sure how feasible that would be for most people who just want to use PyTorch? ↩

jingxu10 · 2022-12-18T22:04:00Z

@timothygebhard The latest MKL is 2022.2.1, but the one you got was 2021.4. Since @malfet had enabled the 2022.2.1 into nightly builds, you can probably try the ones later than 20221218? @malfet Any idea which one has 2022.2.1 enabled?

malfet · 2022-12-18T23:33:13Z

Unfortunately, even with this version, the SIGFPE issue is still present on my machine.

@timothygebhard would you be comfortable attaching gdb/lldb to the process running pytorch and sharing exact instruction/FPU registers when exception occurs? (I can provide all the instruction on how to do it if needed)

As a side node, building PyTorch from source took well over 24 hours for me (on a machine with 32 cores), so even if that had fixed the issue, I am not sure how feasible that would be for most people who just want to use PyTorch?

Hmm, this shouldn't be the case, even if you build with CUDA support for all possible architectures. I.e. CPU build of PyTorch takes less than 30 min on my rather aniquated laptop. Have you installed ninja? If not, have you passed MAX_JOBS parameter to ensure build is running in parallel rather than sequetnially?

timothygebhard · 2022-12-18T23:57:20Z

@jingxu10 I just installed the 2.0.0.dev20221218+cpu nightly, which indeed was built with the 2022.2 version of MKL. I am happy to report that with this version, the problem seems resolved — my minimal examples run without causing a floating point exception 🙂

@malfet Sure, I can give it a try! I just ran gdb -ex r --args python test.py for a PyTorch version using MKL 2020, and it resulted in:

Thread 1 "python" received signal SIGFPE, Arithmetic exception.
0x0000155488495c47 in mkl_vml_serv_GetMinN () from /lustre/home/tgebhard/.virtualenvs/venv/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so

Is this helpful / what you were looking for, or what else should I run?

Regarding the long build time: I did install ninja via conda (as described in the "Install dependencies" command here), but I did not explicitly set MAX_JOBS. I can try again if you think that was the issue?

malfet · 2022-12-19T01:55:22Z

@timothygebhard can you run following commands x/4i $rip and info all-registers and share the output here.

As for build time, ninja should auto-detect the number of CPU cores and do parallelization automatically. Do you mind sharing .ninja_log file from build folder, that would give me an idea about parallelism/built times you've observed
I believe 16 core builders are used for nightly builds of PyTorch and it finishes in under 4 hours for all GPU architectures. If one builds for themselves it's not necessary to build for all arches, and this way builds are much faster.

timothygebhard · 2022-12-19T15:29:32Z

Here is the output from the those two commands (I hope I understood correctly how to use them):

(gdb) x/4i $rip
=> 0x15548827bc47 <mkl_vml_serv_GetMinN+135>:   idiv   %esi
   0x15548827bc49 <mkl_vml_serv_GetMinN+137>:   cvtsi2sd %eax,%xmm5
   0x15548827bc4d <mkl_vml_serv_GetMinN+141>:   mov    %ecx,%eax
   0x15548827bc4f <mkl_vml_serv_GetMinN+143>:   pxor   %xmm0,%xmm0

(gdb) info all-registers
rax            0x3b                59
rbx            0x7ffffffdc30c      140737488208652
rcx            0x3b                59
rdx            0x0                 0
rsi            0x0                 0
rdi            0x7ffffffdc30c      140737488208652
rbp            0x7ffffffdc360      0x7ffffffdc360
rsp            0x7ffffffdc2b0      0x7ffffffdc2b0
r8             0x0                 0
r9             0x15549ace7d20      23453118659872
r10            0x501ca5f           84003423
r11            0x56                86
r12            0x501ca00           84003328
r13            0x3e8               1000
r14            0x7fffffffc430      140737488340016
r15            0x3e8               1000
rip            0x15548827bc47      0x15548827bc47 <mkl_vml_serv_GetMinN+135>
eflags         0x10206             [ PF IF RF ]
cs             0x33                51
ss             0x2b                43
ds             0x0                 0
es             0x0                 0
fs             0x0                 0
gs             0x0                 0
st0            1.490116119384765625e-08 (raw 0x3fe58000000000000000)
st1            0                   (raw 0x00000000000000000000)
st2            0                   (raw 0x00000000000000000000)
st3            1.00000000000000000005e-31 (raw 0x3f9881ceb32c4b43fcf5)
st4            -31                 (raw 0xc003f800000000000000)
st5            10000000000000000   (raw 0x40348e1bc9bf04000000)
st6            nan(0xc000000000000000) (raw 0x7fffc000000000000000)
st7            4                   (raw 0x40018000000000000000)
fctrl          0x37f               895
fstat          0x20                32
ftag           0xffff              65535
fiseg          0x0                 0
fioff          0x0                 0
foseg          0x0                 0
fooff          0x0                 0
fop            0x0                 0
mxcsr          0x1fa2              [ DE PE IM DM ZM OM UM PM ]
ymm0           {v16_bfloat16 = {0x1440, 0x1e, 0x0 <repeats 14 times>}, v8_float = {0x1e, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0xa2, 0x45, 0xf3, 0x41, 0x0 <repeats 28 times>}, v16_int16 = {0x45a2, 0x41f3, 0x0 <repeats 14 times>}, v8_int32 = {0x41f345a2, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x41f345a2, 0x0, 0x0, 0x0}, v2_int128 = {0x41f345a2, 0x0}}
ymm1           {v16_bfloat16 = {0x0 <repeats 16 times>}, v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0x0 <repeats 32 times>}, v16_int16 = {0x0 <repeats 16 times>}, v8_int32 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x0, 0x0, 0x0, 0x0}, v2_int128 = {0x0, 0x0}}
ymm2           {v16_bfloat16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v8_float = {0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x7fffffffffffffff, 0x7fffffffffffffff, 0x0, 0x0}, v32_int8 = {0xff <repeats 16 times>, 0x0 <repeats 16 times>}, v16_int16 = {0xffff, 0xffff, 0xffff, 0xffff, 0xffff, 0xffff, 0xffff, 0xffff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v8_int32 = {0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0xffffffffffffffff, 0xffffffffffffffff, 0x0, 0x0}, v2_int128 = {0xffffffffffffffffffffffffffffffff, 0x0}}
ymm3           {v16_bfloat16 = {0x0, 0x0, 0x0, 0x0 <repeats 13 times>}, v8_float = {0xca000000, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0xa0, 0xbc, 0x4d, 0x55, 0x55, 0x15, 0x0 <repeats 26 times>}, v16_int16 = {0xbca0, 0x554d, 0x1555, 0x0 <repeats 13 times>}, v8_int32 = {0x554dbca0, 0x1555, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x1555554dbca0, 0x0, 0x0, 0x0}, v2_int128 = {0x1555554dbca0, 0x0}}
ymm4           {v16_bfloat16 = {0x0 <repeats 16 times>}, v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0x0 <repeats 32 times>}, v16_int16 = {0x0 <repeats 16 times>}, v8_int32 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x0, 0x0, 0x0, 0x0}, v2_int128 = {0x0, 0x0}}
ymm5           {v16_bfloat16 = {0x0 <repeats 16 times>}, v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0x0 <repeats 32 times>}, v16_int16 = {0x0 <repeats 16 times>}, v8_int32 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x0, 0x0, 0x0, 0x0}, v2_int128 = {0x0, 0x0}}
ymm6           {v16_bfloat16 = {0x1440, 0x1e, 0x0 <repeats 14 times>}, v8_float = {0x1e, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0xa2, 0x45, 0xf3, 0x41, 0x0 <repeats 28 times>}, v16_int16 = {0x45a2, 0x41f3, 0x0 <repeats 14 times>}, v8_int32 = {0x41f345a2, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x41f345a2, 0x0, 0x0, 0x0}, v2_int128 = {0x41f345a2, 0x0}}
ymm7           {v16_bfloat16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x20, 0x30, 0x2e, 0x37, 0x39, 0x39, 0x34, 0x2c, 0x20, 0x0 <repeats 16 times>}, v16_int16 = {0x2020, 0x2020, 0x2020, 0x2020, 0x2e30, 0x3937, 0x3439, 0x202c, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v8_int32 = {0x20202020, 0x20202020, 0x39372e30, 0x202c3439, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x2020202020202020, 0x202c343939372e30, 0x0, 0x0}, v2_int128 = {0x202c343939372e302020202020202020, 0x0}}
ymm8           {v16_bfloat16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0x30, 0x2e, 0x36, 0x38, 0x38, 0x34, 0x2c, 0x20, 0x30, 0x2e, 0x34, 0x33, 0x34, 0x38, 0x2c, 0x20, 0x0 <repeats 16 times>}, v16_int16 = {0x2e30, 0x3836, 0x3438, 0x202c, 0x2e30, 0x3334, 0x3834, 0x202c, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v8_int32 = {0x38362e30, 0x202c3438, 0x33342e30, 0x202c3834, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x202c343838362e30, 0x202c383433342e30, 0x0, 0x0}, v2_int128 = {0x202c383433342e30202c343838362e30, 0x0}}
ymm9           {v16_bfloat16 = {0x0, 0x0, 0x2, 0x4, 0x0 <repeats 12 times>}, v8_float = {0x0, 0x4, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x3e8, 0x0, 0x0, 0x0}, v32_int8 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x40, 0x8f, 0x40, 0x0 <repeats 24 times>}, v16_int16 = {0x0, 0x0, 0x4000, 0x408f, 0x0 <repeats 12 times>}, v8_int32 = {0x0, 0x408f4000, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x408f400000000000, 0x0, 0x0, 0x0}, v2_int128 = {0x408f400000000000, 0x0}}
ymm10          {v16_bfloat16 = {0x0, 0x0, 0x0, 0x1, 0x0 <repeats 12 times>}, v8_float = {0x0, 0x1, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0x0, 0x0, 0x0, 0x0, 0xe0, 0x7f, 0xad, 0x3f, 0x0 <repeats 24 times>}, v16_int16 = {0x0, 0x0, 0x7fe0, 0x3fad, 0x0 <repeats 12 times>}, v8_int32 = {0x0, 0x3fad7fe0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x3fad7fe000000000, 0x0, 0x0, 0x0}, v2_int128 = {0x3fad7fe000000000, 0x0}}
ymm11          {v16_bfloat16 = {0x0 <repeats 16 times>}, v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0x0 <repeats 32 times>}, v16_int16 = {0x0 <repeats 16 times>}, v8_int32 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x0, 0x0, 0x0, 0x0}, v2_int128 = {0x0, 0x0}}
ymm12          {v16_bfloat16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0x10, 0xba, 0x90, 0x0, 0x0, 0x0, 0x0, 0x0, 0x10, 0xba, 0x90, 0x0 <repeats 21 times>}, v16_int16 = {0xba10, 0x90, 0x0, 0x0, 0xba10, 0x90, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v8_int32 = {0x90ba10, 0x0, 0x90ba10, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x90ba10, 0x90ba10, 0x0, 0x0}, v2_int128 = {0x90ba10000000000090ba10, 0x0}}
ymm13          {v16_bfloat16 = {0x0, 0x0, 0x0 <repeats 14 times>}, v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0x10, 0xba, 0x90, 0x0 <repeats 29 times>}, v16_int16 = {0xba10, 0x90, 0x0 <repeats 14 times>}, v8_int32 = {0x90ba10, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x90ba10, 0x0, 0x0, 0x0}, v2_int128 = {0x90ba10, 0x0}}
ymm14          {v16_bfloat16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0x10, 0xba, 0x90, 0x0, 0x0, 0x0, 0x0, 0x0, 0x10, 0xba, 0x90, 0x0 <repeats 21 times>}, v16_int16 = {0xba10, 0x90, 0x0, 0x0, 0xba10, 0x90, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v8_int32 = {0x90ba10, 0x0, 0x90ba10, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x90ba10, 0x90ba10, 0x0, 0x0}, v2_int128 = {0x90ba10000000000090ba10, 0x0}}
ymm15          {v16_bfloat16 = {0x0, 0x0, 0x0 <repeats 14 times>}, v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0x10, 0xba, 0x90, 0x0 <repeats 29 times>}, v16_int16 = {0xba10, 0x90, 0x0 <repeats 14 times>}, v8_int32 = {0x90ba10, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x90ba10, 0x0, 0x0, 0x0}, v2_int128 = {0x90ba10, 0x0}}

—

Regarding the long build time: I realized I made a dumb error on my side — I ran the build on a machine using a shared file system, which was a bad idea. When I re-run the build using local storage, it finishes in about 35 minutes 🙂

malfet · 2022-12-19T22:06:23Z

(gdb) x/4i $rip
=> 0x15548827bc47 <mkl_vml_serv_GetMinN+135>:   idiv   %esi

This looks like a good old integral division by zero (easily reproducible by the following code: https://godbolt.org/z/rh7d8MdEE ), so I'm confused why python runtime reports it as floating point exception.... (And even more confused what MKL is trying to do here.)

piraka9011 · 2022-12-23T23:23:47Z

Using the nightly version works for me too.
Also FWIW, sharing how CoreWeave resolved this for nodes/pods running in their k8s cluster.

...applying lxcfs-admission-webhook: disabled has done the trick here for prior clients seeking the same solution.
There is a layer (lxcfs) that exposes /proc/cpuinfo /proc/meminfo /proc/stat etc to look like the limits section of the resources, so applications that read those to determine ie how many number of threads to spin up don’t spin up more than the limit. Seems like something pytorch reads there messes it up sometime. Doesn’t make sense to me, but you also don’t really need that feature when doing most ML tasks so I disabled it for your namespace.

levmckinney · 2023-03-28T15:01:52Z

Will there be a patch for torch 1.13 to resolve this? I'm still running into this issue using the 1.13.1-cuda11.6-cudnn8-runtime container.

malfet · 2023-03-28T15:09:36Z

@levmckinney no, there are usually no back ports to previous releases, but it should be fixed in PyTorch-2.0

* Make sure package_type is set (pytorch#1139) * Update check_binary.sh * Update check_binary.sh * Modifying smoke test to add more advanced validation as requested (pytorch#1124) * Modify smoke test matrix More vision smoke tests Temporary pointing to my repo for testing Try 2 use atalman builder Modify path Fixing commits Testing Testing Smoke test modifications Refactor test code Fix typo Fixing image read A little more refactoring Addressing comments Testing * Add same test for windows and macos * Addressing c omments * Add manywheel special build for including pypi package (pytorch#1142) * Add manywheel special build Testing Builder change Testing Adding manywheel cuda workflow Simplify Fix expr * address comments * checking for general setting * Pass correct parameters for macos validations (pytorch#1143) * Revert "Update check_binary.sh" This reverts commit 6850bed. * Revert "Update check_binary.sh" This reverts commit 051b9d1. * setup periodic test to run binary verification pytorch/pytorch#84764: (pytorch#1144) * add a reusable workflow to run all smoke tests/or smoke tests for a specific os/channel * add workflows to schedule the periodic smoke tests for nightly and release channels * Update aarch64 script to latest one (pytorch#1146) * minor: fix the typo job name for windows binaries validation workflow (pytorch#1147) * fix the typo in the the job name for the release binaries validation workflow (pytorch#1148) issue was introduced in pytorch#1144 * Move to rc2 of 3.11 python (pytorch#1149) Need it to get several convenience functions * Integrates CUDA pip wheels (pytorch#1136) * Refactors rpath to externally set var. Adds mechanism to add metadata * Sets RUNPATH when using cudnn and cublas wheels * Escapes dollar sign * Fix rpath for cpu builds Co-authored-by: atalman <atalman@fb.com> * Uses RPATH instead of RUNPATH so that user strictly uses pypi libs (pytorch#1150) * Binary Validation Workflow - Adding check binary script (pytorch#1127) * Update action.yml * Update validate-macos-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Update validate-linux-binaries.yml * Fix check binary for arm64 (pytorch#1155) * Fix check binary for arm64 * Update check_binary.sh Co-authored-by: Nikita Shulga <nikita.shulga@gmail.com> Co-authored-by: Nikita Shulga <nikita.shulga@gmail.com> * Fix for including nvtx dll and cudart (pytorch#1156) * Fix for invluding nvtx dll and cudart * Fix for include nvtx * Fix spaces * Back out inclusion of cudart (pytorch#1157) * Add cuda and date check to smoke test (pytorch#1145) * shorten binary validation workflow names, so they are more readable in the HUD and GH job view (pytorch#1159) * Fix anaconda torchaudio smoke test (pytorch#1161) * Fix anaconda torchaudio smoke test * Format using ufmt * Fix whels tests for torchaudio (pytorch#1162) * Pin condaforge version Most recent version fails with invalid cert error when trying to update python * Option to run resnet classifier on specific device * Fix typo `.test/smoke_test` -> `test/smoke_test` Noticed when pushed pytorch@3b93537 and no tests were run * Test resnet classifier on CUDA (pytorch#1163) * [ROCm] support for rocm5.3 wheel builds (pytorch#1160) * Updates to support rocm5.3 wheel builds (#6) * Changes to support ROCm 5.3 * Updated as per comments * Installing python before magma build - In ROCm 5.3 libtorch build are failing during magma build due to to missing python binary so added install statement * Move python install to libtorch/Dockerfile (#8) * Updating the condition for noRCCL build (#9) * Updating the condition for noRCCL build * Updated changes as per comments * Use MIOpen branch for ROCm5.3; Change all conditions to -eq * Use staging branch of MIOpen for ROCm5.3 * Fix merge conflict Fix merge conflict Co-authored-by: Pruthvi Madugundu <pmagundu@amd.com> Co-authored-by: Pruthvi Madugundu <pruthvigithub@gmail.com> Co-authored-by: Jithun Nair <37884920+jithunnair-amd@users.noreply.github.com> Co-authored-by: Jithun Nair <jithun.nair@amd.com> * Validate python 3.11 (pytorch#1165) * Validate python 3.11 * Validate linux binaries change Add options Import torchvision Adding python 3.11 install pass package to check nightly binaries date Test test Add python 3.11 code testing Adding python 3.11 test Add python 3.11 validation Adding zlib develop install Install zlib etc.. Adding zlib1g as well testing testing Adding validate windows binary Trying to workaround testing Refacor smoke test Add import statement fix datetime call * Fix stripping dev * fix import * Strip pypi-cudnn from the version.py (pytorch#1167) * Strip pypi-cudnn from the version.py * small fix * Regenerates RECORD file to reflect hash changes caused by sed'ing the version suffix (pytorch#1164) * Add pypi cudnn package to tests (pytorch#1168) * Add pypi cudnn package to tests * Fix pypi installation check * Fix pypi instructions setting * Update DEVELOPER_DIR in build_pytorch.sh Not sure why we are still expecting Xcode9 to be present there, update it to the same folder as wheel builds May be fixes pytorch/pytorch#87637 * Fix to not use sccache if it's not setup properly (pytorch#1171) * Revert "Fix to not use sccache if it's not setup properly (pytorch#1171)" (pytorch#1172) This reverts commit 377efea. * Remove cuda102 and cuda115 docker builds and regenerate manylinux docker (pytorch#1173) * Rebuild manywheel * Remove cuda102 and cuda115 * [aarch64] add mkldnn acl backend build support for pytorch cpu libary (pytorch#1104) * Only push to Docker and Anaconda repo from main (pytorch#1175) We currently allow push from any branch to go to Docker (and Anaconda) prod. This is a dangerous practice because it allows unfinished works to jump to prod and used by other workflows * Release 1.13 script changes (pytorch#1177) * Test ResNet on MPS (pytorch#1176) After pytorch/pytorch#86954 is fixed, we should be able to test resnet on MPS * Revert "Test ResNet on MPS (pytorch#1176)" (pytorch#1180) This reverts commit efa1bc7. * Add v1.13 versions * Update CMake to 3.18, needed for C++17 compilation (pytorch#1178) * release: separate out version suffixes for torch pypi promotion (pytorch#1179) * Fixup wheel published to PyPI (pytorch#1181) * Fixup wheel published to PyPI * Update prep_binary_for_pypi.sh * Fix folder deletion for pypi prep Co-authored-by: Andrey Talman <atalman@fb.com> * Update cmake version to 3.18 for libtorch docker * Pins cuda runtime to 111.7.99 (pytorch#1182) * Fixes cuda pypi rpaths and libnvrtc name (pytorch#1183) * Allow ROCm minor releases to use the same MIOpen branch as the major release (pytorch#1170) * Allow ROCm minor releases to use the same MIOpen branch as the major release * correct logic to ensure rocm5.4 doesn't fall in wrong condition * add 11.8 workflow for docker image build (pytorch#1186) * Using windows runners from test-infra for validation workflows (pytorch#1188) * Testing new windows runners test Testing Testing testing testing test Test Testing testing Testing Testing test Test test testing testing Test testing test testing testing testing testing testing testing test test testing testing testing testing Test test test testing testing testing testing testing testing testing testing testing Refactor code * Adding details for the test-infra issue * Update current CUDA supported matrix * add magma build for CUDA11.8 (pytorch#1189) * Test setting job name (pytorch#1191) * Use official Python-3.11 tag (pytorch#1195) * remove CUDA 10.2-11.5 builds (pytorch#1194) * remove CUDA 10.2-11.5 builds * remove 11.5 and 11.3 builds * build libtorch and manywheel for 11.8 (pytorch#1190) * build libtorch and manywheel for 11.8 * Update common/install_magma.sh * use magma-cuda build-1 by default; remove CUDA 10.2-11.5 builds Co-authored-by: Andrey Talman <atalman@fb.com> * [Validation] Pass ref:main to general worker (pytorch#1197) * Pass ref:main to general worker * Try to pass reference to workflow * Pass ref:main to general worker * Test * Pass reference as input parameter * Make new variable not required * Fix typo * Add workflow for manywheel cpu-cxx11-abi (pytorch#1198) * [Validation] Use linux_job for linux workers (pytorch#1199) * Use linux_job for linux workers Test Testing Test testing Tetsing testing Change linux binary action test Simplify version check * Fix if statement * Fix typo * Fix cuda version check Fix Audio and Vision version check Add check binary to libtorch test test testing testing testing Testing Testing testing * Use macos generic workers (pytorch#1201) * Use macos generic workers fix workflow testing Add arm64 builds test Remove validate binary action * add check binary step * fix ld_library path * add package type * Adding ref to validate binaries (pytorch#1204) * ROCm5.3 nightly wheels (pytorch#1193) * Enable ROCm5.3 nightly wheels * Enable ROCm5.3 docker builds * Update amdgpu repo url for ROCm5.3 * ROCm5.3 not supported on Ubuntu 18.04 * empty * Another empty commit * Try disabling MLIR build to shorten docker build time * Clean up disk space * MLIR project changed names from ROCm5.4 * Retrigger CI to get around flaky magma git access error * One more cmake-3.18.4 update * Use cmake-3.18 for ROCM builds * More cmake ROCM tweaks * cmake-3.18 installation on ROCM (take 3) * add conda builds for CUDA 11.8 (pytorch#1205) * Enable nightly CUDA 11.8 builds (pytorch#1206) * enable nightly builds for CUDA 11.8 * add CUDA 11.8 version to manywheel, remove 11.3 and 11.5 * Windows CUDA 11.8 changes (pytorch#1207) * Add continue on error to validation jobs (pytorch#1209) * Add continue on error to validation jobs * test * Delete unmaintaned torchvision build scripts (pytorch#1210) All build logic has long moved to torchvision repo and now is executed by reusable workflow from https://github.com/pytorch/test-infra/tree/main/.github/workflows * build_pytorch.sh replace tabs with spaces (pytorch#1211) * Make PyTorch depend on TorchTrition (pytorch#1213) Remove me when Triton is properly released elsewhere * Remove smoke test script that is no longer used (pytorch#1212) * Another tabs-to-spaces change `s/\t/\ \ \ \ \ \ \ \ /` * Disable continue on error (pytorch#1214) * Add torchtrition dependency for wheels * Make PyTorchConda depend on Triton (Take 2) Multi-line environment variables are hard, so lets do it traditional way * Revert "Add torchtrition dependency for wheels" This reverts commit 475100b. * Add TorchTrition dependency for wheels (take 2) Now tests should be green thanks to pytorch/pytorch#90017 * Add sympy to pytorch linux dependencies * Mitigate windows nightly build regressions By pinning conda to 22.9.0 Fixes pytorch/pytorch#90059 * Consolidating validation scripts (pytorch#1219) * Consolidating validation scripts * Fix validate script name * Correct script path * Correct script path * test * testing * testing * testing * testing * test * test * test * testing * testc * test hook * adding wondows use case * windows use case * test * testing * Windows fixes * more fixes * Add package type * testing more * Truncate RECORD instead of delete (pytorch#1215) * Refactor and fix windows smoke tests (pytorch#1218) * Fix windows smoke test * Fix first if statement * Refactor not to cal install nightly package * Revert "Refactor not to cal install nightly package" This reverts commit ac580c8. * Fix pip install command remove cu102 * Refacor the conda installation * Add cuda profiler apu to cuda install 11.8 (pytorch#1221) * Update CUDA upgrade runbook to mention subpackages changes As per following doc: https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html * conda: Add CUDA_HOME, cuda binaries to path (pytorch#1224) * Refactor macos-arm64 into separate group (pytorch#1226) * Adding libcufft constraint (pytorch#1227) * Adding libcufft constraint * Adding rest of the dependencies * Advance build number in pytorch-cuda (pytorch#1229) * Make sympy mandatory dependency of PyTorch Should fix https://github.com/pytorch/audio/actions/runs/3684598046/jobs/6234531675 * Revert me later: Fix conda package smoke tests * Install `sympy` via pip rather than conda Needs to be reverted as well * Refactor smoke tests to configure module included in the release (pytorch#1223) * Changes to prep for pypi script for release 1.13.1 (pytorch#1231) * PyPi binary validation and size check (pytorch#1230) * Validate binary size * Validate binary size linux_job * evaluate the fix from pytorch#1231 * Add an optional artifact upload, consolidate fixes to `prep_binary_for_pypi.sh` * Adding new workflow to call from domain libraries to validate on domain libraries such as text (pytorch#1234) * Testing new workflow Fix naming fix input * Changed comments * Ad ability to call validate domain library manually (pytorch#1235) * Adding test for validate dm workflow and fixing dm validation workflow (pytorch#1236) * Test manywheel packages (pytorch#1239) Change only docker file * Bump scripts in release (pytorch#1241) * release: Strip whitespace from version_with_suffix (pytorch#1242) * Cuda 11.8 and removal of dev packages (pytorch#1243) * Adding more OS's to validate domain library workflow (pytorch#1238) * Adding more OS's to validate domain library workflow * conda and wheel togeather * add macos workflows * fix workflow * Add target os variable to windows validation (pytorch#1244) * Update MKL to 2022.1 (pytorch#1245) As previous one occasionally crashes on AMD CPUs May be addresses pytorch/pytorch#89817 Please note, that in order to get maximum perf on AMD CPUs one needs to compile and LD_PRELOAD following library: ``` int mkl_serv_intel_cpu_true() { return 1; } ``` * Adds infra to use nvidia dependencies from pypi and cleans up patches (pytorch#1196) * Installs NCCL from redist, uses system NCCL, and adds pypi RPATH * Cleans up nvrtc patches and adds it using main script * Fixes typo * Adds more dependencies and builds torch with dynamic linking * NCCL dirs have to be specified. Otherwise picks up different version * Handles 11.8 * Adds echo message for nccl 2.15 * Remove invalid git option (pytorch#1246) * Revert "Adds infra to use nvidia dependencies from pypi and cleans up patches (pytorch#1196)" (pytorch#1247) This reverts commit ee59264. * Add with_cuda flag (pytorch#1249) * Add GPU architecture env variables (pytorch#1250) * Add cuda to jobname for validate domain library (pytorch#1252) * Remove pylief dependency (pytorch#1255) * Fix PEP503 for packages with dashes * Rename `torchtriton` to `pytorch-triton` Companion change for pytorch/pytorch#91539 * s3_management: Hide specific packages between dates (pytorch#1256) * s3_management: Pin requirements.txt Packaging got updated and that's not what we want Signed-off-by: Eli Uriegas <eliuriegas@fb.com> * s3_management: except ValueError Signed-off-by: Eli Uriegas <eliuriegas@fb.com> * s3_management: Use the correct format for strptime Signed-off-by: Eli Uriegas <eliuriegas@fb.com> * s3_management: Bump bad dates to october 17th (pytorch#1257) * s3_management: hide torchtriton (pytorch#1258) * s3_management: Add PACKAGE_ALLOW_LIST for indices (pytorch#1259) * s3_management: Bump bad date end to 12/30 (pytorch#1260) * Adds infra to use nvidia dependencies from pypi and cleans up patches (pytorch#1248) * Installs NCCL from redist, uses system NCCL, and adds pypi RPATH * Cleans up nvrtc patches and adds it using main script * Fixes typo * Adds more dependencies and builds torch with dynamic linking * NCCL dirs have to be specified. Otherwise picks up different version * Handles 11.8 * Adds echo message for nccl 2.15 * Fixes logic for 11.8 and adds missing names for DEPS_SONAME * s3_management: Account for underscore packages pytorch-triton is listed as pytorch_triton Signed-off-by: Eli Uriegas <eliuriegas@fb.com> * s3_management: simplify allowlist, correct underscores Signed-off-by: Eli Uriegas <eliuriegas@fb.com> * Fix cuda version in nightly (pytorch#1261) * Adding py311 validations (pytorch#1262) * Use MATRIX_* variables instead of redeefining new var each time (pytorch#1265) * Fix validation domain library (pytorch#1266) remove ref main fix workflow more refactor * Nightly: do test install with the dependencies better and skip CUDA tests on cpu only box (pytorch#1264) * Refactor PyTorch wheel and libtorch build scripts for ROCm (pytorch#1232) * Refactor wheel and libtorch build scripts (#7) * Update to so patching for ROCm Wildcard used in grep to grab the actual numbered so file referenced in patchelf. This allows the removal of specifying the so number in DEPS_LIST & DEPS_SONAME This commit also adds the functionality for trimming so names to build_libtorch.sh from build_common.sh * Refactor to remove switch statement in build_rocm.sh This commit refactors build_rocm.sh and brings in a few major updates: - No longer required to specify the full .so name (with number) for ROCm libraries - The .so versions are copied and the patching code will fix the links to point to this version - No longer required to specify paths for ROCm libraries allowing the removal of the large switch - Paths are acquired programmatically with find - No longer required to specify both the path and filename for the OS specific libraries - Programatically extract file name from the path - Automatically extract Tensile/Kernels files for the architectures specified in PYTORCH_ROCM_ARCH and any non-arch specific files e.g. TensileLibrary.dat * rocfft/hipfft link to libhiprtc.so in ROCm5.4 (#15) Co-authored-by: Jack Taylor <108682042+jataylo@users.noreply.github.com> * add sm_90 to CUDA11.8 builds (pytorch#1263) * add sm_90 to CUDA11.8 builds * Manually invoke bash for Miniconda * Revert "add sm_90 to CUDA11.8 builds (pytorch#1263)" (pytorch#1275) This reverts commit e1453a4. * Set ubuntu distribution correctly for ROCm5.3 and above (pytorch#1268) * Fix unbound variable error (pytorch#1276) Regression introduced (and ignored) by pytorch#1262 Test plan: ``` % bash -c 'set -u; if [[ -z "${FOO}" ]]; then echo "bar"; fi' bash: FOO: unbound variable (base) nshulga@nshulga-mbp builder % bash -c 'set -u; if [[ -z "${FOO+x}" ]]; then echo "bar"; fi' bar (base) nshulga@nshulga-mbp builder % FOO=1 bash -c 'set -u; if [[ -z "${FOO+x}" ]]; then echo "bar"; fi' ``` * Manually invoke bash for miniconda (pytorch#1277) Fixes build issues failing with: ``` ./Miniconda3-latest-Linux-x86_64.sh: 438: ./Miniconda3-latest-Linux-x86_64.sh: [[: not found ``` as seen in e.g.: pytorch#1271 * Fix perm Which somehow got changed by pytorch@62103bf * add sm_90 to CUDA11.8 builds (pytorch#1278) * libtinfo.so version update and logic fix for ROCm libtorch (pytorch#1270) * Use libtinfo.so.6 for Ubuntu 2004 * Fix to origname grep * Condition on ROCM_VERSION for libtinfo6 * Looks like it is not used anywhere. (pytorch#1273) * Build Windows binaries with Visual Studio 2022 Build Tools (pytorch#1240) * Build Windows binaries with Visual Studio 2022 Build Tools * Unify casing in Batch files, remove VS 2017 installation * Remove VS 2017 Conda scripts, unify casing in conda Batch scripts, minor Conda scripts tweaks * Slim down `pytorch-cuda` It should only contain runtime dependencies that PyTorch+domain libraries depend on, namely: - cudart - cublas - cusparse - cufft - curand - nvtx - nvrtc - nvjpeg (for TorchVision) This removes dependencies on NVCC, build/debug tools, etc which are not needed for running the pytorch Test Plan: `conda create -n tmp -c nvidia -c malfet cuda-toolkit==11.7` and observe that only relevant packages are installed Fixes pytorch/pytorch#91334 * [BE] Delete `unicode-flags` build options (pytorch#1284) There were relevant only for Python<=3.3 * [BE] Define `openssl_flags` (pytorch#1285) Rather than have two invocations of `./configure` * Build with `--enabled-shared` if `patchelf` is found (pytorch#1283) This is needed to make `manylinux-wheel` images usable for building new Triton binaries. Test plan: Build docker and verify that following `CMakeLists.txt` finishes successfully: ``` cmake_minimum_required(VERSION 3.6) find_package(Python3 REQUIRED COMPONENTS Interpreter Development) message(WARNING Executable ${Python3_EXECUTABLE}) message(WARNING IncludeDirs ${Python3_INCLUDE_DIRS}) message(WARNING Libraries ${Python3_LIBRARIES}) ``` * Update cudnn to 8.7.0.84 for CUDA 11.8 builds (pytorch#1271) * update cudnn to 8.7.0.84 for CUDA 11.8 builds * workaround for pytorch#1272 * Revert "workaround for pytorch#1272" This reverts commit c0b10d8. * update cudnn==8.7.0.84 for windows * [BE] Remove references to Python<3.6 (pytorch#1287) * Upgrade desired python versoin to 3.8 For libtorch builds * Fix how libtorch picks the python version * Tweak conda builds to support 3.11 Add `-c malfet` when building for 3.11 (though perhaps it's better to move numpy to pytorch channel) Tweak some build time dependencies * Fix typo * Skip triton dependency for 3.11 CUDA builds * Update build-number to 3 * Add ability to override cuda archs for conda (pytorch#1282) * [ROCm] reduce disk space used in image (pytorch#1288) Fixes pytorch#1286 * Extend MacOS/Windows builds to 3.11 By installing dependencies from pip Should be a no-op for <=3.10 * ci: Migrate to checkout@v3 (pytorch#1290) checkout@v2 is deprecated moving to checkout@v3 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> * Fix typo * Add 3.11 option for Windows builds * Add python-3.11 download location for windows * Add pypi with cudnn package test (pytorch#1289) * Add pypi with cudnn package test * Add pypi with cudnn package test * test * test * More pypi cudnn changes * test * Fix pipy smoke test * Remove debug comments * Delete some ancient checks for MacOS builds As we no longer build for Python-2.7 or 3.5 * Add libnvjpeg-dev package as fallback (pytorch#1294) * Add libnvjpeg-dev package as fallback * Move libnvjpeg and libnvjpeg-dev to required packages * Update conda/pytorch-cuda/meta.yaml --------- Co-authored-by: Nikita Shulga <nikita.shulga@gmail.com> * Upgrade nightly wheels to rocm5.4.2 (pytorch#1225) * Upgrade nightly wheels to rocm5.4 * Adding graphic architectures for ROCm 5.4 * Updated to use ROCm5.4.1 * Updated to use ROCm5.4.2 * Fixed syntax error * Perform build on image with magma and miopen preinstalled * Add dev packages for windows pytorch-cuda dependencies (pytorch#1295) * Add dev packages for windows dependencies * Adding architecture dependent builds * Add notes around windows * fix typo * Bumping version to v3 * rocm libtorch prebuild magma; fix manylinux cmake version (pytorch#1296) * Add manywheel:cpu-cxx11-abi checkup for check_binary.sh (pytorch#1251) * Remove with_py311 flag (pytorch#1301) * rocm manylinux now uses devtoolset 9 (pytorch#1300) * fix ACL_ROOT_DIR setting and upgrade the ACL version to 22.11 (pytorch#1291) * Add `-c malfet` for Windows builds as well * Set torch._C._PYBIND11_BUILD_ABI version check only for GLIBCXX_USE_CXX11_ABI=0 (pytorch#1303) * Adding limit windows builds logic (pytorch#1297) * Adding limit windows builds logic * Remove empty space * Simplify mkl build dependencies (pytorch#1305) On Linux and Mac PyTorch must be built against `mkl=2020.x` in order to be compatible with both `mkl-2021` and `mkl-2022`, that added `.so.1` and `.so.2` files respectively, that would make binary linked against those versions incompatible with the newer/older toolchains. This is not an issue on Windows, as all mkl binaries there end with simple `.dll` * "Fix" PyTorch CPU conda testing It's still horribly broken, but make it a bit better by not installing pytorch from default anaconda channel (which installs 1.12.1 that does not have any dependencies 2.0 dev package supposed to have) For example, see this runlog https://github.com/pytorch/pytorch/actions/runs/4155371267/jobs/7189101147 * Update torch._C._PYBIND11_BUILD_ABI version check (pytorch#1306) * Skip tests for manywheel built with _GLIBCXX_USE_CXX11_ABI=1 * Put back smoke test label (pytorch#1310) * [aarch64] add support for torchdata wheel building (pytorch#1309) * Python 3.11 validation workflow tests (pytorch#1304) * Test windows py311 * Nightly binaries * Fix py311 tests * fix python calling * Revert "Nightly binaries" This reverts commit cbf80ca. * add a scheduled workflow for the nightly pypi binary size validation (compliments pytorch/test-infra#2681) (pytorch#1312) * Add regression test for pytorch/pytorch#94751 * Add 3.11 and `--pytorch-only` options * Add `lit` to list of allowed packages As it is now mandatory (albeit spurious) dependency of pytorch-triton See https://pypi.org/project/lit/ for more details * s3: Allow tar.gz as an accepted file extension (pytorch#1317) * Changes for Python 3.11 and smoke Test RC cut (pytorch#1316) * Smoke Test RC cut * Validate binaries 3.11 * test * Smoke test binaries * Fix pytorch-cuda chan download * Remove temp change * Make sure we don't use GPU runners for any of libtorch validations (pytorch#1319) * Make sure we don't use GPU runners for any of libtorch * Make sure we don't use GPU runners for any of libtorch * s3: Add pytorch_triton_rocm to index (pytorch#1323) Signed-off-by: Eli Uriegas <eliuriegas@meta.com> * s3: Add tqdm package req for text (pytorch#1324) * Add `--analyze-stacks` option That using `git rev-base`, prints total number of stacks, and its average, mean and max depth At the time of submission here is top 10 ghstack uses of pytorch: ``` ezyang has 462 stacks max depth is 15 avg depth is 1.70 mean is 1 awgu has 240 stacks max depth is 28 avg depth is 4.30 mean is 1 peterbell10 has 146 stacks max depth is 7 avg depth is 1.84 mean is 1 zou3519 has 128 stacks max depth is 7 avg depth is 1.98 mean is 1 jerryzh168 has 113 stacks max depth is 16 avg depth is 1.45 mean is 1 bdhirsh has 111 stacks max depth is 7 avg depth is 1.85 mean is 2 wconstab has 108 stacks max depth is 7 avg depth is 2.15 mean is 1 SherlockNoMad has 99 stacks max depth is 4 avg depth is 1.24 mean is 1 zasdfgbnm has 80 stacks max depth is 11 avg depth is 2.52 mean is 6 desertfire has 73 stacks max depth is 3 avg depth is 1.14 mean is 1 ``` * Add filelock and networkx deps (pytorch#1327) To match dependencies for wheel files defined in https://github.com/pytorch/pytorch/blob/ed1957dc1989417cb978d3070a4e3d20520674b4/setup.py#L1021-L1024 * Remove building magma from source * Revert * Upgrade cmake version to 3.22.1 to build triton (pytorch#1331) * Upgrade cmake version to 3.22.1 to build triton * Pin patcheft version * Fix comment typo * Smoke test for cuda runtime errors (pytorch#1315) * Add test for cuda runtime errors * Add cuda exception smoke test * Move cuda runtime error to end * Move cuda runtime error to end * Address comments * Address comments * Add Jinja2 Dependency (pytorch#1332) As part of the effort to fix pytorch/pytorch#95986 * Add MarkupSafe to S3 Index (pytorch#1335) * Remove rocm5.1 rocm5.2 from libtorch Dockerfile * [aarch64] Adding CI Scripts to build aarch64 wheels (pytorch#1302) * add aarch64 ci scripts * added readme. get branch from /pytorch * Add smoke tests conv,linalg,compile. And better version check. (pytorch#1333) * Add smoke tests conv,linalg,compile * Add version check * Fix typo Fix version check Add not * Add exception for python 3.11 * fix typo * Try to exit after CUDA Runtime exception * Restrict carsh test only to conda * Restrict carsh test only to conda * Fix tests * Turn off cuda runtime issue * tests * more tests * test * remove compile step * test * disable some of the tests * testing * Remove extra index url * test * Fix tests * Additional smoke tests Remove release blocking changes * Aarch64 changes for PyTorch release 2.0 (pytorch#1336) * Aarch64 changes for PyTorch release 2.0 * Fix spacing * Update aarch64_linux/build_aarch64_wheel.py Co-authored-by: Nikita Shulga <nikita.shulga@gmail.com> * Update aarch64_linux/build_aarch64_wheel.py Co-authored-by: Nikita Shulga <nikita.shulga@gmail.com> --------- Co-authored-by: Nikita Shulga <nikita.shulga@gmail.com> * Aarch64 build py3.11 fix (pytorch#1341) * Fix nightly smoke test (pytorch#1340) * Fix nightly smoke test * Fix nightly builds * Release 2.0 release scripts changes (pytorch#1342) * Release 2.0 release scripts changes * Release script modifications * Add more packages to allow list (pytorch#1344) * Add `jinja2` dependency to conda package To be consistent with wheels, see https://github.com/pytorch/pytorch/95961 * Restrict jinja to py 3.10 or less (pytorch#1345) * Update `torchtriton` version to 2.1.0 * And update trition version here as well * added smoke test for max-autotune (pytorch#1349) Co-authored-by: agunapal <agunapal@berkeley.edu> * Refactor conda backup script (pytorch#1350) * Refacto conda backup * Fix space * Minor style * Revert "Upgrade cmake version to 3.22.1 to build triton (pytorch#1331)" (pytorch#1351) * Revert "Upgrade cmake version to 3.22.1 to build triton (pytorch#1331)" This reverts commit 18c5017. * Selective revert * Get cmake from pip * Use 3.18.2 from conda * Release script changes, add more release dependencies, bump version for aarch64 builds (pytorch#1352) * Release script changes * Add Jinja2 dependency * Fix typo * Add pytorch conda dependencies (pytorch#1353) * Add latest dependencies for pytorch 2.0 release (pytorch#1357) * Fix typo * Revert "Revert me later: Fix conda package smoke tests" This reverts commit d7f2a7c. * [aarch64] update readme with the "--enable-mkldnn" option (pytorch#1362) This needs to be enabled for official wheel building. * Replace `--enable-mkldnn` with `--disable-mkldnn` Also, change default to ubuntu-20.04 * Update AMIs Using following images: ``` % aws ec2 describe-images --image-ids ami-078eece1d8119409f ami-052eac90edaa9d08f ami-0c6c29c5125214c77 --query "Images[].[ImageId, Description]" [ [ "ami-078eece1d8119409f", "Canonical, Ubuntu, 18.04 LTS, arm64 bionic image build on 2023-03-02" ], [ "ami-0c6c29c5125214c77", "Canonical, Ubuntu, 22.04 LTS, arm64 jammy image build on 2023-03-03" ], [ "ami-052eac90edaa9d08f", "Canonical, Ubuntu, 20.04 LTS, arm64 focal image build on 2023-03-01" ] ] ``` * Update tags for domain libraries * Add PyTorch version pinning to release wheels * Fix flake8 * [BE] Introduce `build_domains` function And call it to rebuild only domains if torch wheel is available * Switch deprecated ubuntu-18.04 runner to ubuntu-latest (pytorch#1334) * Switch deprecated ubuntu-18.04 runner to self-hosted 2xlarge * Leave build-nvidia-docker for now * Apply suggestions from code review Co-authored-by: Nikita Shulga <nikita.shulga@gmail.com> * Use ephemeral runners * Use ubuntu-latest * Apply suggestions from code review Co-authored-by: Nikita Shulga <nikita.shulga@gmail.com> * Switch from latest to 22.04 to pin the version --------- Co-authored-by: Nikita Shulga <nikita.shulga@gmail.com> * Introduce optional --build-number parameter * Revert me later: Fix conda package smoke tests (cherry picked from commit d7f2a7c) Alas, it's still used and causes nightly build failures * Fix aarch64 torchvision build (pytorch#1363) * Fix torchvision image extension compilation * Fix torchvision image extension compilation * Set enable_mkldnn to pypi build * Remove unused `enable_mkldnn` for configure_system * [aarch64] Try to link statically with png/jpeg Also, add testing (which is currently broken) * Revert "Revert me later: Fix conda package smoke tests" This reverts commit ce427de. * [AARCH64] Fix image.so wheel By adding explicit libz dependency * [AARCH64] Pass `BUILD_S3` to torchdata To make build consistent with Linux-x86_64 * Revert "[AARCH64] Pass `BUILD_S3` to torchdata" This reverts commit ae8e825. As it does not want to be built on aarch64 * Add portalocker (pytorch#1364) * [BE] Error handling in build_aarch64_wheel I've noticed that build errors in `build_ArmComputeLibrary` would be ignored as semicolon is used between the commands, instead of && Also, replace nightly version evaluation by relying on torch, to rely on individual libraries * [AArch64] Pass `args.instance_type` to `start_instance` * use c++17 when building windows smoke tests (pytorch#1365) Summary: We are seeing failures during CI dealing with some headers that have nested namespaces. This is expected to remedy them. One such example: https://github.com/pytorch/pytorch/actions/runs/4510336715/jobs/7942660912 Test Plan: Test this with CI. --------- Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Signed-off-by: Eli Uriegas <eliuriegas@meta.com> Co-authored-by: Andrey Talman <atalman@fb.com> Co-authored-by: andysamfb <111015134+andysamfb@users.noreply.github.com> Co-authored-by: izaitsevfb <108101595+izaitsevfb@users.noreply.github.com> Co-authored-by: Nikita Shulga <nshulga@fb.com> Co-authored-by: Syed Tousif Ahmed <syed.ahmed.emails@gmail.com> Co-authored-by: Syed Tousif Ahmed <syeahmed@nvidia.com> Co-authored-by: Nikita Shulga <nikita.shulga@gmail.com> Co-authored-by: Wei Wang <109318740+weiwangmeta@users.noreply.github.com> Co-authored-by: Nikita Shulga <nshulga@meta.com> Co-authored-by: Pruthvi Madugundu <pmagundu@amd.com> Co-authored-by: Pruthvi Madugundu <pruthvigithub@gmail.com> Co-authored-by: Jithun Nair <37884920+jithunnair-amd@users.noreply.github.com> Co-authored-by: Jithun Nair <jithun.nair@amd.com> Co-authored-by: Huy Do <huydhn@gmail.com> Co-authored-by: snadampal <87143774+snadampal@users.noreply.github.com> Co-authored-by: Eli Uriegas <1700823+seemethere@users.noreply.github.com> Co-authored-by: ptrblck <ptrblck@users.noreply.github.com> Co-authored-by: zhuhong61 <95205772+zhuhong61@users.noreply.github.com> Co-authored-by: Greg Roodt <groodt@gmail.com> Co-authored-by: Eli Uriegas <eliuriegas@fb.com> Co-authored-by: Dmytro Dzhulgakov <dzhulgakov@users.noreply.github.com> Co-authored-by: albanD <desmaison.alban@gmail.com> Co-authored-by: Radek Bartoň <blackhex@post.cz> Co-authored-by: divchenko <divchenko@users.noreply.github.com> Co-authored-by: Jeff Daily <jeff.daily@amd.com> Co-authored-by: Bo Li <110066325+BLOrange-AMD@users.noreply.github.com> Co-authored-by: Mike Schneider <104035434+xncqr@users.noreply.github.com> Co-authored-by: Ankith Gunapal <agunapal@ischool.Berkeley.edu> Co-authored-by: agunapal <agunapal@berkeley.edu> Co-authored-by: dagitses <dagitses@users.noreply.github.com>

jgong5 assigned jingxu10 Dec 16, 2022

malfet self-assigned this Dec 17, 2022

malfet mentioned this issue Dec 18, 2022

Update MKL to 2022.1 pytorch/builder#1245

Merged

levmckinney mentioned this issue Mar 28, 2023

Upgrade to support torch 2.0 AlignmentResearch/tuned-lens#14

Closed

smkravec mentioned this issue Apr 14, 2023

no more mkl hofvarpnir-studios/playground#3

Merged

kaixxx mentioned this issue Jun 12, 2023

noScribe auf macOS kaixxx/noScribe#14

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Basic math operations produce a "floating point exception" #89817

Basic math operations produce a "floating point exception" #89817

timothygebhard commented Nov 29, 2022 •

edited by pytorch-bot bot

Loading

jgong5 commented Nov 29, 2022

timothygebhard commented Nov 29, 2022

zadaianchuk commented Dec 13, 2022

timothygebhard commented Dec 15, 2022

piraka9011 commented Dec 15, 2022

jgong5 commented Dec 16, 2022

mseitzer commented Dec 16, 2022

timothygebhard commented Dec 16, 2022

jingxu10 commented Dec 16, 2022 •

edited

Loading

piraka9011 commented Dec 17, 2022

jingxu10 commented Dec 17, 2022

soumith commented Dec 17, 2022

malfet commented Dec 17, 2022 •

edited

Loading

malfet commented Dec 17, 2022 •

edited

Loading

piraka9011 commented Dec 18, 2022

malfet commented Dec 18, 2022

timothygebhard commented Dec 18, 2022

jingxu10 commented Dec 18, 2022

malfet commented Dec 18, 2022

timothygebhard commented Dec 18, 2022

malfet commented Dec 19, 2022

timothygebhard commented Dec 19, 2022

malfet commented Dec 19, 2022 •

edited

Loading

piraka9011 commented Dec 23, 2022

levmckinney commented Mar 28, 2023

malfet commented Mar 28, 2023

Basic math operations produce a "floating point exception" #89817

Basic math operations produce a "floating point exception" #89817

Comments

timothygebhard commented Nov 29, 2022 • edited by pytorch-bot bot Loading

🐛 Describe the bug

Versions

jgong5 commented Nov 29, 2022

timothygebhard commented Nov 29, 2022

zadaianchuk commented Dec 13, 2022

timothygebhard commented Dec 15, 2022

piraka9011 commented Dec 15, 2022

jgong5 commented Dec 16, 2022

mseitzer commented Dec 16, 2022

timothygebhard commented Dec 16, 2022

jingxu10 commented Dec 16, 2022 • edited Loading

piraka9011 commented Dec 17, 2022

jingxu10 commented Dec 17, 2022

soumith commented Dec 17, 2022

malfet commented Dec 17, 2022 • edited Loading

malfet commented Dec 17, 2022 • edited Loading

piraka9011 commented Dec 18, 2022

malfet commented Dec 18, 2022

timothygebhard commented Dec 18, 2022

Footnotes

jingxu10 commented Dec 18, 2022

malfet commented Dec 18, 2022

timothygebhard commented Dec 18, 2022

malfet commented Dec 19, 2022

timothygebhard commented Dec 19, 2022

malfet commented Dec 19, 2022 • edited Loading

piraka9011 commented Dec 23, 2022

levmckinney commented Mar 28, 2023

malfet commented Mar 28, 2023

timothygebhard commented Nov 29, 2022 •

edited by pytorch-bot bot

Loading

jingxu10 commented Dec 16, 2022 •

edited

Loading

malfet commented Dec 17, 2022 •

edited

Loading

malfet commented Dec 17, 2022 •

edited

Loading

malfet commented Dec 19, 2022 •

edited

Loading