VecISA.bool is very expensive (nearly a second) on startup #100378

ezyang · 2023-05-01T14:52:57Z

🐛 Describe the bug

VecISA currently requires compiling a cpp program to test for capabilities, similar to ./configure. We do this compilation on every new process invocation, dissimilar to ./configure. After performing the capability query, we should cache it and reuse the cache entry. This will save us about 1s cold start time on CPU inductor.

Versions

master

cc @msaroufim @bdhirsh @anijain2305 @zou3519 @chauhang @wconstab @soumith @ngimel

The text was updated successfully, but these errors were encountered:

Valentine233 · 2023-06-28T06:33:10Z

The PR is drafted #101679. However, there still exists an open issue in CI inductor / cuda11.8-py3.10-gcc7-sm86 / test (inductor, 1, 1, linux.g5.4xlarge.nvidia.gpu) (push). When dry-compiling for checking valid VecISA, we find that the same machine would give different results, compilation success or failure, resulting the issue when doing cache. The cache folder is set under /tmp/torchinductor_jenkins/. The next step is to download the .so file with compilation failure and run it locally to find out the failure reason.

xuhancn · 2023-12-22T11:15:14Z

I will optimze it after CppBuilder merged.

tugsbayasgalan · 2024-03-13T19:16:15Z

@xuhancn is this resolved?

ezyang · 2024-03-22T16:45:22Z

No

    @functools.lru_cache(None)
    def __bool__(self) -> bool:
        if config.cpp.vec_isa_ok is not None:
            return config.cpp.vec_isa_ok
        
        if config.is_fbcode():
            return True
        
        key, input_path = write(VecISA._avx_code, "cpp")
        from filelock import FileLock
        
        lock_dir = get_lock_dir()
        lock = FileLock(os.path.join(lock_dir, key + ".lock"), timeout=LOCK_TIMEOUT)
        with lock:
            output_path = input_path[:-3] + "so"
            build_cmd = shlex.split(
                cpp_compile_command(
                    input_path, output_path, warning_all=False, vec_isa=self
                )
            )
            try:
                # Check build result
                compile_file(input_path, output_path, build_cmd)
                subprocess.check_call(
                    [   
                        sys.executable,
                        "-c",
                        VecISA._avx_py_load.replace("__lib_path__", output_path),
                    ],
                    stderr=subprocess.DEVNULL,
                    env={**os.environ, "PYTHONPATH": ":".join(sys.path)},
                )
            except Exception as e:
                return False
            
            return True

in 0a1b3be

xuhancn · 2024-04-18T08:40:46Z

Possable help on this issue. #124366

xuhancn · 2024-04-18T14:01:28Z

The time is cost on ISA checker always need run the jit cpp build to compile check module.
I will optimize it later.
Wait on #124045 land.

xuhancn · 2024-04-24T07:31:05Z

@xuhancn is this resolved?

My pr will solve it: #124602

Fixes #100378 Original issue caused by startup dry compile need cost almost 1 second. This PR add compiler version info, isa build options and pytorch version info to the test binary path hash. So same compile, same isa and same pytorch can skip the dry compile. Local test: First time: <img width="1588" alt="image" src="https://github.com/pytorch/pytorch/assets/8433590/d0b83f5d-849e-4f37-9977-3b0276e5a5a5"> We need to compile all c++ modules and it cost 16.5s. Second time: <img width="1589" alt="image" src="https://github.com/pytorch/pytorch/assets/8433590/44f07fb0-5a15-4342-b0f6-dfe2c880b5d3"> We skipped dry compile due to the same isa fingerprint. It is only cost 0.36s. Pull Request resolved: #124602 Approved by: https://github.com/jgong5, https://github.com/ezyang

ezyang added oncall: pt2 module: cpu inductor module: startup-tracing-compile Compilation mechanism or time spent in (re)compilation, tracing, startup labels May 1, 2023

anijain2305 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label May 1, 2023

bdhirsh mentioned this issue May 1, 2023

Speed up minifier tests by applying some configs that speed things up. #100387

Closed

jgong5 assigned EikanWang May 4, 2023

EikanWang assigned Valentine233 May 9, 2023

Valentine233 mentioned this issue May 17, 2023

[inductor][cpu] Cache for VecISA compilation results #101679

Closed

penguinwu added oncall: cpu inductor CPU Inductor issues for Intel team to triage and removed module: cpu inductor labels Dec 2, 2023

leslie-fang-intel assigned xuhancn and unassigned Valentine233 and EikanWang Dec 21, 2023

ezyang added the small We think this is a small issue to fix. Consider knocking off high priority small issues label Mar 22, 2024

xuhancn mentioned this issue Apr 22, 2024

[inductor] optimize isa dry compile time. #124602

Closed

pytorchmergebot closed this as completed in c715e76 Apr 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VecISA.bool is very expensive (nearly a second) on startup #100378

VecISA.bool is very expensive (nearly a second) on startup #100378

ezyang commented May 1, 2023 •

edited by pytorch-bot bot

Loading

Valentine233 commented Jun 28, 2023

xuhancn commented Dec 22, 2023

tugsbayasgalan commented Mar 13, 2024

ezyang commented Mar 22, 2024

xuhancn commented Apr 18, 2024

xuhancn commented Apr 18, 2024 •

edited

Loading

xuhancn commented Apr 24, 2024

VecISA.__bool__ is very expensive (nearly a second) on startup #100378

VecISA.__bool__ is very expensive (nearly a second) on startup #100378

Comments

ezyang commented May 1, 2023 • edited by pytorch-bot bot Loading

🐛 Describe the bug

Versions

Valentine233 commented Jun 28, 2023

xuhancn commented Dec 22, 2023

tugsbayasgalan commented Mar 13, 2024

ezyang commented Mar 22, 2024

xuhancn commented Apr 18, 2024

xuhancn commented Apr 18, 2024 • edited Loading

xuhancn commented Apr 24, 2024

VecISA.bool is very expensive (nearly a second) on startup #100378

VecISA.bool is very expensive (nearly a second) on startup #100378

ezyang commented May 1, 2023 •

edited by pytorch-bot bot

Loading

xuhancn commented Apr 18, 2024 •

edited

Loading