Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VecISA.__bool__ is very expensive (nearly a second) on startup #100378

Closed
ezyang opened this issue May 1, 2023 · 7 comments
Closed

VecISA.__bool__ is very expensive (nearly a second) on startup #100378

ezyang opened this issue May 1, 2023 · 7 comments
Assignees
Labels
module: startup-tracing-compile Compilation mechanism or time spent in (re)compilation, tracing, startup oncall: cpu inductor CPU Inductor issues for Intel team to triage oncall: pt2 small We think this is a small issue to fix. Consider knocking off high priority small issues triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@ezyang
Copy link
Contributor

ezyang commented May 1, 2023

🐛 Describe the bug

VecISA currently requires compiling a cpp program to test for capabilities, similar to ./configure. We do this compilation on every new process invocation, dissimilar to ./configure. After performing the capability query, we should cache it and reuse the cache entry. This will save us about 1s cold start time on CPU inductor.

Versions

master

cc @msaroufim @bdhirsh @anijain2305 @zou3519 @chauhang @wconstab @soumith @ngimel

@ezyang ezyang added oncall: pt2 module: cpu inductor module: startup-tracing-compile Compilation mechanism or time spent in (re)compilation, tracing, startup labels May 1, 2023
@anijain2305 anijain2305 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label May 1, 2023
@Valentine233
Copy link
Collaborator

The PR is drafted #101679. However, there still exists an open issue in CI inductor / cuda11.8-py3.10-gcc7-sm86 / test (inductor, 1, 1, linux.g5.4xlarge.nvidia.gpu) (push). When dry-compiling for checking valid VecISA, we find that the same machine would give different results, compilation success or failure, resulting the issue when doing cache. The cache folder is set under /tmp/torchinductor_jenkins/. The next step is to download the .so file with compilation failure and run it locally to find out the failure reason.

@penguinwu penguinwu added oncall: cpu inductor CPU Inductor issues for Intel team to triage and removed module: cpu inductor labels Dec 2, 2023
@xuhancn
Copy link
Collaborator

xuhancn commented Dec 22, 2023

I will optimze it after CppBuilder merged.

@tugsbayasgalan
Copy link
Contributor

@xuhancn is this resolved?

@ezyang
Copy link
Contributor Author

ezyang commented Mar 22, 2024

No

    @functools.lru_cache(None)
    def __bool__(self) -> bool:
        if config.cpp.vec_isa_ok is not None:
            return config.cpp.vec_isa_ok
        
        if config.is_fbcode():
            return True
        
        key, input_path = write(VecISA._avx_code, "cpp")
        from filelock import FileLock
        
        lock_dir = get_lock_dir()
        lock = FileLock(os.path.join(lock_dir, key + ".lock"), timeout=LOCK_TIMEOUT)
        with lock:
            output_path = input_path[:-3] + "so"
            build_cmd = shlex.split(
                cpp_compile_command(
                    input_path, output_path, warning_all=False, vec_isa=self
                )
            )
            try:
                # Check build result
                compile_file(input_path, output_path, build_cmd)
                subprocess.check_call(
                    [   
                        sys.executable,
                        "-c",
                        VecISA._avx_py_load.replace("__lib_path__", output_path),
                    ],
                    stderr=subprocess.DEVNULL,
                    env={**os.environ, "PYTHONPATH": ":".join(sys.path)},
                )
            except Exception as e:
                return False
            
            return True

in 0a1b3be

@ezyang ezyang added the small We think this is a small issue to fix. Consider knocking off high priority small issues label Mar 22, 2024
@xuhancn
Copy link
Collaborator

xuhancn commented Apr 18, 2024

Possable help on this issue. #124366

@xuhancn
Copy link
Collaborator

xuhancn commented Apr 18, 2024

image

The time is cost on ISA checker always need run the jit cpp build to compile check module.
I will optimize it later.
Wait on #124045 land.

@xuhancn
Copy link
Collaborator

xuhancn commented Apr 24, 2024

@xuhancn is this resolved?

My pr will solve it: #124602

pytorch-bot bot pushed a commit that referenced this issue May 3, 2024
Fixes #100378
Original issue caused by startup dry compile need cost almost 1 second.

This PR add compiler version info, isa build options and pytorch version info to the test binary path hash.
So same compile, same isa and same pytorch can skip the dry compile.

Local test:
First time:
<img width="1588" alt="image" src="https://github.com/pytorch/pytorch/assets/8433590/d0b83f5d-849e-4f37-9977-3b0276e5a5a5">
We need to compile all c++ modules and it cost 16.5s.

Second time:
<img width="1589" alt="image" src="https://github.com/pytorch/pytorch/assets/8433590/44f07fb0-5a15-4342-b0f6-dfe2c880b5d3">
We skipped dry compile due to the same isa fingerprint. It is only cost 0.36s.

Pull Request resolved: #124602
Approved by: https://github.com/jgong5, https://github.com/ezyang
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: startup-tracing-compile Compilation mechanism or time spent in (re)compilation, tracing, startup oncall: cpu inductor CPU Inductor issues for Intel team to triage oncall: pt2 small We think this is a small issue to fix. Consider knocking off high priority small issues triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
7 participants