Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About 2 minor bug fixes on CUDA macOSX 10.13.6 #46803

Open
llv22 opened this issue Oct 24, 2020 · 6 comments
Open

About 2 minor bug fixes on CUDA macOSX 10.13.6 #46803

llv22 opened this issue Oct 24, 2020 · 6 comments
Labels
module: build Build system issues module: cuda Related to torch.cuda, and CUDA support in general module: macos Mac OS related issues module: nnpack Related to our NNPack integration triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@llv22
Copy link

llv22 commented Oct 24, 2020

🐛 Bug

During building pytorch 1.7 on macOSX 10.13.6, cuda 10.1(update2), Xcode 10.1(clang 1000.11.45.5), I found the following two bugs and adjusted code accordingly on my local mac. Just update information to you in case that you need to fix as well.

To Reproduce

Using "MAGMA_HOME="/usr/local/lib/magma-cu101" MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ USE_OPENMP=OFF USE_FBGEMM=OFF python setup.py bdist_wheel" to build pytorch on 1.7 branch

Steps to reproduce the behavior:

  1. For issue during building nnpack
"ModuleNotFoundError: No module named 'peachpy.x86_64.avx’" 
  1. Error when building ../torch/csrc/jit/codegen/cuda/kernel_cache.cpp
../torch/csrc/jit/codegen/cuda/kernel_cache.cpp:29:25: error: format specifies type 'long' but the argument has type 'int64_t' (aka 'long long') [-Werror,-Wformat]
        printf("%ld, ", shape_symbol.static_size());
                ~~~     ^~~~~~~~~~~~~~~~~~~~~~~~~~
                %lld
../torch/csrc/jit/codegen/cuda/kernel_cache.cpp:31:28: error: format specifies type 'long' but the argument has type 'int64_t' (aka 'long long') [-Werror,-Wformat]
        printf("s(%ld), ", *reinterpret_cast<const int64_t*>(&shape_symbol));
                  ~~~      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                  %lld

My code changes

  1. For issue during building nnpack: have to generate bytecode ahead of time
    https://github.com/pytorch/pytorch/blob/master/cmake/External/nnpack.cmake#51, add following codes:
  # Orlando: check if avx bytecode has been generated or not
  if(NOT EXISTS "${CAFFE2_THIRD_PARTY_ROOT}/python-peachpy/peachpy/x86_64/avx.py")
    execute_process(COMMAND python setup.by develop
      WORKING_DIRECTORY "${CAFFE2_THIRD_PARTY_ROOT}/python-peachpy")
  endif()
  1. Error when building ../torch/csrc/jit/codegen/cuda/kernel_cache.cpp
    https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/codegen/cuda/kernel_cache.cpp#L48
    int64_t refers to third_party/eigen/Eigen/src/Core/util/Meta.h
      const auto& shape_symbol = sizes.value()[i];
      if (shape_symbol.is_static()) {
        printf("%lld, ", shape_symbol.static_size());
      } else {
        printf("s(%lld), ", *reinterpret_cast<const int64_t*>(&shape_symbol));
      }

Environment

  • PyTorch Version (e.g., 1.0): 1.7
  • OS (e.g., Linux): macOS 10.13.6
  • How you installed PyTorch (conda, pip, source): source
  • Build command you used (if compiling from source): MAGMA_HOME="/usr/local/lib/magma-cu101" MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ USE_OPENMP=OFF USE_FBGEMM=OFF python setup.py bdist_wheel
  • Python version: 3.7
  • CUDA/cuDNN version: 10.1update 2, cudnn 7.6.5
  • GPU models and configuration:
  • Any other relevant information:

cc @malfet @seemethere @walterddr @ngimel

@llv22 llv22 changed the title About 2 bug-fixes on macOSX 10.13.6 About 2 minor bug fixes on macOSX 10.13.6 Oct 24, 2020
@llv22
Copy link
Author

llv22 commented Oct 26, 2020

@osalpekar I tried to figure out how to submit a minor merge for the bug. Anyway, submit patch here.
orlando-for-patch-torch1.8-mac.patch.txt
this also apply for 1.5-1.7

@ezyang
Copy link
Contributor

ezyang commented Oct 26, 2020

@llv22 would you mind opening a pull request for your patch?

@ezyang ezyang added module: build Build system issues module: macos Mac OS related issues triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module module: nnpack Related to our NNPack integration labels Oct 26, 2020
@malfet malfet added the module: cuda Related to torch.cuda, and CUDA support in general label Oct 26, 2020
@ezyang
Copy link
Contributor

ezyang commented Oct 26, 2020

By the way, CUDA on OS X is an unsupported configuration (we're happy to take your fixes, but we won't be testing these continuously)

@malfet malfet self-assigned this Oct 26, 2020
@ezyang ezyang changed the title About 2 minor bug fixes on macOSX 10.13.6 About 2 minor bug fixes on CUDA macOSX 10.13.6 Oct 26, 2020
@llv22
Copy link
Author

llv22 commented Oct 27, 2020

@ezyang actually I tried with create new branch from master and can't push this branch to torch to create pull request. Is it caused by some missing rights? or we can have another process to make pull request ready?

@ezyang
Copy link
Contributor

ezyang commented Oct 27, 2020

Make a fork of PyTorch, and then push your branch to that fork and then open a PR from that branch

@llv22
Copy link
Author

llv22 commented Oct 28, 2020

@ezyang OK, refer to #46968.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: build Build system issues module: cuda Related to torch.cuda, and CUDA support in general module: macos Mac OS related issues module: nnpack Related to our NNPack integration triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

4 participants