Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roi_crop (from Detectron.pytorch) building consistently fails #8483

Closed
phalexo opened this issue Jun 14, 2018 · 12 comments
Closed

roi_crop (from Detectron.pytorch) building consistently fails #8483

phalexo opened this issue Jun 14, 2018 · 12 comments

Comments

@phalexo
Copy link

phalexo commented Jun 14, 2018

Python has no problem with importing pytorch, but building the extension fails.

gcc -pthread -B /home/developer/anaconda3/envs/pytorch/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DWITH_CUDA -I/home/developer/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include -I/home/developer/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include/TH -I/home/developer/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include/THC -I/usr/local/cuda/include -I/home/developer/anaconda3/envs/pytorch/include/python3.6m -c /home/developer/Detectron.pytorch/lib/model/roi_crop/src/roi_crop_cuda.c -o ./home/developer/Detectron.pytorch/lib/model/roi_crop/src/roi_crop_cuda.o -std=c99
/home/developer/Detectron.pytorch/lib/model/roi_crop/src/roi_crop_cuda.c: In function 'BilinearSamplerBHWD_updateOutput_cuda':
/home/developer/Detectron.pytorch/lib/model/roi_crop/src/roi_crop_cuda.c:22:64: error: dereferencing pointer to incomplete type 'THCTensor {aka struct THCTensor}'
success = BilinearSamplerBHWD_updateOutput_cuda_kernel(output->size[1],
^
Traceback (most recent call last):
File "/home/developer/anaconda3/envs/pytorch/lib/python3.6/distutils/unixccompiler.py", line 118, in _compile
extra_postargs)
File "/home/developer/anaconda3/envs/pytorch/lib/python3.6/distutils/ccompiler.py", line 909, in spawn
spawn(cmd, dry_run=self.dry_run)
File "/home/developer/anaconda3/envs/pytorch/lib/python3.6/distutils/spawn.py", line 36, in spawn
_spawn_posix(cmd, search_path, dry_run=dry_run)
File "/home/developer/anaconda3/envs/pytorch/lib/python3.6/distutils/spawn.py", line 159, in _spawn_posix
% (cmd, exit_status))
distutils.errors.DistutilsExecError: command 'gcc' failed with exit status 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/developer/anaconda3/envs/pytorch/lib/python3.6/site-packages/cffi/ffiplatform.py", line 51, in _build
dist.run_command('build_ext')
File "/home/developer/anaconda3/envs/pytorch/lib/python3.6/distutils/dist.py", line 974, in run_command
cmd_obj.run()
File "/home/developer/anaconda3/envs/pytorch/lib/python3.6/distutils/command/build_ext.py", line 339, in run
self.build_extensions()
File "/home/developer/anaconda3/envs/pytorch/lib/python3.6/distutils/command/build_ext.py", line 448, in build_extensions
self._build_extensions_serial()
File "/home/developer/anaconda3/envs/pytorch/lib/python3.6/distutils/command/build_ext.py", line 473, in _build_extensions_serial
self.build_extension(ext)
File "/home/developer/anaconda3/envs/pytorch/lib/python3.6/distutils/command/build_ext.py", line 533, in build_extension
depends=ext.depends)
File "/home/developer/anaconda3/envs/pytorch/lib/python3.6/distutils/ccompiler.py", line 574, in compile
self._compile(obj, src, ext, cc_args, extra_postargs, pp_opts)
File "/home/developer/anaconda3/envs/pytorch/lib/python3.6/distutils/unixccompiler.py", line 120, in _compile

@fmassa
Copy link
Member

fmassa commented Jun 14, 2018

I believe this issue should be opened in the Detectron.pytorch repo.

@fmassa fmassa closed this as completed Jun 14, 2018
@ezyang
Copy link
Contributor

ezyang commented Jun 14, 2018

FYI, this is because we made some structs in THC abstract in HEAD. Any sites which accessed members directly have to use a function instead now.

@phalexo
Copy link
Author

phalexo commented Jun 14, 2018

Is there a reference what changes have to happen to correspond to changes in PyTorch? Detectron repo is not the only one affected, there are at least 2 others.

@ezyang
Copy link
Contributor

ezyang commented Jun 14, 2018

Not yet, but you can get some guidance looking at 4caea64; look at changes to files in torch/csrc

@phalexo
Copy link
Author

phalexo commented Jun 14, 2018

I've been looking at the changes and unfortunately I am not seeing how everything is connected.

The first error occurs in this line:
success = BilinearSamplerBHWD_updateOutput_cuda_kernel(output->size[1],
output is the "undefined" pointer. It does not appear to be a stream.

#include <THC/THC.h>
#include <stdbool.h>
#include <stdio.h>
#include "roi_crop_cuda_kernel.h"

#define real float

// this symbol will be resolved automatically from PyTorch libs
extern THCState *state;

// Bilinear sampling is done in BHWD (coalescing is not obvious in BDHW)
// we assume BHWD format in inputImages
// we assume BHW(YX) format on grids

int BilinearSamplerBHWD_updateOutput_cuda(THCudaTensor *inputImages, THCudaTensor *grids, THCudaTensor *output){
// THCState *state = getCutorchState(L);
// THCudaTensor *inputImages = (THCudaTensor *)luaT_checkudata(L, 2, "torch.CudaTensor");
// THCudaTensor *grids = (THCudaTensor *)luaT_checkudata(L, 3, "torch.CudaTensor");
// THCudaTensor *output = (THCudaTensor *)luaT_checkudata(L, 4, "torch.CudaTensor");

int success = 0;
success = BilinearSamplerBHWD_updateOutput_cuda_kernel(output->size[1],
output->size[3],
output->size[2],
output->size[0],
THCudaTensor_size(state, inputImages, 1),
THCudaTensor_size(state, inputImages, 2),
THCudaTensor_size(state, inputImages, 3),
THCudaTensor_size(state, inputImages, 0),
THCudaTensor_data(state, inputImages),
THCudaTensor_stride(state, inputImages, 0),
THCudaTensor_stride(state, inputImages, 1),
THCudaTensor_stride(state, inputImages, 2),
THCudaTensor_stride(state, inputImages, 3),
THCudaTensor_data(state, grids),
THCudaTensor_stride(state, grids, 0),
THCudaTensor_stride(state, grids, 3),
THCudaTensor_stride(state, grids, 1),
THCudaTensor_stride(state, grids, 2),
THCudaTensor_data(state, output),
THCudaTensor_stride(state, output, 0),
THCudaTensor_stride(state, output, 1),
THCudaTensor_stride(state, output, 2),
THCudaTensor_stride(state, output, 3),
THCState_getCurrentStream(state));

//check for errors
if (!success) {
THError("aborting");
}
return 1;
}

@phalexo
Copy link
Author

phalexo commented Jun 14, 2018

I made the following mods to one file and similar mods to two others. I'd appreciate a comment if it makes sense.

#include <THC/THC.h>
#include <stdio.h>
#include "nms_cuda_kernel.h"

// this symbol will be resolved automatically from PyTorch libs
extern THCState *state;

int nms_cuda(THCudaIntTensor *keep_out, THCudaTensor *boxes_host,
THCudaIntTensor *num_out, float nms_overlap_thresh) {

    int sz0 = THCudaTensor_size(state, boxes_host, 0);
    int sz1 = THCudaTensor_size(state, boxes_host, 1);
    nms_cuda_compute(THCudaIntTensor_data(state, keep_out),
                     THCudaIntTensor_data(state, num_out),
                     THCudaTensor_data(state, boxes_host),
                     sz0, sz1,
                     //boxes_host->size[0],
                     //boxes_host->size[1],
                     nms_overlap_thresh);

    return 1;

}

@stoneyang
Copy link

@phalexo You could try out to install pytorch 0.4.0, and insert CFLAGS="-std=c99 before sh make.sh.

@JiamingSuen
Copy link

JiamingSuen commented Aug 7, 2018

I want to follow up on this issue. The commit mentioned earlier 4caea64 makes a lot of user defined cpp/cuda extensions broken on 0.4.1, examples include roi_align in Detectron.pytorch, correlation in flownet2-pytorch and many more.

I think a tutorial or at least detailed comment can be provided illustrating how to migrate these extensions into the newer at::Tensor format as suggested in the cpp_extension tutorial. Otherwise most of the opensource implementations with self-defined operations cannot benefit from other updates from 0.4.1 and later versions.

Thanks for your time.

Edit:
I found some reference here:
https://github.com/pytorch/pytorch/tree/master/aten
https://github.com/zdevito/ATen/tree/master/aten/doc
https://github.com/pytorch/pytorch/tree/master/aten/src/ATen/test
but still little confused on how to start to migrate.
Is there anything else as reference? Thanks.

@ProfFan
Copy link

ProfFan commented Aug 8, 2018

@JiamingSuen The world is small LOL

Struggling to get flownet2-pytorch built

@GuoleiSun
Copy link

Face the same issue. Currently, I am using CUDA 9.2. I can't degrade pytorch 0.4.1 to lower versions because cuda 9.2 doesn't support them. And I don't want to degrade CUDA.
Any insights? Thanks

@JiamingSuen
Copy link

@ezyang Any more comments would be nice to address this issue.
@GuoleiSun You may compile pytorch==0.4.0 by yourself for a temporary solution, however rewriting self-defined operators is necessary to use 0.4.1 and above.

@soumith
Copy link
Member

soumith commented Aug 15, 2018

@JiamingSuen your references are all spot-on. Also look at how we migrated torchaudio from the cffi extension that used TH* API into a cpp_extension that uses ATen: pytorch/audio@18c01be

We're happy to answer any questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants