roi_crop (from Detectron.pytorch) building consistently fails #8483

phalexo · 2018-06-14T14:01:43Z

Python has no problem with importing pytorch, but building the extension fails.

gcc -pthread -B /home/developer/anaconda3/envs/pytorch/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DWITH_CUDA -I/home/developer/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include -I/home/developer/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include/TH -I/home/developer/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/ffi/../../lib/include/THC -I/usr/local/cuda/include -I/home/developer/anaconda3/envs/pytorch/include/python3.6m -c /home/developer/Detectron.pytorch/lib/model/roi_crop/src/roi_crop_cuda.c -o ./home/developer/Detectron.pytorch/lib/model/roi_crop/src/roi_crop_cuda.o -std=c99
/home/developer/Detectron.pytorch/lib/model/roi_crop/src/roi_crop_cuda.c: In function 'BilinearSamplerBHWD_updateOutput_cuda':
/home/developer/Detectron.pytorch/lib/model/roi_crop/src/roi_crop_cuda.c:22:64: error: dereferencing pointer to incomplete type 'THCTensor {aka struct THCTensor}'
success = BilinearSamplerBHWD_updateOutput_cuda_kernel(output->size[1],
^
Traceback (most recent call last):
File "/home/developer/anaconda3/envs/pytorch/lib/python3.6/distutils/unixccompiler.py", line 118, in _compile
extra_postargs)
File "/home/developer/anaconda3/envs/pytorch/lib/python3.6/distutils/ccompiler.py", line 909, in spawn
spawn(cmd, dry_run=self.dry_run)
File "/home/developer/anaconda3/envs/pytorch/lib/python3.6/distutils/spawn.py", line 36, in spawn
_spawn_posix(cmd, search_path, dry_run=dry_run)
File "/home/developer/anaconda3/envs/pytorch/lib/python3.6/distutils/spawn.py", line 159, in _spawn_posix
% (cmd, exit_status))
distutils.errors.DistutilsExecError: command 'gcc' failed with exit status 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/developer/anaconda3/envs/pytorch/lib/python3.6/site-packages/cffi/ffiplatform.py", line 51, in _build
dist.run_command('build_ext')
File "/home/developer/anaconda3/envs/pytorch/lib/python3.6/distutils/dist.py", line 974, in run_command
cmd_obj.run()
File "/home/developer/anaconda3/envs/pytorch/lib/python3.6/distutils/command/build_ext.py", line 339, in run
self.build_extensions()
File "/home/developer/anaconda3/envs/pytorch/lib/python3.6/distutils/command/build_ext.py", line 448, in build_extensions
self._build_extensions_serial()
File "/home/developer/anaconda3/envs/pytorch/lib/python3.6/distutils/command/build_ext.py", line 473, in _build_extensions_serial
self.build_extension(ext)
File "/home/developer/anaconda3/envs/pytorch/lib/python3.6/distutils/command/build_ext.py", line 533, in build_extension
depends=ext.depends)
File "/home/developer/anaconda3/envs/pytorch/lib/python3.6/distutils/ccompiler.py", line 574, in compile
self._compile(obj, src, ext, cc_args, extra_postargs, pp_opts)
File "/home/developer/anaconda3/envs/pytorch/lib/python3.6/distutils/unixccompiler.py", line 120, in _compile

fmassa · 2018-06-14T14:03:24Z

I believe this issue should be opened in the Detectron.pytorch repo.

ezyang · 2018-06-14T14:07:15Z

FYI, this is because we made some structs in THC abstract in HEAD. Any sites which accessed members directly have to use a function instead now.

phalexo · 2018-06-14T14:26:44Z

Is there a reference what changes have to happen to correspond to changes in PyTorch? Detectron repo is not the only one affected, there are at least 2 others.

ezyang · 2018-06-14T14:33:53Z

Not yet, but you can get some guidance looking at 4caea64; look at changes to files in torch/csrc

phalexo · 2018-06-14T22:06:28Z

I've been looking at the changes and unfortunately I am not seeing how everything is connected.

The first error occurs in this line:
success = BilinearSamplerBHWD_updateOutput_cuda_kernel(output->size[1],
output is the "undefined" pointer. It does not appear to be a stream.

#include <THC/THC.h>
#include <stdbool.h>
#include <stdio.h>
#include "roi_crop_cuda_kernel.h"

#define real float

// this symbol will be resolved automatically from PyTorch libs
extern THCState *state;

// Bilinear sampling is done in BHWD (coalescing is not obvious in BDHW)
// we assume BHWD format in inputImages
// we assume BHW(YX) format on grids

int BilinearSamplerBHWD_updateOutput_cuda(THCudaTensor *inputImages, THCudaTensor *grids, THCudaTensor *output){
// THCState *state = getCutorchState(L);
// THCudaTensor *inputImages = (THCudaTensor *)luaT_checkudata(L, 2, "torch.CudaTensor");
// THCudaTensor *grids = (THCudaTensor *)luaT_checkudata(L, 3, "torch.CudaTensor");
// THCudaTensor *output = (THCudaTensor *)luaT_checkudata(L, 4, "torch.CudaTensor");

int success = 0;
success = BilinearSamplerBHWD_updateOutput_cuda_kernel(output->size[1],
output->size[3],
output->size[2],
output->size[0],
THCudaTensor_size(state, inputImages, 1),
THCudaTensor_size(state, inputImages, 2),
THCudaTensor_size(state, inputImages, 3),
THCudaTensor_size(state, inputImages, 0),
THCudaTensor_data(state, inputImages),
THCudaTensor_stride(state, inputImages, 0),
THCudaTensor_stride(state, inputImages, 1),
THCudaTensor_stride(state, inputImages, 2),
THCudaTensor_stride(state, inputImages, 3),
THCudaTensor_data(state, grids),
THCudaTensor_stride(state, grids, 0),
THCudaTensor_stride(state, grids, 3),
THCudaTensor_stride(state, grids, 1),
THCudaTensor_stride(state, grids, 2),
THCudaTensor_data(state, output),
THCudaTensor_stride(state, output, 0),
THCudaTensor_stride(state, output, 1),
THCudaTensor_stride(state, output, 2),
THCudaTensor_stride(state, output, 3),
THCState_getCurrentStream(state));

//check for errors
if (!success) {
THError("aborting");
}
return 1;
}

phalexo · 2018-06-14T22:59:49Z

I made the following mods to one file and similar mods to two others. I'd appreciate a comment if it makes sense.

#include <THC/THC.h>
#include <stdio.h>
#include "nms_cuda_kernel.h"

// this symbol will be resolved automatically from PyTorch libs
extern THCState *state;

int nms_cuda(THCudaIntTensor *keep_out, THCudaTensor *boxes_host,
THCudaIntTensor *num_out, float nms_overlap_thresh) {

    int sz0 = THCudaTensor_size(state, boxes_host, 0);
    int sz1 = THCudaTensor_size(state, boxes_host, 1);
    nms_cuda_compute(THCudaIntTensor_data(state, keep_out),
                     THCudaIntTensor_data(state, num_out),
                     THCudaTensor_data(state, boxes_host),
                     sz0, sz1,
                     //boxes_host->size[0],
                     //boxes_host->size[1],
                     nms_overlap_thresh);

    return 1;

}

stoneyang · 2018-07-30T06:52:16Z

@phalexo You could try out to install pytorch 0.4.0, and insert CFLAGS="-std=c99 before sh make.sh.

JiamingSuen · 2018-08-07T15:46:50Z

I want to follow up on this issue. The commit mentioned earlier 4caea64 makes a lot of user defined cpp/cuda extensions broken on 0.4.1, examples include roi_align in Detectron.pytorch, correlation in flownet2-pytorch and many more.

I think a tutorial or at least detailed comment can be provided illustrating how to migrate these extensions into the newer at::Tensor format as suggested in the cpp_extension tutorial. Otherwise most of the opensource implementations with self-defined operations cannot benefit from other updates from 0.4.1 and later versions.

Thanks for your time.

Edit:
I found some reference here:
https://github.com/pytorch/pytorch/tree/master/aten
https://github.com/zdevito/ATen/tree/master/aten/doc
https://github.com/pytorch/pytorch/tree/master/aten/src/ATen/test
but still little confused on how to start to migrate.
Is there anything else as reference? Thanks.

ProfFan · 2018-08-08T07:36:22Z

@JiamingSuen The world is small LOL

Struggling to get flownet2-pytorch built

GuoleiSun · 2018-08-15T07:03:44Z

Face the same issue. Currently, I am using CUDA 9.2. I can't degrade pytorch 0.4.1 to lower versions because cuda 9.2 doesn't support them. And I don't want to degrade CUDA.
Any insights? Thanks

JiamingSuen · 2018-08-15T08:29:55Z

@ezyang Any more comments would be nice to address this issue.
@GuoleiSun You may compile pytorch==0.4.0 by yourself for a temporary solution, however rewriting self-defined operators is necessary to use 0.4.1 and above.

soumith · 2018-08-15T17:54:55Z

@JiamingSuen your references are all spot-on. Also look at how we migrated torchaudio from the cffi extension that used TH* API into a cpp_extension that uses ATen: pytorch/audio@18c01be

We're happy to answer any questions.

fmassa closed this as completed Jun 14, 2018

stoneyang mentioned this issue Jul 30, 2018

Problem with building/installing extensions in lib roytseng-tw/Detectron.pytorch#80

Open

JCBrouwer mentioned this issue Sep 24, 2018

Failed to compile correlation with pytorch 0.4.1 cuda 9.0 NVlabs/PWC-Net#34

Closed

sar-gupta mentioned this issue Oct 7, 2018

Error in installation yeezhu/SPN.pytorch#27

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

roi_crop (from Detectron.pytorch) building consistently fails #8483

roi_crop (from Detectron.pytorch) building consistently fails #8483

phalexo commented Jun 14, 2018

fmassa commented Jun 14, 2018

ezyang commented Jun 14, 2018

phalexo commented Jun 14, 2018 •

edited

Loading

ezyang commented Jun 14, 2018

phalexo commented Jun 14, 2018

phalexo commented Jun 14, 2018

stoneyang commented Jul 30, 2018

JiamingSuen commented Aug 7, 2018 •

edited

Loading

ProfFan commented Aug 8, 2018

GuoleiSun commented Aug 15, 2018

JiamingSuen commented Aug 15, 2018

soumith commented Aug 15, 2018

roi_crop (from Detectron.pytorch) building consistently fails #8483

roi_crop (from Detectron.pytorch) building consistently fails #8483

Comments

phalexo commented Jun 14, 2018

fmassa commented Jun 14, 2018

ezyang commented Jun 14, 2018

phalexo commented Jun 14, 2018 • edited Loading

ezyang commented Jun 14, 2018

phalexo commented Jun 14, 2018

phalexo commented Jun 14, 2018

stoneyang commented Jul 30, 2018

JiamingSuen commented Aug 7, 2018 • edited Loading

ProfFan commented Aug 8, 2018

GuoleiSun commented Aug 15, 2018

JiamingSuen commented Aug 15, 2018

soumith commented Aug 15, 2018

phalexo commented Jun 14, 2018 •

edited

Loading

JiamingSuen commented Aug 7, 2018 •

edited

Loading