Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Can the model trained using this repo be run on regular MXNet? #28

Closed
xiaoyongzhu opened this issue Jul 3, 2018 · 17 comments
Closed

Comments

@xiaoyongzhu
Copy link

Hi, I have a quick question regarding model inference. Let's say I've trained a model using SNIPER, for a regular object detection task, can the model be served:

  • using the official MXNet build?
  • using CPU?

The reason is that for some reason, we need to inference the model on CPU. I guess it might be possible since the NN is still standard resnet + faster RCNN, but I am not sure about this (since I do see a few updates in the SNIPER-mxnet module which might be relevant to accomplish the goal above).

Thanks!

@bharatsingh430
Copy link
Collaborator

if you write the proposal layer in cpu (not sure if we wrote it, may be its there), it should work

@xiaoyongzhu
Copy link
Author

@bharatsingh430 sorry you mean the (soft) NMS part? Or the RPN part?

@bharatsingh430
Copy link
Collaborator

RPN part

@bharatsingh430
Copy link
Collaborator

ok, i checked, that layer has CPU support, it should work

@xiaoyongzhu
Copy link
Author

Cool - will try it out and let you know the result!

@xiaoyongzhu
Copy link
Author

xiaoyongzhu commented Jul 3, 2018

@bharatsingh430 Thanks for the response, unfortunately I tried to run the code on CPU but there's no detection at all for the demo image in the repository (I tried to print out the predictions for the CPU inference and GPU inference, and they look very different). Here's what I've done:

Basically I've re-compiled the SNIPER-mxnet code with the following changes:

  • Since the deformable_im2col operator is not available on CPU, I need to follow the issue mentioned in the Deformable-RCNN repository (Can this repo be run on cpu only mxnet? msracver/Deformable-ConvNets#124), and replace three files: src/operator/contrib/deformable_psroi_pooling.cc, src/operator/contrib/nn/deformable_im2col.h, src/operator/contrib/psroi_pooling.cc
  • I've done this for Deformable-RCNN repository with the above file changes in order to run Deformable-RCNN using CPU, and the result looks good - so I think this proves that the above three pieces of code is bug free (BTW when I was doing this, I was using mxnet from the master branch, which should be newer than the forked one in SNIPER-mxnet repo)
  • I've also modified src/operator/contrib/nn/deformable_im2col.cuh file, since during compilation there's a name conflict. Basically I've renamed get_gradient_weight to get_gradient_weight_cuda to avoid the name conflict.
  • in the configuration files, I changed fp16: true to false, since otherwise it won't go thru.
  • other than the above changes, the only other change I made was in the demo.py file to switch between CPU/GPU:
    if gpu_only:
        context = [mx.gpu(int(config.gpus[0]))]
    else:
        context = [mx.cpu()]
  • another interesting phenomenon is that when I was trying to compile Deformable-RCNN code in CPU, I didn't encounter the get_gradient_weight name conflict. Although I am doing the almost same changes for SNIPER-mxnet, I encountered get_gradient_weight name conflict. Not sure if this is related.

Since I am still new to mxnet, I am not sure what are the correct ways to debug this? Also, any insights regarding this?

I've also attached all the changes I've made in the zip file below in case you are interested in. Any pointers will be appreciated!

mxnet_code.zip

@xiaoyongzhu xiaoyongzhu reopened this Jul 3, 2018
@bharatsingh430
Copy link
Collaborator

can you try to compare the conv5 dumps for CPU and GPU versions, if they are not the same then probably something is not correct in the deformable convolution layers. If it is the same, then I can check the proposal layer once more, and probably the deformable roi pooling layers.

@xiaoyongzhu
Copy link
Author

Thanks @bharatsingh430 , however I am not sure how should I dump the value for a certain layer during forward pass (sorry still new to mxnet)?

I've tried to dump the output after the forward pass for both the cpu and the gpu version, in case there's some interesting finding... As you can see, basically for the cpu version, it classifies everything as background (with a very high confidence of 1.0...)

cpuoutput.txt
gpuoutput.txt

@xiaoyongzhu
Copy link
Author

xiaoyongzhu commented Jul 3, 2018

I've tried a few methods to print out the output for resnet conv5 layer (the layer's name is cat4_output I think). However there are a few output format issue I met in mxnet binding step which I am lost. (For example it gave me errors like this: ValueError: You created Module with Module(..., data_names=['d', 'a', 't', 'a']) but input with name 'd' is not found in symbol.list_arguments(). Did you mean one of: data).

Anyway, I have compiled the SNIPER-mxnet cpu version and put it here. Just a simple pip install should be fine:
https://chestxray.blob.core.windows.net/chestxraytutorial/mxnet-1.2.0-cp27-cp27mu-linux_x86_64.whl

If @bharatsingh430 you can help to dump the output of the cpu version (or give me some guidance) I will appreciate it!

@bharatsingh430
Copy link
Collaborator

you need to change the symbol, mx.sym.Group add the layer's output you want to dump and then you can access it in the output which the network generates...

@xiaoyongzhu
Copy link
Author

Thanks @bharatsingh430 , and I've output the two layers (relut and relu1) for both CPU and GPU, and they are the same. So I assume there might be something wrong with either the MultiProposal layer, or the DeformablePSROIPooling layer. Basically I did the following:

        # in get_symbol_rcnn, line 248,249
        relut = self.resnetc5(conv_feat, deform=True)
        relu1 = mx.symbol.Concat(*[conv_feat, relut], name='cat4')  
        ....
        # line 368
        group = mx.sym.Group([rois, cls_prob, bbox_pred, im_ids, relut, relu1])

And the result is as below (I only take the first a few lines since the dump files are around 500~MB.
CPU:

<NDArray 1x2048x88x125 @cpu(0)>, 'cat4_output':
[[[[-0.199  0.493  0.557  0.630  0.259  0.152  0.491  0.544  0.402
     0.501  0.020  0.358  0.934  1.098  0.734  0.652  0.519  0.430
     0.085  0.117  0.010 -0.238  0.201  0.361  0.354  0.875  0.492
     0.445  0.315  0.215  0.373  0.379  0.078  0.077 -0.045  0.110
     0.253  0.402  0.515  0.230  0.273  0.667  0.465  0.757  0.510
     0.785  0.645  0.430  0.328 -0.303  0.087  0.194  0.410  0.480
     0.327  0.187  0.047  0.398  0.490  0.513  0.377  0.410  0.414
     0.197 -0.087 -0.067 -0.040 -0.206 -0.323 -0.123 -0.180  0.133
     0.351  0.714  0.478  0.056 -0.090 -0.163 -0.205 -0.013  0.186
     0.094  0.083 -0.223  0.091  0.334  0.247  0.314  0.028  0.284
     0.182  0.303  0.332 -0.238 -0.146  0.118  0.109  0.275  0.252
     0.115  0.482  0.252  0.274  0.255  0.034  0.115  0.231  0.204
    -0.139  0.137  0.065  0.150  0.198  0.338  0.247  0.109  0.002
    -0.003  0.345  0.215  0.333  0.107 -0.223  0.412 -0.013]
   [-0.085  0.624  0.563  0.426  0.202  0.136  0.365  0.177  0.040
     0.298  0.475  0.432  0.714  0.652  0.562  0.556  0.356  0.217
     0.104  0.059 -0.077 -0.135 -0.208  0.072  0.172  0.553  0.384
     0.497  0.345 -0.057  0.250  0.631  0.167 -0.225 -0.118  0.068
    -0.047  0.387  0.621  0.581  0.435  0.271  0.139  0.640  0.764
     0.631  0.558  0.412 -0.087 -0.562 -0.167 -0.042  0.343  0.410
     0.224  0.216  0.139  0.338  0.217  0.159  0.121  0.471  0.455
     0.124 -0.168  0.006 -0.270 -0.612 -0.306 -0.349 -0.282 -0.028
     0.151  0.239  0.310 -0.110 -0.146 -0.045 -0.544 -0.135  0.389
     0.393  0.351  0.113  0.246  0.369  0.220  0.004 -0.190 -0.049
    -0.178 -0.113  0.152 -0.223  0.082 -0.079  0.032  0.020  0.258
     0.096  0.211  0.181  0.133  0.177 -0.130 -0.079  0.304  0.103
     0.107  0.385  0.495  0.288  0.342  0.308  0.084 -0.182 -0.127
     0.123  0.425  0.180  0.331  0.350 -0.095  0.745  0.073]
   [-0.079  0.404  0.186  0.333 -0.033 -0.114  0.300 -0.067  0.035
     0.087  0.544  0.347  0.457  0.338  0.222  0.510  0.445  0.546
     0.120  0.172  0.203  0.049 -0.225  0.095 -0.021  0.207  0.055
    -0.083 -0.185 -0.519 -0.257  0.511 -0.147 -0.401  0.181  0.076
    -0.395 -0.328  0.295  0.563  0.447  0.307 -0.016  0.261  0.658


<NDArray 1x300x81 @cpu(0)>, '_plus32_output':
[[[[ 0.097  0.144  0.145  0.051 -0.140 -0.175 -0.100 -0.086 -0.021
    -0.096 -0.308  0.024 -0.059 -0.148 -0.102 -0.097 -0.091  0.058
     0.099  0.065 -0.036 -0.208 -0.178 -0.130 -0.116  0.072 -0.063
    -0.142 -0.113 -0.077 -0.078 -0.087 -0.147 -0.154 -0.018 -0.009
    -0.014 -0.052 -0.086 -0.074  0.064  0.049 -0.045 -0.006 -0.006
     0.146  0.257  0.110  0.086  0.305  0.239  0.047  0.145  0.333
     0.195  0.193  0.264  0.291  0.264  0.076  0.035  0.022  0.200
     0.232  0.147  0.127  0.210  0.246  0.204  0.121  0.150 -0.015
     0.114  0.067  0.132  0.147  0.215  0.152  0.242  0.092  0.259
     0.433  0.406  0.237  0.405  0.480  0.377  0.485  0.563  0.420
     0.450  0.464  0.448  0.412  0.551  0.612  0.392  0.282  0.253
     0.297  0.146  0.142  0.413  0.337  0.338  0.403  0.407  0.429
     0.449  0.435  0.192  0.376  0.340  0.487  0.507  0.356  0.206
    -0.031  0.026  0.014  0.212 -0.037 -0.036 -0.160 -0.160]
   [-0.166  0.185  0.072 -0.073  0.053  0.115  0.122  0.070  0.177
     0.066 -0.017  0.103  0.130  0.219  0.228  0.142  0.001  0.038
     0.154  0.261  0.324  0.174  0.071  0.022  0.106  0.075  0.141
     0.028  0.036  0.138  0.064  0.064  0.150  0.077  0.156  0.139
     0.179  0.039  0.110  0.010  0.005 -0.092  0.033 -0.002 -0.165
     0.037  0.073  0.218  0.237  0.182  0.177  0.118  0.166  0.184
     0.232  0.335  0.301  0.317  0.170  0.105  0.179  0.208  0.504
     0.333  0.429  0.470  0.312  0.399  0.225  0.183  0.263  0.209
     0.137  0.200  0.205  0.250  0.271  0.235  0.351  0.399  0.379
     0.438  0.409  0.439  0.424  0.460  0.487  0.576  0.468  0.510
     0.420  0.487  0.664  0.695  0.429  0.330  0.441  0.597  0.345
     0.435  0.424  0.353  0.298  0.387  0.348  0.445  0.597  0.610
     0.679  0.586  0.481  0.535  0.492  0.435  0.445  0.349  0.261
     0.373  0.341  0.404  0.579  0.314  0.269  0.157  0.126]
   [ 0.001  0.063  0.013  0.045 -0.003 -0.012  0.102  0.140  0.163
     0.109 -0.081 -0.109 -0.146  0.165  0.049 -0.019 -0.056 -0.065
    -0.043  0.158  0.376  0.277  0.308  0.252  0.042 -0.051  0.124
     0.129 -0.016  0.038 -0.084  0.030  0.263  0.164  0.284  0.159
     0.221  0.016  0.120 -0.099  0.120  0.260  0.211  0.120 -0.014
    -0.065  0.122  0.202  0.274  0.118  0.108  0.010  0.143  0.232
     0.287  0.340  0.356  0.395  0.306  0.249  0.120  0.208  0.360
     0.140  0.378  0.435  0.508  0.388  0.228  0.232  0.116  0.268
     0.298  0.217  0.169  0.093  0.086  0.196  0.316  0.376  0.402
     0.521  0.495  0.426  0.414  0.503  0.379  0.320  0.249  0.109
     0.229  0.161  0.277  0.447  0.519  0.382  0.149  0.324  0.376
     0.439  0.337  0.256  0.307  0.227  0.238  0.340  0.531  0.452
     0.465  0.408  0.312  0.273  0.352  0.235  0.100  0.160  0.224
     0.333  0.211  0.260  0.201  0.389  0.457  0.276  0.298]
   [-0.068  0.148  0.094  0.063  0.048  0.095  0.109  0.126  0.180
     0.016  0.032  0.030  0.020  0.241  0.167  0.181  0.015  0.115
     0.225  0.243  0.327  0.343  0.323  0.316  0.139  0.027  0.033
    -0.030 -0.063 -0.043 -0.002  0.007  0.190  0.169  0.180  0.116
     0.117  0.211  0.086  0.097  0.088  0.187  0.175  0.117  0.064
     0.036  0.212  0.134 -0.045 -0.201  0.033  0.197  0.280  0.177
     0.136  0.353  0.316  0.294  0.336  0.244  0.177  0.104  0.203
     0.177  0.310  0.453  0.515  0.408  0.240  0.181  0.198  0.216
     0.112  0.109  0.111  0.116  0.144  0.248  0.218  0.271  0.307
     0.427  0.450  0.401  0.405  0.367  0.302  0.424  0.459  0.359
     0.238  0.328  0.518  0.510  0.423  0.372  0.323  0.513  0.378
     0.250  0.168  0.381  0.384  0.365  0.214  0.269  0.254  0.261
     0.344  0.333  0.264  0.228  0.346  0.270  0.257  0.278  0.231
     0.242  0.306  0.331  0.382  0.439  0.582  0.377  0.300]
   [-0.166  0.031 -0.018 -0.040  0.170  0.077  0.069  0.077  0.022
     0.072 -0.009  0.079  0.204  0.219  0.315  0.284  0.229  0.232
     0.233  0.299  0.254  0.408  0.308  0.176  0.013  0.087  0.006
    -0.032 -0.138 -0.014  0.024  0.050  0.244  0.131  0.130 -0.005
    -0.080  0.039 -0.139  0.057  0.230  0.253  0.451  0.225  0.347
     0.250  0.357  0.250 -0.050 -0.046  0.080  0.373  0.441  0.343
     0.286  0.335  0.415  0.476  0.536  0.292  0.085  0.254  0.307
     0.328  0.350  0.602  0.524  0.481  0.291  0.242  0.176  0.200

GPU output

<NDArray 1x2048x88x125 @gpu(0)>, 'cat4_output':
[[[[-0.199  0.493  0.557  0.630  0.259  0.152  0.491  0.544  0.402
     0.501  0.020  0.358  0.934  1.098  0.734  0.652  0.519  0.430
     0.085  0.117  0.010 -0.238  0.201  0.361  0.354  0.875  0.492
     0.445  0.315  0.215  0.373  0.379  0.078  0.077 -0.045  0.110
     0.253  0.402  0.515  0.230  0.273  0.667  0.465  0.757  0.510
     0.785  0.645  0.430  0.328 -0.303  0.087  0.194  0.410  0.480
     0.327  0.187  0.047  0.398  0.490  0.513  0.377  0.410  0.414
     0.197 -0.087 -0.067 -0.040 -0.206 -0.323 -0.123 -0.180  0.133
     0.351  0.714  0.478  0.056 -0.090 -0.163 -0.205 -0.013  0.186
     0.094  0.083 -0.223  0.091  0.334  0.247  0.314  0.028  0.284
     0.182  0.303  0.332 -0.238 -0.146  0.118  0.109  0.275  0.252
     0.115  0.482  0.252  0.274  0.255  0.034  0.115  0.231  0.204
    -0.139  0.137  0.065  0.150  0.198  0.338  0.247  0.109  0.002
    -0.003  0.345  0.215  0.333  0.107 -0.223  0.412 -0.013]
   [-0.085  0.624  0.563  0.426  0.202  0.136  0.365  0.177  0.040
     0.298  0.475  0.432  0.714  0.652  0.562  0.556  0.356  0.217
     0.104  0.059 -0.077 -0.135 -0.208  0.072  0.172  0.553  0.384
     0.497  0.345 -0.057  0.250  0.631  0.167 -0.225 -0.118  0.068
    -0.047  0.387  0.621  0.581  0.435  0.271  0.139  0.640  0.764
     0.631  0.558  0.412 -0.087 -0.562 -0.167 -0.042  0.343  0.410
     0.224  0.216  0.139  0.338  0.217  0.159  0.121  0.471  0.455
     0.124 -0.168  0.006 -0.270 -0.612 -0.306 -0.349 -0.282 -0.028
     0.151  0.239  0.310 -0.110 -0.146 -0.045 -0.544 -0.135  0.389
     0.393  0.351  0.113  0.246  0.369  0.220  0.004 -0.190 -0.049
    -0.178 -0.113  0.152 -0.223  0.082 -0.079  0.032  0.020  0.258
     0.096  0.211  0.181  0.133  0.177 -0.130 -0.079  0.304  0.103
     0.107  0.385  0.495  0.288  0.342  0.308  0.084 -0.182 -0.127
     0.123  0.425  0.180  0.331  0.350 -0.095  0.745  0.073]
   [-0.079  0.404  0.186  0.333 -0.033 -0.114  0.300 -0.067  0.035
     0.087  0.544  0.347  0.457  0.338  0.222  0.510  0.445  0.546
     0.120  0.172  0.203  0.049 -0.225  0.095 -0.021  0.207  0.055
    -0.083 -0.185 -0.519 -0.257  0.511 -0.147 -0.401  0.181  0.076
    -0.395 -0.328  0.295  0.563  0.447  0.307 -0.016  0.261  0.658


<NDArray 1x300x81 @gpu(0)>, '_plus32_output':
[[[[ 0.097  0.144  0.145  0.051 -0.140 -0.175 -0.100 -0.086 -0.021
    -0.096 -0.308  0.024 -0.059 -0.148 -0.102 -0.097 -0.091  0.058
     0.099  0.065 -0.036 -0.208 -0.178 -0.130 -0.116  0.072 -0.063
    -0.142 -0.113 -0.077 -0.078 -0.087 -0.147 -0.154 -0.018 -0.009
    -0.014 -0.052 -0.086 -0.074  0.064  0.049 -0.045 -0.006 -0.006
     0.146  0.257  0.110  0.086  0.305  0.239  0.047  0.145  0.333
     0.195  0.193  0.264  0.291  0.264  0.076  0.035  0.022  0.200
     0.232  0.147  0.127  0.210  0.246  0.204  0.121  0.150 -0.015
     0.114  0.067  0.132  0.147  0.215  0.152  0.242  0.092  0.259
     0.433  0.406  0.237  0.405  0.480  0.377  0.485  0.563  0.420
     0.450  0.464  0.448  0.412  0.551  0.612  0.392  0.282  0.253
     0.297  0.146  0.142  0.413  0.337  0.338  0.403  0.407  0.430
     0.449  0.435  0.192  0.376  0.340  0.487  0.507  0.356  0.206
    -0.031  0.026  0.014  0.212 -0.037 -0.036 -0.160 -0.160]
   [-0.166  0.185  0.072 -0.073  0.053  0.115  0.122  0.070  0.177
     0.066 -0.017  0.103  0.130  0.219  0.228  0.142  0.001  0.038
     0.154  0.261  0.324  0.174  0.071  0.022  0.106  0.075  0.141
     0.028  0.036  0.138  0.064  0.064  0.150  0.077  0.156  0.139
     0.179  0.039  0.110  0.010  0.005 -0.092  0.033 -0.002 -0.165
     0.037  0.073  0.218  0.237  0.182  0.177  0.118  0.166  0.184
     0.232  0.335  0.301  0.317  0.170  0.105  0.179  0.208  0.504
     0.333  0.429  0.470  0.312  0.399  0.225  0.183  0.263  0.209
     0.137  0.200  0.205  0.250  0.271  0.235  0.351  0.399  0.379
     0.438  0.409  0.439  0.424  0.460  0.487  0.576  0.468  0.510
     0.420  0.487  0.664  0.695  0.429  0.330  0.441  0.597  0.345
     0.435  0.424  0.353  0.298  0.387  0.348  0.445  0.597  0.610
     0.679  0.586  0.481  0.535  0.492  0.435  0.445  0.349  0.261
     0.373  0.341  0.404  0.579  0.314  0.269  0.157  0.126]
   [ 0.001  0.063  0.013  0.045 -0.003 -0.012  0.102  0.140  0.163
     0.109 -0.081 -0.109 -0.146  0.165  0.049 -0.019 -0.056 -0.065
    -0.043  0.158  0.376  0.277  0.308  0.252  0.042 -0.051  0.124
     0.129 -0.016  0.038 -0.084  0.030  0.263  0.164  0.284  0.159
     0.221  0.016  0.120 -0.099  0.120  0.260  0.211  0.120 -0.014

@bharatsingh430
Copy link
Collaborator

thanks, if you can check the output of the proposal layer and the features after the pooling as well, that will give a clear idea which one is failing. If its the proposal layer, I can check it and get back to you

@xiaoyongzhu
Copy link
Author

BTW - I've replaced the deformable_psroi_pooling.cc file in order to make it run on CPU (as I mentioned above). I've also tested this file with some other repository so I think it is fine. Since you guys didn't change this file, so I think the problem might be in the MultiProposal layer. Will check and let you know soon!

@xiaoyongzhu
Copy link
Author

OK here's more data. We can see that the input of the MultiProposal layer is the same (denoted as rpn_cls_prob_reshape_output below), but the output of MultiProposal layer is different (denoted as rois_output. The output of DeformablePSROIPooling layer is also different, but I think this is because the input of DeformablePSROIPooling layer is already different.

CPU output


<NDArray 300x5 @cpu(0)>, 'rpn_cls_prob_reshape_output':
[[[[ 1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  0.997
     0.999  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000]
   [ 1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000]
   [ 1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000


<NDArray 1 @cpu(0)>, 'rois_output':
[[ 0.000  280.119  204.702  467.711  399.569]
 [ 0.000  157.547  188.911  342.748  397.438]
 [ 0.000  185.453  198.982  387.434  400.531]
 [ 0.000  229.094  208.812  428.486  396.312]
 [ 0.000  307.040  195.348  501.990  393.009]
 [ 0.000  341.803  201.006  548.583  394.455]
 [ 0.000  396.528  191.727  591.324  387.371]
 [ 0.000  1156.895  681.321  1201.000  743.583]
 [ 0.000  38.503  178.544  207.934  410.373]
 [ 0.000  79.362  180.744  250.460  407.440]
 [ 0.000  108.472  186.929  290.305  401.470]
 [ 0.000  1172.917  684.200  1201.000  743.904]
 [ 0.000  1131.557  679.007  1201.000  740.875]
 [ 0.000  1106.835  672.633  1201.000  742.475]
 [ 0.000  1052.540  671.765  1201.000  745.924]
 [ 0.000  993.391  674.792  1182.917  749.072]
 [ 0.000  128.083  215.154  321.601  413.789]
 [ 0.000  173.980  700.681  478.930  799.000]
 [ 0.000  168.017  231.889  373.320  417.049]
 [ 0.000  203.315  231.655  416.539  412.567]
 [ 0.000  5.146  172.476  170.600  426.306]
 [ 0.000  772.012  576.201  889.586  730.370]
 [ 0.000  1175.192  169.566  1201.000  424.347]
 [ 0.000  1156.714  158.541  1201.000  427.482]
 [ 0.000  1131.555  164.173  1201.000  418.241]
 [ 0.000  1101.202  171.076  1201.000  416.677]
 [ 0.000  1052.231  174.439  1201.000  416.185]
 [ 0.000  452.740  218.070  667.618  406.802]
 [ 0.000  426.838  204.409  637.058  395.885]
 [ 0.000  99.064  151.861  267.066  384.954]
 [ 0.000  836.372  556.619  940.184  737.266]
 [ 0.000  251.144  181.788  444.279  375.302]
 [ 0.000  1105.779  689.475  1201.000  756.210]

<NDArray 1x300x4 @cpu(0)>, 'offset_t_output':
[[[[ 0.047  0.000  0.000  0.000  0.000  0.000  0.000]
   [ 0.000  0.000  0.000  0.000  0.000  0.000  0.000]
   [ 0.000  0.000  0.000  0.000  0.000  0.000  0.000]
   [ 0.000  0.000  0.000  0.000  0.000  0.000  0.000]
   [ 0.000  0.000  0.000  0.000  0.000  0.000  0.000]
   [ 0.000  0.000  0.000  0.000  0.000  0.000  0.000]
   [ 0.000  0.000  0.000  0.000  0.000  0.000  0.000]]

  [[ 0.000  0.000  0.000  0.000  0.000  0.000  0.000]
   [ 0.000  0.000  0.000  0.000  0.000  0.000  0.000]
   [ 0.000  0.000  0.000  0.000  0.000  0.000  0.000]
   [ 0.000  0.000  0.000  0.000  0.000  0.000  0.000]
   [ 0.000  0.000  0.000  0.000  0.000  0.000  0.000]
   [ 0.000  0.000  0.000  0.000  0.000  0.000  0.000]
   [ 0.000  0.000  0.000  0.000  0.000  0.000  0.000]]

  [[ 1.576  0.291  0.044  0.000  0.000  0.000  0.000]
   [ 1.411  0.231  0.027  0.000  0.000  0.000  0.000]
   [ 1.172  0.171  0.017  0.000  0.000  0.000  0.000]
   [ 0.933  0.112  0.008  0.000  0.000  0.000  0.000]
   [ 0.771  0.077  0.000  0.000  0.000  0.000  0.000]
   [ 0.818  0.108  0.000  0.000  0.000  0.000  0.000]
   [ 0.877  0.144  0.000  0.000  0.000  0.000  0.000]]

  [[ 0.000  0.000  0.000  0.000  0.000  0.000  0.000]
   [ 0.000  0.000  0.000  0.000  0.000  0.000  0.000]
   [ 0.000  0.000  0.000  0.000  0.000  0.000  0.000]
   [ 0.000  0.000  0.000  0.000  0.000  0.000  0.000]
   [ 0.000  0.000  0.000  0.000  0.000  0.000  0.000]
   [ 0.000  0.000  0.000  0.000  0.000  0.000  0.000]
   [ 0.000  0.000  0.000  0.000  0.000  0.000  0.000]]

GPU output:


<NDArray 300x5 @gpu(0)>, 'rpn_cls_prob_reshape_output':
[[[[ 1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  0.997
     0.999  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000]
   [ 1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000]
   [ 1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000


<NDArray 1 @gpu(0)>, 'rois_output':
[[ 0.000  753.216  749.806  834.909  820.897]
 [ 0.000  725.318  1043.365  944.857  1101.822]
 [ 0.000  714.645  749.887  832.771  833.734]
 [ 0.000  323.836  714.759  786.215  1115.803]
 [ 0.000  733.397  741.636  831.955  823.448]
 [ 0.000  742.081  751.284  844.674  830.893]
 [ 0.000  275.516  651.538  764.976  1073.219]
 [ 0.000  744.451  747.997  833.505  798.543]
 [ 0.000  379.031  643.509  836.091  1088.876]
 [ 0.000  697.719  755.645  828.332  845.296]
 [ 0.000  727.792  1033.050  938.315  1086.404]
 [ 0.000  771.650  751.886  832.977  829.797]
 [ 0.000  1350.452  982.553  1438.400  1031.452]
 [ 0.000  592.604  144.989  645.273  241.807]
 [ 0.000  236.748  696.619  707.583  1108.910]
 [ 0.000  1023.936  935.082  1502.045  1236.251]
 [ 0.000  639.958  732.515  836.882  849.457]
 [ 0.000  746.808  735.036  843.615  816.192]
 [ 0.000  716.378  1034.667  962.463  1111.670]
 [ 0.000  318.429  608.859  838.934  1018.720]
 [ 0.000  662.491  741.221  822.586  833.795]
 [ 0.000  666.295  754.479  820.237  852.718]
 [ 0.000  757.149  742.948  840.021  838.489]
 [ 0.000  767.774  751.316  843.469  809.851]
 [ 0.000  731.319  747.238  823.234  807.961]
 [ 0.000  440.054  635.350  928.807  1047.063]
 [ 0.000  256.945  765.061  739.553  1145.297]
 [ 0.000  708.396  732.966  835.075  811.547]
 [ 0.000  759.680  742.207  839.527  803.654]
 [ 0.000  749.974  1043.761  941.582  1087.742]
 [ 0.000  478.419  600.464  951.523  977.659]
 [ 0.000  746.143  755.441  818.694  833.960]
 [ 0.000  1252.271  315.293  1311.270  375.792]

<NDArray 1x300x4 @gpu(0)>, 'offset_t_output':
[[[[ 0.149  0.000  0.000  0.000  0.000  0.000  0.000]
   [ 0.586  0.434  0.246  0.022  0.000  0.000  0.000]
   [ 1.520  1.406  0.791  0.071  0.000  0.000  0.000]
   [ 1.144  0.958  0.459  0.041  0.000  0.000  0.004]
   [ 0.508  0.305  0.053  0.005  0.000  0.000  0.017]
   [ 0.116  0.076  0.020  0.002  0.000  0.000  0.046]
   [ 0.000  0.006  0.007  0.001  0.000  0.000  0.029]]

  [[ 0.000  0.000  0.000  0.000  0.000  0.000  0.000]
   [ 0.000  0.000  0.000  0.000  0.000  0.000  0.000]
   [ 0.000  0.000  0.000  0.000  0.000  0.000  0.000]
   [ 0.000  0.000  0.000  0.000  0.000  0.000  0.000]
   [ 0.000  0.000  0.000  0.000  0.000  0.000  0.000]
   [ 0.000  0.000  0.000  0.000  0.000  0.000  0.000]
   [ 0.000  0.000  0.000  0.000  0.000  0.000  0.000]]

  [[ 0.000  0.000  0.000  0.005  0.031  0.019  0.000]
   [ 0.000  0.000  0.000  0.006  0.038  0.022  0.000]
   [ 0.000  0.000  0.000  0.002  0.011  0.006  0.000]
   [ 0.000  0.000  0.000  0.010  0.059  0.035  0.000]
   [ 0.000  0.000  0.000  0.027  0.168  0.120  0.027]
   [ 0.000  0.000  0.000  0.057  0.356  0.291  0.106]
   [ 0.000  0.000  0.000  0.059  0.365  0.269  0.070]]

  [[ 0.010  0.000  0.000  0.000  0.000  0.000  0.000]
   [ 0.013  0.000  0.000  0.000  0.000  0.000  0.000]
   [ 0.002  0.000  0.000  0.000  0.000  0.000  0.007]
   [ 0.000  0.000  0.000  0.000  0.000  0.000  0.111]
   [ 0.000  0.000  0.000  0.000  0.000  0.000  0.185]
   [ 0.000  0.000  0.000  0.001  0.004  0.002  0.141]
   [ 0.000  0.000  0.000  0.017  0.105  0.063  0.078]]

@bharatsingh430
Copy link
Collaborator

ok, for inference there is no CUDA code written other than the memcpy. I can make the changes in some time, but if you want to get this working soon, you just need to get rid of the memcpy and paste all the content from the cu file to the cc file, i am talking about https://github.com/mahyarnajibi/SNIPER-mxnet/blob/master/src/operator/multi_proposal.cc and https://github.com/mahyarnajibi/SNIPER-mxnet/blob/master/src/operator/multi_proposal.cu files.

@xiaoyongzhu
Copy link
Author

@bharatsingh430 Cool - I'll probably try to implement it myself, and will let you know how it goes!

Thanks for the help!

@xiaoyongzhu
Copy link
Author

Looks like it's working! Basically a few changes need to be made:

  • As @bharatsingh430 mentioned, copy the content of multi_proposal.cu to multi_proposal.cc
  • remove cuda dependencies such as #include <mshadow/cuda/reduce.cuh>
  • make necessary changes to create CPU tensor rather than GPU tensor (replace all Tensor<gpu with Tensor<cpu)
  • replace cudaMemcpy with memcpy

I put the file (multi_proposal.cc) in a public gist so people can refer to it as needed:
https://gist.github.com/xiaoyongzhu/4ae8d9df580c84e33157ff8d68f9ce89

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants