Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python Module Failing #1

Closed
gnedster opened this issue Jul 18, 2017 · 7 comments
Closed

Python Module Failing #1

gnedster opened this issue Jul 18, 2017 · 7 comments

Comments

@gnedster
Copy link
Contributor

Hi,

First, great work! However I'm trying to get the Python examples running but hitting this error:

Traceback (most recent call last):
  File "hello-world.py", line 8, in <module>
    import sparseconvnet.legacy as scn
  File "/.../local/lib/python2.7/site-packages/sparseconvnet/legacy/__init__.py", line 7, in <module>
    from ..utils import *
  File /.../local/lib/python2.7/site-packages/sparseconvnet/utils.py", line 8, in <module>
    import sparseconvnet.SCN as scn
  File "/.../local/lib/python2.7/site-packages/sparseconvnet/SCN/__init__.py", line 3, in <module>
    from ._SCN import lib as _lib, ffi as _ffi
ImportError: No module named _SCN

Could this be a missing config in setup.py? Seems like the ._SCN module isn't built or copied over. Any help is appreciated.

Cheers

@oztc
Copy link

oztc commented Jul 18, 2017

I have the same issue

@btgraham
Copy link
Contributor

btgraham commented Jul 18, 2017

Hello. To help me debug, can you please show the output from:
cd SpareConvNet/PyTorch
python setup.py develop
ls sparseconvnet/SCN/
(Also, what OS? What Python version? Conda or not?)

@oztc
Copy link

oztc commented Jul 18, 2017

Hi btgraham,

the following log is my output when I use "python setup.py develop" in SpareConvNet/PyTorch:

ozzie@debian:~/working/work/ML/SparseConvNet/PyTorch$ python setup.py develop

Building SCN module
nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
generating /tmp/tmpS1UlkY/_SCN.c
running build_ext
building '_SCN' extension
gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/ozzie/anaconda2/lib/python2.7/site-packages/torch/utils/ffi/../../lib/include -I/home/ozzie/anaconda2/lib/python2.7/site-packages/torch/utils/ffi/../../lib/include/TH -I/home/ozzie/anaconda2/lib/python2.7/site-packages/torch/utils/ffi/../../lib/include/THC -I/usr/local/cuda/include -I/home/ozzie/anaconda2/include/python2.7 -c _SCN.c -o ./_SCN.o
gcc -pthread -shared -L/home/ozzie/anaconda2/lib -Wl,-rpath=/home/ozzie/anaconda2/lib,--no-as-needed ./_SCN.o /media/New_bt/ML/SparseConvNet/PyTorch/sparseconvnet/SCN/init.cu.o -L/home/ozzie/anaconda2/lib -lpython2.7 -o ./_SCN.so
running develop
running egg_info
creating sparseconvnet.egg-info
writing sparseconvnet.egg-info/PKG-INFO
writing top-level names to sparseconvnet.egg-info/top_level.txt
writing dependency_links to sparseconvnet.egg-info/dependency_links.txt
writing manifest file 'sparseconvnet.egg-info/SOURCES.txt'
reading manifest file 'sparseconvnet.egg-info/SOURCES.txt'
writing manifest file 'sparseconvnet.egg-info/SOURCES.txt'
running build_ext
Creating /home/ozzie/anaconda2/lib/python2.7/site-packages/sparseconvnet.egg-link (link to .)
Adding sparseconvnet 0.1 to easy-install.pth file

Installed /media/New_bt/ML/SparseConvNet/PyTorch
Processing dependencies for sparseconvnet==0.1
Finished processing dependencies for sparseconvnet==0.1

ozzie@debian:~/working/work/ML/SparseConvNet/examples/Assamese_handwriting$ python VGGplus.py
Downloading and preprocessing data ...
--2017-07-18 18:06:00-- https://archive.ics.uci.edu/ml/machine-learning-databases/00208/Online%20Handwritten%20Assamese%20Characters%20Dataset.rar
Resolving archive.ics.uci.edu (archive.ics.uci.edu)... 128.195.10.249
Connecting to archive.ics.uci.edu (archive.ics.uci.edu)|128.195.10.249|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 8067448 (7.7M) [text/plain]
Saving to: ‘Online Handwritten Assamese Characters Dataset.rar’

Online Handwritten 100%[=====================>] 7.69M 1.22MB/s in 12s

2017-07-18 18:06:13 (671 KB/s) - ‘Online Handwritten Assamese Characters Dataset.rar’ saved [8067448/8067448]

UNRAR 5.30 beta 2 freeware Copyright (c) 1993-2015 Alexander Roshal

Extracting from Online Handwritten Assamese Characters Dataset.rar

Extracting data_table.pdf OK
Extracting 1.1.txt OK
Extracting 10.1.txt OK
Extracting 100.1.txt OK
Extracting 101.1.txt OK
Extracting 102.1.txt OK
Extracting 103.1.txt OK
Extracting 104.1.txt OK
Extracting 105.1.txt OK
Extracting 106.1.txt OK
Extracting 107.1.txt OK
Extracting 108.1.txt OK
Extracting 109.1.txt OK
................ (the middle "Extracting xxx.txt OK" is removed by Ozzie Zhang because it is too long)
Extracting 53.9.txt OK
Extracting 54.9.txt OK
Extracting 55.9.txt OK
Extracting 56.9.txt OK
Extracting 57.9.txt OK
Extracting 58.9.txt OK
Extracting 59.9.txt OK
Extracting 6.9.txt OK
Extracting 60.9.txt OK
Extracting 61.9.txt OK
Extracting 62.9.txt OK
Extracting 63.9.txt OK
Extracting 64.9.txt OK
Extracting 65.9.txt OK
Extracting 66.9.txt OK
Extracting 67.9.txt OK
Extracting 68.9.txt OK
Extracting 69.9.txt OK
Extracting 7.9.txt OK
Extracting 70.9.txt OK
Extracting 71.9.txt OK
Extracting 72.9.txt OK
Extracting 73.9.txt OK
Extracting 74.9.txt OK
Extracting 75.9.txt OK
Extracting 76.9.txt OK
Extracting 77.9.txt OK
Extracting 78.9.txt OK
Extracting 79.9.txt OK
Extracting 8.9.txt OK
Extracting 80.9.txt OK
Extracting 81.9.txt OK
Extracting 82.9.txt OK
Extracting 83.9.txt OK
Extracting 84.9.txt OK
Extracting 85.9.txt OK
Extracting 86.9.txt OK
Extracting 87.9.txt OK
Extracting 88.9.txt OK
Extracting 89.9.txt OK
Extracting 9.9.txt OK
Extracting 90.9.txt OK
Extracting 91.9.txt OK
Extracting 92.9.txt OK
Extracting 93.9.txt OK
Extracting 94.9.txt OK
Extracting 95.9.txt OK
Extracting 96.9.txt OK
Extracting 97.9.txt OK
Extracting 98.9.txt OK
Extracting 99.9.txt OK
All OK
(6588, 1647)

nn.Sequential {
[input -> (0) -> (1) -> output]
(0): nn.Sequential {
[input -> (0) -> (1) -> (2) -> (3) -> output]
(0): nn.Sequential {
[input -> (0) -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> (12) -> (13) -> (14) -> (15) -> (16) -> (17) -> (18) -> (19) -> (20) -> (21) -> (22) -> (23) -> output]
(0): ValidConvolution 3->8 C3
(1): BatchNormReLU(8,eps=0.0001,momentum=0.9,affine=True)
(2): ValidConvolution 8->8 C3
(3): BatchNormReLU(8,eps=0.0001,momentum=0.9,affine=True)
(4): MaxPooling3/2
(5): ValidConvolution 8->16 C3
(6): BatchNormReLU(16,eps=0.0001,momentum=0.9,affine=True)
(7): ValidConvolution 16->16 C3
(8): BatchNormReLU(16,eps=0.0001,momentum=0.9,affine=True)
(9): MaxPooling3/2
(10): sparseconvnet.legacy.concatTable.ConcatTable {
input
|-> (0): ValidConvolution 16->16 C3 |-> (1): nn.Sequential {
[input -> (0) -> (1) -> (2) -> (3) -> (4) -> output]
(0): Convolution 16->8 C3/2
(1): BatchNormReLU(8,eps=0.0001,momentum=0.9,affine=True)
(2): ValidConvolution 8->8 C3
(3): BatchNormReLU(8,eps=0.0001,momentum=0.9,affine=True)
(4): Deconvolution 8->8 C3/2
}
+. -> output
}
(11): JoinTable: 16 + 8 -> 24
(12): BatchNormReLU(24,eps=0.0001,momentum=0.9,affine=True)
(13): sparseconvnet.legacy.concatTable.ConcatTable {
input
|-> (0): ValidConvolution 24->16 C3 |-> (1): nn.Sequential {
[input -> (0) -> (1) -> (2) -> (3) -> (4) -> output]
(0): Convolution 24->8 C3/2
(1): BatchNormReLU(8,eps=0.0001,momentum=0.9,affine=True)
(2): ValidConvolution 8->8 C3
(3): BatchNormReLU(8,eps=0.0001,momentum=0.9,affine=True)
(4): Deconvolution 8->8 C3/2
}
+. -> output
}
(14): JoinTable: 16 + 8 -> 24
(15): BatchNormReLU(24,eps=0.0001,momentum=0.9,affine=True)
(16): MaxPooling3/2
(17): sparseconvnet.legacy.concatTable.ConcatTable {
input
|-> (0): ValidConvolution 24->24 C3 |-> (1): nn.Sequential {
[input -> (0) -> (1) -> (2) -> (3) -> (4) -> output]
(0): Convolution 24->8 C3/2
(1): BatchNormReLU(8,eps=0.0001,momentum=0.9,affine=True)
(2): ValidConvolution 8->8 C3
(3): BatchNormReLU(8,eps=0.0001,momentum=0.9,affine=True)
(4): Deconvolution 8->8 C3/2
}
+. -> output
}
(18): JoinTable: 24 + 8 -> 32
(19): BatchNormReLU(32,eps=0.0001,momentum=0.9,affine=True)
(20): sparseconvnet.legacy.concatTable.ConcatTable {
input
|-> (0): ValidConvolution 32->24 C3 |-> (1): nn.Sequential {
[input -> (0) -> (1) -> (2) -> (3) -> (4) -> output]
(0): Convolution 32->8 C3/2
(1): BatchNormReLU(8,eps=0.0001,momentum=0.9,affine=True)
(2): ValidConvolution 8->8 C3
(3): BatchNormReLU(8,eps=0.0001,momentum=0.9,affine=True)
(4): Deconvolution 8->8 C3/2
}
+. -> output
}
(21): JoinTable: 24 + 8 -> 32
(22): BatchNormReLU(32,eps=0.0001,momentum=0.9,affine=True)
(23): MaxPooling3/2
}
(1): Convolution 32->64 C5/1
(2): BatchNormReLU(64,eps=0.0001,momentum=0.9,affine=True)
(3): SparseToDense(2)
}
(1): nn.Sequential {
[input -> (0) -> (1) -> output]
(0): nn.View(-1, 64)
(1): nn.Linear(64 -> 183)
}
}
('input spatial size',
95
95
[torch.LongTensor of size 2]
)
Replicating training set 10 times (1 epoch = 10 iterations through the training set = 10x6588 training samples)
{'weightDecay': 0.0001, 'initial_LR': 0.1, 'checkPoint': False, 'nEpochs': 100, 'LR_decay': 0.05, 'momentum': 0.9}
('#parameters', 97295)
THCudaCheck FAIL file=/b/wheel/pytorch-src/torch/lib/THC/generic/THCTensorMath.cu line=35 error=8 : invalid device function
Traceback (most recent call last):
File "VGGplus.py", line 38, in
{'nEpochs': 100, 'initial_LR': 0.1, 'LR_decay': 0.05, 'weightDecay': 1e-4})
File "/media/New_bt/ML/SparseConvNet/PyTorch/sparseconvnet/legacy/classificationTrainValidate.py", line 73, in ClassificationTrainValidate
model.forward(batch['input'])
File "/home/ozzie/anaconda2/lib/python2.7/site-packages/torch/legacy/nn/Module.py", line 33, in forward
return self.updateOutput(input)
File "/home/ozzie/anaconda2/lib/python2.7/site-packages/torch/legacy/nn/Sequential.py", line 36, in updateOutput
currentOutput = module.updateOutput(currentOutput)
File "/home/ozzie/anaconda2/lib/python2.7/site-packages/torch/legacy/nn/Sequential.py", line 36, in updateOutput
currentOutput = module.updateOutput(currentOutput)
File "/home/ozzie/anaconda2/lib/python2.7/site-packages/torch/legacy/nn/Sequential.py", line 36, in updateOutput
currentOutput = module.updateOutput(currentOutput)
File "/media/New_bt/ML/SparseConvNet/PyTorch/sparseconvnet/legacy/validConvolution.py", line 46, in updateOutput
torch.cuda.IntTensor() if input.features.is_cuda else nullptr)
File "/home/ozzie/anaconda2/lib/python2.7/site-packages/torch/utils/ffi/init.py", line 177, in safe_call
result = torch._C._safe_call(*args, **kwargs)
torch.FatalError: cuda runtime error (8) : invalid device function at /b/wheel/pytorch-src/torch/lib/THC/generic/THCTensorMath.cu:35

I guess that this should be a CUDA device issue related my GPU device number

I should use arch=compute_30,code=sm_30 because my GPU is nvidia k4000

@oztc
Copy link

oztc commented Jul 18, 2017

my os is
uname -a
Linux debian 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt25-2 (2016-04-08) x86_64 GNU/Linux

python
Python 2.7.12 |Anaconda custom (64-bit)| (default, Jul 2 2016, 17:42:40)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org

@oztc
Copy link

oztc commented Jul 18, 2017

my bug should be related to torch not SparseConvNet

@btgraham
Copy link
Contributor

I have switched compute_20, code=sm_20 to compute_30,code=sm_30
in setup file.

@gnedster
Copy link
Contributor Author

The hello-world.py example works now. Thanks for the quick fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants