Python Module Failing #1

gnedster · 2017-07-18T01:36:37Z

Hi,

First, great work! However I'm trying to get the Python examples running but hitting this error:

Traceback (most recent call last):
  File "hello-world.py", line 8, in <module>
    import sparseconvnet.legacy as scn
  File "/.../local/lib/python2.7/site-packages/sparseconvnet/legacy/__init__.py", line 7, in <module>
    from ..utils import *
  File /.../local/lib/python2.7/site-packages/sparseconvnet/utils.py", line 8, in <module>
    import sparseconvnet.SCN as scn
  File "/.../local/lib/python2.7/site-packages/sparseconvnet/SCN/__init__.py", line 3, in <module>
    from ._SCN import lib as _lib, ffi as _ffi
ImportError: No module named _SCN

Could this be a missing config in setup.py? Seems like the ._SCN module isn't built or copied over. Any help is appreciated.

Cheers

The text was updated successfully, but these errors were encountered:

oztc · 2017-07-18T07:13:52Z

I have the same issue

btgraham · 2017-07-18T07:51:01Z

Hello. To help me debug, can you please show the output from:
cd SpareConvNet/PyTorch
python setup.py develop
ls sparseconvnet/SCN/
(Also, what OS? What Python version? Conda or not?)

oztc · 2017-07-18T14:17:05Z

Hi btgraham,

the following log is my output when I use "python setup.py develop" in SpareConvNet/PyTorch:

ozzie@debian:~/working/work/ML/SparseConvNet/PyTorch$ python setup.py develop

Building SCN module
nvcc warning : The 'compute_20', 'sm_20', and 'sm_21' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
generating /tmp/tmpS1UlkY/_SCN.c
running build_ext
building '_SCN' extension
gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/ozzie/anaconda2/lib/python2.7/site-packages/torch/utils/ffi/../../lib/include -I/home/ozzie/anaconda2/lib/python2.7/site-packages/torch/utils/ffi/../../lib/include/TH -I/home/ozzie/anaconda2/lib/python2.7/site-packages/torch/utils/ffi/../../lib/include/THC -I/usr/local/cuda/include -I/home/ozzie/anaconda2/include/python2.7 -c _SCN.c -o ./_SCN.o
gcc -pthread -shared -L/home/ozzie/anaconda2/lib -Wl,-rpath=/home/ozzie/anaconda2/lib,--no-as-needed ./_SCN.o /media/New_bt/ML/SparseConvNet/PyTorch/sparseconvnet/SCN/init.cu.o -L/home/ozzie/anaconda2/lib -lpython2.7 -o ./_SCN.so
running develop
running egg_info
creating sparseconvnet.egg-info
writing sparseconvnet.egg-info/PKG-INFO
writing top-level names to sparseconvnet.egg-info/top_level.txt
writing dependency_links to sparseconvnet.egg-info/dependency_links.txt
writing manifest file 'sparseconvnet.egg-info/SOURCES.txt'
reading manifest file 'sparseconvnet.egg-info/SOURCES.txt'
writing manifest file 'sparseconvnet.egg-info/SOURCES.txt'
running build_ext
Creating /home/ozzie/anaconda2/lib/python2.7/site-packages/sparseconvnet.egg-link (link to .)
Adding sparseconvnet 0.1 to easy-install.pth file

Installed /media/New_bt/ML/SparseConvNet/PyTorch
Processing dependencies for sparseconvnet==0.1
Finished processing dependencies for sparseconvnet==0.1

ozzie@debian:~/working/work/ML/SparseConvNet/examples/Assamese_handwriting$ python VGGplus.py
Downloading and preprocessing data ...
--2017-07-18 18:06:00-- https://archive.ics.uci.edu/ml/machine-learning-databases/00208/Online%20Handwritten%20Assamese%20Characters%20Dataset.rar
Resolving archive.ics.uci.edu (archive.ics.uci.edu)... 128.195.10.249
Connecting to archive.ics.uci.edu (archive.ics.uci.edu)|128.195.10.249|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 8067448 (7.7M) [text/plain]
Saving to: ‘Online Handwritten Assamese Characters Dataset.rar’

Online Handwritten 100%[=====================>] 7.69M 1.22MB/s in 12s

2017-07-18 18:06:13 (671 KB/s) - ‘Online Handwritten Assamese Characters Dataset.rar’ saved [8067448/8067448]

Extracting from Online Handwritten Assamese Characters Dataset.rar

Extracting data_table.pdf OK
Extracting 1.1.txt OK
Extracting 10.1.txt OK
Extracting 100.1.txt OK
Extracting 101.1.txt OK
Extracting 102.1.txt OK
Extracting 103.1.txt OK
Extracting 104.1.txt OK
Extracting 105.1.txt OK
Extracting 106.1.txt OK
Extracting 107.1.txt OK
Extracting 108.1.txt OK
Extracting 109.1.txt OK
................ (the middle "Extracting xxx.txt OK" is removed by Ozzie Zhang because it is too long)
Extracting 53.9.txt OK
Extracting 54.9.txt OK
Extracting 55.9.txt OK
Extracting 56.9.txt OK
Extracting 57.9.txt OK
Extracting 58.9.txt OK
Extracting 59.9.txt OK
Extracting 6.9.txt OK
Extracting 60.9.txt OK
Extracting 61.9.txt OK
Extracting 62.9.txt OK
Extracting 63.9.txt OK
Extracting 64.9.txt OK
Extracting 65.9.txt OK
Extracting 66.9.txt OK
Extracting 67.9.txt OK
Extracting 68.9.txt OK
Extracting 69.9.txt OK
Extracting 7.9.txt OK
Extracting 70.9.txt OK
Extracting 71.9.txt OK
Extracting 72.9.txt OK
Extracting 73.9.txt OK
Extracting 74.9.txt OK
Extracting 75.9.txt OK
Extracting 76.9.txt OK
Extracting 77.9.txt OK
Extracting 78.9.txt OK
Extracting 79.9.txt OK
Extracting 8.9.txt OK
Extracting 80.9.txt OK
Extracting 81.9.txt OK
Extracting 82.9.txt OK
Extracting 83.9.txt OK
Extracting 84.9.txt OK
Extracting 85.9.txt OK
Extracting 86.9.txt OK
Extracting 87.9.txt OK
Extracting 88.9.txt OK
Extracting 89.9.txt OK
Extracting 9.9.txt OK
Extracting 90.9.txt OK
Extracting 91.9.txt OK
Extracting 92.9.txt OK
Extracting 93.9.txt OK
Extracting 94.9.txt OK
Extracting 95.9.txt OK
Extracting 96.9.txt OK
Extracting 97.9.txt OK
Extracting 98.9.txt OK
Extracting 99.9.txt OK
All OK
(6588, 1647)

nn.Sequential {
[input -> (0) -> (1) -> output]
(0): nn.Sequential {
[input -> (0) -> (1) -> (2) -> (3) -> output]
(0): nn.Sequential {
[input -> (0) -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> (12) -> (13) -> (14) -> (15) -> (16) -> (17) -> (18) -> (19) -> (20) -> (21) -> (22) -> (23) -> output]
(0): ValidConvolution 3->8 C3
(1): BatchNormReLU(8,eps=0.0001,momentum=0.9,affine=True)
(2): ValidConvolution 8->8 C3
(3): BatchNormReLU(8,eps=0.0001,momentum=0.9,affine=True)
(4): MaxPooling3/2
(5): ValidConvolution 8->16 C3
(6): BatchNormReLU(16,eps=0.0001,momentum=0.9,affine=True)
(7): ValidConvolution 16->16 C3
(8): BatchNormReLU(16,eps=0.0001,momentum=0.9,affine=True)
(9): MaxPooling3/2
(10): sparseconvnet.legacy.concatTable.ConcatTable {
input
|-> (0): ValidConvolution 16->16 C3 |-> (1): nn.Sequential {
[input -> (0) -> (1) -> (2) -> (3) -> (4) -> output]
(0): Convolution 16->8 C3/2
(1): BatchNormReLU(8,eps=0.0001,momentum=0.9,affine=True)
(2): ValidConvolution 8->8 C3
(3): BatchNormReLU(8,eps=0.0001,momentum=0.9,affine=True)
(4): Deconvolution 8->8 C3/2
}
+. -> output
}
(11): JoinTable: 16 + 8 -> 24
(12): BatchNormReLU(24,eps=0.0001,momentum=0.9,affine=True)
(13): sparseconvnet.legacy.concatTable.ConcatTable {
input
|-> (0): ValidConvolution 24->16 C3 |-> (1): nn.Sequential {
[input -> (0) -> (1) -> (2) -> (3) -> (4) -> output]
(0): Convolution 24->8 C3/2
(1): BatchNormReLU(8,eps=0.0001,momentum=0.9,affine=True)
(2): ValidConvolution 8->8 C3
(3): BatchNormReLU(8,eps=0.0001,momentum=0.9,affine=True)
(4): Deconvolution 8->8 C3/2
}
+. -> output
}
(14): JoinTable: 16 + 8 -> 24
(15): BatchNormReLU(24,eps=0.0001,momentum=0.9,affine=True)
(16): MaxPooling3/2
(17): sparseconvnet.legacy.concatTable.ConcatTable {
input
|-> (0): ValidConvolution 24->24 C3 |-> (1): nn.Sequential {
[input -> (0) -> (1) -> (2) -> (3) -> (4) -> output]
(0): Convolution 24->8 C3/2
(1): BatchNormReLU(8,eps=0.0001,momentum=0.9,affine=True)
(2): ValidConvolution 8->8 C3
(3): BatchNormReLU(8,eps=0.0001,momentum=0.9,affine=True)
(4): Deconvolution 8->8 C3/2
}
+. -> output
}
(18): JoinTable: 24 + 8 -> 32
(19): BatchNormReLU(32,eps=0.0001,momentum=0.9,affine=True)
(20): sparseconvnet.legacy.concatTable.ConcatTable {
input
|-> (0): ValidConvolution 32->24 C3 |-> (1): nn.Sequential {
[input -> (0) -> (1) -> (2) -> (3) -> (4) -> output]
(0): Convolution 32->8 C3/2
(1): BatchNormReLU(8,eps=0.0001,momentum=0.9,affine=True)
(2): ValidConvolution 8->8 C3
(3): BatchNormReLU(8,eps=0.0001,momentum=0.9,affine=True)
(4): Deconvolution 8->8 C3/2
}
+. -> output
}
(21): JoinTable: 24 + 8 -> 32
(22): BatchNormReLU(32,eps=0.0001,momentum=0.9,affine=True)
(23): MaxPooling3/2
}
(1): Convolution 32->64 C5/1
(2): BatchNormReLU(64,eps=0.0001,momentum=0.9,affine=True)
(3): SparseToDense(2)
}
(1): nn.Sequential {
[input -> (0) -> (1) -> output]
(0): nn.View(-1, 64)
(1): nn.Linear(64 -> 183)
}
}
('input spatial size',
95
95
[torch.LongTensor of size 2]
)
Replicating training set 10 times (1 epoch = 10 iterations through the training set = 10x6588 training samples)
{'weightDecay': 0.0001, 'initial_LR': 0.1, 'checkPoint': False, 'nEpochs': 100, 'LR_decay': 0.05, 'momentum': 0.9}
('#parameters', 97295)
THCudaCheck FAIL file=/b/wheel/pytorch-src/torch/lib/THC/generic/THCTensorMath.cu line=35 error=8 : invalid device function
Traceback (most recent call last):
File "VGGplus.py", line 38, in
{'nEpochs': 100, 'initial_LR': 0.1, 'LR_decay': 0.05, 'weightDecay': 1e-4})
File "/media/New_bt/ML/SparseConvNet/PyTorch/sparseconvnet/legacy/classificationTrainValidate.py", line 73, in ClassificationTrainValidate
model.forward(batch['input'])
File "/home/ozzie/anaconda2/lib/python2.7/site-packages/torch/legacy/nn/Module.py", line 33, in forward
return self.updateOutput(input)
File "/home/ozzie/anaconda2/lib/python2.7/site-packages/torch/legacy/nn/Sequential.py", line 36, in updateOutput
currentOutput = module.updateOutput(currentOutput)
File "/home/ozzie/anaconda2/lib/python2.7/site-packages/torch/legacy/nn/Sequential.py", line 36, in updateOutput
currentOutput = module.updateOutput(currentOutput)
File "/home/ozzie/anaconda2/lib/python2.7/site-packages/torch/legacy/nn/Sequential.py", line 36, in updateOutput
currentOutput = module.updateOutput(currentOutput)
File "/media/New_bt/ML/SparseConvNet/PyTorch/sparseconvnet/legacy/validConvolution.py", line 46, in updateOutput
torch.cuda.IntTensor() if input.features.is_cuda else nullptr)
File "/home/ozzie/anaconda2/lib/python2.7/site-packages/torch/utils/ffi/init.py", line 177, in safe_call
result = torch._C._safe_call(*args, **kwargs)
torch.FatalError: cuda runtime error (8) : invalid device function at /b/wheel/pytorch-src/torch/lib/THC/generic/THCTensorMath.cu:35

I guess that this should be a CUDA device issue related my GPU device number

I should use arch=compute_30,code=sm_30 because my GPU is nvidia k4000

oztc · 2017-07-18T14:18:09Z

my os is
uname -a
Linux debian 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt25-2 (2016-04-08) x86_64 GNU/Linux

python
Python 2.7.12 |Anaconda custom (64-bit)| (default, Jul 2 2016, 17:42:40)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org

oztc · 2017-07-18T14:39:46Z

my bug should be related to torch not SparseConvNet

btgraham · 2017-07-18T16:08:34Z

I have switched compute_20, code=sm_20 to compute_30,code=sm_30
in setup file.

gnedster · 2017-07-18T23:16:49Z

The hello-world.py example works now. Thanks for the quick fix!

gnedster closed this as completed Jul 18, 2017

angup143 mentioned this issue Nov 22, 2017

Type mismatch error in example code #20

Closed

eddiewrc mentioned this issue May 23, 2022

RuntimeError: CUDA error: an illegal memory access was encountered #231

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python Module Failing #1

Python Module Failing #1

gnedster commented Jul 18, 2017

oztc commented Jul 18, 2017

btgraham commented Jul 18, 2017 •

edited

oztc commented Jul 18, 2017

oztc commented Jul 18, 2017

oztc commented Jul 18, 2017 •

edited

btgraham commented Jul 18, 2017

gnedster commented Jul 18, 2017

Python Module Failing #1

Python Module Failing #1

Comments

gnedster commented Jul 18, 2017

oztc commented Jul 18, 2017

btgraham commented Jul 18, 2017 • edited

oztc commented Jul 18, 2017

oztc commented Jul 18, 2017

oztc commented Jul 18, 2017 • edited

btgraham commented Jul 18, 2017

gnedster commented Jul 18, 2017

btgraham commented Jul 18, 2017 •

edited

oztc commented Jul 18, 2017 •

edited