Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error information when I run the, gsimclr.py --DS ENZYMES --lr 0.01 --local --num-gc-layers 3 --aug random4 --seed 0 #29

Open
Austinzhenghua opened this issue Jun 29, 2021 · 13 comments

Comments

@Austinzhenghua
Copy link

600
1

lr: 0.01
num_features: 1
hidden_dim: 32
num_gc_layers: 3

/opt/conda/conda-bld/pytorch_1623448224956/work/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [158,0,0], thread: [105,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1623448224956/work/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [158,0,0], thread: [55,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1623448224956/work/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [158,0,0], thread: [56,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1623448224956/work/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [158,0,0], thread: [57,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1623448224956/work/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [158,0,0], thread: [58,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1623448224956/work/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [158,0,0], thread: [59,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1623448224956/work/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [158,0,0], thread: [60,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1623448224956/work/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [158,0,0], thread: [61,0,0] Assertion srcIndex < srcSelectDimSize failed.
Traceback (most recent call last):
File "/home/zhenghua/pythoncode/unsupervised_graph_TU/gsimclr.py", line 190, in
emb, y = model.encoder.get_embeddings(dataloader_eval)
File "/home/zhenghua/pythoncode/unsupervised_graph_TU/gin.py", line 76, in get_embeddings
x, _ = self.forward(x, edge_index, batch)
File "/home/zhenghua/pythoncode/unsupervised_graph_TU/gin.py", line 52, in forward
x = F.relu(self.convs[i](x, edge_index))
File "/home/zhenghua/.conda/envs/pytorchgeo/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in call_impl
return forward_call(*input, **kwargs)
File "/home/zhenghua/.conda/envs/pytorchgeo/lib/python3.7/site-packages/torch_geometric/nn/conv/gin_conv.py", line 64, in forward
out = self.propagate(edge_index, x=x, size=size)
File "/home/zhenghua/.conda/envs/pytorchgeo/lib/python3.7/site-packages/torch_geometric/nn/conv/message_passing.py", line 253, in propagate
out = self.aggregate(out, **aggr_kwargs)
File "/home/zhenghua/.conda/envs/pytorchgeo/lib/python3.7/site-packages/torch_geometric/nn/conv/message_passing.py", line 288, in aggregate
reduce=self.aggr)
File "/home/zhenghua/.conda/envs/pytorchgeo/lib/python3.7/site-packages/torch_scatter/scatter.py", line 153, in scatter
return scatter_sum(src, index, dim, out, dim_size)
File "/home/zhenghua/.conda/envs/pytorchgeo/lib/python3.7/site-packages/torch_scatter/scatter.py", line 21, in scatter_sum
return out.scatter_add
(dim, index, src)
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Can anyone help me with what wrong with the algorithm or the enviroment?

the environment as follows:

Jinja2 3.0.1 3.0.1
MarkupSafe 2.0.1 2.0.1
Pillow 8.2.0 8.2.0
PySocks 1.7.1 1.7.1
brotlipy 0.7.0 0.7.0
certifi 2020.6.20 2021.5.30
cffi 1.14.5 1.14.5
chardet 4.0.0 4.0.0
cryptography 3.4.7 3.4.7
cycler 0.10.0 0.10.0
decorator 4.4.2 5.0.9
googledrivedownloader 0.4 0.4
idna 2.10 3.2
joblib 1.0.1 1.0.1
kiwisolver 1.3.1 1.3.1
matplotlib 3.4.2 3.4.2
mkl-fft 1.3.0 1.3.0
mkl-random 1.2.1 1.2.2
mkl-service 2.3.0 2.4.0
networkx 2.5.1 2.6rc2
numpy 1.20.2 1.21.0
olefile 0.46 0.47.dev4
pandas 1.2.5 1.3.0rc1
pip 21.1.2 21.1.3
pyOpenSSL 20.0.1 20.0.1
pycparser 2.20 2.20
pyparsing 2.4.7 3.0.0b2
python-dateutil 2.8.1 2.8.1
python-louvain 0.15 0.15
pytz 2021.1 2021.1
requests 2.25.1 2.25.1
scikit-learn 0.24.2 0.24.2
scipy 1.6.2 1.7.0
seaborn 0.11.0 0.11.1
setuptools 52.0.0.post20210125 57.0.0
six 1.16.0 1.16.0
threadpoolctl 2.1.0 2.1.0
torch 1.9.0 1.9.0
torch-cluster 1.5.9 1.5.9
torch-geometric 1.7.2 1.7.2
torch-scatter 2.0.7 2.0.7
torch-sparse 0.6.10 0.6.10
torch-spline-conv 1.2.1 1.2.1
torchaudio 0.9.0a0+33b2469 0.9.0
torchvision 0.10.0 0.10.0
tornado 6.1 6.1
tqdm 4.61.1 4.61.1
typing-extensions 3.7.4.3 3.10.0.0
urllib3 1.26.6 1.26.6
wheel 0.36.2 0.36.2
@yyou1996
Copy link
Collaborator

Hi @Austinzhenghua,

Thanks for your feedback. Does torch_geometric==1.7.2 not work for you? You can take a try version 1.6.0/1.6.1 for this experiment.

@Austinzhenghua
Copy link
Author

Austinzhenghua commented Jun 29, 2021 via email

@yyou1996
Copy link
Collaborator

Just for a test, are you capable to run this https://github.com/fanyun-sun/InfoGraph/tree/master/unsupervised which the unsupervised_TU experiment is built on?

@Austinzhenghua
Copy link
Author

Just for a test, are you capable to run this https://github.com/fanyun-sun/InfoGraph/tree/master/unsupervised which the unsupervised_TU experiment is built on?

Yes, I can run this algorithm, but it seems it didn't use GPU to train. The error above did cause by the version of torch_geometric. Can you run it in your computrer? Thanks a lot!

@Austinzhenghua
Copy link
Author

Traceback (most recent call last):
File "/home/zhenghua/pythoncode/unsupervised_TU_zh/gsimclr.py", line 189, in
emb, y = model.encoder.get_embeddings(dataloader_eval)
File "/home/zhenghua/pythoncode/unsupervised_TU_zh/gin.py", line 77, in get_embeddings
x, _ = self.forward(x, edge_index, batch)
File "/home/zhenghua/pythoncode/unsupervised_TU_zh/gin.py", line 52, in forward
x = F.relu(self.convs[i](x, edge_index))
File "/home/zhenghua/.conda/envs/graphcontra/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/zhenghua/.conda/envs/graphcontra/lib/python3.6/site-packages/torch_geometric/nn/conv/gin_conv.py", line 63, in forward
out = self.propagate(edge_index, x=x, size=size)
File "/home/zhenghua/.conda/envs/graphcontra/lib/python3.6/site-packages/torch_geometric/nn/conv/message_passing.py", line 233, in propagate
kwargs)
File "/home/zhenghua/.conda/envs/graphcontra/lib/python3.6/site-packages/torch_geometric/nn/conv/message_passing.py", line 158, in collect
j if arg[-2:] == '_j' else i)
File "/home/zhenghua/.conda/envs/graphcontra/lib/python3.6/site-packages/torch_geometric/nn/conv/message_passing.py", line 127, in lift
return src.index_select(self.node_dim, index)
RuntimeError: index out of range: Tried to access index 4324 out of table with 4323 rows. at /opt/conda/conda-bld/pytorch_1579027003190/work/aten/src/TH/generic/THTensorEvenMoreMath.cpp:418

I run it on the CPU get this error.

@Austinzhenghua
Copy link
Author

image
image

I find the shape of x is different from your algorithm and infograph. the first one is infograph.

@yyou1996
Copy link
Collaborator

yyou1996 commented Jul 1, 2021

It works well on my machine. What is the command u use? Please take a look at readme https://github.com/Shen-Lab/GraphCL/tree/master/unsupervised_TU#readme.

@ztk1996
Copy link

ztk1996 commented Sep 6, 2021

Traceback (most recent call last):
File "/home/zhenghua/pythoncode/unsupervised_TU_zh/gsimclr.py", line 189, in
emb, y = model.encoder.get_embeddings(dataloader_eval)
File "/home/zhenghua/pythoncode/unsupervised_TU_zh/gin.py", line 77, in get_embeddings
x, _ = self.forward(x, edge_index, batch)
File "/home/zhenghua/pythoncode/unsupervised_TU_zh/gin.py", line 52, in forward
x = F.relu(self.convs[i](x, edge_index))
File "/home/zhenghua/.conda/envs/graphcontra/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/zhenghua/.conda/envs/graphcontra/lib/python3.6/site-packages/torch_geometric/nn/conv/gin_conv.py", line 63, in forward
out = self.propagate(edge_index, x=x, size=size)
File "/home/zhenghua/.conda/envs/graphcontra/lib/python3.6/site-packages/torch_geometric/nn/conv/message_passing.py", line 233, in propagate
kwargs)
File "/home/zhenghua/.conda/envs/graphcontra/lib/python3.6/site-packages/torch_geometric/nn/conv/message_passing.py", line 158, in collect
j if arg[-2:] == '_j' else i)
File "/home/zhenghua/.conda/envs/graphcontra/lib/python3.6/site-packages/torch_geometric/nn/conv/message_passing.py", line 127, in lift
return src.index_select(self.node_dim, index)
RuntimeError: index out of range: Tried to access index 4324 out of table with 4323 rows. at /opt/conda/conda-bld/pytorch_1579027003190/work/aten/src/TH/generic/THTensorEvenMoreMath.cpp:418

I run it on the CPU get this error.

I have the same error. Have you fixed it?

@yyou1996
Copy link
Collaborator

yyou1996 commented Sep 7, 2021

Hi @ztk1996,

I remember I tested the command and it worked ok in my machine. Would you also share your environment and the command you run?

@ztk1996
Copy link

ztk1996 commented Sep 7, 2021

Hi @ztk1996,

I remember I tested the command and it worked ok in my machine. Would you also share your environment and the command you run?

Thanks for your reply. Error information when I run "./go.sh 1 AIDS subgraph" on CPU is as follows.

  • for seed in 0 1 2 3 4
  • CUDA_VISIBLE_DEVICES=1
  • python gsimclr.py --DS AIDS --lr 0.01 --local --num-gc-layers 3 --aug subgraph --seed 0
    dataset length: 2000
    1
    ================
    lr: 0.01
    num_features: 1
    hidden_dim: 32
    num_gc_layers: 3
    ================
    Traceback (most recent call last):
    File "gsimclr.py", line 188, in
    emb, y = model.encoder.get_embeddings(dataloader_eval)
    File "/home/zt/GraphCL/unsupervised_TU/gin.py", line 89, in get_embeddings
    x, _ = self.forward(x, edge_index, batch)
    File "/home/zt/GraphCL/unsupervised_TU/gin.py", line 62, in forward
    x = F.relu(self.convs[i](x, edge_index))
    File "/home/zt/.conda/envs/GraphCL-test/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
    File "/home/zt/.conda/envs/GraphCL-test/lib/python3.8/site-packages/torch_geometric/nn/conv/gin_conv.py", line 64, in forward
    out = self.propagate(edge_index, x=x, size=size)
    File "/home/zt/.conda/envs/GraphCL-test/lib/python3.8/site-packages/torch_geometric/nn/conv/message_passing.py", line 233, in propagate
    coll_dict = self.collect(self.user_args, edge_index, size,
    File "/home/zt/.conda/envs/GraphCL-test/lib/python3.8/site-packages/torch_geometric/nn/conv/message_passing.py", line 157, in collect
    data = self.lift(data, edge_index,
    File "/home/zt/.conda/envs/GraphCL-test/lib/python3.8/site-packages/torch_geometric/nn/conv/message_passing.py", line 127, in lift
    return src.index_select(self.node_dim, index)
    IndexError: index out of range in self

torch: 1.7.0
torch-geometric: 1.7.2

@yyou1996
Copy link
Collaborator

yyou1996 commented Sep 7, 2021

@ztk1996

Please take a try to run with torch-geometric==1.6.0 and on GPU. Since both of you use torch-geometric>=1.7.0 and on CPU, I guess it might be the source of error.

@ztk1996
Copy link

ztk1996 commented Sep 7, 2021

@ztk1996

Please take a try to run with torch-geometric==1.6.0 and on GPU. Since both of you use torch-geometric>=1.7.0 and on CPU, I guess it might be the source of error.

I try to run with torch_geometric==1.6.0, pytorch==1.7.0 and on GPU. And the error information is as follows.

  • for seed in 0 1 2 3 4
  • CUDA_VISIBLE_DEVICES=0
  • python gsimclr.py --DS AIDS --lr 0.01 --local --num-gc-layers 3 --aug subgraph --seed 0
    dataset length: 2000
    1
    ================
    lr: 0.01
    num_features: 1
    hidden_dim: 32
    num_gc_layers: 3
    ================
    Traceback (most recent call last):
    File "gsimclr.py", line 188, in
    emb, y = model.encoder.get_embeddings(dataloader_eval)
    File "/home/zt/GraphCL/unsupervised_TU/gin.py", line 89, in get_embeddings
    x, _ = self.forward(x, edge_index, batch)
    File "/home/zt/GraphCL/unsupervised_TU/gin.py", line 62, in forward
    x = F.relu(self.convs[i](x, edge_index))
    File "/home/zt/.conda/envs/PYG160/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
    File "/home/zt/.conda/envs/PYG160/lib/python3.7/site-packages/torch_geometric/nn/conv/gin_conv.py", line 69, in forward
    return self.nn(out)
    File "/home/zt/.conda/envs/PYG160/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
    File "/home/zt/.conda/envs/PYG160/lib/python3.7/site-packages/torch/nn/modules/container.py", line 117, in forward
    input = module(input)
    File "/home/zt/.conda/envs/PYG160/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
    File "/home/zt/.conda/envs/PYG160/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 93, in forward
    return F.linear(input, self.weight, self.bias)
    File "/home/zt/.conda/envs/PYG160/lib/python3.7/site-packages/torch/nn/functional.py", line 1690, in linear
    ret = torch.addmm(bias, input, weight.t())
    RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling cublasCreate(handle)
    /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [48,0,0], thread: [89,0,0] Assertion srcIndex < srcSelectDimSize failed.
    /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [48,0,0], thread: [90,0,0] Assertion srcIndex < srcSelectDimSize failed.
    /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [48,0,0], thread: [91,0,0] Assertion srcIndex < srcSelectDimSize failed.
    /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [48,0,0], thread: [92,0,0] Assertion srcIndex < srcSelectDimSize failed.
    /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [48,0,0], thread: [93,0,0] Assertion srcIndex < srcSelectDimSize failed.
    /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [48,0,0], thread: [94,0,0] Assertion srcIndex < srcSelectDimSize failed.
    /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [48,0,0], thread: [95,0,0] Assertion srcIndex < srcSelectDimSize failed.
    /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [48,0,0], thread: [96,0,0] Assertion srcIndex < srcSelectDimSize failed.
    /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [48,0,0], thread: [97,0,0] Assertion srcIndex < srcSelectDimSize failed.
    /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [48,0,0], thread: [98,0,0] Assertion srcIndex < srcSelectDimSize failed.
    /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [48,0,0], thread: [99,0,0] Assertion srcIndex < srcSelectDimSize failed.
    /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [48,0,0], thread: [100,0,0] Assertion srcIndex < srcSelectDimSize failed.
    /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [48,0,0], thread: [101,0,0] Assertion srcIndex < srcSelectDimSize failed.
    /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [48,0,0], thread: [112,0,0] Assertion srcIndex < srcSelectDimSize failed.
    /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [48,0,0], thread: [113,0,0] Assertion srcIndex < srcSelectDimSize failed.
    /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [48,0,0], thread: [114,0,0] Assertion srcIndex < srcSelectDimSize failed.
    /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [48,0,0], thread: [115,0,0] Assertion srcIndex < srcSelectDimSize failed.
    /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [48,0,0], thread: [116,0,0] Assertion srcIndex < srcSelectDimSize failed.
    /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [48,0,0], thread: [117,0,0] Assertion srcIndex < srcSelectDimSize failed.
    /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [48,0,0], thread: [118,0,0] Assertion srcIndex < srcSelectDimSize failed.
    /opt/conda/conda-bld/pytorch_1603729047590/work/aten/src/ATen/native/cuda/Indexing.cu:658: indexSelectLargeIndex: block: [48,0,0], thread: [119,0,0] Assertion srcIndex < srcSelectDimSize failed.

Besides, when I run with torch_geometric==1.6.0, pytorch==1.7.0 and on CPU. The error information is the same as run with torch_geometric==1.7.2.

@yyou1996
Copy link
Collaborator

yyou1996 commented Sep 7, 2021

@ztk1996

My impression is that the version of torch_geometric and pytorch should be consistent (https://github.com/rusty1s/pytorch_geometric)? If using torch_geometric==1.6 I would also use pytorch==1.6. Please notify me if this also not works. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants