Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PRPLL 0.9-0 PRP problem #285

Closed
shenzhui007 opened this issue May 19, 2024 · 10 comments
Closed

PRPLL 0.9-0 PRP problem #285

shenzhui007 opened this issue May 19, 2024 · 10 comments

Comments

@shenzhui007
Copy link

Hello. I'm working on PRP task. PRP 0.8 worked fine for the task, but the log file contained duplicate lines. Now I move to PRP 0.9. Now the program outputs like follows for 10 mintues, then no more logs, and PRP task don't seem to start because no temp files are created.
I'm using self-built PRPLL 0.9-0.

20240520 06:26:05 files0 127496407 EE   1000000 on-load: ffcd8d32427f0060 vs. 807fc23afacc0a6f
20240520 06:26:13 files0 127496407 EE   1000000 on-load: a5425d547829a060 vs. 807fc23afacc0a6f
20240520 06:26:21 files0 127496407 EE   1000000 on-load: 66f35acb1c1df598 vs. 807fc23afacc0a6f
20240520 06:26:29 files0 127496407 EE   1000000 on-load: ba38b4ad90243fd8 vs. 807fc23afacc0a6f
20240520 06:26:38 files0 127496407 EE   1000000 on-load: a854b04c39bf8156 vs. 807fc23afacc0a6f
20240520 06:26:47 files0 127496407 EE   1000000 on-load: afe408e423dd6f54 vs. 807fc23afacc0a6f
20240520 06:26:56 files0 127496407 EE   1000000 on-load: d89da2061cb69225 vs. 807fc23afacc0a6f
20240520 06:27:05 files0 127496407 EE   1000000 on-load: 0ff296513784e02b vs. 807fc23afacc0a6f
20240520 06:27:14 files0 127496407 EE   1000000 on-load: 7d350e9a0439e822 vs. 807fc23afacc0a6f
20240520 06:27:23 files0 127496407 EE   1000000 on-load: 7052fddbcce0e53c vs. 807fc23afacc0a6f
20240520 06:27:32 files0 127496407 EE   1000000 on-load: 6f39fa53e8da2ddf vs. 807fc23afacc0a6f
20240520 06:27:42 files0 127496407 EE   1000000 on-load: 37c7f7ae8199b080 vs. 807fc23afacc0a6f
20240520 06:27:51 files0 127496407 EE   1000000 on-load: f5637572045b88df vs. 807fc23afacc0a6f
20240520 06:28:00 files0 127496407 EE   1000000 on-load: 6abd5514ee4eb5f1 vs. 807fc23afacc0a6f
20240520 06:28:09 files0 127496407 EE   1000000 on-load: 853a11644fa9c717 vs. 807fc23afacc0a6f
20240520 06:28:18 files0 127496407 EE   1000000 on-load: 937df57ded97f282 vs. 807fc23afacc0a6f
20240520 06:28:27 files0 127496407 EE   1000000 on-load: ab5738a2debb8bab vs. 807fc23afacc0a6f
20240520 06:28:37 files0 127496407 EE   1000000 on-load: e696af843c862a0f vs. 807fc23afacc0a6f
@preda
Copy link
Owner

preda commented May 20, 2024

Which GPU/OS? could you post the output displayed at startup of prpll?
In the meantime please move back to 0.8, while we diagnose the problem.

@shenzhui007
Copy link
Author

GPU: NVIDIA GeForce RTX 3070 Laptop GPU
OS:Kali Linux (based on Debian)
Startup log:

20240520 06:25:54 files PRPLL 0.9-0-g1d6174a
20240520 06:25:54 files config: -dir /home/$user/Applications/gpuowl/files 
20240520 06:25:56 files device 0, OpenCL 525.147.05, unique id ''
20240520 06:25:56 files0 127496407 FFT: 7M 1K:14:256 (17.37 bpw)
20240520 06:26:05 files0 127496407 EE   1000000 on-load: ffcd8d32427f0060 vs. 807fc23afacc0a6f
20240520 06:26:13 files0 127496407 EE   1000000 on-load: a5425d547829a060 vs. 807fc23afacc0a6f
20240520 06:26:21 files0 127496407 EE   1000000 on-load: 66f35acb1c1df598 vs. 807fc23afacc0a6f
20240520 06:26:29 files0 127496407 EE   1000000 on-load: ba38b4ad90243fd8 vs. 807fc23afacc0a6f
20240520 06:26:38 files0 127496407 EE   1000000 on-load: a854b04c39bf8156 vs. 807fc23afacc0a6f
20240520 06:26:47 files0 127496407 EE   1000000 on-load: afe408e423dd6f54 vs. 807fc23afacc0a6f

@preda preda closed this as completed in 0fee03b May 20, 2024
@preda preda reopened this May 20, 2024
@preda
Copy link
Owner

preda commented May 20, 2024

I updated the pre-builts https://github.com/preda/gpuowl/releases/tag/v%2Fprpll%2F0.9 or use self-built from the most recent commit.

Please let me know whether the issue is fixed.

@shenzhui007
Copy link
Author

The problem remains. No more logs and no temp files created. The Gpu is working at 100%.
Log file:

20240520 18:40:33 files PRPLL 0.9-6-g207cb9f
20240520 18:40:33 files config: -dir /home/$user/Applications/gpuowl/files 
20240520 18:40:33 files device 0, OpenCL 525.147.05, unique id ''
20240520 18:40:33 files0 127496407 FFT: 7M 1K:14:256 (17.37 bpw)
20240520 18:40:41 files0 127496407 EE   1000000 on-load: 8d3ca53e027bee88 vs. 807fc23afacc0a6f
20240520 18:40:49 files0 127496407 EE   1000000 on-load: 2f3d17ce247f9460 vs. 807fc23afacc0a6f
20240520 18:40:57 files0 127496407 EE   1000000 on-load: 161200e08cb0815f vs. 807fc23afacc0a6f
20240520 18:41:06 files0 127496407 EE   1000000 on-load: c8a09c8078cad405 vs. 807fc23afacc0a6f
20240520 18:41:15 files0 127496407 EE   1000000 on-load: 53c33e281d2e0f21 vs. 807fc23afacc0a6f
20240520 18:41:25 files0 127496407 EE   1000000 on-load: 24b8d0db14ef2201 vs. 807fc23afacc0a6f
20240520 18:41:34 files0 127496407 EE   1000000 on-load: 829b15533838c981 vs. 807fc23afacc0a6f
20240520 18:41:44 files0 127496407 EE   1000000 on-load: bd3acdbfbe609bc3 vs. 807fc23afacc0a6f
20240520 18:41:53 files0 127496407 EE   1000000 on-load: 7dee6afb788611de vs. 807fc23afacc0a6f
20240520 18:42:02 files0 127496407 EE   1000000 on-load: 77e78735f527df13 vs. 807fc23afacc0a6f
20240520 18:42:12 files0 127496407 EE   1000000 on-load: ec338b39d380d60e vs. 807fc23afacc0a6f
20240520 18:42:22 files0 127496407 EE   1000000 on-load: a279fef7874f424d vs. 807fc23afacc0a6f
20240520 18:42:31 files0 127496407 EE   1000000 on-load: 2c587956cf253fa7 vs. 807fc23afacc0a6f
20240520 18:42:41 files0 127496407 EE   1000000 on-load: fd1849016b28bdc1 vs. 807fc23afacc0a6f
20240520 18:42:51 files0 127496407 EE   1000000 on-load: 49b32ca567f0df6d vs. 807fc23afacc0a6f
20240520 18:43:00 files0 127496407 EE   1000000 on-load: 566ec2ec74a71b09 vs. 807fc23afacc0a6f
20240520 18:43:10 files0 127496407 EE   1000000 on-load: e2ad5a7ca1672dbc vs. 807fc23afacc0a6f
20240520 18:43:20 files0 127496407 EE   1000000 on-load: 606e7f6c22f0e6a8 vs. 807fc23afacc0a6f
20240520 18:43:30 files0 127496407 EE   1000000 on-load: 4a980f30aa509304 vs. 807fc23afacc0a6f
20240520 18:43:39 files0 127496407 EE   1000000 on-load: 9d56f9ca95e68774 vs. 807fc23afacc0a6f
20240520 18:43:49 files0 127496407 EE   1000000 on-load: d4478118d9514438 vs. 807fc23afacc0a6f
20240520 18:43:59 files0 127496407 EE   1000000 on-load: 53fe37322e7a6902 vs. 807fc23afacc0a6f
20240520 18:44:09 files0 127496407 EE   1000000 on-load: 4bd19ecdbde49e8f vs. 807fc23afacc0a6f
20240520 18:44:19 files0 127496407 EE   1000000 on-load: 1764fd5fff6041e0 vs. 807fc23afacc0a6f
20240520 18:44:29 files0 127496407 EE   1000000 on-load: ac4bedf5ecd1e467 vs. 807fc23afacc0a6f
20240520 18:44:39 files0 127496407 EE   1000000 on-load: 4a77437422dc0850 vs. 807fc23afacc0a6f
20240520 18:44:49 files0 127496407 EE   1000000 on-load: c364ab488b6e2516 vs. 807fc23afacc0a6f
20240520 18:44:59 files0 127496407 EE   1000000 on-load: a213c398d3ce9af5 vs. 807fc23afacc0a6f
20240520 18:45:09 files0 127496407 EE   1000000 on-load: d0f3472b0d93e79e vs. 807fc23afacc0a6f
20240520 18:45:19 files0 127496407 EE   1000000 on-load: 075f3eeaacabbe89 vs. 807fc23afacc0a6f

GPU situation from command nvtop:
Screenshot_2024-05-20_19-04-43

@shenzhui007
Copy link
Author

I jump back to the commits and find that my problem starts from commit ca4ec2e.
The log stops there and the prp task doesn't start.

20240520 19:45:57 files PRPLL 0.8-20-gca4ec2e
20240520 19:45:57 files config: -dir /home/$user/Applications/gpuowl/files 
20240520 19:45:57 files device 0, OpenCL 525.147.05, unique id ''
20240520 19:45:58 files0 127496407 FFT: 7M 1K:14:256 (17.37 bpw)

preda added a commit that referenced this issue May 20, 2024
@preda
Copy link
Owner

preda commented May 20, 2024

Have another go, let's see if this (most recent commit) fixes it; thanks!

@shenzhui007
Copy link
Author

Logs about commit f5cf119:

20240520 20:45:49 files PRPLL 0.9-7-gf5cf119
20240520 20:45:49 files config: -dir /home/$user/Applications/gpuowl/files 
20240520 20:45:49 files device 0, OpenCL 525.147.05, unique id ''
20240520 20:45:49 files0 127496407 FFT: 7M 1K:14:256 (17.37 bpw)
20240520 20:45:50 files0 127496407 In file included from <kernel>:1:
carryfused.cl:84:5: warning: implicit declaration of function 'sub_group_barrier' is invalid in OpenCL
    sub_group_barrier(CLK_GLOBAL_MEM_FENCE, memory_scope_device);
    ^
carryfused.cl:103:3: warning: implicit declaration of function 'sub_group_barrier' is invalid in OpenCL
  sub_group_barrier(CLK_GLOBAL_MEM_FENCE, memory_scope_device);
  ^

20240520 20:45:50 files0 127496407 In file included from <kernel>:1:
carryfused.cl:84:5: warning: implicit declaration of function 'sub_group_barrier' is invalid in OpenCL
    sub_group_barrier(CLK_GLOBAL_MEM_FENCE, memory_scope_device);
    ^
carryfused.cl:103:3: warning: implicit declaration of function 'sub_group_barrier' is invalid in OpenCL
  sub_group_barrier(CLK_GLOBAL_MEM_FENCE, memory_scope_device);
  ^

20240520 20:45:50 files0 127496407 Linking 'carryfused.cl' error LINK_PROGRAM_FAILURE (-17) (args -cl-finite-math-only )
20240520 20:45:50 files0 127496407 Can't compile carryfused.cl
20240520 20:45:50 files0  Exception "Can't compile carryfused.cl"
20240520 20:45:50 files Bye

@preda
Copy link
Owner

preda commented May 20, 2024

OK please try once more.

If it works, could you also try a run with

-use OLD_FENCE=0

and report whether that one works as well. (and if both work, which one's faster between -use OLD_FENCE=0 and -use OLD_FENCE=1 )

Thanks!

@shenzhui007
Copy link
Author

Running directly or setting OLD_FENCE=1 works. However, setting OLD_FENCE=0 doesn't work. In this case, the process runs and logs output correctly, but the iteration does not progress for 30 minutes. With OLD_FENCE=1, the iteration progresses every 3 minutes.

@preda
Copy link
Owner

preda commented May 20, 2024

Thanks! closing then; and we have some additional info in the process.

@preda preda closed this as completed May 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants