-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(yet another) nvidia crash on ubuntu #2590
Comments
I have tried to install the latest version (340.108) although it is not sure if that would fix the problem. I have downloaded it from the nvidia 64-bit download page. It works, but it never overrides the apt version (340.107), when I type I have tried many workarounds, such as uninstalling the apt-get packages (this almost breaks my installation). I have tried running the nvidia installer with several options, including:
I have seen the this version has a precompiled ubuntu package, but it belongs to a ubuntu release newer than mine and I am not allowed even if I edit the apt sources. My edits are correct, the new version is found. Everything without success. Needless to say, I have rebooted after every attempt. I am always getting this:
Any ideas? |
I guess it is because of your OpenCL version. Updating you drive to support OpenCL 1.2 maybe will be success. |
Thanks CGLemon. I had already tried, and that's what I get, some kind of package version 2.2.8 which may be something related to ubuntu. At the end I realize that in clinfo there is a message that I could upgrade my OpenCL to version 1.2 and even 2.1. How do I do that? In the project README, it says OpenCL 1.1 should be enough, and there is neither reference to OpenCL1.2 nor to how to install it.
|
I found this official intel github page with instructions, and I have done as follows (Ubuntu section):
This has installed some new packages. But still clinfo is showing OpenCL 1.1 and leelaz crashes. |
Although OpenCL 1.1 is minimun version for leelaz. But the 'core dumped' may occurr wrong version. And OpenCL 1.2 is fine. |
Thanks CGLemon. It looks it is not possible to use leela with this version of ubuntu. I should upgrade the distro, but I am not allowed to that, I am already afraid that I have broken too many things (openCL package was locked). Too bad. If I have the time I will try to debug the code. Will that be useful? Does anybody know why it is crashing with a supposedly supported openCL/nvidia version combination? |
Did you try to run Leela with a (very) small network as delivered by the stable 25.1 version? |
No, and to me that does not look as a possible solution.
I understand this is an issue with CUDA version dependencies, that in my
ubuntu distro do not work. I'd appreciate to have feedback on how to
debug this. Otherwise I will have to wait until the distro is upgraded for
that machine.
…On Wed, 27 May 2020 at 00:21, GosseRomkes ***@***.***> wrote:
Did you try to run Leela with a (very) small network as delivered by the
stable 25.1 version?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#2590 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEZ5UI6TZWVJGQFTTS4ZOXDRTQ6FZANCNFSM4MMMB5AA>
.
|
I have a (now almost idle) pretty cool workstation availabe that runs ubuntu.
The machine has 24 cores and a NVIDIA 340 if I am not mistaken (I know nothing about GPUs/CUDA/OpenCL etc).
I have just downloaded and built leela, but the leela executable crashes. I have googled a bit and it seemed to be a nvidia drivers version issue. I checked issues #1360 and #1363 in this repo and as I am writing this github tells me there are 2 more. But none of them is with my NVIDIA model (340).
I managed to update the nvidia drivers (which was not that easy since the nvidia package was configured to be on hold, and I also had to reboot and update other nvidia-related packages).
The latest available version in my ubuntu ppa repositories for nvidia 340 is 107. According to the nvidia releases page that was released on June 6, 2018, that is after the referenced issues claim a bug was fixed.
But this does not fix the problem.
I see that there is a newer nvidia driver release (340.108) from December 2019. But it is not available yet on my APT repo (which I updated following the instructions in this project README).
Do you think it will fix the issue? Has anyone been using nvidia 340 and which driver are you using?
Follows my leelaz output. Is there any way to capture logs or debug?
The text was updated successfully, but these errors were encountered: