Skip to content
Louis Maddox edited this page Feb 7, 2021 · 1 revision

As the guide within Ubuntu's tutorial for GPU data processing inside LXD says, the best way to install CUDA is directly from NVIDIA's site here

Either use the local installer:

cd ~/Downloads
wget https://developer.download.nvidia.com/compute/cuda/11.2.0/local_installers/cuda_11.2.0_460.27.04_linux.run
sudo sh cuda_11.2.0_460.27.04_linux.run

or the network installer:

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/7fa2af80.pub
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/ /"
sudo apt-get update
sudo apt-get -y install cuda

"Existing package manager installation of the driver found, it is strongly recommended that you uninstall that before continuing"

As Viacheslav Kovalevskyi writes, you should choose the .run (local) script rather than the .deb options:

I would strongly recommend use the installer script. First of all, it is agnostic to the version of the Linux that is used. Secondly, unlike some binary pre-build packages like deb file you can control where exactly CUDA library files will be installed.

You should not use a version of the NVIDIA driver with a version number lower than that in your package installer name: e.g. cuda_11.2.0_460.27.04_linux.run indicates CUDA 11.2 built with NVIDIA driver 460.27 (see your nvidia-smi header: for me 460.32.03 >= 460.27.04 so it'll work)

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
...

If you've already got the NVIDIA driver, you'll get an error

│ Existing package manager installation of the driver found. It is strongly    │
│ recommended that you remove this before continuing.                          │
│ Abort                                                                        │
│ Continue 

If you're not sure what you do have installed for some reason, check with

apt list --installed | grep nvidia

(or if you're on Debian I think it's dpkg -l | grep nvidia)

If you've already installed the driver from the package manager you'll see packages like nvidia-driver-460

To ignore the warning message and install just the CUDA toolkit, run

sudo ./cuda_11.2.0_460.27.04_linux.run --toolkit

(This is usually done)

To instead get the list of all the installed nvidia packages as a list on one line run

apt list --installed | grep nvidia | cut -d "/" -f 1 | tr "\n" " "

e.g.

libnvidia-cfg1-460 libnvidia-common-460 libnvidia-compute-460 libnvidia-compute-460 libnvidia-decode-460 libnvidia-decode-460 libnvidia-encode-460 libnvidia-encode-460 libnvidia-extra-460 libnvidia-fbc1-460 libnvidia-fbc1-460 libnvidia-gl-460 libnvidia-gl-460 libnvidia-ifr1-460 libnvidia-ifr1-460 nvidia-compute-utils-460 nvidia-dkms-460 nvidia-driver-460 nvidia-kernel-common-460 nvidia-kernel-source-460 nvidia-prime-applet nvidia-prime nvidia-settings nvidia-utils-460 xserver-xorg-video-nvidia-460

The operating system urges you to install these upon installing (see Linux Mint 20.1 NVIDIA graphics card driver setup), specifically nvidia-driver-460 warning that "Your system is currently running without video hardware acceleration."

In theory then it should suffice to uninstall that single package to take the rest away with it, after which we can then install CUDA from the run file.

To uninstall the Debian driver package run

sudo apt-get --purge remove nvidia-driver-460
  • Oops: this was just sudo apt purge nvidia-driver-460, it should be sudo apt purge --autoremove nvidia-driver-460 (see below)

This tells you that

The following packages were automatically installed and are no longer required:
  libatomic1:i386 libbsd0:i386 libdrm-amdgpu1:i386 libdrm-intel1:i386
  libdrm-nouveau2:i386 libdrm-radeon1:i386 libdrm2:i386 libedit2:i386 libelf1:i386
  libexpat1:i386 libffi7:i386 libgl1:i386 libgl1-mesa-dri:i386 libglapi-mesa:i386
  libglvnd0:i386 libglx-mesa0:i386 libglx0:i386 libllvm11:i386 libnvidia-cfg1-460
  libnvidia-common-460 libnvidia-compute-460:i386 libnvidia-decode-460
  libnvidia-decode-460:i386 libnvidia-encode-460 libnvidia-encode-460:i386
  libnvidia-extra-460 libnvidia-fbc1-460 libnvidia-fbc1-460:i386 libnvidia-gl-460
  libnvidia-gl-460:i386 libnvidia-ifr1-460 libnvidia-ifr1-460:i386 libpciaccess0:i386
  libsensors5:i386 libstdc++6:i386 libvulkan1:i386 libwayland-client0:i386 libx11-6:i386
  libx11-xcb1:i386 libxau6:i386 libxcb-dri2-0:i386 libxcb-dri3-0:i386 libxcb-glx0:i386
  libxcb-present0:i386 libxcb-randr0:i386 libxcb-sync1:i386 libxcb-xfixes0:i386
  libxcb1:i386 libxdamage1:i386 libxdmcp6:i386 libxext6:i386 libxfixes3:i386 libxnvctrl0
  libxshmfence1:i386 libxxf86vm1:i386 mesa-vulkan-drivers:i386 nvidia-compute-utils-460
  nvidia-dkms-460 nvidia-kernel-common-460 nvidia-kernel-source-460 nvidia-prime
  nvidia-settings nvidia-utils-460 screen-resolution-extra xserver-xorg-video-nvidia-460
  • Note the "screen-resolution-extra" package: if uninstalled, your resolution will be constrained

If we take off the colons (saving to a file deb_uninstalls.txt):

xclip -o | tail --lines=+2 | cut -d " " -f 3- | tr " " "\n" | cut -d ":" -f 1

and compare to the earlier list (apt list --installed | ... but without the last tr call) as nvidia_grepped_installs.txt, summarising with:

# diff deb_uninstalls.txt nvidia_grepped_installs.txt | grep -E "^>|<"
echo "<" $(diff deb_uninstalls.txt nvidia_grepped_installs.txt | grep -E "^<" | cut -d " " -f 2)
echo ">" $(diff deb_uninstalls.txt nvidia_grepped_installs.txt | grep -E "^>" | cut -d " " -f 2)

< libatomic1 libbsd0 libdrm-amdgpu1 libdrm-intel1 libdrm-nouveau2 libdrm-radeon1 libdrm2 libedit2
libelf1 libexpat1 libffi7 libgl1 libgl1-mesa-dri libglapi-mesa libglvnd0 libglx-mesa0 libglx0
libllvm11 libpciaccess0 libsensors5 libstdc++6 libvulkan1 libwayland-client0 libx11-6 libx11-xcb1
libxau6 libxcb-dri2-0 libxcb-dri3-0 libxcb-glx0 libxcb-present0 libxcb-randr0 libxcb-sync1
libxcb-xfixes0 libxcb1 libxdamage1 libxdmcp6 libxext6 libxfixes3 libxnvctrl0 libxshmfence1
libxxf86vm1 mesa-vulkan-drivers screen-resolution-extra

> libnvidia-compute-460 nvidia-driver-460 nvidia-prime-applet

i.e. the packages that will be uninstalled include everything except libnvidia-compute-460 and nvidia-prime-applet (obviously nvidia-driver-460 was not mentioned as it's the primary uninstallation target)

In other words we might want to remember to uninstall libnvidia-compute-460 afterwards

We can also check

apt list --installed | grep 460 | cut -d "/" -f 1 > 460_grepped_installs.txt

and do the same

# diff deb_uninstalls.txt 460_grepped_installs.txt | grep -E "^>|<"
echo "<" $(diff deb_uninstalls.txt 460_grepped_installs.txt | grep -E "^<" | cut -d " " -f 2)
echo ">" $(diff deb_uninstalls.txt 460_grepped_installs.txt | grep -E "^>" | cut -d " " -f 2)

< libatomic1 libbsd0 libdrm-amdgpu1 libdrm-intel1 libdrm-nouveau2 libdrm-radeon1 libdrm2 libedit2
libelf1 libexpat1 libffi7 libgl1 libgl1-mesa-dri libglapi-mesa libglvnd0 libglx-mesa0 libglx0
libllvm11 libpciaccess0 libsensors5 libstdc++6 libvulkan1 libwayland-client0 libx11-6 libx11-xcb1
libxau6 libxcb-dri2-0 libxcb-dri3-0 libxcb-glx0 libxcb-present0 libxcb-randr0 libxcb-sync1
libxcb-xfixes0 libxcb1 libxdamage1 libxdmcp6 libxext6 libxfixes3 libxnvctrl0 libxshmfence1
libxxf86vm1 mesa-vulkan-drivers nvidia-prime nvidia-settings screen-resolution-extra

> libnvidia-compute-460 nvidia-driver-460

Again libnvidia-compute-460 is the only one to watch out for lagging behind on the system.

!!! At this point I realised I didn't run the purge with --autoremove option earlier, so I re-ran the checks which gave

  libatomic1:i386* libbsd0:i386* libdrm-amdgpu1:i386* libdrm-intel1:i386* libdrm-nouveau2:i386* libdrm-radeon1:i386* libdrm2:i386* libedit2:i386* libelf1:i386* libexpat1:i386* libffi7:i386* libgl1:i386* libgl1-mesa-dri:i386*
  libglapi-mesa:i386* libglvnd0:i386* libglx-mesa0:i386* libglx0:i386* libllvm11:i386* libnvidia-cfg1-460* libnvidia-common-460* libnvidia-compute-460:i386* libnvidia-decode-460* libnvidia-decode-460:i386* libnvidia-encode-460*
  libnvidia-encode-460:i386* libnvidia-extra-460* libnvidia-fbc1-460* libnvidia-fbc1-460:i386* libnvidia-gl-460* libnvidia-gl-460:i386* libnvidia-ifr1-460* libnvidia-ifr1-460:i386* libpciaccess0:i386* libsensors5:i386* libstdc++6:i386*
  libvulkan1:i386* libwayland-client0:i386* libx11-6:i386* libx11-xcb1:i386* libxau6:i386* libxcb-dri2-0:i386* libxcb-dri3-0:i386* libxcb-glx0:i386* libxcb-present0:i386* libxcb-randr0:i386* libxcb-sync1:i386* libxcb-xfixes0:i386*
  libxcb1:i386* libxdamage1:i386* libxdmcp6:i386* libxext6:i386* libxfixes3:i386* libxnvctrl0* libxshmfence1:i386* libxxf86vm1:i386* mesa-vulkan-drivers:i386* nvidia-compute-utils-460* nvidia-dkms-460* nvidia-driver-460*
  nvidia-kernel-common-460* nvidia-kernel-source-460* nvidia-prime* nvidia-settings* nvidia-utils-460* screen-resolution-extra* xserver-xorg-video-nvidia-460*

and compared the results

xclip -o | cut -d " " -f 3- | tr " " "\n" | cut -d ":" -f 1 | cut -d "*" -f 1 > autoremoved_deb_uninstalls.txt

Then show the differences (saving this as show_install_uninstall_diff.sh):

# diff autoremoved_deb_uninstalls.txt nvidia_grepped_installs.txt | grep -E "^>|<"
echo "<" $(diff autoremoved_deb_uninstalls.txt nvidia_grepped_installs.txt | grep -E "^<" | cut -d " " -f 2)
echo
echo ">" $(diff autoremoved_deb_uninstalls.txt nvidia_grepped_installs.txt | grep -E "^>" | cut -d " " -f 2)

echo "\n\n"

# diff autoremoved_deb_uninstalls.txt 460_grepped_installs.txt | grep -E "^>|<"
echo "<" $(diff autoremoved_deb_uninstalls.txt 460_grepped_installs.txt | grep -E "^<" | cut -d " " -f 2)
echo
echo ">" $(diff autoremoved_deb_uninstalls.txt 460_grepped_installs.txt | grep -E "^>" | cut -d " " -f 2)

(Nothing changed, > libnvidia-compute-460, nvidia-prime-applet and > libnvidia-compute-460 just like last time)

To clean out the NVIDIA driver you installed (note: you will often see apt-get remove --purge which is a synonym for apt purge, use the shorter)

  • Edit - as mentioned below, in hindsight I'm curious whether this next command should target nvidia-graphics-driver-460 instead of nvidia-driver-460 (specifically to remove nvidia-driver-460 which comes with nvidia-graphics-driver-460). I think this is known as the 'metapackage'.
sudo apt purge --autoremove nvidia-driver-460 -y
sudo apt autoclean

nvidia-smi is now gone, and if you re-run the grep checks...

apt list --installed | grep nvidia | cut -d "/" -f 1 | tr "\n" " "
apt list --installed | grep 460 | cut -d "/" -f 1 | tr "\n" " "

We now get

libnvidia-compute-460 nvidia-prime-applet
libnvidia-compute-460

...as expected!

If we let bash complete nvidia- (+ tab) there are still:

nvidia-detector                nvidia-optimus-offload-vulkan
nvidia-optimus-offload-glx

...so something wasn't quite thorough... I didn't check this beforehand on my installation but from what I can see on the web, nvidia-prime-applet may in fact have shipped with Mint 20. libnvidia-compute-460 on the other hand has definitely been missed during the 460 uninstall, so I'm going to go ahead and remove that one myself.

  • apt show libnvidia-compute-460 shows that this came from nvidia-graphics-drivers-460, whereas I purged nvidia-driver-460, so perhaps I should have purged the metapackage...
sudo apt purge --autoremove libnvidia-compute-460 -y
sudo apt autoclean

There's a section on pre-installation actions for Ubuntu (and other OSs)

Viacheslav suggests:

sudo ./cuda-11.2.run --silent --toolkit --toolkitpath=/usr/local/cuda-11.2
  • But note that this --silent flag will suppress the interactive prompt
  • If --toolkitpath is not provided it defaults to /usr/local/cuda-11.2, so this is unnecessary to provide
  • For all options see the advanced options section of the installation guide

I was going to run:

sudo ./cuda-11.2.run --toolkit

(Read the rest of this section before going ahead!)

However since I just uninstalled my NVIDIA driver package, I'm going to want to get that back, so forget about the --toolkit flag now and install all parts:

So I simply sudo ./cuda-11.2.run and accept the EULA.

To install multiple versions of CUDA, it's advised in the article above that

IMPORTANT: cuda installer creates a link /usr/local/cuda to the installation folder. Therefore, it’s important either to remove the link, or to modify it to point to the CUDA that you want to use by default.

I.e. it will symlink the default CUDA to the installed CUDA. If this is your first CUDA don't worry about it, if you're installing an older version then you might want to symlink it back to the newer one after installation.

However if you want to avoid this, you don't need to modify it: under Toolkit Options in the installer you can opt out of this symlink (which is why --silent is not necessarily the easy choice).

  • Access the Toolkit Options either by going to Options > Toolkit Options or press a with the cursor over [X] [CUDA Toolkit 11.2]

Hit Install to exit the wizard back onto the command line

...and "Installation failed. See log at /var/log/cuda-installer.log for details."

This log shows that

[INFO]: Driver not installed.
[INFO]: Checking compiler version...
[INFO]: gcc location: /usr/bin/gcc

[INFO]: gcc version: gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)

[INFO]: Initializing menu
[INFO]: Setup complete
[INFO]: Components to install:
[INFO]: Driver
[INFO]: 460.27.04
[INFO]: Executing NVIDIA-Linux-x86_64-460.27.04.run --ui=none --no-questions --accept-license --disable-nouveau --no-cc-version-check --install-libglvnd  2>&1
[INFO]: Finished with code: 256
[ERROR]: Install of driver component failed.
[ERROR]: Install of 460.27.04 failed, quitting

...after all that fuss demanding I uninstall the driver, it then fails to install the driver...

The CUDA installer attempted to install the 418.87.00 driver and the driver installation failed. To find out why the driver installation failed, you’ll need to check the driver installer log.

That log would typically be at: /var/log/nvidia-installer.log

This shows:

-> The file '/tmp/.X0-lock' exists and appears to contain the process ID '1314' of a running X server. ERROR: You appear to be running an X server; please exit X before installing. For further details, please see the section INSTALLING THE NVIDIA DRIVER in the README available on the Linux driver download page at www.nvidia.com. ERROR: Installation has failed. Please see the file '/var/log/nvidia-installer.log' for details. You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.

In Mint 20.1 (presumably also for Ubuntu 20.04) the X server is LightDM (it was previously gdm),

sudo service lightdm stop

is how to stop it, and then later sudo service lightdm start will restart it.

This will kill your X server (including tmux server running on it) leaving you with a black screen so save anything you have open and get ready to SSH into it (or figure out how to get a terminal, I prefer to just SSH into it from another machine).

Then cd back into the directory where you put your .run installer file and re-run it

This time I got

-> An alternate method of installing the NVIDIA driver was detected. (This is usually a package
provided by your distributor.) A driver installed via that method may integrate better with your
system than a driver installed by nvidia-installer.

Please review the message provided by the maintainer of this alternate installation method and
decide how to proceed:

The NVIDIA driver provided by Ubuntu can be installed by launching the "Software & Updates"
application, and by selecting the NVIDIA driver from the "Additional Drivers" tab.


(Answer: Continue installation)
-> For some distributions, Nouveau can be disabled by adding a file in the modprobe configuration
directory.  Would you like nvidia-installer to attempt to create this modprobe file for you?
(Answer: Yes)
-> One or more modprobe configuration files to disable Nouveau have been written.  For some
distributions, this may be sufficient to disable Nouveau; other distributions may require
modification of the initial ramdisk.  Please reboot your system and attempt NVIDIA driver
installation again.  Note if you later wish to re-enable Nouveau, you will need to delete these
files: /usr/lib/modprobe.d/nvidia-installer-disable-nouveau.conf,
/etc/modprobe.d/nvidia-installer-disable-nouveau.conf
-> Would you like to register the kernel module sources with DKMS? This will allow DKMS to
automatically build a new module, if you install a different kernel later. (Answer: No)
ERROR: You do not appear to have libc header files installed on your system.  Please install your
distribution's libc development package.
ERROR: Installation has failed.  Please see the file '/var/log/nvidia-installer.log' for details.
You may find suggestions on fixing installation problems in the README available on the Linux driver
download page at www.nvidia.com.

Annoyingly, although it complained that I should remove the drivers, it now tells me 'helpfully' that I can get the NVIDIA drivers from the package manager! It adds that

A driver installed via that method may integrate better with your system than a driver installed by nvidia-installer.

Thanks for that!

Bearing this in mind, I would now prefer to go back and ignore the warning about there being a driver already installed (as this is conflicting advice with the suggestion to install the driver via the system package manager).

At this point you can either restart the system or restart the X server. The installer log mentioned

Please reboot your system and attempt NVIDIA driver installation again.

but first I want to just restart the X server

sudo service lightdm start

At this point the display switches back on (from a black screen) but with low quality resolution. Immediately upon login you get the popup mentioned in Linux Mint 20.1 NVIDIA graphics card driver setup to “Check your video drivers”.

Since it was advised to shut down the machine (above) I ignored this popup and tried to shut down by clicking the power icon but for some reason it just logged out, and then clicking it again brought up a "quit" dialog box but with no button to actually shut down (?).

Instead I just ran shutdown "now" over SSH. Initially it looked like nothing happened (the SSH session closed) but then after a brief delay the machine powered down and I could boot it up again.

Upon logging back (still with the low resolution display due to the uninstalled "screen-resolution-extra" package), the popup re-appeared and I repeated the installation as described at the link and restarted the machine)

This time the screen resolution was back to normal, nvidia-smi was back, and checking for nvidia packages showed libnvidia-compute-460 had returned:

apt list --installed | grep nvidia | cut -d "/" -f 1 | tr "\n" " "

libnvidia-cfg1-460 libnvidia-common-460 libnvidia-compute-460 libnvidia-compute-460
libnvidia-decode-460 libnvidia-decode-460 libnvidia-encode-460 libnvidia-encode-460
libnvidia-extra-460 libnvidia-fbc1-460 libnvidia-fbc1-460 libnvidia-gl-460 libnvidia-gl-460
libnvidia-ifr1-460 libnvidia-ifr1-460 nvidia-compute-utils-460 nvidia-dkms-460 nvidia-driver-460
nvidia-kernel-common-460 nvidia-kernel-source-460 nvidia-prime-applet nvidia-prime nvidia-settings
nvidia-utils-460 xserver-xorg-video-nvidia-460

So the only thing left to do is to re-run the installer and this time just un-check the driver (but get all the samples etc.)

  • I would suggest to just deselect it with a flag but I'm not sure which flag there is to select CUDA demo suite, and documentation (and I don't see a good reason to exclude them), only flags for --toolkit and --samples are listed.
  • At a guess, the docs might be included in the toolkit (but they're a separate bullet point in the installer wizard so I don't know for sure).
    • The --no-man-page flag is likely the same as excluding "CUDA Documentation 11.2" (so just --toolkit --samples will include them, but I'm unclear what controls inclusion of the demo suite...

Edit: lastly, the samples require some 3rd party libraries:

sudo apt install g++ freeglut3-dev build-essential libx11-dev libxmu-dev libxi-dev libglu1-mesa libglu1-mesa-dev

To cut to the chase here, you end up clicking install on this config:

│ CUDA Installer               
│ - [ ] Driver                 
│      [ ] 460.27.04           
│ + [X] CUDA Toolkit 11.2      
│   [X] CUDA Samples 11.2      
│   [X] CUDA Demo Suite 11.2   
│   [X] CUDA Documentation 11.2

This will run on a single core for a while and then

===========
= Summary =
===========

Driver:   Not Selected
Toolkit:  Installed in /usr/local/cuda-11.2/
Samples:  Installed in /home/louis/, but missing recommended libraries

Please make sure that
 -   PATH includes /usr/local/cuda-11.2/bin
 -   LD_LIBRARY_PATH includes /usr/local/cuda-11.2/lib64, or, add /usr/local/cuda-11.2/lib64 to
     /etc/ld.so.conf and run ldconfig as root

To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-11.2/bin
***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of
version at least 460.00 is required for CUDA 11.2 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller>
with the name of this run file:
    sudo <CudaInstaller>.run --silent --driver

Logfile is /var/log/cuda-installer.log
  • Annoyingly samples were "missing recommended libraries", I've gone back and added these as a note above. The samples have gone into ~/NVIDIA_CUDA-11.2_Samples
  • I have to update my PATH to include /usr/local/cuda-11.2/bin and LD_LIBRARY_PATH to include /usr/local/cuda-11.2/lib64
    export PATH="/usr/local/cuda-11.2/bin:$PATH"
    export LD_LIBRARY_PATH="/usr/local/cuda-11.2/lib64:$LD_LIBRARY_PATH"
  • Note that the NVIDIA guide suggests something a little different (which looks like it takes substrings... I'm going to avoid this and do it the standard way above as it's simpler, more consistent with my .bashrc and more readable)
    export PATH=/usr/local/cuda-11.2/bin${PATH:+:${PATH}}
    export LD_LIBRARY_PATH=/usr/local/cuda-11.2/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

To check your installed driver version, run:

nvidia-smi --query-gpu=driver_version --format=csv,noheader

460.32.03
  • See man nvidia-smi for more options and here for some examples of useful queries

If you've installed a package like PyTorch for Python, you can now run:

torch.cuda.is_available()

True
Clone this wiki locally