Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running on linux #64

Closed
TheMcSebi opened this issue Jul 21, 2021 · 7 comments
Closed

Running on linux #64

TheMcSebi opened this issue Jul 21, 2021 · 7 comments

Comments

@TheMcSebi
Copy link

I was trying to get this to run on Pop OS when I encountered an issue.
The installation steps all went fine, but when I was first tried to start the game using play-cuda.sh it didn't work because of this issue:

RUN apt update && apt install xorg -y

Resolved that by commenting out the line, because xorg was already installed.
Now when trying to run it I get this error:

Error response from daemon: error gathering device information while adding custom device "/dev/kfd": no such file or directory

Found out this might have something to do with rocm, which I don't have installed (Because I'm trying to run this on a 1080Ti).
Now I wonder if running this on linux is even supported since all the instructions are made for windows :)
I have to mention though, that it does run flawlessly in my Installation of Windows. No issues at any process step. Only tried the gpt neo 2.7B parameter set so far and runs fine on 11GB VRAM. Thanks for all the work that has already been put into this.

@henk717
Copy link
Collaborator

henk717 commented Aug 2, 2021

I do not have nvidia so my attempt at play-cuda.sh was completely blind and apparently unsuccesful. This will be quite tricky to solve over github since we will have to find out what ends up working one on one.

For that one line don't comment it out and add the following above it and then move these two lines to the bottom of the script:
USER root

This should elevate its permission to root at the last moment and install X11 (this is inside the docker not on your real installation). You need this inside the docker instance so it can draw the file selection window properly.

The missing device is most likely an issue with me plainly copying my amd version hoping it would work. Try removing that from the cuda docker files.

If it still gives issues i recommend joining the kobold discord at https://discord.gg/UCyXV7NssH so we can try and fix this one on one (I am Henky!! there).

@ghost
Copy link

ghost commented Oct 6, 2021

Hi on the topic Linux, to get KoboldAI to run on Arch you may need to modify the docker-compose.yml for it to see your nvidia GPU. If you don't it may lock up on large models.

version: "3.2"
services:
koboldai:
build: .
environment:
- DISPLAY=${DISPLAY}
network_mode: "host"
volumes:
- /tmp/.X11-unix:/tmp/.X11-unix
- ../:/content/
- $HOME/.Xauthority:/home/micromamba/.Xauthority:rw
devices:
- /dev/dri
- /dev/nvidia0:/dev/nvidia0
- /dev/nvidiactl:/dev/nvidiactl
- /dev/nvidia-uvm:/dev/nvidia-uvm
deploy:
resources:
reservations:
devices:
- driver: nvidia
device_ids: ['0']
capabilities: [gpu]
group_add:
- video

@henk717
Copy link
Collaborator

henk717 commented Oct 6, 2021

Feel free to issue that as a commit here on github, i don't have all the parts for my nvidia gpu yet.

@ghost
Copy link

ghost commented Oct 7, 2021

Done, I have made a pull request. Im not sure it's bug free though, but it is a start.

@henk717
Copy link
Collaborator

henk717 commented Oct 7, 2021

The old one wasn't working on the GPU's in general, leaving this issue open for further testing but i expect this to work well for most people!

@ghost
Copy link

ghost commented Oct 22, 2021

Hello, two things. The first an update on nvidia. Sometimes you may get this message like after a reboot,
"Error response from daemon: error gathering device information while adding custom device "/dev/nvidia-uvm": no such file or directory"
In this case you may have to comment out this line
- /dev/nvidia-uvm:/dev/nvidia-uvm
Run ./play-cuda.sh again, have it say GPU NOT FOUND, close the program, uncomment the line again, then rerun the program, don't know why that happens exactly.

Issue number 2 is to do with AMD. I'm working on the docker-rocm/docker-compose.yml file. My GPU was getting marked not found so I made this small change.

devices:
  - /dev/kfd:/dev/kfd
  - /dev/dri:/dev/dri

But now I am getting this error.

"hipErrorNoBinaryForGpu: Unable to find code object for all current devices!"

Does anyone have any ideas how to fix this? Thank you.

@henk717
Copy link
Collaborator

henk717 commented Oct 23, 2021

Which AMD gpu do you have? Not all of them are supported and you need the built in driver (not the pro driver) + ROCm to get a working conpute stack with them. Only very few xards are supported.

@henk717 henk717 closed this as completed Jan 14, 2022
henk717 added a commit that referenced this issue Feb 6, 2022
Prevent tokenizer from taking extra time the first time it's used
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants