-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for older AMD GPU gfx803, gfx802, gfx805 (e.g. Radeon RX 580, FirePro W7100) #2453
Comments
One interesting observation. I managed to get my |
I'm trying to get this working on an RX 580. In the logs after sending a "prompt" (not sure of the lingo?).
I notice in the rocblas cmake file file that they removed support for gfx803 for the 6.0.X builds, so I downgraded to the 5.7.1 packages and rebuilt ollama using the PKGBUILD from #2473 Then when I sent the prompt I get this error:
The assertion is coming from Not sure how much help I can be here, but I can test things out if needed. This is the full output in the logs:
|
I ended up disabling This is where
So it seems like the sum should be greater than 0, idk what the implications are, but that seems to be one of preconditions of using this type which I tried this: Running the llama2 model:
I don't know if it's just messing with me, or if the bug is random.
|
@Todd-Fulton Same error here. do you know how fix this ? |
@wilkensgomes I downgraded to 5.7.1 rocm packages using downgrade on arch linux and then added them to Ignore at the end of the installation so that they don't get upgraded to 6.X packages. For the error: I turned off # CXXFLAGS="$CFLAGS -Wp,-D_GLIBCXX_ASSERTIONS"
CXXFLAGS="$CFLAGS" There might be a better way to disabling this in the PKGBUILD file just for building ollama/llama.cpp, but I haven't bothered with it, and just disabled the assertions globally. Reading over the discussion for the second error, the gibberish happens after disabling the asserts, as the initialize method for So as far as I can tell the gibberish is a result from certain models and small input prompts as said in the conversation. Somewhere between the model and the calculation of the probabilities, either some of them are negative, all are zero, or there is a NaN in there. For example, if for some reason a probability is a result of dividing a float by 0.0 Apart from some of the smaller models and a small input prompts that produce gibberish, everything has been working for me since yesterday. I'm not even sure if the gibberish is particular to polaris gpus. I spent a few hours using llama2:13b as a Dungeon Master yesterday, was mind blowing. |
I'm still getting familiar with these code bases, but I did some print debugging in I built both ollama and llama.cpp from their respective main branches, but took out the check for AMD version > 9 in ollama. In file void llama_sample_softmax(struct llama_context * ctx, llama_token_data_array * candidates) {
//...
//...
float max_l = candidates->data[0].logit;
float cum_sum = 0.0f;
std::stringstream plogs;
for (size_t i = 0; i < candidates->size; ++i) {
float p = expf(candidates->data[i].logit - max_l);
candidates->data[i].p = p;
cum_sum += p;
}
for (size_t i = 0; i < candidates->size - 1; ++i) {
candidates->data[i].p /= cum_sum;
plogs << "{ token: " << candidates->data[i].id
<< ", probability: " << candidates->data[i].p
<< ", logit: " << candidates->data[i].logit
<< "},\n";
}
candidates->data[candidates->size - 1].p /= cum_sum;
plogs << "{ token: " << candidates->data[candidates->size - 1].id
<< ", probability: " << candidates->data[candidates->size - 1].p
<< ", logit: " << candidates->data[candidates->size - 1].logit
<< " }\n";
std::string plogs_string = plogs.str();
LLAMA_LOG_INFO("Probabilities: [%s]\n", plogs_string.data());
//...
} I'll do my best to track down where the nans are coming from, it might be the gpu, which I have little experience in. I might try building rocm6.x from source if I can find an option to enable gfx803 support in the cmake files, and then build against that in case it's a bug in rocm 5.7.1 that I have installed. Short prompt, nans, nans everywhere:
A little bit longer prompt, the calculations look right here:
More detailed logs: |
Is it not possible to create a docker image that supports gfx803? It would be easier than doing trial and error. Two weeks ago I was trying to install Ollama for my RX580 and I was only able to use the CPU due to conflicting dependencies on Arch Linux and Ubuntu 22.04. |
This issue on llama.cpp seems to be the same bug. I'm currently going through the Rocm stack and building it from source using the main branches and trying to find out if I can reintroduce rx580 "support" with patches if needed. I will put up a script and patches if I'm successful in that and it solves the problem. We could create a docker image from that script, or just use the script to create binary packages, or PKGBUILDS if it comes to that. Various parts of the stack still seem to "support" gfx803 (rx580), while other seem to have at least officially dropped it, like rocBLAS (though it might still work if I just patch up the build scripts). I don't think this is a bug in ollama, but further down the stack. For example, clr introduced a As for the gibberish, I think that's a result of It might be worth trying even older versions of rocm than 5.7.1 if ollama and llama.cpp are still compatible with those, at least in the meantime. Adding support for older gpus without requiring downgrading rocm doesn't seem possible if rocm isn't going to support older gpus in the first place, users would still have to install older versions, or at least would require re-implementing that functionality. If the gibberish is coming from clBLAST, then that narrows that down and rocm support for older gpus is just a side issue, I think users will either have to work on support in the open source, or just use older packages. |
Any progress on this... ROCm successfully detects my gfx803 and it should work but ollama is blocking the card :/ |
Could this also be applied to gfx804? |
Support for Radeon RX 580/590 (I have a 590) would be super nice. Tried Ollama 0.1.30 update and is not possible yet. |
Please add support Older GPU's like RX 580 as Llama.cpp already support those GPU's |
@Todd-Fulton That's a regression with ROCm versions 6.0.* (see rocm-arch/rocm-arch#981). Downgrading to 5.7.1 will enable support for, e.g., Polaris cards again. |
True, using CLBlast. |
@6b6279 Can you give me detailed Instructions how to downgrade to 5.7.1 on Arch? I got an Rx 580 |
@DerRehberg Try (downgrade is available on the AUR: https://aur.archlinux.org/packages/downgrade) ollama won't use the GPU regardless, but it'll enable support for, e.g., the RX 580, while using darktable. |
@6b6279 And now give me detailed instruction how to run Stable Diffusion on an RX 580 |
@DerRehberg No idea. I use rocm only for image processing. |
Is there any update to this? I have a 580 and would like to use it in addition to another gpu. |
Helo. I'm a user of an Radeon Rx580 8GB and the statement that
is not entirely true. While it is not officially supported anymore you don't really need any workarounds to make ROCm work with these GPUs. I've been using OpenCL through ROCm for quite some time in Blender without any issues at all. All I needed to do is set an environment variable: I've tried dong so with Ollama but it seems that it disables the GPU manually as unsupported even if ROCm is able to run on it. From ArchWiki
Note I haven't used blender for some time and I switched to NixOS so I didn't test it right now. But if someone wants me to I'll look into it and see whether I can run ROCm on tha card without any additional setup. |
Officially ROCm no longer supports these cards, but it looks like other projects have found workarounds. Let's explore if that's possible. Best case, built-in to our binaries. Fall-back if that's not plausible is document how to build from source with the appropriate older ROCm library and AMD drivers installed on your system and build a local binary that works.
The text was updated successfully, but these errors were encountered: