You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As I tried to chat with my llm through openai api, it just ran into a core dump: Memory access fault by GPU node-1 (Agent handle: 0x64e3ca9e2010) on address 0x75e681099000. Reason: Page not present or supervisor privilege.
Update: even in tg-webui itself the chat would crash
Is there an existing issue for this?
I have searched the existing issues
Reproduction
start the webui at a customed api port: python server.py --api --api-port 11451, and load a model
go to other GPT-webui like Sider or NextChat, set the api server and chat
error occurs!
Screenshot
the api server crashed with terminal message below:
the webui crashed:
with similar terminal output:
Logs
the first crash using api server:
❯ python server.py --api --api-port 11451
12:55:07-606488 INFO Starting Text generation web UI
12:55:07-608075 INFO Loading the extension "openai"
12:55:07-649839 INFO OpenAI-compatible API URL:
http://127.0.0.1:11451
Running on local URL: http://127.0.0.1:7860
12:55:16-379990 INFO Loading "Qwen-32B"
12:55:16-707760 WARNING You are running ExLlamaV2 without flash-attention. This will cause the VRAM usage to be a lot higher than it could be.
Try installing flash-attention following the instructions here: https://github.com/Dao-AILab/flash-attention#installation-and-features
12:55:49-579044 INFO LOADER: "ExLlamav2"
12:55:49-580220 INFO TRUNCATION LENGTH: 4096
12:55:49-580571 INFO INSTRUCTION TEMPLATE: "Custom (obtained from model metadata)"
12:55:49-580940 INFO Loaded the model in 33.20 seconds.
Memory access fault by GPU node-1 (Agent handle: 0x64e3ca9e2010) on address 0x75e681099000. Reason: Page not present or supervisor privilege.
[1] 616392 IOT instruction (core dumped) python server.py --api --api-port 11451
the second crash using tg-webui:
❯ python server.py --api --listen --model Qwen-32B
13:14:08-576771 INFO Starting Text generation web UI
13:14:08-578185 WARNING
You are potentially exposing the web UI to the entire internet without any access password.
You can create one with the "--gradio-auth" flag like this:
--gradio-auth username:password
Make sure to replace username:password with your own.
13:14:08-581204 INFO Loading "Qwen-32B"
13:14:08-932613 WARNING You are running ExLlamaV2 without flash-attention. This will cause the VRAM usage to be a lot higher than it could be.
Try installing flash-attention following the instructions here: https://github.com/Dao-AILab/flash-attention#installation-and-features
13:14:33-197401 INFO LOADER: "ExLlamav2"
13:14:33-198636 INFO TRUNCATION LENGTH: 4096
13:14:33-198966 INFO INSTRUCTION TEMPLATE: "Custom (obtained from model metadata)"
13:14:33-199306 INFO Loaded the model in 24.62 seconds.
13:14:33-199649 INFO Loading the extension "openai"
13:14:33-265414 INFO OpenAI-compatible API URL:
http://0.0.0.0:5000
Running on local URL: http://0.0.0.0:7860
Memory access fault by GPU node-1 (Agent handle: 0x6452407b78b0) on address 0x75980082f000. Reason: Page not present or supervisor privilege.
[1] 630061 IOT instruction (core dumped) python server.py --api --listen --model Qwen-32B
This is likely an old ROCm issue. Even if you system rocm is 6.0.3, pytorch bundles its own compiler which gets used for exllama's c extensions, and the current webui specifies rocm 5.6 prebuilt torch wheels still.
I found the XTX is way more stable if you use the Pytorch 2.3.0 release candidate w/ ROCm 6.0 then recompile exllama2 to match.
ROCm 5.6 and especially 5.7 are extremely unstable on my XTX. Frequent unrecoverable page faults including that exact error you see. If the roc6 exl2 method works I'd also recompile llama-cpp and other extensions to match for a rock solid experience.
Describe the bug
As I tried to chat with my llm through openai api, it just ran into a core dump:
Memory access fault by GPU node-1 (Agent handle: 0x64e3ca9e2010) on address 0x75e681099000. Reason: Page not present or supervisor privilege.
Update: even in tg-webui itself the chat would crash
Is there an existing issue for this?
Reproduction
python server.py --api --api-port 11451
, and load a modelScreenshot
the api server crashed with terminal message below:
the webui crashed:
with similar terminal output:
Logs
the first crash using api server:
the second crash using tg-webui:
System Info
The text was updated successfully, but these errors were encountered: