Expected Behavior
The model either works, or provides a clear error message on why it can't
Ideally it filters the models dropdown by the models that my system is capable running, instead of having me figure it out.
Actual Behavior
Cryptic error messages and failed chats
Steps to Reproduce the Problem
- installed Foundry local in vscode
- installed CUDA 13.3.0
- restarted visual studio code
- All optimized models throw cryptic error mesages
Specifications
- extension version: 1.4.2
- Version: vscode 1.123.0
- Platform: Windows x64
I tried -npu, -cuda, -gpu -openvino models. Only the -gpu seems to try to work, but takes FOREVER even for small requets
Error Log in Output
Sorry, your request failed. Please try again.
Client Request Id: b820b255-7b13-4037-b521-bba45fc1ba4c
Reason: Unable to call the qwen2.5-0.5b-instruct-openvino-npu:4 inference endpoint due to 400. Please check if the input or configuration is correct.: Error: Unable to call the qwen2.5-0.5b-instruct-openvino-npu:4 inference endpoint due to 400. Please check if the input or configuration is correct. at t.InferenceError (c:\Users\JesseHouwing.vscode\extensions\ms-windows-ai-studio.windows-ai-studio-1.4.2-win32-x64\dist\extension.js:2:2998159) at v.handleOpenAIError (c:\Users\JesseHouwing.vscode\extensions\ms-windows-ai-studio.windows-ai-studio-1.4.2-win32-x64\dist\extension.js:2:8523514) at v.chatStream (c:\Users\JesseHouwing.vscode\extensions\ms-windows-ai-studio.windows-ai-studio-1.4.2-win32-x64\dist\extension.js:2:8514725) at processTicksAndRejections (node:internal/process/task_queues:104:5) at v.chatStream (c:\Users\JesseHouwing.vscode\extensions\ms-windows-ai-studio.windows-ai-studio-1.4.2-win32-x64\dist\extension.js:8:569947) at c:\Users\JesseHouwing.vscode\extensions\ms-windows-ai-studio.windows-ai-studio-1.4.2-win32-x64\dist\extension.js:2:3469797 at c:\Users\JesseHouwing.vscode\extensions\ms-windows-ai-studio.windows-ai-studio-1.4.2-win32-x64\dist\extension.js:2:1955044 at Object.e.runWithTelemetry (c:\Users\JesseHouwing.vscode\extensions\ms-windows-ai-studio.windows-ai-studio-1.4.2-win32-x64\dist\extension.js:2:1954872) at t.ModelApi.provideLanguageModelResponse (c:\Users\JesseHouwing.vscode\extensions\ms-windows-ai-studio.windows-ai-studio-1.4.2-win32-x64\dist\extension.js:2:3467717) at c:\Users\JesseHouwing.vscode\extensions\ms-windows-ai-studio.windows-ai-studio-1.4.2-win32-x64\dist\extension.js:2:1973138 at c:\Users\JesseHouwing.vscode\extensions\ms-windows-ai-studio.windows-ai-studio-1.4.2-win32-x64\dist\extension.js:2:1955044 at Object.e.runWithTelemetry (c:\Users\JesseHouwing.vscode\extensions\ms-windows-ai-studio.windows-ai-studio-1.4.2-win32-x64\dist\extension.js:2:1954872) at t.AitkModelChatProvider.provideLanguageModelChatResponse (c:\Users\JesseHouwing.vscode\extensions\ms-windows-ai-studio.windows-ai-studio-1.4.2-win32-x64\dist\extension.js:2:1972878)
Sorry, your request failed. Please try again.
Client Request Id: 3ed3986e-8bcb-4c6f-9b44-730e0457b86d
Reason: Failed loading model qwen2.5-0.5b-instruct-cuda-gpu:4. E:_work\1\s\onnxruntime\core\session\provider_bridge_ort.cc:1844 onnxruntime::ProviderLibrary::Get [ONNXRuntimeError] : 1 : FAIL : Error loading "C:\Users\JesseHouwing.vscode\extensions\ms-windows-ai-studio.windows-ai-studio-1.4.2-win32-x64\ai-mlstudio\bin\onnxruntime_providers_cuda.dll" which depends on "cublasLt64_13.dll" which is missing. (Error 126: "The specified module could not be found.") : Error: Failed loading model qwen2.5-0.5b-instruct-cuda-gpu:4. E:_work\1\s\onnxruntime\core\session\provider_bridge_ort.cc:1844 onnxruntime::ProviderLibrary::Get [ONNXRuntimeError] : 1 : FAIL : Error loading "C:\Users\JesseHouwing.vscode\extensions\ms-windows-ai-studio.windows-ai-studio-1.4.2-win32-x64\ai-mlstudio\bin\onnxruntime_providers_cuda.dll" which depends on "cublasLt64_13.dll" which is missing. (Error 126: "The specified module could not be found.")
at t.LocalModelAccessor.loadModel (c:\Users\JesseHouwing\.vscode\extensions\ms-windows-ai-studio.windows-ai-studio-1.4.2-win32-x64\dist\extension.js:8:830743)
at processTicksAndRejections (node:internal/process/task_queues:104:5)
Failed loading model qwen2.5-0.5b-instruct-cuda-gpu:4. E:\_work\1\s\onnxruntime\core\session\provider_bridge_ort.cc:1844 onnxruntime::ProviderLibrary::Get [ONNXRuntimeError] : 1 : FAIL : Error loading "C:\Users\JesseHouwing\.vscode\extensions\ms-windows-ai-studio.windows-ai-studio-1.4.2-win32-x64\ai-mlstudio\bin\onnxruntime_providers_cuda.dll" which depends on "cudnn64_9.dll" which is missing. (Error 126: "The specified module could not be found.")
5b-instruct-openvino-npu:4 MaxCompletionTokens:(null) maxTokens:(null) temperature:(null) topP:(null)
2026-06-03 23:45:53.698 [error] Unable to call the qwen2.5-0.5b-instruct-openvino-npu:4 inference endpoint due to 400. Please check if the input or configuration is correct. 400 status code (no body)
2026-06-03 23:45:53.699 [info] Error: Microsoft.Neutron.OpenAI.Delegates.OpenAIApi [0] 2026-06-03T23:45:53.6958255+02:00 Your input message is too large. This model supports at most 4224 completion tokens. (Parameter 'chatRequest')
2026-06-03T23:46:53.7453499+02:00 Your input message is too large. This model supports at most 4224 completion tokens. (Parameter 'chatRequest')
2026-06-03 23:46:53.746 [error] Unable to call the qwen2.5-0.5b-instruct-openvino-npu:4 inference endpoint due to 400. Please check if the input or configuration is correct. 400 status code (no body)
Expected Behavior
The model either works, or provides a clear error message on why it can't
Ideally it filters the models dropdown by the models that my system is capable running, instead of having me figure it out.
Actual Behavior
Cryptic error messages and failed chats
Steps to Reproduce the Problem
Specifications
I tried -npu, -cuda, -gpu -openvino models. Only the -gpu seems to try to work, but takes FOREVER even for small requets
Error Log in Output