Skip to content
This repository was archived by the owner on Jul 4, 2025. It is now read-only.
This repository was archived by the owner on Jul 4, 2025. It is now read-only.

bug: Concurrent chat doesnt work on Mac Silicon #1569

@gabrielle-ong

Description

@gabrielle-ong

Cortex version

1.0.1-203

Describe the Bug

Mac: Concurrent chats for the same model are queued up rather than parallel

  • Models tested: tinyllama, llama3.2
  • I expect to open 2 CLI windows / Postman window and have concurrent chats
  • Works well if separate models (eg tinyllama chat & llama3.2 chat)

May be related to n_parallel parameter in model.yaml

Windows, Ubuntu: Working as expected

Steps to Reproduce

No response

Screenshots / Logs

No response

What is your OS?

  • MacOS
  • Windows
  • Linux

What engine are you running?

  • cortex.llamacpp (default)
  • cortex.tensorrt-llm (Nvidia GPUs)
  • cortex.onnx (NPUs, DirectML)

Metadata

Metadata

Assignees

Labels

category: model runningInference ux, handling context/parameters, runtimetype: bugSomething isn't working

Type

No type

Projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions