Replace `cortex.llamacpp` with minimalist fork of `llama.cpp`

## Goal 

- Goal: Can we have a minimalist fork of llama.cpp as `llamacpp-engine`
    - cortex.cpp's desktop focus means Drogon's features are unused
    - We should contribute our vision and multimodal work upstream as a form of llama.cpp server
    - Very clear Engines abstraction (i.e. support OpenVino etc in the future)
- Goal: Contribute upwards to llama.cpp
    - Vision, multimodal
    - May not be possible if the vision, audio encoders are Python-runtime based

Can we consider refactoring llamacpp-engine to use the server implementation, and maintain a fork with our improvements to speech, vision etc? This is especially if we do a C++ implementation of whisperVQ in the future.

## Potential issues 
- [ ] `cortex engines llama.cpp update` -> updates llama.cpp
    - We still need to build `avx-512` variants for `janhq/llama.cpp` (i.e. build scripts)
    - We should align the `janhq/llama.cpp` release names with `ggml-org/llama.cpp`
    - Trigger automatic CI/CD to build
    - We can also ask GG if we can donate compute towards builds
- [ ] Deprecating llava support
- [ ] Handling existing API endpoints for `logit_bias`, `n` etc by either upstreaming or in Cortex Server
- [ ] Update Documentation
- [ ] DevRel @ramonpzg 
    - [ ] Cortex builds on llamacpp-server (and we will contribute in the future)
    - [ ] Why do we need to build so many different types of llama.cpp (AVX512, AVX2)
    - [ ] GG -> can we contribute Menlo Cloud to llama.cpp project (built up Intel CPUs)

## Key Changes

- Use `llama-server` instead of Drogon that we use in `cortex.llamacpp`
- Use a spawned `llama.cpp` process instead of `dylib` (better stablity, parallelism)
    - However, we will effectively need to build a process manager


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Replace `cortex.llamacpp` with minimalist fork of `llama.cpp` #1728

Goal

Potential issues

Key Changes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Replace cortex.llamacpp with minimalist fork of llama.cpp #1728

Description

Goal

Potential issues

Key Changes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Replace `cortex.llamacpp` with minimalist fork of `llama.cpp` #1728