Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: gRPC-based backends #743

Merged
merged 12 commits into from Jul 15, 2023
Merged

feat: gRPC-based backends #743

merged 12 commits into from Jul 15, 2023

Conversation

mudler
Copy link
Owner

@mudler mudler commented Jul 14, 2023

Description

This PR is a multi-fold PR:

  • Fixes the falcon backend. It uses now https://github.com/cmp-nct/ggllm.cpp
  • Get rids of hacks to workaround duplicate symbols due to libraries using different versions of ggml
    • Converts the backends to gRPC services
  • Various refactors. merges back an old branch I had laying around to refactor and break down the packages. I couldn't get at it before due to other compilation issues that now seems went away
  • Adds tests for
    • token stream
    • stablediffusion
    • tts
    • functions

Coverage now is quite good - we just miss testing the backends 1:1. We do test however already: openllama, rwkv and gpt4all

Notes for Reviewers

Moving to gRPC increase code complexity but overall minimize maintenance. The hacks needed to compile all in a single-fat binary are now gone, and if a backend crashes doesn't crash the main process (which will attempt to recover the grpc service automatically).

Downsides are that the resulting binary is bigger and starting internal services is a bit convoluted.
The gain is notable despite the cons, as now we are free to have also different versions of the same backend with quite some ease. We can, also now, support multiple requests in parallel by allocating more services per model, but this can be done on a following batch

Signed commits

  • Yes, I signed my commits.

@mudler mudler marked this pull request as draft July 14, 2023 20:32
@mudler mudler linked an issue Jul 14, 2023 that may be closed by this pull request
@mudler mudler force-pushed the grpc branch 2 times, most recently from a68c474 to dcc3a90 Compare July 14, 2023 20:48
@mudler mudler changed the title [wip] grpc [wip] feat: gRPC-based backends Jul 14, 2023
mudler added 12 commits July 15, 2023 01:19
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
This finally makes everything more consistent

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Previously the libs were added by other deps that made the linker add
those as well (by chance).

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
@mudler mudler changed the title [wip] feat: gRPC-based backends feat: gRPC-based backends Jul 15, 2023
@mudler mudler marked this pull request as ready for review July 15, 2023 07:43
@mudler
Copy link
Owner Author

mudler commented Jul 15, 2023

been playing with this here. going to merge this and run few rounds of tests and fix things on master if necessary with follow-ups

@mudler mudler merged commit e3cabb5 into master Jul 15, 2023
14 checks passed
@mudler mudler deleted the grpc branch July 15, 2023 07:50
@mudler mudler added the enhancement New feature or request label Jul 16, 2023
@tmc
Copy link

tmc commented Jul 18, 2023

@mudler This is a great change -> I don't see a good example of how to use the new grpc-based models. Can you point me in the right direction to run falcon7b via grpc?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Issue regarding falcon-7b quantized
2 participants