Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Usage Features #863

Merged
merged 31 commits into from
Aug 18, 2023
Merged

Usage Features #863

merged 31 commits into from
Aug 18, 2023

Conversation

dave-gray101
Copy link
Collaborator

@dave-gray101 dave-gray101 commented Aug 3, 2023

"The Usage Feature PR"

This PR implements a few semi-related features I've been working on - while they aren't all linked together exactly, they lay the groundwork for some of the multi-node, multi-model scaling work I'm interested in continuing to prototype. List of features (as best I remember):

  • Plumbing for passing Token Usage back from generation methods
    • Counts tokens consumed in the response via existing streaming infra
    • For prompt, set up to use the new TokenizeString method for go-llama backend, fallbacks to 0 for the unimplemented ones (Soon!)
      • Bumps go-llama to f03869d188b72c8a617bea3a36cf8eb43f73445c in the Makefile
    • Works on /edit, /completion and /chat (streaming and 'regular')
      • I'm not really sure that OpenAI actually puts a Usage object on their streaming responses... but it seems useful and we've thus far felt willing to enhance the spec for flexibility....
  • New grpc function / related endpoint: /backend/monitor - returns memory usage and status for a backend.
    • Base golang grpc server returns process-level memory details, backends can enhance with additional consumption breakdowns.
    • Uses the BSD-licensed gopsutil, should work cross platform.
    • Per-backend enhancement is very much a future scope goal, but not yet.
    • Not implemented on the python side - but I repurposed an old pre-grpc test sampler in cases where that fails - admittedly not tested in its current configuration and highly limited, but it's better than nothing.
  • New startup option: --preload-backend-only : used to skip starting up the LocalAI server, and only start the preloaded grpc backends.
  • Responsibility for "Backend Locking" was moved from api to the backends themselves. Handled by base golang grpc server, but external / python backends are not locked until we know we need to.

@mudler
Copy link
Owner

mudler commented Aug 4, 2023

That's looking good! thanks again @dave-gray101 for your great contributions!

I'll have a look later in the detail, but a quick glance looks ok to me. Just a style note, maybe let's call it feature flags instead of Chicken? maybe it's more idiomatic.

Re remote gRPC, this is quite possible already (and I use it for debugging) so I'm leaving here a note for those who are interested in running remote gRPCs in LocalAI.

LocalAI have the --external-grpc-backends parameter in the CLI that can be used either to specify a local backend (a file) or a remote URL:
https://github.com/go-skynet/LocalAI/blob/4aa5dac768a5255667097db1c043581196fa46b2/pkg/model/initializers.go#L144

So for instance, both specifying a remote URL or a file works and make the backend managed by LocalAI:

./local-ai --debug --threads 14 --image-path tmp-imgr --external-grpc-backends "huggingface.py:/home/mudler/_git/LocalAI/extra/grpc/huggingface/huggingface.py"
./local-ai --debug --threads 14 --image-path tmp-imgr --external-grpc-backends "my-awesome-backend:host:port"

To try it out, you can for instance, after running make build to run one of the grpc server in backend-assets/grpc and use the printed URL directly in there - to use then the backend you need to specify the backend name in the model config file.

Back to your open question: To support remote backends, I guess later maybe we can make the backend report metrics via gRPC, and we stop assuming that we are getting stat from the main process.

@mudler mudler added the enhancement New feature or request label Aug 7, 2023
@dave-gray101 dave-gray101 marked this pull request as ready for review August 17, 2023 06:33
@dave-gray101 dave-gray101 changed the title WIP: Usage Features Usage Features Aug 17, 2023
@dave-gray101 dave-gray101 enabled auto-merge (squash) August 17, 2023 08:01
Makefile Show resolved Hide resolved
api/backend/embeddings.go Outdated Show resolved Hide resolved
api/backend/image.go Outdated Show resolved Hide resolved
api/backend/llm.go Outdated Show resolved Hide resolved
api/backend/lock.go Outdated Show resolved Hide resolved
pkg/grpc/interface.go Outdated Show resolved Hide resolved
@mudler
Copy link
Owner

mudler commented Aug 17, 2023

this is looking good overall! just few nits, that should not make it a blocker as could be just follow-ups, the real one is testing this on a GPU as it supersedes #896 that I was holding off to merge for the same reason (manual QA)

@dave-gray101 would you mind me merging meanwhile #906? It might conflict with your PR, but actually is just about running make protobuf again and committing the results

If you can't test on GPU I'll give it a shot if not today, tomorrow at max

@dave-gray101
Copy link
Collaborator Author

this is looking good overall! just few nits, that should not make it a blocker as could be just follow-ups, the real one is testing this on a GPU as it supersedes #896 that I was holding off to merge for the same reason (manual QA)

@dave-gray101 would you mind me merging meanwhile #906? It might conflict with your PR, but actually is just about running make protobuf again and committing the results

If you can't test on GPU I'll give it a shot if not today, tomorrow at max

Hey @mudler ! From my testing, CUDA seems to work fine with this branch. Granted... my test criteria consists of "normal output comes out, and process manager shows GPU load"

@dave-gray101 dave-gray101 merged commit 8cb1061 into mudler:master Aug 18, 2023
8 checks passed
@dave-gray101 dave-gray101 deleted the feat-psutil-usage branch February 21, 2024 02:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants