-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Usage Features #863
Usage Features #863
Conversation
…ly that is broken
That's looking good! thanks again @dave-gray101 for your great contributions! I'll have a look later in the detail, but a quick glance looks ok to me. Just a style note, maybe let's call it feature flags instead of Chicken? maybe it's more idiomatic. Re remote LocalAI have the So for instance, both specifying a remote URL or a file works and make the backend managed by LocalAI:
To try it out, you can for instance, after running Back to your open question: To support remote backends, I guess later maybe we can make the backend report metrics via gRPC, and we stop assuming that we are getting stat from the main process. |
…kens are ints not strings so fix this
… a hack, but worth testing if it makes horizontal scaling simpler and easier
this is looking good overall! just few nits, that should not make it a blocker as could be just follow-ups, the real one is testing this on a GPU as it supersedes #896 that I was holding off to merge for the same reason (manual QA) @dave-gray101 would you mind me merging meanwhile #906? It might conflict with your PR, but actually is just about running If you can't test on GPU I'll give it a shot if not today, tomorrow at max |
Hey @mudler ! From my testing, CUDA seems to work fine with this branch. Granted... my test criteria consists of "normal output comes out, and process manager shows GPU load" |
"The Usage Feature PR"
This PR implements a few semi-related features I've been working on - while they aren't all linked together exactly, they lay the groundwork for some of the multi-node, multi-model scaling work I'm interested in continuing to prototype. List of features (as best I remember):
/backend/monitor
- returns memory usage and status for a backend.--preload-backend-only
: used to skip starting up the LocalAI server, and only start the preloaded grpc backends.api
to the backends themselves. Handled by base golang grpc server, but external / python backends are not locked until we know we need to.