OLLAMA_KEEP_ALIVE ENV feature #2508

uxfion · 2024-02-15T04:53:03Z

Does anyone know how to set keep_alive in the openai API? It seems that this feature is not supported in the openai API.

It would be better if we could set OLLAMA_KEEP_ALIVE in the environment variables, since the /v1/chat/completions endpoint is difficult to support customized parameters.

#2146 (comment)

The text was updated successfully, but these errors were encountered:

jukofyork · 2024-02-16T06:44:21Z

Not sure if it helps but I've been keeping it alive by sending this every 4.5 minutes:

If an empty prompt is provided, the model will be loaded into memory.

curl http://localhost:11434/api/generate -d '{
  "model": "llama2"
}'

From: https://github.com/ollama/ollama/blob/main/docs/api.md

uxfion · 2024-02-16T06:57:28Z

I also wrote a code to keep it alive, but it's still a bit silly. We urgently need an intelligent scheduling system.

import requests
import time
from datetime import datetime
import argparse


def get_current_time_str():
    return datetime.now().strftime("%Y-%m-%d %H:%M:%S")


def call_api(model):
    url = "http://127.0.0.1:11434/api/generate"
    headers = {"Content-Type": "application/json"}
    payload = {"model": model, "keep_alive": "-3m"}

    try:
        start_time = datetime.now()
        print(f"\n\n[{start_time}] Trying to call the API...")
        response = requests.post(url, json=payload, headers=headers)
        end_time = datetime.now()
        duration = (end_time - start_time).total_seconds()

        current_time = get_current_time_str()
        if response.status_code == 200:
            print(f"[{current_time}] API call successful. Duration: {duration} seconds")
            print(response.text)
        else:
            print(
                f"[{current_time}] API call failed with status code: {response.status_code}. Duration: {duration} seconds"
            )
    except Exception as e:
        current_time = get_current_time_str()
        print(f"[{current_time}] An error occurred: {e}. Duration: {duration} seconds")


def main():
    parser = argparse.ArgumentParser(description="Call API with a model parameter")
    parser.add_argument("model", type=str, help="Model name to call API with")
    args = parser.parse_args()

    interval = 270  # 4 minutes and 30 seconds in seconds
    while True:
        call_api(args.model)
        time.sleep(interval)


if __name__ == "__main__":
    main()

run with python keep_alive llama2

uxfion mentioned this issue Feb 26, 2024

Added OLLAMA_DEFAULT_KEEPALIVE, OLLAMA_KEEPALIVE environment variables #2523

Closed

hoyyeva added the feature request New feature or request label Mar 11, 2024

pdevine mentioned this issue Mar 13, 2024

Default Keep Alive environment variable #3094

Merged

pdevine closed this as completed in #3094 Mar 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OLLAMA_KEEP_ALIVE ENV feature #2508

OLLAMA_KEEP_ALIVE ENV feature #2508

uxfion commented Feb 15, 2024 •

edited

jukofyork commented Feb 16, 2024

uxfion commented Feb 16, 2024 •

edited

OLLAMA_KEEP_ALIVE ENV feature #2508

OLLAMA_KEEP_ALIVE ENV feature #2508

Comments

uxfion commented Feb 15, 2024 • edited

jukofyork commented Feb 16, 2024

uxfion commented Feb 16, 2024 • edited

uxfion commented Feb 15, 2024 •

edited

uxfion commented Feb 16, 2024 •

edited