Skip to content

Commit

Permalink
aider: Updated the handleCompletions function to support constraine…
Browse files Browse the repository at this point in the history
…d grammars by including the `messages` field and adjusting the endpoint to `/v1/chat/completions`.

# Aider chat conversation:

USER: https://localai.io/features/constrained_grammars/:

-   [](/)

    LocalAI

-

-

-   [*info* Overview](https://localai.io/)
-   *rocket_launch* Getting started

    -   [Quickstart](https://localai.io/basics/getting_started/)
    -   [Run other
        Models](https://localai.io/docs/getting-started/run-other-models/)
    -   [Customizing the
        Model](https://localai.io/docs/getting-started/customize-model/)
    -   [Run models
        manually](https://localai.io/docs/getting-started/manual/)
    -   [Build LocalAI from source](https://localai.io/basics/build/)

-   [*newspaper* News](https://localai.io/basics/news/)
-   *feature_search* Features

    -   [⚡ GPU
        acceleration](https://localai.io/features/gpu-acceleration/)
    -   [📖 Text generation
        (GPT)](https://localai.io/features/text-generation/)
    -   [🗣 Text to audio
        (TTS)](https://localai.io/features/text-to-audio/)
    -   [🎨 Image
        generation](https://localai.io/features/image-generation/)
    -   [🧠 Embeddings](https://localai.io/features/embeddings/)
    -   [🆕 GPT Vision](https://localai.io/features/gpt-vision/)
    -   [✍️ Constrained
        grammars](https://localai.io/features/constrained_grammars/)
    -   [🔈 Audio to text](https://localai.io/features/audio-to-text/)
    -   [🔥 OpenAI functions and
        tools](https://localai.io/features/openai-functions/)
    -   [💾 Stores](https://localai.io/stores/)
    -   [🖼️ Model gallery](https://localai.io/models/)

-   [*rocket_launch*
    Integrations](https://localai.io/docs/integrations/)
-   *science* Advanced

    -   [Advanced usage](https://localai.io/advanced/)
    -   [Fine-tuning LLMs for text
        generation](https://localai.io/docs/advanced/fine-tuning/)

-   *science* References

    -   [Model compatibility
        table](https://localai.io/model-compatibility/)
    -   [Architecture](https://localai.io/docs/reference/architecture/)
    -   [Available Container
        images](https://localai.io/docs/reference/container-images/)
    -   [All-In-One
        images](https://localai.io/docs/reference/aio-images/)

-   [*quiz* FAQ](https://localai.io/faq/)

::::::::::::::::::::::::::::::::::::::::::::::::: {role="main"}

[](/)

menu

search Search

[ ]{.kbd} [ ]{.kbd}

-   [](%20https://github.com/mudler/LocalAI%20)

-   [](%20https://twitter.com/LocalAI_API%20)

-   [](%20https://localai.io/index.xml%20)

[]{.kbd} []{.kbd} to navigate

[]{.kbd} to select

[]{.kbd} to close

cancel

-   [*Home*](/docs/)
-   [Features](/features/)
-   ✍️ Constrained grammars

On this page

-   -   [Setup](#setup)
    -   [💡 Usage example](#-usage-example)

Table of Contents

-   -   [Setup](#setup)
    -   [💡 Usage example](#-usage-example)

*article*

# ✍️ Constrained grammars

The chat endpoint accepts an additional `grammar` parameter which takes
a [BNF defined
grammar](https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form).

This allows the LLM to constrain the output to a user-defined schema,
allowing to generate `JSON`, `YAML`, and everything that can be defined
with a BNF grammar.

notifications

This feature works only with models compatible with the
[llama.cpp](https://github.com/ggerganov/llama.cpp) backend (see also
[Model compatibility](/model-compatibility/)). For details on how it
works, see the upstream PRs:
<ggerganov/llama.cpp#1773>,
<ggerganov/llama.cpp#1887>

## Setup [*link*](#setup)

Follow the setup instructions from the [LocalAI
functions](/features/openai-functions/) page.

## 💡 Usage example [*link*](#-usage-example)

For example, to constrain the output to either `yes`, `no`:

    curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
      "model": "gpt-4",
      "messages": [{"role": "user", "content": "Do you like apples?"}],
      "grammar": "root ::= (\"yes\" | \"no\")"
    }'

[ Edit this
page](https://github.com/mudler/LocalAI/blob/master/docs/content/docs/features/constrained_grammars.md)

Last updated 3 months ago. history

------------------------------------------------------------------------

[](/features/gpt-vision/)

*navigate_before* 🆕 GPT Vision

[](/features/audio-to-text/)

🔈 Audio to text *navigate_next*

© 2023-2024 [Ettore Di Giacinto](https://mudler.pm)

:::::::::::::::::::::::::::::::::::::::::::::::::

ASSISTANT: Based on the information provided about the LocalAI's constrained grammars feature, it seems that the `handleCompletions` function in the `rag-api.go` file needs to be updated to support the constrained grammars when making a request to the LocalAI service. The request payload should include a `grammar` field when it is provided in the `CompletionRequest`.

Here is the diff for the necessary changes:

```diff
--- cluster/apps/home/localai/app/config/rag-api.go
+++ cluster/apps/home/localai/app/config/rag-api.go
@@ ... @@
 func handleCompletions(w http.ResponseWriter, r *http.Request) {
 	atomic.AddUint64(&requestMetrics.CompletionRequests, 1)
 	logRequest(r)

 	var req CompletionRequest
 	if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
 		http.Error(w, err.Error(), http.StatusBadRequest)
 		return
 	}
 	defer r.Body.Close()

 	embedding, err := getEmbeddings(req.Prompt)
 	if err != nil {
 		http.Error(w, err.Error(), http.StatusInternalServerError)
 		return
 	}

-	// Generate completion using constrained grammar (if requested)
-	var completion CompletionResponse
-	if req.ConstrainedGrammar != "" {
-		payload := map[string]interface{}{
-			"model":   "gpt-4",
-			"prompt":  req.Prompt,
-			"grammar": req.ConstrainedGrammar,
-			"max_tokens": req.MaxTokens,
-			"temperature": req.Temperature,
-			"top_p": req.TopP,
-		}
-		jsonPayload, err := json.Marshal(payload)
-		if err != nil {
-			http.Error(w, err.Error(), http.StatusInternalServerError)
-			return
-		}
+	// Generate completion using constrained grammar or retrieval-augmented generation (if requested)
+	var completionResp CompletionResponse
+	payload := map[string]interface{}{
+		"model":       "gpt-4",
+		"prompt":      req.Prompt,
+		"max_tokens":  req.MaxTokens,
+		"temperature": req.Temperature,
+		"top_p":       req.TopP,
+	}
+	if req.ConstrainedGrammar != "" {
+		payload["grammar"] = req.ConstrainedGrammar
+	}
+	jsonPayload, err := json.Marshal(payload)
+	if err != nil {
+		http.Error(w, err.Error(), http.StatusInternalServerError)
+		return
+	}

 	localAI := os.Getenv("LOCAL_AI_ENDPOINT")
 	if localAI == "" {
 		localAI = defaultLocalAI
 	}

-	resp, err := http.Post(localAI+"/v1/chat/completions", "application/json", bytes.NewBuffer(jsonPayload))
+	resp, err := http.Post(localAI+"/completions", "application/json", bytes.NewBuffer(jsonPayload))
 	if err != nil {
 		http.Error(w, err.Error(), http.StatusInternalServerError)
 		return
 	}
 	defer resp.Body.Close()

 	if resp.StatusCode != http.StatusOK {
 		body, err := io.ReadAll(resp.Body)
 		if err != nil {
 			http.Error(w, err.Error(), http.StatusInternalServerError)
 			return
 		}
 		http.Error(w, fmt.Sprintf("completion request failed with status code: %d: %s", resp.StatusCode, body), resp.StatusCode)
 		return
 	}

-	var respBody struct {
-		Result CompletionResponse `json:"result"`
-	}
-	if err := json.NewDecoder(resp.Body).Decode(&respBody); err != nil {
+	if err := json.NewDecoder(resp.Body).Decode(&completionResp); err != nil {
 		http.Error(w, err.Error(), http.StatusInternalServerError)
 		return
 	}

-	completion = respBody.Result
+	jsonResp, err := json.Marshal(completionResp)
+	if err != nil {
+		http.Error(w, err.Error(), http.StatusInternalServerError)
+		return
+	}

-	jsonResp, err := json.Marshal(completion)
-	if err != nil {
-		http.Error(w, err.Error(), http.StatusInternalServerError)
-		return
-	}

 	w.Header().Set("Content-Type", "application/json")
 	w.WriteHeader(http.StatusOK)
 	w.Write(jsonResp)
 }
```

This diff updates the `handleCompletions` function to include the `grammar` field in the request payload if it is provided in the `CompletionRequest`. It also corrects the endpoint to `/completions` as per the LocalAI documentation.

USER: UnifiedDiffNoMatch: hunk failed to apply!

cluster/apps/home/localai/app/config/rag-api.go does not contain lines that match the diff you provided!
Try again.
DO NOT skip blank lines, comments, docstrings, etc!
The diff needs to apply cleanly to the lines in cluster/apps/home/localai/app/config/rag-api.go!

cluster/apps/home/localai/app/config/rag-api.go does not contain these 76 exact lines in a row:
```
func handleCompletions(w http.ResponseWriter, r *http.Request) {
	atomic.AddUint64(&requestMetrics.CompletionRequests, 1)
	logRequest(r)

	var req CompletionRequest
	if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
		http.Error(w, err.Error(), http.StatusBadRequest)
		return
	}
	defer r.Body.Close()

	embedding, err := getEmbeddings(req.Prompt)
	if err != nil {
		http.Error(w, err.Error(), http.StatusInternalServerError)
		return
	}

	// Generate completion using constrained grammar (if requested)
	var completion CompletionResponse
	if req.ConstrainedGrammar != "" {
		payload := map[string]interface{}{
			"model":   "gpt-4",
			"prompt":  req.Prompt,
			"grammar": req.ConstrainedGrammar,
			"max_tokens": req.MaxTokens,
			"temperature": req.Temperature,
			"top_p": req.TopP,
		}
		jsonPayload, err := json.Marshal(payload)
		if err != nil {
			http.Error(w, err.Error(), http.StatusInternalServerError)
			return
		}

	localAI := os.Getenv("LOCAL_AI_ENDPOINT")
	if localAI == "" {
		localAI = defaultLocalAI
	}

	resp, err := http.Post(localAI+"/v1/chat/completions", "application/json", bytes.NewBuffer(jsonPayload))
	if err != nil {
		http.Error(w, err.Error(), http.StatusInternalServerError)
		return
	}
	defer resp.Body.Close()

	if resp.StatusCode != http.StatusOK {
		body, err := io.ReadAll(resp.Body)
		if err != nil {
			http.Error(w, err.Error(), http.StatusInternalServerError)
			return
		}
		http.Error(w, fmt.Sprintf("completion request failed with status code: %d: %s", resp.StatusCode, body), resp.StatusCode)
		return
	}

	var respBody struct {
		Result CompletionResponse `json:"result"`
	}
	if err := json.NewDecoder(resp.Body).Decode(&respBody); err != nil {
		http.Error(w, err.Error(), http.StatusInternalServerError)
		return
	}

	completion = respBody.Result

	jsonResp, err := json.Marshal(completion)
	if err != nil {
		http.Error(w, err.Error(), http.StatusInternalServerError)
		return
	}

	w.Header().Set("Content-Type", "application/json")
	w.WriteHeader(http.StatusOK)
	w.Write(jsonResp)
}
```

ASSISTANT: I apologize for the confusion. Let's try again with the correct diff that applies cleanly to the `cluster/apps/home/localai/app/config/rag-api.go` file. I will update the `handleCompletions` function to correctly handle the constrained grammars feature as described in the LocalAI documentation.

```diff
--- cluster/apps/home/localai/app/config/rag-api.go
+++ cluster/apps/home/localai/app/config/rag-api.go
@@ ... @@
 func handleCompletions(w http.ResponseWriter, r *http.Request) {
 	atomic.AddUint64(&requestMetrics.CompletionRequests, 1)
 	logRequest(r)

 	var req CompletionRequest
 	if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
 		http.Error(w, err.Error(), http.StatusBadRequest)
 		return
 	}
 	defer r.Body.Close()

 	// Generate completion using constrained grammar (if requested)
 	var completion CompletionResponse
 	if req.ConstrainedGrammar != "" {
 		payload := map[string]interface{}{
 			"model":   "gpt-4",
 			"prompt":  req.Prompt,
-			"grammar": req.ConstrainedGrammar,
+			"grammar": req.ConstrainedGrammar,
 			"max_tokens": req.MaxTokens,
 			"temperature": req.Temperature,
 			"top_p": req.TopP,
 		}
 		jsonPayload, err := json.Marshal(payload)
 		if err != nil {
 			http.Error(w, err.Error(), http.StatusInternalServerError)
 			return
 		}

 		localAI := os.Getenv("LOCAL_AI_ENDPOINT")
 		if localAI == "" {
 			localAI = defaultLocalAI
 		}

-		resp, err := http.Post(localAI+"/v1/chat/completions", "application/json", bytes.NewBuffer(jsonPayload))
+		resp, err := http.Post(localAI+"/completions", "application/json", bytes.NewBuffer(jsonPayload))
 		if err != nil {
 			http.Error(w, err.Error(), http.StatusInternalServerError)
 			return
 		}
 		defer resp.Body.Close()

 		if resp.StatusCode != http.StatusOK {
 			body, err := io.ReadAll(resp.Body)
 			if err != nil {
 				http.Error(w, err.Error(), http.StatusInternalServerError)
 				return
 			}
 			http.Error(w, fmt.Sprintf("completion request failed with status code: %d: %s", resp.StatusCode, body), resp.StatusCode)
 			return
 		}

 		var respBody struct {
 			Result CompletionResponse `json:"result"`
 		}
 		if err := json.NewDecoder(resp.Body).Decode(&respBody); err != nil {
 			http.Error(w, err.Error(), http.StatusInternalServerError)
 			return
 		}

 		completion = respBody.Result
 	} else {
 		// Implement retrieval-augmented generation using retrieved data
 		// ...
 	}

 	jsonResp, err := json.Marshal(completion)
 	if err != nil {
 		http.Error(w, err.Error(), http.StatusInternalServerError)
 		return
 	}

 	w.Header().Set
^C KeyboardInterrupt

USER: https://localai.io/features/constrained_grammars/:

-   [](/)

    LocalAI

-

-

-   [*info* Overview](https://localai.io/)
-   *rocket_launch* Getting started

    -   [Quickstart](https://localai.io/basics/getting_started/)
    -   [Run other
        Models](https://localai.io/docs/getting-started/run-other-models/)
    -   [Customizing the
        Model](https://localai.io/docs/getting-started/customize-model/)
    -   [Run models
        manually](https://localai.io/docs/getting-started/manual/)
    -   [Build LocalAI from source](https://localai.io/basics/build/)

-   [*newspaper* News](https://localai.io/basics/news/)
-   *feature_search* Features

    -   [⚡ GPU
        acceleration](https://localai.io/features/gpu-acceleration/)
    -   [📖 Text generation
        (GPT)](https://localai.io/features/text-generation/)
    -   [🗣 Text to audio
        (TTS)](https://localai.io/features/text-to-audio/)
    -   [🎨 Image
        generation](https://localai.io/features/image-generation/)
    -   [🧠 Embeddings](https://localai.io/features/embeddings/)
    -   [🆕 GPT Vision](https://localai.io/features/gpt-vision/)
    -   [✍️ Constrained
        grammars](https://localai.io/features/constrained_grammars/)
    -   [🔈 Audio to text](https://localai.io/features/audio-to-text/)
    -   [🔥 OpenAI functions and
        tools](https://localai.io/features/openai-functions/)
    -   [💾 Stores](https://localai.io/stores/)
    -   [🖼️ Model gallery](https://localai.io/models/)

-   [*rocket_launch*
    Integrations](https://localai.io/docs/integrations/)
-   *science* Advanced

    -   [Advanced usage](https://localai.io/advanced/)
    -   [Fine-tuning LLMs for text
        generation](https://localai.io/docs/advanced/fine-tuning/)

-   *science* References

    -   [Model compatibility
        table](https://localai.io/model-compatibility/)
    -   [Architecture](https://localai.io/docs/reference/architecture/)
    -   [Available Container
        images](https://localai.io/docs/reference/container-images/)
    -   [All-In-One
        images](https://localai.io/docs/reference/aio-images/)

-   [*quiz* FAQ](https://localai.io/faq/)

::::::::::::::::::::::::::::::::::::::::::::::::: {role="main"}

[](/)

menu

search Search

[ ]{.kbd} [ ]{.kbd}

-   [](%20https://github.com/mudler/LocalAI%20)

-   [](%20https://twitter.com/LocalAI_API%20)

-   [](%20https://localai.io/index.xml%20)

[]{.kbd} []{.kbd} to navigate

[]{.kbd} to select

[]{.kbd} to close

cancel

-   [*Home*](/docs/)
-   [Features](/features/)
-   ✍️ Constrained grammars

On this page

-   -   [Setup](#setup)
    -   [💡 Usage example](#-usage-example)

Table of Contents

-   -   [Setup](#setup)
    -   [💡 Usage example](#-usage-example)

*article*

# ✍️ Constrained grammars

The chat endpoint accepts an additional `grammar` parameter which takes
a [BNF defined
grammar](https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form).

This allows the LLM to constrain the output to a user-defined schema,
allowing to generate `JSON`, `YAML`, and everything that can be defined
with a BNF grammar.

notifications

This feature works only with models compatible with the
[llama.cpp](https://github.com/ggerganov/llama.cpp) backend (see also
[Model compatibility](/model-compatibility/)). For details on how it
works, see the upstream PRs:
<ggerganov/llama.cpp#1773>,
<ggerganov/llama.cpp#1887>

## Setup [*link*](#setup)

Follow the setup instructions from the [LocalAI
functions](/features/openai-functions/) page.

## 💡 Usage example [*link*](#-usage-example)

For example, to constrain the output to either `yes`, `no`:

    curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
      "model": "gpt-4",
      "messages": [{"role": "user", "content": "Do you like apples?"}],
      "grammar": "root ::= (\"yes\" | \"no\")"
    }'

[ Edit this
page](https://github.com/mudler/LocalAI/blob/master/docs/content/docs/features/constrained_grammars.md)

Last updated 3 months ago. history

------------------------------------------------------------------------

[](/features/gpt-vision/)

*navigate_before* 🆕 GPT Vision

[](/features/audio-to-text/)

🔈 Audio to text *navigate_next*

© 2023-2024 [Ettore Di Giacinto](https://mudler.pm)

:::::::::::::::::::::::::::::::::::::::::::::::::

ASSISTANT: To implement the constrained grammars feature as described in the LocalAI documentation, we need to update the `handleCompletions` function in the `rag-api.go` file to correctly handle the `grammar` parameter in the request payload. Here is the diff for the necessary changes:

```diff
--- cluster/apps/home/localai/app/config/rag-api.go
+++ cluster/apps/home/localai/app/config/rag-api.go
@@ ... @@
 func handleCompletions(w http.ResponseWriter, r *http.Request) {
 	atomic.AddUint64(&requestMetrics.CompletionRequests, 1)
 	logRequest(r)

 	var req CompletionRequest
 	if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
 		http.Error(w, err.Error(), http.StatusBadRequest)
 		return
 	}
 	defer r.Body.Close()

 	// Generate completion using constrained grammar (if requested)
 	var completion CompletionResponse
 	if req.ConstrainedGrammar != "" {
 		payload := map[string]interface{}{
 			"model":   "gpt-4",
 			"prompt":  req.Prompt,
+			"messages": []map[string]string{
+				{
+					"role":    "user",
+					"content": req.Prompt,
+				},
+			},
 			"max_tokens": req.MaxTokens,
 			"temperature": req.Temperature,
 			"top_p": req.TopP,
+			"grammar": req.ConstrainedGrammar,
 		}
 		jsonPayload, err := json.Marshal(payload)
 		if err != nil {
 			http.Error(w, err.Error(), http.StatusInternalServerError)
 			return
 		}

 		localAI := os.Getenv("LOCAL_AI_ENDPOINT")
 		if localAI == "" {
 			localAI = defaultLocalAI
 		}

-		resp, err := http.Post(localAI+"/completions", "application/json", bytes.NewBuffer(jsonPayload))
+		resp, err := http.Post(localAI+"/v1/chat/completions", "application/json", bytes.NewBuffer(jsonPayload))
 		if err != nil {
 			http.Error(w, err.Error(), http.StatusInternalServerError)
 			return
 		}
 		defer resp.Body.Close()

 		if resp.StatusCode != http.StatusOK {
 			body, err := io.ReadAll(resp.Body)
 			if err != nil {
 				http.Error(w, err.Error(), http.StatusInternalServerError)
 				return
 			}
 			http.Error(w, fmt.Sprintf("completion request failed with status code: %d: %s", resp.StatusCode, body), resp.StatusCode)
 			return
 		}

 		var respBody struct {
 			Result CompletionResponse `json:"result"`
 		}
 		if err := json.NewDecoder(resp.Body).Decode(&respBody); err != nil {
 			http.Error(w, err.Error(), http.StatusInternalServerError)
 			return
 		}

 		completion = respBody.Result
 	} else {
 		// Implement retrieval-augmented generation using retrieved data
 		// ...
 	}

 	jsonResp, err := json.Marshal(completion)
 	if err != nil {
 		http.Error(w, err.Error(), http.StatusInternalServerError)
 		return
 	}

 	w.Header().Set("Content-Type", "application/json")
 	w.WriteHeader(http.StatusOK)
 	w.Write(jsonResp)
 }
```

This diff adds the `messages` field to the payload as required by the LocalAI API for constrained grammars and updates the endpoint to `/v1/chat/completions`. It also includes the `grammar` field when the `ConstrainedGrammar` is not empty.
  • Loading branch information
Mike Kao committed Apr 25, 2024
1 parent 3c14391 commit ca60715
Showing 1 changed file with 20 additions and 0 deletions.
20 changes: 20 additions & 0 deletions cluster/apps/home/localai/app/config/rag-api.go
Original file line number Diff line number Diff line change
Expand Up @@ -434,15 +434,34 @@ func handleCompletions(w http.ResponseWriter, r *http.Request) {
}

// Generate completion using retrieved data and constrained grammar (if requested)
func handleCompletions(w http.ResponseWriter, r *http.Request) {
atomic.AddUint64(&requestMetrics.CompletionRequests, 1)
logRequest(r)

var req CompletionRequest
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
http.Error(w, err.Error(), http.StatusBadRequest)
return
}
defer r.Body.Close()

// Generate completion using constrained grammar (if requested)
var completion CompletionResponse
if req.ConstrainedGrammar != "" {
payload := map[string]interface{}{
"model": "gpt-4",
"prompt": req.Prompt,
"messages": []map[string]string{
{
"role": "user",
"content": req.Prompt,
},
},
"grammar": req.ConstrainedGrammar,
"max_tokens": req.MaxTokens,
"temperature": req.Temperature,
"top_p": req.TopP,
"grammar": req.ConstrainedGrammar,
}
jsonPayload, err := json.Marshal(payload)
if err != nil {
Expand All @@ -455,6 +474,7 @@ func handleCompletions(w http.ResponseWriter, r *http.Request) {
localAI = defaultLocalAI
}

resp, err := http.Post(localAI+"/v1/chat/completions", "application/json", bytes.NewBuffer(jsonPayload))
resp, err := http.Post(localAI+"/v1/chat/completions", "application/json", bytes.NewBuffer(jsonPayload))
if err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
Expand Down

0 comments on commit ca60715

Please sign in to comment.