Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

frequency_penalty and presence_penalty from curl request discarded #1051

Closed
flotos opened this issue Sep 14, 2023 · 3 comments · Fixed by #1817
Closed

frequency_penalty and presence_penalty from curl request discarded #1051

flotos opened this issue Sep 14, 2023 · 3 comments · Fixed by #1817
Assignees
Labels
bug Something isn't working

Comments

@flotos
Copy link

flotos commented Sep 14, 2023

LocalAI version:

1.25.0

Environment, CPU architecture, OS, and Version:

Linux REDACTED 4.18.0-147.5.1.6.h541.eulerosv2r9.x86_64 #1 SMP Wed Aug 4 02:30:13 UTC 2021 x86_64 GNU/Linux

Describe the bug

I have frequency_penalty or presence_penalty in the body of my /chat/completions request but it ignore it. If it is unset in the model .yaml it is set to 0, otherwise it only use the .yaml one. Tested on Airoboros 70b.

For this try I have set both to 1:

The full "configuration read" log :

9:16AM DBG Configuration read: &{PredictionOptions:{Model:airoboros-l2-70b-2.1.ggmlv3.Q4_0.bin Language: N:0 TopP:0.7 TopK:80 Temperature:1.5 Maxtokens:10 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:airoboros-70b F16:true Threads:16 Debug:true Roles:map[assistant:ASSISTANT: system:SYSTEM: user:USER:] Embeddings:true Backend:llama-stable TemplateConfig:{Chat:airoboros-chat ChatMessage: Completion:airoboros-completion Edit: Functions:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:8 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 MMap:false MMlock:false LowVRAM:false Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:2048 NUMA:false LoraAdapter: LoraBase: NoMulMatQ:false} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{PipelineType: SchedulerType: CUDA:false EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0}}

Parameters:

9:16AM DBG Parameters: &{PredictionOptions:{Model:airoboros-l2-70b-2.1.ggmlv3.Q4_0.bin Language: N:0 TopP:0.7 TopK:80 Temperature:1.5 Maxtokens:10 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:airoboros-70b F16:true Threads:16 Debug:true Roles:map[assistant:ASSISTANT: system:SYSTEM: user:USER:] Embeddings:true Backend:llama-stable TemplateConfig:{Chat:airoboros-chat ChatMessage: Completion:airoboros-completion Edit: Functions:} PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName:} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:8 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 MMap:false MMlock:false LowVRAM:false Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:2048 NUMA:false LoraAdapter: LoraBase: NoMulMatQ:false} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{PipelineType: SchedulerType: CUDA:false EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0}}

To Reproduce

Setup a model and try to change presence_penalty using body parameters

Expected behavior

Logs

Additional context

@flotos flotos added the bug Something isn't working label Sep 14, 2023
@flotos
Copy link
Author

flotos commented Sep 14, 2023

The same goes for top_k, while top_p works properly.

@joshuaipwork
Copy link

I can second that this is happening. The llama backend does load frequency_penalty from the config, but not from the request.

@blob42
Copy link
Contributor

blob42 commented Mar 8, 2024

same issue for me as well, anyway I could chip in to fix this ?

mudler pushed a commit that referenced this issue Mar 17, 2024
…1817)

* fix request debugging, disable marshalling of context fields

Signed-off-by: blob42 <contact@blob42.xyz>

* merge frequency_penalty request parm with config

Signed-off-by: blob42 <contact@blob42.xyz>

* openai: add presence_penalty parameter

Signed-off-by: blob42 <contact@blob42.xyz>

---------

Signed-off-by: blob42 <contact@blob42.xyz>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants