Extra generations cause max_tokens of AWSModels to halve permanently each time

Extra generations in the Predict Module here:

https://github.com/stanfordnlp/dspy/blob/main/dsp/primitives/predict.py#L96

halve the max_tokens value of the model in the kwargs and try again this is I think supposed to be a temporary halving since its reading that from a global settings on the dsp.settings.lm.kwargs["max_tokens"] which is on the lm model object and passing the halved value in as kwargs for that generation only.

However this halving in made permanent going forward by code in the AWSmodels code which takes that lm.kwargs dictionary as a reference  off "self" and sets the max_tokens back into it thus making all future generations start from the halved value and eventually causing all generations to end up at a limit of max_tokens=75 after a certain number of runs. 

The code that sets the max_tokens back on the lm model at the halved value I believe is here:

https://github.com/stanfordnlp/dspy/blob/main/dsp/modules/aws_models.py#L221-L223

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Extra generations cause max_tokens of AWSModels to halve permanently each time #1465

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Extra generations cause max_tokens of AWSModels to halve permanently each time #1465

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions