fix(metrics): wrong usage of prompt #422
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
background
Here is the running time of answer correctness. It's clear that the json loading process takes too much time.
dive into the problem
The long json loading time indicates the model always outputs wrong json format. But it's strange since I am using GPT4 and the model should be clever enough to produce json output.
So I checked the model output and found the the model was misleading by the prompt examples.
Before this fix, the examples were all strings. When using the
json.dumps()
in theprompt.format()
, the output would be likeIt contains many escape characters, encouraging the model to output escape characters, too. When we want to recover the json object, theses escape characters make the
json.loads()
process fails easily.solution
To avoid this issue. When writing prompt examples, we should try to use the original JSON data (such as dictionaries, lists) as much as possible in the output section, rather than writing them as strings using """ "”"
here is the running time after the fixing.