fix(metrics): wrong usage of prompt #422

yuukidach · 2024-01-04T08:46:49Z

background

Here is the running time of answer correctness. It's clear that the json loading process takes too much time.

dive into the problem

The long json loading time indicates the model always outputs wrong json format. But it's strange since I am using GPT4 and the model should be clever enough to produce json output.

So I checked the model output and found the the model was misleading by the prompt examples.

Before this fix, the examples were all strings. When using the json.dumps() in the prompt.format(), the output would be like

Extracted statements: "{{\n                \"statements that are present in both the answer and the ground truth\": [\"The sun's primary function is to provide light\"],\n                \"statements present in the answer but not found in the ground truth\": [\"The sun is powered by nuclear fission\", \"similar to nuclear reactors on Earth\"],\n                \"relevant statements found in the ground truth but omitted in the answer\": [\"The sun is powered by nuclear fusion, not fission\", \"In its core, hydrogen atoms fuse to form helium, releasing a tremendous amount of energy\", \"This energy provides heat and light, essential for life on Earth\", \"The sun's light plays a critical role in Earth's climate system\", \"The sun helps to drive the weather and ocean currents\"]\n            }}\n            "

It contains many escape characters, encouraging the model to output escape characters, too. When we want to recover the json object, theses escape characters make the json.loads() process fails easily.

solution

To avoid this issue. When writing prompt examples, we should try to use the original JSON data (such as dictionaries, lists) as much as possible in the output section, rather than writing them as strings using """ "”"

here is the running time after the fixing.

shahules786 · 2024-01-04T09:04:37Z

Wow @yuukidach , great work. Thanks for pointing out this mistake.

yuukidach added 3 commits January 4, 2024 16:07

fix(metrics): wrong usage of prompt

66bd566

update

6df343b

update

db5e720

shahules786 approved these changes Jan 4, 2024

View reviewed changes

shahules786 merged commit be03218 into explodinggradients:main Jan 4, 2024
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(metrics): wrong usage of prompt #422

fix(metrics): wrong usage of prompt #422

yuukidach commented Jan 4, 2024 •

edited

shahules786 commented Jan 4, 2024

fix(metrics): wrong usage of prompt #422

fix(metrics): wrong usage of prompt #422

Conversation

yuukidach commented Jan 4, 2024 • edited

background

dive into the problem

solution

shahules786 commented Jan 4, 2024

yuukidach commented Jan 4, 2024 •

edited