Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datatypes.Choice: If selected_key is not found, create it #123

Closed
wants to merge 3 commits into from

Conversation

irgolic
Copy link
Contributor

@irgolic irgolic commented Apr 11, 2023

Fixes #122, fixes #186

This is my take on it, not sure if it works well for all correct uses of choice 👀

@codecov
Copy link

codecov bot commented Apr 11, 2023

Codecov Report

Patch coverage: 83.33% and project coverage change: +0.01 🎉

Comparison is base (9b78f9c) 77.62% compared to head (e5d6aaf) 77.64%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #123      +/-   ##
==========================================
+ Coverage   77.62%   77.64%   +0.01%     
==========================================
  Files          63       64       +1     
  Lines        3754     3766      +12     
==========================================
+ Hits         2914     2924      +10     
- Misses        840      842       +2     
Flag Coverage Δ
unittests 77.64% <83.33%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
tests/unit_tests/test_choice.py 80.00% <80.00%> (ø)
guardrails/datatypes.py 87.76% <100.00%> (+0.08%) ⬆️

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@ShreyaR
Copy link
Collaborator

ShreyaR commented Apr 12, 2023

Ah my concern with this approach would be that there may be other keys in the schema that might not belong within the choice object. E.g. from #122

<output>
    <string name="action" choices="new_file,edit_file,finished"/>
    <object name="new_file" description="new_file" if="action==new_file">
        <string name="filepath" description="Path to the newly created file." required="true"/>
        <string name="description" description="Description of the contents of the new file." required="true"/>
    </object>
    <object name="edit_file" description="edit_file" if="action==edit_file">
        <string name="filepath" description="Path to the file to be edited." required="true"/>
        <string name="description" description="Description of the changes to be made to the file." required="true"/>
        <integer name="start_line" description="The line number of the first line of the hunk to be edited." format="positive" required="false" on-fail="noop"/>
        <integer name="end_line" description="The line number of the last line of the hunk to be edited. Keep the hunk as short as possible while fulfilling the description." format="positive" required="false" on-fail="noop"/>
    </object>
    <string name="finished" description="commit_message: A more appropriate commit message based on the actions taken." required="true" if="action==finished"/>
    <string name="other_metadata" description="..." />
</output>

might have schema such as

{
    "action": "edit_file",
    "filepath": ...
    "description": ...
    "start_line": ...
    "end_line": ...
    "other_metadata": ...
}

In this case, other_metadata will end up moving to edit_file as well.

One way to resolve this issue that doesn't run into this problem could be to add few shot examples at the end of complete_json_suffix_v2 that show how to interpret choice correctly. That's probably the low-hanging fruit -- if you get a chance to test it out, lmk how it works!

Another option that is more involved is to generate a skeleton of what the correct structure should look like given the schema, and pass that in the prompt.

@irgolic
Copy link
Contributor Author

irgolic commented Apr 20, 2023

Sorry for the late reply on this.

<complete_json_suffix_v2>
Given below is XML that describes the information to extract from this document and the tags to extract it into.

{output_schema}

ONLY return a valid JSON object (no other text is necessary), where the key of the field in JSON is the `name` attribute of the corresponding XML, and the value is of the type specified by the corresponding XML's tag. The JSON MUST conform to the XML format, including any types and format requests e.g. requests for lists, objects and specific types. Be correct and concise.

Here are examples of simple (XML, JSON) pairs that show the expected behavior:
- `<![CDATA[<string name='foo' format='two-words lower-case' />`]]> => `{{{{'foo': 'example one'}}}}`
- `<![CDATA[<list name='bar'><string format='upper-case' /></list>]]>` => `{{{{"bar": ['STRING ONE', 'STRING TWO', etc.]}}}}`
- `<![CDATA[<object name='baz'><string name="foo" format="capitalize two-words" /><integer name="index" format="1-indexed" /></object>]]>` => `{{{{'baz': {{{{'foo': 'Some String', 'index': 1}}}}}}}}`
</complete_json_suffix_v2>

The lowest hanging fruit is adding more examples for each new complex type to complete_json_suffix_v2, but it feels weird to use up context window space for something you don't need most of the time. Unless you can think of a short way to write an example for choice?

For now I've changed this PR to the simplest fix for the issue (avoids throwing the exception).

@irgolic
Copy link
Contributor Author

irgolic commented Jun 21, 2023

Since the structure fix will be addressed by reasking according to the json skeleton (#185), my suggestion is to simply avoid throwing the exception. The check is already performed in the field validator, it's just that the datatype validator incorrectly assumes the key is present.

Copy link

@SophieDC98 SophieDC98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perfect!

@smyja
Copy link

smyja commented Jul 9, 2023

Since the structure fix will be addressed by reasking according to the json skeleton (#185), my suggestion is to simply avoid throwing the exception. The check is already performed in the field validator, it's just that the datatype validator incorrectly assumes the key is present.

any update on this? or its deprecated?

@irgolic
Copy link
Contributor Author

irgolic commented Jul 27, 2023

Let's revisit this after the validator refactor (#228) and pydantic choice discriminator change, those both affect this greatly.

@irgolic
Copy link
Contributor Author

irgolic commented Sep 12, 2023

Choice is now generated very differently, I think this fix is irrelevant. Closing for now.

@irgolic irgolic closed this Sep 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

KeyError: None [bug] KeyError on choice
4 participants