Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Translate Pydantic models into a regular expression that accept the corresponding YAML #923

Open
rlouf opened this issue May 27, 2024 · 2 comments

Comments

@rlouf
Copy link
Member

rlouf commented May 27, 2024

No description provided.

@lapp0
Copy link
Collaborator

lapp0 commented May 29, 2024

I love the idea, yaml uses fewer syntactic tokens and allows language models to generate without needing to keep track of as much "nesting" / context.

Here's what I'm thinking for a strategy, would love to hear your thoughts:

We should refactor fsm/json_schema.py so it uses a class-based approach with handler methods for each type. Then we can subclass to implement the different behavior in yaml.

class JSONSchemaRegexGenerator:
    def __init__(self):
        self.handlers = {
            "string": self.handle_string,
            "array": self.handle_array,
            ...
        }

    @classmethod
    def get_pattern(cls, schema):
        return cls().handle_node(schema)

    def get_pattern(self, node):
        handler = self.handlers.get(node["type"], self.handle_default)
        return handler(node)

    def handle_string(self, node):
        return STRING

    def handle_array(self, node):
        ...
        return rf"\[{whitespace_pattern}({'|'.join(regexes)})(,{whitespace_pattern}({'|'.join(regexes)})){num_repeats}){allow_empty}{whitespace_pattern}\]"


class YAMLSchemaRegexGenerator(JSONSchemaRegexGenerator):
    def handle_array(self, node):
        """handle format for yaml arrays:
            - elem0
            - elem1
        """
        ...     

This would make the code more readable, extensible, reduce technical debt, and make it so we don't have to have conditional handling for a passed is_yaml for many rules within to_regex()

@rlouf
Copy link
Member Author

rlouf commented Jun 5, 2024

I can get on board with this. To follow ast.NodeVisitor's naming scheme we could name the handlers visit_X. I think we should first implement a first version of the converter to YAML with only a few primitives before refactoring.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Todo
Development

No branches or pull requests

2 participants