Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue: Customizing 'structured_format_instructions' for Non-English Languages #5203

Closed
luc-kalaora opened this issue May 24, 2023 · 7 comments

Comments

@luc-kalaora
Copy link

luc-kalaora commented May 24, 2023

HI, I have a requirement to customize the format instructions for multiple languages.

Specifically, I need to make modifications to the output_parser.get_format_instructions() string. This function currently utilizes the following structured format instructions:

STRUCTURED_FORMAT_INSTRUCTIONS = """The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "

json" and "

":

{{
{format}
}}

To modify this value, I have considered the following approach:

langchain.output_parsers.structured.STRUCTURED_FORMAT_INSTRUCTIONS = """traduction of the string below"""

However, please note that this approach is not thread-safe. If multiple users are simultaneously using my application with different languages, there is a risk of interference between their settings.

Could you please advise on the appropriate solution for this issue?

How i can do? May i miss something?

Thanks for your help

Suggestion:

A solution could be to change get_format_instructions(self) -> str function by adding a string params def get_format_instructions(self, cust_str_format_instructions) -> str:

Another solution is to create a class inherited from StructuredOutputParser

from langchain.output_parsers.structured import _get_sub_string
from langchain.output_parsers.format_instructions import STRUCTURED_FORMAT_INSTRUCTIONS
from typing import List, Any



from pydantic import Field

class CustomStructuredOutputParser(StructuredOutputParser):
    language: str = Field(default=None)
    cust_struct_format_instructions: str = Field(default=None)

    def __init__(self, response_schemas: List[ResponseSchema], **data: Any):
        super().__init__(response_schemas=response_schemas, **data)
        if self.language == "fr_FR":
            self.cust_struct_format_instructions = """La sortie doit être un extrait de code au format markdown, formaté selon le schéma suivant, en incluant le début et la fin "\`\`\`json" et "\`\`\`":

            ```json
            {{
            {format}
            }}
            ```"""

    @classmethod
    def from_response_schemas(
        cls, response_schemas: List[ResponseSchema], language: str = None, cust_struct_format_instructions: str = None
    ) -> 'CustomStructuredOutputParser':
        return cls(response_schemas=response_schemas, language=language, cust_struct_format_instructions=cust_struct_format_instructions)

    def get_format_instructions(self) -> str:
        schema_str = "\n".join(
                [_get_sub_string(schema) for schema in self.response_schemas]
            )
        if self.cust_struct_format_instructions:
            return self.cust_struct_format_instructions.format(format=schema_str)
        return STRUCTURED_FORMAT_INSTRUCTIONS.format(format=schema_str)



summary_response_schemas = [
    ResponseSchema(name="resumé", description="Fournissez un résumé en une ou deux phrases."),
    ResponseSchema(name="types_réponses", description="Fournissez un objet JSON contenant jusqu'à 4 types de réponses distincts en tant que clés, et une description pour chaque type de réponse en tant que valeurs."),
]

summary_output_parser = CustomStructuredOutputParser.from_response_schemas(summary_response_schemas, language='EN_US')

summary_output_parser.get_format_instructions()

The problem is that if you are modifying the code, I have to maintain it.

@luc-kalaora luc-kalaora changed the title Issue: modify structure_format_instructions for other language than English Issue: modify structured_format_instructions for other language than English May 24, 2023
@luc-kalaora luc-kalaora changed the title Issue: modify structured_format_instructions for other language than English Issue: Customizing 'structured_format_instructions' for Non-English Languages May 24, 2023
@dosubot
Copy link

dosubot bot commented Aug 30, 2023

Hi, @luc-kalaora! I'm Dosu, and I'm here to help the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.

From what I understand, the issue you raised is about customizing the format instructions for non-English languages in the output_parser.get_format_instructions() function. You suggested two possible solutions: adding a string parameter to the function or creating a class inherited from StructuredOutputParser.

However, there hasn't been any activity on this issue yet, and it remains unresolved.

Could you please let us know if this issue is still relevant to the latest version of the LangChain repository? If it is, please comment on the issue to let us know. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your understanding and contribution to the LangChain project!

@dosubot dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Aug 30, 2023
@luc-kalaora
Copy link
Author

Yes this issue is still relevant.

@dosubot dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Aug 30, 2023
@dosubot
Copy link

dosubot bot commented Aug 30, 2023

@baskaryan Could you please help @luc-kalaora with the issue they raised about customizing the format instructions for non-English languages in the output_parser.get_format_instructions() function? They have confirmed that the issue is still relevant. Thank you!

@guidorietbroek
Copy link

I need to change the format instructions to fix the ReAct prompt. So I think it will be really helpful if this can be customized. I also don't understand why you want to hardcode this, format instructions are so important for the good working of your total prompt.

@guidorietbroek
Copy link

We've created a subclass of the ConvoOutputParser and that way it is possible to edit the get_format_instructions() method to load your custom format instructions.

@luc-kalaora
Copy link
Author

luc-kalaora commented Oct 11, 2023 via email

@DominiquePaul
Copy link

DominiquePaul commented Dec 20, 2023

Is there any guide on how to do this? I also have this problem, but for the pydantic json formatter. Wouldn't it be possible to add something like a language code when the parser is defined?

e.g. like this

PydanticOutputParser(pydantic_object=my_format_template, language="german")

@dosubot dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Mar 20, 2024
@dosubot dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 27, 2024
@dosubot dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Mar 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants