You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Running the following query crashed TGI on the tech stack above. Would be great if someone could reproduce the issue.
Running the example without using the grammar parameter works as expected.
# Define the response format: Start
from enum import Enum
from pydantic import BaseModel, Field
from typing import Optional
from text_generation.types import GrammarType, Grammar
class Gender(str, Enum):
male = 'male'
female = 'female'
diverse = 'diverse'
class SmokerStatus(str, Enum):
smoker = "yes"
non_smoker = "no"
class AllFields(BaseModel):
CompanyName: Optional[str] = Field(None, description="Name of the insurance company. "
"Correct answer can never be a bank,"
"comparison portal")
ApplicationDate: Optional[str] = Field(None, description="Date when the application form was signed.")
NameInsuredPerson: Optional[str] = Field(None, description="First name of Insured Person.")
SurnameInsuredPerson: Optional[str] = Field(None, description="For German applications, include"
"Dr. title as part of the surname if applicable. "
"Dr. does not need to be denoted further"
"such as in Dr.med. Other titles must not appear.")
DateOfBirthInsuredPerson: Optional[str] = None
CompanyReference: Optional[str] = Field(None, description="ID assigned by the company to the"
"insurance application filed. The answer must"
"not be an IBAN number.")
Occupation: Optional[str] = None
MaritalStatus: Optional[str] = None
Sex: Optional[Gender] = None
Height: Optional[float] = Field(None, gt=1.0)
Weight: Optional[float] = Field(None, gt=1.0)
Smoker: Optional[SmokerStatus] = None
NamePolicyHolder: Optional[str] = Field(None, description="First name of Policyholder.")
SurnamePolicyHolder: Optional[str] = Field(None, description="For German applications, include"
"Dr. title as part of the surname if applicable. "
"Dr. does not need to be denoted further"
"such as in Dr.med.")
Name: Optional[str] = Field(None, description="Name of insurance product.")
StartDate: Optional[str] = Field(None, description="Start date of insurance contract.")
Term: Optional[str] = Field(None, description="Duration of insurance contract.")
MonthlyPension: Optional[float] = Field(None, gt=1.0)
YearlyPension: Optional[float] = Field(None, gt=1.0)
SumAssured: Optional[float] = Field(None, gt=1.0)
BenefitAppliedFor: Optional[float] = Field(None, gt=1.0)
ExistingCover: Optional[float] = Field(None, gt=1.0, description="Sum assured (cover) of existing insurance contracted earlier.")
response_format = AllFields.model_json_schema()
response_type = GrammarType.Json
response_grammar = Grammar(type=response_type, value=response_format)
# Define the response format: End
# Define the query
specify_query = "Deine Aufgabe ist, es die wichtigsten Daten und Angaben aus dem Antrag zusammenzufassen. Hier ist der Text aus dem Antrag, den du zusammenfassen sollst: "
txt = """Hier beginnt Seite 1 im Dokument:
Betreff: Ergänzungen zum Antrag Müller / RiLV / 123456788 An: leben@allgemeineversicherung.de Datum: 2024-03-01 08:49:10
Der Versicherungsnehmer Max Müller, geboren am 1. Februar 1982, ist von Beruf Rechtsanwalt.
Hier beginnt Seite 2 im Dokument:
Abschluss einer Risikolebensversicherung bei der Allgemeine Versicherungs AG in Höhe von 800.000€. Versicherte Person ist Max Müller. Versicherungsnehmerin ist die Ehefrau Marina Müller.
Versicherungsstart ist 01.04.2024. Die Versicherung läuft bis zum 65. Lebensjahr."""
PROMPT = f"""[INST] {specify_query + txt} [/INST]"""
# Call the client using the grammar parameter
# llm_client is an instance of the text_generation Python client
llm_client.generate(PROMPT,max_new_tokens=450, grammar=response_grammar)
We are restricting the batch size to 1 was because we anticipated an issue with mixed requests using grammar or not, but it made no difference.
In the debug logs is no error message whatsoever. The only thing that helps is to restart the pod.
Expected behavior
Code should reduce a valid json. It does so if a certain share of fields in the pydantic class are removed. However, it does not depend which fields. So having 5-6 of them is fine.
The text was updated successfully, but these errors were encountered:
Hi!
The LLM has to worry about generating the JSON as well as the fields in the schema, I think that's the issue. Grammar works 99% of the time really well with smaller schemas. I have to admit I've never seen a schema so long, but the use-case is absolutely something that should work effectively.
I've been doing some reading around schema based generation and I came across this article from Lamini here... it looks like they present to the LLM the pre-filled JSON, this saves on compute plus all the LLM has to do is generate the field contents. The schema parsing would never fail this way. @drbh I'm not totally sure on the implementation currently in TGI but I'm assuming the LLM is also generating the JSON right now. Is there scope to implement something like this going forwards? I can see great benefit in this if so :)
Thanks.
System Info
Tech stack: tgi 2.0.1, A100 GPU 80GB running on Kubernetes.
Model: Mixtral 8x7B-instruct-v0.1
Information
Tasks
Reproduction
Running the following query crashed TGI on the tech stack above. Would be great if someone could reproduce the issue.
Running the example without using the
grammar
parameter works as expected.We are calling tgi as follows:
We are restricting the batch size to 1 was because we anticipated an issue with mixed requests using grammar or not, but it made no difference.
In the debug logs is no error message whatsoever. The only thing that helps is to restart the pod.
Expected behavior
Code should reduce a valid json. It does so if a certain share of fields in the pydantic class are removed. However, it does not depend which fields. So having 5-6 of them is fine.
The text was updated successfully, but these errors were encountered: