How to make the LLM understand the definition of parties in a contract? #26345
Replies: 1 comment
-
|
To improve extraction of Seller/Buyer definitions in contracts with Dify and Ollama, focus on three areas: document chunking, prompt engineering, and structured output. First, ensure your contract files are preprocessed for clean extraction—DOCX extractor in Dify only pulls paragraphs and tables, so re-save or simplify complex DOCX files if needed to avoid missing content [source]. Next, Dify’s backend supports highly configurable chunking (chunk size, overlap, separator), but the UI exposes limited options. If you have backend access, adjust chunking parameters in your process rules to keep party definitions together in a single chunk. This helps the LLM see the full definition context [source]. For prompt engineering, enable structured output on your LLM node and define a JSON schema for the party info you want (e.g., company name, address, legal rep, ID). Use a prompt like: This will give you both a user-friendly summary and a machine-readable JSON object in the outputs [source]. If you’re using Ollama, check for prompt truncation—there’s a known issue where only 4096 tokens are passed even if your model supports more, which can cut off important context and hurt extraction accuracy [source]. Also, large models or long outputs may hit a 300-second timeout in the Ollama plugin, so consider adjusting or patching the timeout if you see incomplete responses [source]. Finally, use Dify’s Expert Mode to further customize prompts, retrieval TopK, and score thresholds for better context selection. If you continue to get “not specified in the document” responses, review the actual chunks being retrieved and the prompt logs to debug what context the LLM is seeing. Let me know if you need more details on chunking configuration or prompt examples! To reply, just mention @dosu. How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Self Checks
1. Is this request related to a challenge you're experiencing? Tell me about your story.
With Ollama and Dify deployed in Docker, I am setting up a very simple chat assistant to search some key points in a contract. The embedding model is bge-m3 and the LLM is gpt-oss:120b-cloud.
The definition of parties in the contract is as follows: company name, address, legal representatives name and ID (hereinafter the "Seller"), same for the definition of the Buyer. I have tried different lengths and methods of chunk, but the LLM just cannot give me the correct information of the Seller and the Buyer, saying "not specified in the document" or giving me something not directly related to the definition.
Can anyone tell me what to do?
2. Additional context or comments
No response
Beta Was this translation helpful? Give feedback.
All reactions