Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/taxonomy/knowledge/file_structure.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ Key | Type | Required | Constraints | Value | Notes
`created_by` | string | Y | no spaces | Your GitHub username (for the upstream taxonomy) or your name with no spaces (for general intructlab use) | -
`domain` | string | Y | - | Knowledge sub-category | The knowledge domain which is used in prompts to the teacher model during synthetic data generation. The domain should be brief such as the title to a textbook chapter or section.
`seed_examples` | Y | array | at least 5 sets | null | This is a collection of questions and answers with context from the knowledge document that InstructLab uses to generate data synthetically.
`context` | string | Y | < 500 tokens | A chunk of the knowledge document showing off the different **unique** content to help guide the teacher model. If the knowledge documents have only text, all context would be text. If the knowledge documnets have tables or other content formats, ensure samples of those formats are all used. | This should be a copy-paste from the Markdown version of your document
`context` | string | Y | < 500 tokens | A chunk of the knowledge document showing off the different **unique** content to help guide the teacher model. If the knowledge documents have only text, all context would be text. If the knowledge documents have tables or other content formats, ensure samples of those formats are all used. | This should be a copy-paste from the Markdown version of your document
`questions_and_answers` | Y | array | at least 3 pairs per context | null | This is a collection of questions and answers.
`question` | Y | string | \> 250 tokens | A question related to and grounded in the relevant context | Questions are things you'd expect someone to ask the model based on the context given. This will be used for synthetic data generation.
`answer` | Y | string | \> 250 tokens | An answer for the question, longer than a one-word or one-number answer | Answers are what you'd like the model to give as an answer. It will not be an exact answer the model always gives.
Expand Down