-
Notifications
You must be signed in to change notification settings - Fork 2
add documentation for DAG feature #23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
* Initial implementation * Improvements * add test cases and validation (#19) * add test cases and validation * update validation and test cases * fix test file path * add check for leading arrows and fix test cases * correct arrow count comparison * remove duplicate test cases * better group validation function * split functions and add index to errors * refactor test cases * refactor test cases * make whitespaces vary * better validation function * abstract running tasks in concurrently * do not set output for the leaf tasks * check bounds and check leading arrow first * fix typos Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * address PR suggestions --------- Co-authored-by: Divyanshu Tiwari <33171967+divyanshu-tiwari@users.noreply.github.com> Co-authored-by: Divyanshu Tiwari <divyanshu20.tiwari@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces comprehensive documentation for the experimental DAG (Directed Acyclic Graph) feature, which enables complex pipeline architectures with parallel processing, branching, and merging capabilities.
Key changes:
- Added a dedicated DAG_README.md file with detailed documentation covering syntax, patterns, validation rules, best practices, and troubleshooting
- Updated the main README.md with a brief introduction to DAG functionality and example usage
- Documented the DAG syntax operators (
>>for sequential,[]for parallel) and common execution patterns
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| README.md | Added new section introducing the experimental DAG feature with basic syntax examples and a reference to the detailed documentation |
| DAG_README.md | Created comprehensive documentation covering DAG syntax, patterns, configuration examples, validation rules, migration guide, troubleshooting, and best practices |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.
|
|
||
| 1. **Syntax Errors** | ||
| ``` | ||
| Error: invalid DAG groups: error at index X, unmatched closing brace ']' found |
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spelling error: "brace" should be "bracket". The error message refers to ']' which is a bracket, not a brace. Braces are '{}'.
| Error: invalid DAG groups: error at index X, unmatched closing brace ']' found | |
| Error: invalid DAG groups: error at index X, unmatched closing bracket ']' found |
| only_data: true | ||
|
|
||
| # DAG definition | ||
| dag: read_csv_file >> [split_to_lines, echo] >> convert_from_csv >> echo2 |
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The example DAG read_csv_file >> [split_to_lines, echo] >> convert_from_csv >> echo2 appears to have a logical issue. The split_to_lines task produces multiple records (splits input into lines), but then both it and echo feed into convert_from_csv. This creates a fan-in pattern where convert_from_csv would receive data from two sources, which may not be the intended behavior. Consider clarifying this example or using a more straightforward pattern that better demonstrates the DAG feature without potential confusion about data flow.
| dag: read_csv_file >> [split_to_lines, echo] >> convert_from_csv >> echo2 | |
| dag: read_csv_file >> split_to_lines >> convert_from_csv >> echo2 |
| - Sequential: `task1 >> task2 >> task3` | ||
| - Parallel: `[task1, task2, task3]` | ||
| - Fan-out: `task1 >> [task2, task3]` | ||
| - Fan-in: `[task1, task2] >> task3` | ||
| - Diamond: `task1 >> [task2, task3] >> task4` |
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Inconsistent list formatting. The other bullet points in this section use - with a space after it, but some patterns like "Sequential", "Parallel", etc. are formatted as - Sequential: (with a colon). For consistency with the rest of the document's style, consider whether these should follow the same formatting pattern as other list items in the README.
| - Sequential: `task1 >> task2 >> task3` | |
| - Parallel: `[task1, task2, task3]` | |
| - Fan-out: `task1 >> [task2, task3]` | |
| - Fan-in: `[task1, task2] >> task3` | |
| - Diamond: `task1 >> [task2, task3] >> task4` | |
| - **Sequential** `task1 >> task2 >> task3` | |
| - **Parallel** `[task1, task2, task3]` | |
| - **Fan-out** `task1 >> [task2, task3]` | |
| - **Fan-in** `[task1, task2] >> task3` | |
| - **Diamond** `task1 >> [task2, task3] >> task4` |
| - **No empty groups**: `[]` is invalid | ||
| - **No single-item groups**: `[task1]` is invalid (use `task1` directly) | ||
| - **Valid characters only**: Letters, numbers, `_`, `-`, `[`, `]`, `,`, `>`, whitespace | ||
| - **Proper arrow usage**: Only `>>` allowed, no single `>` or `>>>+` |
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent notation: The rule states "no >>>+" which is unclear. It likely means "no >>> or more" (three or more consecutive > characters), but the + notation typically means "one or more" in regex, which would contradict the intended meaning. Consider clarifying this as "no >>> (three or more > characters)" or "only >> allowed (not >, >>>, etc.)".
| - **Proper arrow usage**: Only `>>` allowed, no single `>` or `>>>+` | |
| - **Proper arrow usage**: Only `>>` allowed; no single `>` or three or more `>` characters (e.g., `>>>`) |
| tasks: | ||
| - name: task1 | ||
| type: file | ||
| - name: task2 |
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trailing whitespace detected after "task2". This should be removed for consistency with code style standards.
| - name: task2 | |
| - name: task2 |
| tasks: | ||
| - name: task1 | ||
| type: file | ||
| - name: task2 |
Copilot
AI
Nov 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trailing whitespace detected after "task2". This should be removed for consistency with code style standards.
| - name: task2 | |
| - name: task2 |
Description
Added documentation for DAG feature.
Types of changes
Checklist