adds ToL portal updater as daily workflow#8
Merged
Merged
Conversation
Contributor
Reviewer's guide (collapsed on small PRs)Reviewer's GuideAdds a new Prefect deployment to run a daily flow that updates Tree of Life project status data from the ToL portal and saves it locally and to S3 with basic record-count validation. Sequence diagram for the new daily ToL portal status update flowsequenceDiagram
participant DailySchedule_daily
participant PrefectDeployment_update_tol_portal_status
participant PrefectWorkPool_goat_data_work_pool
participant PrefectAgent_goat_data_work_pool
participant Flow_update_tol_portal_status
participant ToLPortal_API
participant LocalFS_status_path
participant S3_status_lists_bucket
DailySchedule_daily->>PrefectDeployment_update_tol_portal_status: Trigger scheduled run
PrefectDeployment_update_tol_portal_status->>PrefectWorkPool_goat_data_work_pool: Enqueue flow run request
PrefectWorkPool_goat_data_work_pool->>PrefectAgent_goat_data_work_pool: Deliver flow run assignment
PrefectAgent_goat_data_work_pool->>Flow_update_tol_portal_status: Start flow with parameters
Flow_update_tol_portal_status->>ToLPortal_API: Fetch project status list
ToLPortal_API-->>Flow_update_tol_portal_status: Return status TSV data
Flow_update_tol_portal_status->>Flow_update_tol_portal_status: Validate min_records >= 8300
Flow_update_tol_portal_status->>LocalFS_status_path: Write tol_project_status_expanded.tsv
Flow_update_tol_portal_status->>S3_status_lists_bucket: Upload tol_project_status_expanded.tsv
Flow_update_tol_portal_status-->>PrefectAgent_goat_data_work_pool: Report success
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Contributor
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- Consider moving the hardcoded local output path and S3 path into configuration or environment variables so they can be adjusted per environment without changing the deployment spec.
- The
min_records: 8300value is a magic number; it would be clearer to reference a named configuration value or briefly document how this threshold is chosen and maintained. - The local path currently includes
tmp/testwhich may be misleading for a production daily workflow; consider using a more clearly production-oriented directory or separating test and production output paths.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Consider moving the hardcoded local output path and S3 path into configuration or environment variables so they can be adjusted per environment without changing the deployment spec.
- The `min_records: 8300` value is a magic number; it would be clearer to reference a named configuration value or briefly document how this threshold is chosen and maintained.
- The local path currently includes `tmp/test` which may be misleading for a production daily workflow; consider using a more clearly production-oriented directory or separating test and production output paths.
## Individual Comments
### Comment 1
<location path="flows/prefect.yaml" line_range="245-246" />
<code_context>
+ entrypoint: flows/updaters/update_tol_portal_status.py:update_tol_portal_status
+ parameters:
+ # Local path to save the ToL portal status TSV file
+ output_path: "/home/ubuntu/tmp/test/status-lists/tol_project_status_expanded.tsv"
+ # The S3 path to save the ToL portal status TSV file
+ s3_path: s3://goat/resources/status-lists/tol_project_status_expanded.tsv
</code_context>
<issue_to_address>
**suggestion:** Consider avoiding a hard-coded environment-specific output path.
Using an absolute path under `/home/ubuntu/tmp/test/...` ties this to a specific machine and likely a temporary directory. Please make this path configurable (e.g., parameter default, env var, or shared base directory) so the flow can run cleanly in other environments.
```suggestion
# Local path to save the ToL portal status TSV file (relative; override per-environment via deployment parameters)
output_path: "status-lists/tol_project_status_expanded.tsv"
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Comment on lines
+245
to
+246
| # Local path to save the ToL portal status TSV file | ||
| output_path: "/home/ubuntu/tmp/test/status-lists/tol_project_status_expanded.tsv" |
Contributor
There was a problem hiding this comment.
suggestion: Consider avoiding a hard-coded environment-specific output path.
Using an absolute path under /home/ubuntu/tmp/test/... ties this to a specific machine and likely a temporary directory. Please make this path configurable (e.g., parameter default, env var, or shared base directory) so the flow can run cleanly in other environments.
Suggested change
| # Local path to save the ToL portal status TSV file | |
| output_path: "/home/ubuntu/tmp/test/status-lists/tol_project_status_expanded.tsv" | |
| # Local path to save the ToL portal status TSV file (relative; override per-environment via deployment parameters) | |
| output_path: "status-lists/tol_project_status_expanded.tsv" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The daily updates for status of projects in the Tree of Life pipeline was blocked in GoaT because the access to the STS API had stopped. This completes the transfer to the ToL portal as the new source for ToL projects target and status lists.
This currently attends to status updates of the following projects:
Summary by Sourcery
Enhancements: