Skip to content

adds ToL portal updater as daily workflow#8

Merged
ccaio merged 1 commit into
mainfrom
update-tol-status
Mar 24, 2026
Merged

adds ToL portal updater as daily workflow#8
ccaio merged 1 commit into
mainfrom
update-tol-status

Conversation

@ccaio
Copy link
Copy Markdown
Contributor

@ccaio ccaio commented Mar 24, 2026

The daily updates for status of projects in the Tree of Life pipeline was blocked in GoaT because the access to the STS API had stopped. This completes the transfer to the ToL portal as the new source for ToL projects target and status lists.

This currently attends to status updates of the following projects:

  • AEGIS
  • ASG
  • BAT1K
  • DTOL
  • ERGA
  • ERGAPI
  • PSYCHE
  • VGP

Summary by Sourcery

Enhancements:

  • Introduce an automated flow to export ToL portal project status to a local TSV and mirror it to S3.

@ccaio ccaio self-assigned this Mar 24, 2026
@ccaio ccaio added bug Something isn't working enhancement New feature or request labels Mar 24, 2026
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai Bot commented Mar 24, 2026

Reviewer's guide (collapsed on small PRs)

Reviewer's Guide

Adds a new Prefect deployment to run a daily flow that updates Tree of Life project status data from the ToL portal and saves it locally and to S3 with basic record-count validation.

Sequence diagram for the new daily ToL portal status update flow

sequenceDiagram
  participant DailySchedule_daily
  participant PrefectDeployment_update_tol_portal_status
  participant PrefectWorkPool_goat_data_work_pool
  participant PrefectAgent_goat_data_work_pool
  participant Flow_update_tol_portal_status
  participant ToLPortal_API
  participant LocalFS_status_path
  participant S3_status_lists_bucket

  DailySchedule_daily->>PrefectDeployment_update_tol_portal_status: Trigger scheduled run
  PrefectDeployment_update_tol_portal_status->>PrefectWorkPool_goat_data_work_pool: Enqueue flow run request
  PrefectWorkPool_goat_data_work_pool->>PrefectAgent_goat_data_work_pool: Deliver flow run assignment
  PrefectAgent_goat_data_work_pool->>Flow_update_tol_portal_status: Start flow with parameters

  Flow_update_tol_portal_status->>ToLPortal_API: Fetch project status list
  ToLPortal_API-->>Flow_update_tol_portal_status: Return status TSV data

  Flow_update_tol_portal_status->>Flow_update_tol_portal_status: Validate min_records >= 8300

  Flow_update_tol_portal_status->>LocalFS_status_path: Write tol_project_status_expanded.tsv
  Flow_update_tol_portal_status->>S3_status_lists_bucket: Upload tol_project_status_expanded.tsv

  Flow_update_tol_portal_status-->>PrefectAgent_goat_data_work_pool: Report success
Loading

File-Level Changes

Change Details Files
Introduce a daily Prefect deployment for updating ToL portal project status data.
  • Add a new deployment named update-tol-portal-status pointing to the update_tol_portal_status flow entrypoint
  • Configure parameters for local TSV output path, S3 destination path, and a minimum expected record count threshold
  • Schedule the deployment to run on the existing daily schedule and goat_data work pool
flows/prefect.yaml

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • Consider moving the hardcoded local output path and S3 path into configuration or environment variables so they can be adjusted per environment without changing the deployment spec.
  • The min_records: 8300 value is a magic number; it would be clearer to reference a named configuration value or briefly document how this threshold is chosen and maintained.
  • The local path currently includes tmp/test which may be misleading for a production daily workflow; consider using a more clearly production-oriented directory or separating test and production output paths.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Consider moving the hardcoded local output path and S3 path into configuration or environment variables so they can be adjusted per environment without changing the deployment spec.
- The `min_records: 8300` value is a magic number; it would be clearer to reference a named configuration value or briefly document how this threshold is chosen and maintained.
- The local path currently includes `tmp/test` which may be misleading for a production daily workflow; consider using a more clearly production-oriented directory or separating test and production output paths.

## Individual Comments

### Comment 1
<location path="flows/prefect.yaml" line_range="245-246" />
<code_context>
+    entrypoint: flows/updaters/update_tol_portal_status.py:update_tol_portal_status
+    parameters:
+      # Local path to save the ToL portal status TSV file
+      output_path: "/home/ubuntu/tmp/test/status-lists/tol_project_status_expanded.tsv"
+      # The S3 path to save the ToL portal status TSV file
+      s3_path: s3://goat/resources/status-lists/tol_project_status_expanded.tsv
</code_context>
<issue_to_address>
**suggestion:** Consider avoiding a hard-coded environment-specific output path.

Using an absolute path under `/home/ubuntu/tmp/test/...` ties this to a specific machine and likely a temporary directory. Please make this path configurable (e.g., parameter default, env var, or shared base directory) so the flow can run cleanly in other environments.

```suggestion
      # Local path to save the ToL portal status TSV file (relative; override per-environment via deployment parameters)
      output_path: "status-lists/tol_project_status_expanded.tsv"
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread flows/prefect.yaml
Comment on lines +245 to +246
# Local path to save the ToL portal status TSV file
output_path: "/home/ubuntu/tmp/test/status-lists/tol_project_status_expanded.tsv"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Consider avoiding a hard-coded environment-specific output path.

Using an absolute path under /home/ubuntu/tmp/test/... ties this to a specific machine and likely a temporary directory. Please make this path configurable (e.g., parameter default, env var, or shared base directory) so the flow can run cleanly in other environments.

Suggested change
# Local path to save the ToL portal status TSV file
output_path: "/home/ubuntu/tmp/test/status-lists/tol_project_status_expanded.tsv"
# Local path to save the ToL portal status TSV file (relative; override per-environment via deployment parameters)
output_path: "status-lists/tol_project_status_expanded.tsv"

@ccaio ccaio merged commit e7635cb into main Mar 24, 2026
1 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working enhancement New feature or request

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

1 participant