🐞 Windmill and Harvest fail to start on production deployments with AWS S3#2522
Conversation
There was a problem hiding this comment.
Pull request overview
Fixes production startup failures for Windmill/Harvest when using AWS S3 by making plugin loading and S3 listing compatible with bucket-hosted S3 endpoints, and by surfacing initialization failures instead of silently swallowing them.
Changes:
- Propagate
PluginManagerinitialization/loading errors with properanyhow::Contextinstead of ignoring them. - Add S3 endpoint “bucket-hosted URL” parsing + bucket/prefix rewriting so list-based operations (plugin discovery, bulk deletes, exports) work against AWS S3.
- Improve Celery app initialization failure visibility by logging the error before panicking.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| packages/windmill/src/services/plugins_manager/plugin_manager.rs | Avoids swallowing plugin manager init/load failures; modernizes Option handling. |
| packages/windmill/src/services/celery_app.rs | Logs Celery initialization errors before panicking to aid diagnosis in production. |
| packages/sequent-core/src/util/aws.rs | Introduces named env constants and clarifies the S3 config selection flag. |
| packages/sequent-core/src/services/s3.rs | Implements AWS bucket-hosted endpoint rewriting for list operations and adds focused unit tests. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
This PR addresses production startup failures for Windmill/Harvest deployments using AWS S3 by improving initialization error propagation and making S3 “list-style” operations work with AWS bucket-hosted endpoint URIs.
Changes:
- Propagate
PluginManagerinitialization/load failures (instead of silently ignoring them) with clearer error context. - Improve Celery app initialization failures by logging the full error before panicking.
- Refactor S3 list/delete helpers to correctly handle AWS bucket-hosted endpoints by rewriting the endpoint + mapping “logical bucket” to a key prefix; adds unit tests for the endpoint parsing/rewriting logic.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| packages/windmill/src/services/plugins_manager/plugin_manager.rs | Improves plugin loading flow and ensures plugin-manager initialization errors are not swallowed. |
| packages/windmill/src/services/celery_app.rs | Logs Celery app init errors before panicking during singleton initialization. |
| packages/sequent-core/src/util/aws.rs | Introduces env-var constants and updates AWS/S3 config helpers/docstrings. |
| packages/sequent-core/src/services/s3.rs | Adds AWS bucket-hosted endpoint parsing + list target resolution; updates list/delete code paths and adds tests. |
Comments suppressed due to low confidence (2)
packages/windmill/src/services/plugins_manager/plugin_manager.rs:235
get_plugin_managernow propagates errors frominit_plugin_manager(). Becauseinit_plugin_manager()uses a non-atomicget()check followed byset(), concurrent calls can race: one task may successfullyset, while the other fails with "PluginManager already initialized" andget_plugin_managerreturns an error even though the singleton is ready. Consider using an async-awaretokio::sync::OnceCell::get_or_try_init(or similar) or treating the "already initialized"setfailure as success and then returningPLUGIN_MANAGER.get().unwrap().
pub async fn get_plugin_manager() -> Result<&'static PluginManager> {
let plugin_manager = match PLUGIN_MANAGER.get() {
Some(manager) => manager,
_ => {
init_plugin_manager()
.await
.context("Failed to initialize PluginManager")?;
PLUGIN_MANAGER
.get()
.expect("PluginManager should be initialized")
}
packages/sequent-core/src/util/aws.rs:20
- The new docstring for
get_regionsays it "keeps the default chain as a fallback", but the implementation still hard-errors ifAWS_REGIONis missing (std::env::var("AWS_REGION")?). Either update the docs to reflect thatAWS_REGIONis required, or change the implementation to truly fall back (e.g., avoid erroring on missingAWS_REGIONand letRegionProviderChain::default_provider()/or_elsehandle it).
/// Resolves the AWS region from the environment and keeps the default chain
/// as a fallback so local and deployed runtimes share the same lookup flow.
pub fn get_region() -> Result<RegionProviderChain> {
let region = RegionProviderChain::first_try(Region::new(
std::env::var("AWS_REGION")
.map_err(|err| anyhow!("AWS_REGION env var missing: {err}"))?,
))
.or_default_provider()
.or_else(Region::new("us-east-1"));
Ok(region)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Parent issue: https://github.com/sequentech/meta/issues/11570