Fail builds on app.toml parse errors instead of silently continuing#765
Conversation
The build server was swallowing loadAppConfig errors with a warning log and continuing as if no app.toml existed. This meant any parse failure (unknown fields, malformed TOML, validation errors) silently dropped all services, env vars, and build config defined in app.toml. In practice this hit us when a v0.7.0 CLI sent an [aliases] section that the v0.6.1 server didn't recognize. DisallowUnknownFields fired correctly, but the error was discarded. The deploy succeeded with 1 service instead of 3, causing a prolonged outage. Both call sites (buildFromDir and AnalyzeApp) now return the error instead of swallowing it. The build path also sends the error to the client via sendErrorStatus so the user sees the actual parse diagnostic (with file path, line numbers, and "did you mean?" suggestions).
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThis pull request adds a test to verify that unknown configuration fields are properly rejected during app config parsing, and modifies error handling in the build process. Previously, app config loading failures were logged as warnings and execution continued with an unchanged configuration. The changes now cause the build and analysis operations to fail fast when app config loading encounters errors, returning wrapped error messages to the client instead of proceeding with partial or nil configuration state. Comment |
The build server had this pattern in two places where it loaded app.toml:
Any parse error got logged and discarded, and the build continued as if no app.toml existed. Services, env vars, build config, addons -- all silently gone. The user saw a successful deploy with the wrong number of services and no indication that anything went wrong.
This bit us on a deploy where a v0.7.0 CLI sent an
[aliases]section to a v0.6.1 server.DisallowUnknownFieldscorrectly rejected the unknown field, but the error was swallowed. The app deployed with 1 service instead of 3, leading to a ~23 hour degradation and eventual outage when crash cooldown kicked in.The fix is straightforward: return the error instead of ignoring it. The build path now also sends the parse error to the client via
sendErrorStatus, so the user gets the full diagnostic (file path, line number, "did you mean?" suggestions) rather than a mysterious "no services defined" error downstream.The build server code change itself doesn't have direct test coverage.
buildFromDiris deeply wired into the Builder and hard to unit test in isolation, but we're on the cusp of breaking that whole section into a saga which will make it much more testable. In the meantime we have unit test coverage on the appconfig parsing layer confirming that unknown fields are correctly rejected, including a new version-skew scenario test added here.Closes MIR-1016