Skip to content

cipher#1

Open
misikdmytro wants to merge 25 commits intomasterfrom
impl
Open

cipher#1
misikdmytro wants to merge 25 commits intomasterfrom
impl

Conversation

@misikdmytro
Copy link
Copy Markdown
Owner

@misikdmytro misikdmytro commented Apr 8, 2026

Cipher — Secret Rotation Service

Distributed secret rotation service that automates AWS Secrets Manager credential lifecycle: scheduled rotation, blue/green secret strategy, and webhook-based event notifications.

Architecture

Four services in a Cargo workspace (crates/):

Service Role Persistence
API HTTP REST facade (Actix-web). CRUD for secrets metadata, schedules, webhooks. OpenAPI docs via utoipa. PostgreSQL (sqlx)
Scheduler Cron-based orchestrator. Triggers rotations by publishing rotation.scheduled events. PostgreSQL (sqlx)
Rotator Stateless worker. Executes rotation against AWS Secrets Manager (blue/green strategy, STS AssumeRole). Publishes rotation.started / done / failed.
Notificator Event consumer. Delivers webhook notifications on rotation lifecycle events.

Shared domain and proto crates for domain types and gRPC definitions.

Communication

  • Sync: gRPC (tonic) — API ↔ Scheduler, API ↔ Rotator, API ↔ Notificator, Rotator → API
  • Async: RabbitMQ / AMQP (lapin) — event-driven pipeline for rotation lifecycle

Key features

  • Full CRUD API for secrets, schedules, and webhooks
  • Scheduled and manual secret rotation
  • Blue/green rotation strategy for zero-downtime credential rollover
  • AWS STS AssumeRole support for cross-account rotation
  • Request validation and error handling middleware
  • Structured logging and tracing configuration
  • Integration tests
  • OpenAPI documentation

Stack

actix-web, tonic (gRPC + protobuf), sqlx (PostgreSQL), lapin (RabbitMQ), aws-sdk-secretsmanager, aws-sdk-sts, tokio, tracing, utoipa, serde

.pool_max_idle_per_host(10)
.redirect(reqwest::redirect::Policy::none())
.build()
.expect("failed to build HTTP client");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't test code, it runs during service startup. If something goes wrong with TLS config or the system SSL stack, the whole service panics with no useful error. Return a Result from new_webhook_delivery_service() and let main() decide whether to panic.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I see it, failing to create the HTTP client here is an unrecoverable error for this application. Since the service's primary responsibility is making HTTP requests, we don't have a meaningful fallback strategy besides logging and exiting. In my opinion, panicking in this context is an acceptable and effectively the only viable approach. Whether we trigger the panic directly within the constructor or propagate an error to main() is a debatable point, but since the service cannot function without this client, the final outcome remains the same.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but if this method provides the functionality to create the client, then maybe it's not the method that should decide whether to panic now. But in general yes, the question is debatable

)]
pub(in crate::handlers) async fn delete_webhook(
State(state): State<Arc<AppState>>,
Path((_secret_id, webhook_id)): Path<(uuid::Uuid, uuid::Uuid)>,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

secret_id is thrown away, deletion only uses webhook_id. This means DELETE /secrets/A/webhooks/1 will happily delete a webhook that belongs to secret B. That breaks REST semantics and is a potential authorization issue

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just change the endpoint and remove the path to the secrets resource

Comment on lines +15 to +24
impl Default for RabbitMqConfig {
fn default() -> Self {
Self {
host: "localhost".to_string(),
port: 5672,
username: "cipher".to_string(),
password: "cipher".to_string(),
}
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having username: "cipher", password: "cipher" in Default is fine for local dev, but it's a footgun. If someone accidentally constructs a config via Default::default() instead of loading it from file, the service silently connects with hardcoded creds. I suggest removing Default and force everyone to go through load_config.

Copy link
Copy Markdown

@pokhanto pokhanto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Monumental work!
Everything feels very well thought through and in place - it is real pleasure to read and navigate in this code. And more than that, this solves real world problem in a very practical way.

Copy link
Copy Markdown

@YevhenHrynov YevhenHrynov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid PR. I’m impressed with how you handled the service boundaries - it actually makes sense as a system, which is rare.

Code reads well, and the type-driven error handling is a nice touch. I spent some time looking for more issues but didn't find anything worth mentioning.

The things I flagged above are worth addressing but none of them are blockers. Solid foundation, nice job.

Comment on lines +58 to +66
}))
}
Err(ServiceError::NotFound) => Err(Status::not_found("secret not found")),
Err(ServiceError::PersistenceError(e)) => {
Err(Status::internal(format!("database error: {e}")))
}
Err(e) => Err(Status::internal(format!("error: {e}"))),
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider extracting the error mapping logic into an impl From<ServiceError> for tonic::Status. This will significantly declutter the handler.

Additionally, be mindful of returning raw string representations of errors (like PersistenceError and the catch-all arm) directly to the client. You should establish a clear boundary indicating which errors are safe to expose with informative messages, and which should obscure internal implementation details to prevent data leakage - even within internal service-to-service communication.

Comment on lines +10 to +14
#[derive(Serialize, ToSchema)]
pub struct PaginatedResponse<T: Serialize + utoipa::ToSchema> {
pub items: Vec<T>,
pub total: u64,
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The utoipa::ToSchema trait bound on the generic type T doesn't appear to serve a functional purpose in this struct definition. I'd recommend reviewing whether it's strictly necessary here.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be needed because of the items (not sure though)

Comment on lines +45 to +51
match body.validate() {
Ok(_) => (),
Err(e) => {
let error_response: ErrorResponse = e.into();
return (StatusCode::BAD_REQUEST, Json(error_response)).into_response();
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a perfect use case for an if let Err(e) = body.validate() statement rather than a full match block.

Comment on lines +103 to +108
pub(in crate::handlers) async fn list_secrets(
State(state): State<Arc<AppState>>,
Query(params): Query<PaginationParams>,
) -> impl IntoResponse {
let limit = params.limit.unwrap_or(20).clamp(1, 100);
let offset = params.offset.unwrap_or(0);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default page size and the pagination bounds are currently hardcoded. It would be much better to make these values configurable, or at the very least, extract them into named constants.

Comment on lines +194 to +225
pub(in crate::handlers) async fn activate_blue_green(
State(state): State<Arc<AppState>>,
Path(id): Path<uuid::Uuid>,
) -> impl IntoResponse {
match state.secrets_service.activate_blue_green(id).await {
Ok(new_slot) => {
let active_slot = match new_slot {
ActiveSlot::Blue => "blue".to_string(),
ActiveSlot::Green => "green".to_string(),
};
(
StatusCode::OK,
Json(ActivateBlueGreenResponse { active_slot }),
)
.into_response()
}
Err(ServiceError::NotFound) => (
StatusCode::NOT_FOUND,
Json(ErrorResponse {
message: "Secret not found".to_string(),
}),
)
.into_response(),
Err(ServiceError::InvalidOperation(msg)) => (
StatusCode::BAD_REQUEST,
Json(ErrorResponse { message: msg }),
)
.into_response(),
Err(e) => {
tracing::error!(error = ?e, "Unexpected error in activate_blue_green");
(
StatusCode::INTERNAL_SERVER_ERROR,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to my previous note, the handler is heavily cluttered with explicit error mapping boilerplate. Ideally, handlers should focus purely on request orchestration. You can achieve this by implementing IntoResponse for your custom ServiceError type.

Comment on lines +41 to +71
impl From<SecretRow> for Secret {
fn from(row: SecretRow) -> Self {
let strategy = match row.strategy_type.as_str() {
"single" => StrategyConfig::Single(SingleStrategyConfig {
path: row
.single_path
.expect("single_path must be set for single strategy"),
}),
"blue_green" => StrategyConfig::BlueGreen(BlueGreenStrategyConfig {
blue_path: row
.bg_blue_path
.expect("bg_blue_path must be set for blue_green strategy"),
green_path: row
.bg_green_path
.expect("bg_green_path must be set for blue_green strategy"),
active_slot: row
.bg_active_slot
.expect("bg_active_slot must be set for blue_green strategy")
.into(),
}),
other => panic!("unknown strategy_type: {other}"),
};

let provider = match row.provider_type.as_str() {
"aws" => ProviderConfig::Aws(AwsProviderConfig {
role_arn: row
.aws_role_arn
.expect("aws_role_arn must be set for aws provider"),
}),
other => panic!("unknown provider_type: {other}"),
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I understand the rationale behind using .expect() and panic!() here, I strongly recommend implementing TryFrom instead of From, propagating the errors upward. This prevents the entire application from panicking in the event of malformed, unexpected, or missing data in the database.

Comment on lines +100 to +109
let provider_type = match &request.provider {
ProviderConfig::Aws(_) => "aws",
};
let aws_role_arn = match &request.provider {
ProviderConfig::Aws(aws) => aws.role_arn.clone(),
};
let strategy_type = match &request.strategy {
StrategyConfig::Single(_) => "single",
StrategyConfig::BlueGreen(_) => "blue_green",
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you need to map enum variants to their string representations, it would be cleaner to encapsulate this logic within the enum itself. Consider implementing standard traits like Display, AsRef<str>, or simply adding a .as_str() method to the enums instead of matching inline.

Comment on lines +7 to +13
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct RabbitMqConfig {
pub host: String,
pub port: u16,
pub username: String,
pub password: String,
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Structs containing sensitive data (like passwords, API keys, etc.) generally should not implement Default. Furthermore, since this struct derives Debug, I highly recommend wrapping the sensitive fields in secrecy::SecretString.

Comment on lines +58 to +66
impl DatabaseConfig {
pub fn connection_string(&self) -> String {
let ssl_mode = if self.use_ssl { "require" } else { "disable" };
format!(
"postgres://{}:{}@{}:{}/{}?sslmode={}",
self.username, self.password, self.host, self.port, self.database, ssl_mode
)
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using sqlx::postgres::PgConnectOptions to programmatically build the connection configuration rather than manually formatting a plaintext connection string. It is safer, less error-prone, and handles URL encoding out of the box.

Copy link
Copy Markdown

@nikitosikvn1 nikitosikvn1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job, Dmytro! I really like how the project is shaping up, it’s clear you’ve invested a lot of time and effort into this. I’ve left a few minor comments, but they’re more 'food for thought' rather than critical issues.

if !response.status().is_success() {
return Err(DeliveryError::StatusError(response.status()));
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since treating 4xx and 5xx status codes as errors is a standard requirement, you can simplify this logic by using the built-in error_for_status() method.

Comment on lines +16 to +29
pub async fn serve(state: Arc<AppState>, shutdown: CancellationToken) -> Result<()> {
let config = ConsumerConfig {
queue_name: "notificator.rotation.events".to_string(),
consumer_tag: "notificator".to_string(),
exchange: "rotation".to_string(),
routing_keys: vec![
"rotation.started".to_string(),
"rotation.done".to_string(),
"rotation.ready".to_string(),
"rotation.failed".to_string(),
],
dead_letter_exchange: Some("rotation.dlx".to_string()),
dead_letter_routing_key: Some("notificator.rotation.events".to_string()),
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider extracting these parameters (queue name, exchange, and routing keys) into a configuration file or environment variables.

Copy link
Copy Markdown

@bzadorozhnyi bzadorozhnyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Impressive and interesting project!

Nit: I noticed several functions with around six levels of indentation. It would be helpful to extract some of that logic into smaller helper functions.

pub id: Uuid,
pub strategy: StrategyConfig,
pub provider: ProviderConfig,
pub created_at: chrono::NaiveDateTime,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utc might be useful here? not sure what time is stored here (ui / api server / cloud / etc)

fn default() -> Self {
Self {
username: "cipher".to_string(),
password: "cipher".to_string(),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also recommend move secrets outside the commitable codebase

Comment thread crates/common/src/lib.rs
pub mod shutdown;

#[macro_export]
macro_rules! repo_error {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice :)

.webhooks_repository
.delete_webhook(webhook_id)
.await
.map_err(|e| Status::internal(format!("failed to delete webhook: {e}")))?;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: implementing From<> for Status would make the whole file cleaner (using just ? instead of map_err)

pub id: Uuid,
pub secret_id: Uuid,
pub url: String,
pub created_at: chrono::NaiveDateTime,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

webhook handler most probably would like to know the timezone :)

let max_attempts = 3;
let mut last_error = None;

for attempt in 0..max_attempts {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider moving retry outside the service (for instance - enqueue a message for each attempt) - to be able to retry even after service crash

Copy link
Copy Markdown

@vol-lev vol-lev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first pleasant experience reviewing such a huge PR! Great work with microservices! Resilient approach with rabbitmq (and ack-ed messages), nice trait usage, consistent error handling approach - good technical job and nice product! Can't wait for the release :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants