Is solid_queue designed for distributed systems like Kubernetes?

Hi there! 👋

I've been exploring solid_queue as a potential solution for our project, and I wanted to share some observations and questions about its architecture, particularly in the context of distributed systems and Kubernetes environments.

## Context

I'm currently evaluating solid_queue for a project running on Kubernetes. Before investing time in a POC, I'd like to understand whether my assumptions about its design goals are correct.

## My understanding

Based on the principle from *Designing Data-Intensive Applications*:

> "There is no single system that can satisfy all data storage, querying and processing needs. In practice, most nontrivial applications need to combine several different technologies to satisfy their requirements."
> 

In Kubernetes environments, pods run applications with concurrent execution capabilities, allowing for multiple concurrent executions. This led me to wonder about a few architectural aspects of solid_queue.

## Questions and observations

### Job distribution in distributed environments

In solid_queue's architecture, there isn't a mechanism to determine which specific pod will process a given job. This differs from traditional message broker patterns where:

- **Producers and consumers are separate entities**: In solid_queue, the consumer is also the producer
- **Centralized orchestration**: Message brokers centralize data and can arbitrarily assign messages to consumers
- **Durability and reliability**: By centralizing data in the broker, these systems can more easily tolerate clients that connect, disconnect, or crash

### Potential challenges I'm considering

**Backpressure handling**: What happens if producers send messages faster than consumers can process them? Without a centralized server to orchestrate processing, how does solid_queue handle backpressure or buffer messages?

**Fault tolerance**: What happens if pods/nodes crash or temporarily go offline? Are any messages at risk of being lost?

**Worker recovery**: What happens if a worker is killed (e.g., OOMKill)? How does the system handle worker restart?

### Related GitHub issues

I noticed several issues that seem related to Kubernetes deployments:

- https://github.com/rails/solid_queue/issues/636
- https://github.com/rails/solid_queue/issues/591
- https://github.com/rails/solid_queue/issues/585

### My current hypothesis

It seems that solid_queue might be optimized for environments like Basecamp's, where they're not using Kubernetes. According to their blog posts, they use Kamal for deployment on bare-metal/VMs:

- https://world.hey.com/dhh/why-we-re-leaving-the-cloud-654b47e0
- https://basecamp.com/cloud-exit

> "It's kinda wild to think that it's been less than three months since we decided to scrap Kubernetes and pursue a simpler solution for the cloud exit with Kamal. And that we've already moved half of the cloud applications that need to come home!"
> 

[[Reference](https://world.hey.com/dhh/the-hardware-we-need-for-our-cloud-exit-has-arrived-99d66966)]

## Alternative approaches

For comparison, systems like [Temporal](https://temporal.io/) provide centralized orchestration that addresses these distributed system principles:

- https://levelup.gitconnected.com/temporal-worker-architecture-and-scaling-af0c670ce6c1
- https://docs.temporal.io/evaluate/development-production-features/

## My question

**Is my assumption correct that solid_queue was primarily designed for non-distributed, single-server or small-cluster environments rather than distributed systems like Kubernetes?**

If I'm mistaken, I'd be very interested to learn about:

- Use cases where solid_queue has been successfully deployed in Kubernetes environments
- Recommended patterns or configurations for distributed deployments
- Any architectural features I might have missed that address these concerns

Thank you for your time and for creating this project! I really appreciate the work that's gone into it 👏🏼 , and I'm genuinely curious to understand its design philosophy better.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Is solid_queue designed for distributed systems like Kubernetes? #685

Context

My understanding

Questions and observations

Job distribution in distributed environments

Potential challenges I'm considering

Related GitHub issues

My current hypothesis

Alternative approaches

My question

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Is solid_queue designed for distributed systems like Kubernetes? #685

Description

Context

My understanding

Questions and observations

Job distribution in distributed environments

Potential challenges I'm considering

Related GitHub issues

My current hypothesis

Alternative approaches

My question

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions