Rewriting scaling documentation

stellar · Jun 7, 2024 · 4b4b611 · 4b4b611
1 parent 4c25859
commit 4b4b611
Show file tree

Hide file tree

Showing 11 changed files with 35 additions and 18 deletions.
diff --git a/network/horizon/admin-guide/prerequisites.mdx b/network/horizon/admin-guide/prerequisites.mdx
@@ -34,7 +34,7 @@ These specifications assume a 30-day retention window for data storage. For a lo
 
 ## Multiple Instance Deployment
 
-To achieve high availability, redundancy, and high throughput, explore the [scaling](./scaling.mdx) strategy. It provides detailed prerequisites and guidelines to determine the appropriate [number of Horizon instances](./configuring.mdx#multiple-instance-deployment) to deploy.
+To achieve high availability, redundancy, and high throughput, refer to the [scaling](./scaling.mdx) documentation. It provides a detailed overview of several different deployment strategies you can employ, depending on the SLA you need your Horizon instance to achieve.
 
 ## Network Access
 

diff --git a/network/horizon/admin-guide/scaling.mdx b/network/horizon/admin-guide/scaling.mdx
@@ -3,38 +3,55 @@ title: Scaling
 sidebar_position: 70
 ---
 
-As alluded to in the discussion in [Prerequisites](./prerequisites.mdx), Horizon encompasses different logical tiers that can be scaled independently for high throughput, isolation, and high availability. The following components can be independently scaled:
+Horizon enables different logical tiers that can be scaled independently for increasing throughput, isolation, and availability. The following components can be independently scaled:
 
 - Web service API (serving)
 - Captive Core (ingestion and transaction submission)
-- Database (storage)
+- Database (storage) 
 
-As always, scaling encompasses a spectrum. A few common scaling architectures follow.
+## Single Instance Deployment
 
-## Single VM
+It is recommend to start with a [single instance deployment](./prerequisites.mdx), and scale up based on the needs of your particular use-case. 
 
-As a starting point, for development purposes or low load environments with limited history retention (e.g. a few ledger entries), a single VM would suffice.
+This [deployment](./configuring.mdx#single-instance-deployment) is intended for use with minimal history retention (<= 30 days) and minimal request volume.
 
-![](/assets/horizon-scaling/Topology-1VM.png)
+In this setup, a single instance of Horizon performs all three [roles](./configuring.mdx#multiple-instance-deployment); ingestion, transaction submission, and end-user API requests.
 
-## Low to Medium Load
+![](/assets/horizon-scaling/Topology-single.png)
 
-For low to medium load environments with up to 30-90 days of data history retention and modest API request traffic, this configuration isolates the database instance from the API service and ingestion process.
+## Scaling to Multiple Instances
 
-![](/assets/horizon-scaling/Topology-2VMs.png)
+There are a few reasons you may choose to scale to multiple instances of Horizon.
 
-## Enterprise _n_-Tier
+- Horizontally scaling enables you to serve more API requests and at a faster rate
+- Redundancy enables zero downtime in the cases where Horizon requires downtime on upgrade (migrations, state rebuilds, etc)
+- Protection against potential ingestion lag, which could result in downtime for end-users
 
-This architecture services high request and data processing throughput with isolation and redundancy for each component. Scale the API service horizontally by adding a load balancer in front of multiple API service instances, each only limited by the database I/O limit. If necessary, use ALB routing to direct specific endpoints to specific request-serving instances, which are tied to a specific, dedicated DB. Now, if an intense endpoint gets clobbered, all other endpoints are unaffected.
+Multiple instances of Horizon can be configured to potin to the same database, and the ingestion process will not perform redudant work in these cases. 
 
-Database instances can be scaled when the I/O limit is reached by using read-only replicated copies that stay in sync and a read/write instance connected to Captive Core. Each DB replica can support a set of request servers to support additional horizontal scaling.
+When scaling Horizon, it is worth it to note that Horizon's [rate limiting](../api-reference/structure/rate-limiting.mdx) should be disabled and rate limiting should be managed external to Horizon within infrastructure. Horizon's rate limiting implementation is managed in-memory, so does not work with multiple instances. 
 
-Additionally, a second Captive Core instance shares ingestion load and serves as a backup in case of an instance failure.
+![](/assets/horizon-scaling/Topology-multiple.png)
 
-![](/assets/horizon-scaling/Topology-Enterprise.png)
+## Logically Isolating Ingestion
 
-### Redundant Hot Backup
+Ingestion is the process by which new ledgers are propagated into Horizon's database. It's health is critical, as degredations in performance can result in falling behind the last closed ledger, leaving your end-users unaware of the current state of the network, and unable to successfully submit new transactions. Any lag in ingestion would likely be considered downtime for your service
 
-The entire architecture can be replicated to a second cluster. The backup cluster can be upgraded independently or fail-overed to with no downtime. Additionally, capacity can be doubled in an emergency if needed. This is synonymous with the [Blue/Green deployment model](https://en.wikipedia.org/wiki/Blue%E2%80%93green_deployment).
+Horizon allows you to independently configure the different [roles](./configuring.mdx#multiple-instance-deployment) that it performs, including ingestion. The below diagram illustrates how you could logically separate the instances serving API requests from the instances performing ingestion, and introduce a read-only replica database in order to further isolate these components. This setup has quite a few advantages:
 
-![](/assets/horizon-scaling/Topology-Enterprise-HotBackup.png)
+- Each "role" Horizon plays can be independently scaled
+- API instances are significantly ligher weight from a hardware requirements perspective, since they do not need to run captive core
+- API instances can be horizontally scaled or dynamically scaled, based on your specific end-user needs
+- Ingestion and it's performance is isolated from API activity, so bursts in user activity cannot degrade it and cause ingestion lag. Ingestion health is critical, as degredations in performance can result in falling behind the last closed ledger, leaving your end-users unaware of the current state of the network, and unable to successfully submit new transactions
+
+The Horizon API role requires only read-only permissions to a database for all actions it performs. However, the API instances will need to delegate all transaction submission requests to an instance which runs captive core. Further database replicas could be added if necessary to support more requests.
+
+![](/assets/horizon-scaling/Topology-ingestion-isolation.png)
+
+## Logically Isolating Transaction Submission
+
+In the above example, ingestion is safely isolated from most API traffic, which has historically been the large majority of traffic. However, transaction submission still needs to be served by a core instance, and so API instances must passthrough their transaction submission requests to an ingesting instance.
+
+The below diagram illustrates how we could further isolate (and scale) transaction submission, by way of using core watcher instances, rather than Horizon instances running captive core. This allows us to further protect ingestion, preventing downtime and ingestion lag. It also makes it possible to horizontally scale transaction submission itself, independent of the rest of the API traffic.
+
+![](/assets/horizon-scaling/Topology-ingestion-isolation.png)
diff --git a/static/assets/horizon-scaling/Topology-1VM.png b/static/assets/horizon-scaling/Topology-1VM.png
diff --git a/static/assets/horizon-scaling/Topology-2VMs.png b/static/assets/horizon-scaling/Topology-2VMs.png
diff --git a/static/assets/horizon-scaling/Topology-3VMs.png b/static/assets/horizon-scaling/Topology-3VMs.png
diff --git a/static/assets/horizon-scaling/Topology-Enterprise-HotBackup.png b/static/assets/horizon-scaling/Topology-Enterprise-HotBackup.png
diff --git a/static/assets/horizon-scaling/Topology-Enterprise.png b/static/assets/horizon-scaling/Topology-Enterprise.png
diff --git a/static/assets/horizon-scaling/Topology-ingestion-isolation.png b/static/assets/horizon-scaling/Topology-ingestion-isolation.png
diff --git a/static/assets/horizon-scaling/Topology-multiple.png b/static/assets/horizon-scaling/Topology-multiple.png
diff --git a/static/assets/horizon-scaling/Topology-single.png b/static/assets/horizon-scaling/Topology-single.png
diff --git a/static/assets/horizon-scaling/Topology-txsub.png b/static/assets/horizon-scaling/Topology-txsub.png