-
Notifications
You must be signed in to change notification settings - Fork 129
Implement server provisioning #300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
AngelOnFira marked this conversation as resolved.
Show resolved
Hide resolved
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,6 @@ | ||
| cluster: | ||
MasterPtato marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| api_endpoint: https://api.eg.rivet.gg | ||
| telemetry: | ||
| disabled: false | ||
| tokens: | ||
| cloud: null | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,138 @@ | ||
| # Autoscaling | ||
MasterPtato marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
MasterPtato marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| The autoscaler service runs every 15 seconds. | ||
|
|
||
| ## Why memory? | ||
MasterPtato marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| The autoscaler uses CPU usage for GG nodes and memory usage for job nodes. This is because certain cloud providers like linode do not provide an actual value for the speed of the CPU, but rather the amount of cores. This is problematic because we use Nomad's API for determining the usage on any given node, and it returns its stats in MHz. | ||
MasterPtato marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| ## Hardware failover | ||
|
|
||
| Before a job server provisioned, we don't know for sure what its specs will be because of the hardware failover system in `cluster-server-provision`. In the autoscaling process, all servers that aren't provisioned yet are assumed to have the specs of the first hardware option in the list. | ||
MasterPtato marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| ### Failover has lower specs | ||
MasterPtato marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| In the event that the hardware which ended up being provisioned has lower specs than the first hardware in the list, the autoscaler will calculate the error between how much was expected and how much was actually provisioned. This error number corresponds to how many more servers might be needed to reach the desired server count. | ||
|
|
||
| Here is an example of the process in action: | ||
|
|
||
| | time since start | desired count | expected total memory | actual total memory | | ||
| | ---------------- | ------------- | --------------------- | ------------------- | | ||
| | 0s | 2 | 2000MB | 0MB | | ||
|
|
||
| We start with 0 servers provisioned, and 2 desired. Our config consists of two hardwares, the first having 1000MB of memory and the second having 500MB of memory. With our failover system if the first one fails to provision, the second will be provisioned. | ||
|
|
||
| | time since start | desired count | expected total memory | actual total memory | | ||
| | ---------------- | ------------- | --------------------- | ------------------- | | ||
| | 0s | 2 | 2000MB | 0MB | | ||
| | 15s | 3 | 2000MB | 1000MB | | ||
MasterPtato marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| After the first iteration, the autoscaler provisioned 2 servers which both ended up failing over and only providing a total of 1000MB of memory. The autoscaler then proceeds to calculate the error like so: | ||
|
|
||
| ```rust | ||
| ceil(expected - actual) / expected_memory_per_server) | ||
|
|
||
| ceil((2000 - 1000) / 1000) = 1 | ||
| ``` | ||
|
|
||
| So an extra server was added to the desired count. | ||
|
|
||
| Now, if the next server to be provisioned ends up having 1000MB like it should, we will end up having the original amount of desired memory. | ||
|
|
||
| | time since start | desired count | expected total memory | actual total memory | | ||
| | ---------------- | ------------- | --------------------- | ------------------- | | ||
| | 0s | 2 | 2000MB | 0MB | | ||
| | 15s | 3 | 2000MB | 1000MB | | ||
| | 30s | 3 | 2000MB | 2000MB | | ||
|
|
||
| The error calculation would now be: | ||
|
|
||
| ```rust | ||
| ceil((3000 - 2000) / 1000) = 1 | ||
MasterPtato marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| ``` | ||
|
|
||
| So the error count stays the same and we stay at 3 desired servers. | ||
|
|
||
| However, if the server provisioned was again a failover server, we would have this scenario: | ||
|
|
||
| | time since start | desired count | expected total memory | actual total memory | | ||
| | ---------------- | ------------- | --------------------- | ------------------- | | ||
| | 0s | 2 | 2000MB | 0MB | | ||
| | 15s | 3 | 2000MB | 1000MB | | ||
| | 30s | 4 | 2000MB | 1500MB | | ||
|
|
||
| We end up with two extra servers to provision atop our original 2. | ||
|
|
||
| ```rust | ||
| ceil((3000 - 1500) / 1000) = 2 | ||
| ``` | ||
|
|
||
| | time since start | desired count | expected total memory | actual total memory | | ||
| | ---------------- | ------------- | --------------------- | ------------------- | | ||
| | 0s | 2 | 2000MB | 0MB | | ||
| | 15s | 3 | 2000MB | 1000MB | | ||
| | 30s | 4 | 2000MB | 1500MB | | ||
| | 45s | 4 | 2000MB | 2000MB | | ||
|
|
||
| And finally we reach the desired capacity. | ||
|
|
||
| ### Failover has higher specs | ||
MasterPtato marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| In the event that the failover hardware has higher specs than the desired amount, there is no error system that reduces the desired count to account for this difference. This is because there is no direct correlation between desired count and the hardware being provisioned and destroyed. Thus, if hardware with higher than expected specs is provisioned, that extra space will not be taken into account. | ||
|
|
||
| If it was taken into account in a similar error system as failover with lower specs, it would look like this: | ||
|
|
||
| | time since start | desired count | expected total memory | actual total memory | | ||
| | ---------------- | ------------- | --------------------- | ------------------- | | ||
| | 0s | 1 | 1000MB | 2000MB | | ||
|
|
||
| Error: | ||
|
|
||
| ```rust | ||
| ceil(expected - actual) / expected_memory_per_server) | ||
|
|
||
| ceil((1000 - 2000) / 1000) = -1 | ||
| ``` | ||
|
|
||
| The original desired count + error would be 0, destroying the only server and causing the capacity to drop to 0. If the higher-spec'd failover kept getting provisioned, this would end up in a loop. | ||
|
|
||
| ## Job server autoscaling | ||
|
|
||
| The nomad topology for each job server in a datacenter is fetched and the memory is aggregated. This value is then divided by the expected memory capacity (the capacity of the first hardware in the config), which determines the minimum expected server count required to accommodate the current usage. Then, we add the error value (discussed above) and the margin value which is configured in the namespace config. | ||
MasterPtato marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| ### Autoscaling via machine learning | ||
|
|
||
| Coming soon | ||
|
|
||
| ## GG server autoscaling | ||
|
|
||
| Because we do not need to be preemptive with GG servers, the autoscaling is a bit more simple. | ||
MasterPtato marked this conversation as resolved.
Show resolved
Hide resolved
MasterPtato marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| - If the current CPU usage is more than 20% under the total, add a server. | ||
| - If the current CPU usage is less than 130% under the total, remove a server. | ||
MasterPtato marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| Examples: | ||
|
|
||
| ```rust | ||
| // 3 servers | ||
| total_cpu = 300 | ||
| cpu_usage = 285 | ||
|
|
||
| // result: add a server | ||
| ``` | ||
|
|
||
| ```rust | ||
| // 1 server | ||
| total_cpu = 100 | ||
| cpu_usage = 70 | ||
|
|
||
| // result: do nothing | ||
| ``` | ||
|
|
||
| ```rust | ||
| // 4 servers | ||
| total_cpu = 400 | ||
| cpu_usage = 250 | ||
|
|
||
| // result: remove a server | ||
| ``` | ||
MasterPtato marked this conversation as resolved.
Show resolved
Hide resolved
|
||
AngelOnFira marked this conversation as resolved.
Show resolved
Hide resolved
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,43 @@ | ||
| # Automatic Server Provisioning | ||
MasterPtato marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| Server provisioning handles everything responsible for getting servers running and installed for game lobbies to run on. Server provisioning occurs in the `cluster` package and is automatically brought up and down to desired levels via `cluster-datacenter-scale`. | ||
MasterPtato marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| ## Motivation | ||
|
|
||
| Server provisioning was created to allow for quick and stateful configuration of the game server topology on Rivet. This system was also written with the intention to allow clients to choose their own hardware options and server providers. | ||
|
|
||
| In the future, an autoscaling system will be hooked up to the provisioning system to allow the system to scale up to meet spikes in demand, and scale down when load is decreased to save on costs. | ||
MasterPtato marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| ## Basic structure | ||
|
|
||
| There are currently three types of servers that work together to host game lobbies: | ||
|
|
||
| - ### ATS | ||
|
|
||
| ATS servers host game images via Apache Traffic server. The caching feature provided by ATS along with ATS node being in the same datacenter as the Job node allows for very quick lobby start times. | ||
|
|
||
| - ### Job | ||
|
|
||
| Job servers run Nomad which handles the orchestration of the game lobbies themselves. | ||
|
|
||
| - ### GG | ||
|
|
||
| GameGuard nodes serve as a proxy for all incoming game connection and provide DoS protection. | ||
MasterPtato marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| ## Why are servers in the same availability zone (aka datacenter or region) | ||
MasterPtato marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| Servers are placed in the same region for two reasons: | ||
MasterPtato marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| 1. ### VLAN + Network Constraints | ||
|
|
||
| Servers rely on VLAN to communicate between each other. | ||
|
|
||
| 2. ### Latency | ||
|
|
||
| Having all of the required components to run a Job server on the edge, (i.e. in the same datacenter) allows for very quick lobby start times. | ||
|
|
||
| ## Prior art | ||
|
|
||
| - https://console.aiven.io/project/rivet-3143/new-service?serviceType=pg | ||
| - https://karpenter.sh/docs/concepts/nodepools/ | ||
| - Nomad autoscaler | ||
NathanFlurry marked this conversation as resolved.
Show resolved
Hide resolved
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,66 @@ | ||
| # [rivet.run](http://rivet.run) DNS & TLS Configuration | ||
MasterPtato marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| ## Moving parts | ||
MasterPtato marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| #### TLS Cert | ||
MasterPtato marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| - Can only have 1 wildcard | ||
| - i.e. `*.lobby.{dc_id}.rivet.run` | ||
| - Takes a long time to issue | ||
| - Prone to Lets Encrypt downtime and [rate limits](https://letsencrypt.org/docs/rate-limits/) | ||
| - Nathan requested a rate limit increase for when this is needed | ||
MasterPtato marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| #### DNS record | ||
|
|
||
| - Must point to the IP of the datacenter we need | ||
| - i.e. `*.lobby.{dc_id}.rivet.run` goes to the GG Node for the given datacenter | ||
| - `*.rivet.run` will not work as a static DNS record because you can’t point it at a single datacenter | ||
|
|
||
| #### GG host resolution | ||
MasterPtato marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| - When a request hits the GG server for HTTP(S) or TCP+TLS requests, we need to be able to resolve the lobby to send it to | ||
| - This is why the lobby ID Needs to be in the DNS name | ||
|
|
||
| #### GG autoscaling | ||
|
|
||
| - The IPs that the DNS records point to change frequently as GG nodes scale up and down | ||
|
|
||
| ## Design | ||
|
|
||
| #### DNS records | ||
MasterPtato marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| Dynamically create a DNS record for each GG node formatted like `*.lobby.{dc_id}.rivet.run`. Example: | ||
|
|
||
| ```bash | ||
| A *.lobby.51f3d45e-693f-4470-b86d-66980edd87ec.rivet.run 1.2.3.4 # DC foo, GG node 1 | ||
| A *.lobby.51f3d45e-693f-4470-b86d-66980edd87ec.rivet.run 5.6.7.8 # DC foo, GG node 2 | ||
| A *.lobby.51f3d45e-693f-4470-b86d-66980edd87ec.rivet.run 9.10.11.12 # DC bar, GG node 1 | ||
| ``` | ||
|
|
||
| These the IPs of these records change as the GG nodes scale up and down, but the origin stays the same. | ||
|
|
||
| #### TLS certs | ||
MasterPtato marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| Each datacenter needs a TLS cert. For the example above, we need a TLS cert for `*.lobby.51f3d45e-693f-4470-b86d-66980edd87ec.rivet.run` and `*.lobby.51f3d45e-693f-4470-b86d-66980edd87ec.rivet.run`. | ||
|
|
||
| ## TLS | ||
MasterPtato marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| #### TLS cert provider | ||
MasterPtato marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| Currently we use Lets Encrypt as our TLS certificate provider. | ||
|
|
||
| Alternatives: | ||
|
|
||
| - ZeroSSL | ||
MasterPtato marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| #### TLS cert refreshing | ||
|
|
||
| Right now, the TLS certs are issued in the Terraform plan. Eventually, TLS certs should renew on their own automatically. | ||
|
|
||
| ## TLS Alternatives | ||
MasterPtato marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| #### Use `*.rivet.run` TLS cert with custom DNS server | ||
MasterPtato marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| Create a `NS` record for `*.rivet.run` pointed at our custom DNS server | ||
|
|
||
| We can use a single static TLS cert | ||
MasterPtato marked this conversation as resolved.
Show resolved
Hide resolved
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,23 @@ | ||
| # yaml-language-server: $schema=https://raw.githubusercontent.com/fern-api/fern/main/fern.schema.json | ||
|
|
||
| imports: | ||
| localCommons: ../common.yml | ||
|
|
||
| service: | ||
| auth: true | ||
| base-path: /cluster | ||
| endpoints: | ||
| getServerIps: | ||
| path: /server_ips | ||
| method: GET | ||
| request: | ||
| name: GetServerIpsRequest | ||
| query-parameters: | ||
| server_id: optional<uuid> | ||
| pool: optional<localCommons.PoolType> | ||
| response: GetServerIpsResponse | ||
|
|
||
| types: | ||
| GetServerIpsResponse: | ||
| properties: | ||
| ips: list<string> |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| # yaml-language-server: $schema=https://raw.githubusercontent.com/fern-api/fern/main/fern.schema.json | ||
| types: | ||
| PoolType: | ||
| enum: | ||
| - job | ||
| - gg | ||
| - ats |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.