Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arbitrary Counts and Lists for GameServers, SDKs and Allocation #2716

Closed
50 tasks done
markmandel opened this issue Aug 24, 2022 · 36 comments
Closed
50 tasks done

Arbitrary Counts and Lists for GameServers, SDKs and Allocation #2716

markmandel opened this issue Aug 24, 2022 · 36 comments
Assignees
Labels
awaiting-maintainer Block issues from being stale/obsolete/closed kind/design Proposal discussing new features / fixes and how they should be implemented kind/feature New features for Agones

Comments

@markmandel
Copy link
Member

markmandel commented Aug 24, 2022

Objective

With the recent work with Player Tracking, as well as it’s cross over into High Density Game Server support / re-allocation of Allocated GameServers, it seems that to be able to provide arbitrary count values and/or lists of values that are tied to GameServers, much like Player Tracking values are right now, is very useful for a wide variety of use cases.

This feature design’s contention is to replace Player Tracking with a generic way to track general counts as well as lists against a GameServer by an user provided key, as well as with integrated allocation, Fleet scheduling and SDK support, such that it can support the use case of player tracking as it currently stands, but also use cases like multi-tenant room server counting, or any other game specific value that could be utilised for a custom integration.

An added benefit would be that simple gauge data as metrics would be exposed as well, although we may not want to advocate this as a blessed path for only exporting metrics, if not taking advantage of other functionality.

This feature would be built behind the GameServerCountsAndLists feature gate and should be on-par with PlayerTracking before the PlayerTracking functionality is removed.

Requirements

  • Define on a GameServer a set of attached lists and/or counters attached to an arbitrary, user supplied key
  • Out of scope: The ability to add/edit or delete GameServer keys for counters and list at runtime. Keys should be explicitly predefined with the GameServer definition to put some limits on what can be stored against fuetcd and ideally avoid overloading the Kubernetes API control plane (although we will need strong documentation about this, as this will definitely put extra load on the control plane).

Counters

  • Counters can have an initial value (0 is the default).
  • Counters can have a set capacity (maximum value), but by default are 0 (max of int64).
    • We deliberately are using the term “capacity” across both list and counters to be consistent between the two pieces of functionality..
  • Incrementation / decrementation below 0 or above set capacity will be a no-op. I.e. No operations to increment/decrement a counter will error.
  • Counters must be >= 0
  • SDK capability to atomically get, increment, decrement and set a counter local value, which is then set to the backing GameServer CRD status.
    • Note: There are race conditions we can’t avoid between SDK updates and Allocation or external updates.
  • SDK capability to change the maximum value for a counter.
  • The ability to atomically increment or decrement counts on allocation
    • If a user wants to ensure there is room for the increment or decrement, that should be explicitly included in the filter options (i.e. decrement by one, but filter for counts that are > 0 so that there is something to decrement).
    • If an attempt is made to increment/decrement a GameServer that does have the specified counter (e.g. through an allocation), the operation is ignored.

Lists

  • Can set a capacity. Defaults to 1000. dsa
  • Capacity can be no longer than 1000 items. This could possibly be expanded in the future depending on use cases and/or performance.
  • SDK capability to atomically add, remove and check if values are in a list’s local value, which is then set to the backing GameServer CRD status.
    • Note: There are race conditions we can’t avoid between SDK updates and Allocation or external updates.
  • SDK capability to change the capacity (local and backing CRD status value)
  • The ability to atomically add items to list on allocation
    • Attempts to add to a list that is at capacity, will silently fail, since all operations are asynchronous. If you need to ensure there is space for an append operation, check with filters and/or the SDK first.
    • If an attempt is made to append to a GameServer that does have the specified list (e.g. through an allocation), the operation is ignored.
  • The ability to change the capacity from an allocation
  • Lists are essentially sorted Set in the order of insertion, i.e. a List cannot contain more than one instance of a value. An attempt to insert a duplicate item into a List will result in a no-op.

Allocation filtering and sorting

  • Allocation filter on count value (min, max)
  • Allocation filter on count available capacity (min, max)
  • Allocation filter on list available capacity (min, max)
  • Allocation filter on if single value is contained in a list
  • Allocation sorting / preference by a count value/list length, ascending or descending.
    • Packed: Within the node.
    • Distributed: Across the entire set.

Fleets scheduling

  • Fleet scale down sorting by a count value/list length, ascending or descending.
    • Packed: Within the node.
    • Distributed: Across the entire set.
  • Fleet scale down sorting
    • Packed: Within the node.
    • Distributed: Across the entire set.

Metrics

  • Expose count values and list lengths as gauge metric, with a label for the key the count or list is set under.
  • Expose counts and list capacities as gauge metric, with a label for the key the count or list is set under.

Background

There have been a lot of discussions and issues about weighted allocation, being able to store “session room” counts to be used on allocation, and more (more on Slack as well), sorting on Fleet scale down.

We’ve also always had a desire to be able to set some level of metrics through Agones from a GameServer as well.

Design ideas

Configuration

GameServers

Being able to set arbitrary counts and lists on a GameServer instance.

apiVersion: "agones.dev/v1"
kind: GameServer
metadata:
  generateName: "simple-game-server-"
spec:
  ports:
    - name: default
      portPolicy: Dynamic
      containerPort: 7654
  template:
    spec:
      containers:
        - name: simple-game-server
          image: gcr.io/agones-images/simple-game-server:0.13
  counters: # list of counters. Key value below is the key for each counter.
    rooms: # key for the counter (room)
      default: 1 # initial value
      capacity: 100 # maximum possible count value
  lists: # list of lists.
    players: # key for this list (players)
      capacity: 100 # maximum number of items in a list
    frogs: # key for another list (frogs), with the default 1000 item capacity

GameServer Status

This is where current count and list value and capacity are stored against the CRD. The values in the spec do not change once they have been initially declared.

status:
  # .. usual status values
  counters: # count values
    rooms: 4 # Current count for "room" key
    capacity: 100 # maximum value for "room" key
  lists: # list values
    players: # values for key "players"
      capacity: 100 # the current capacity as it has been set.
      values: # list of values set against this list
        - xe9m
        - 9iuz
    frogs: # values for key "frogs"
      values:
        - blue
        - green
        - orange

Fleets

apiVersion: "agones.dev/v1"
kind: Fleet
metadata:
 name: simple-game-server
spec:
 replicas: 2
 priorities: # which gameservers in the Fleet are most important to keep around - impacts scale down logic
   - type: count # whether a count or a list. List uses the length as the value, count the current count value.
     key: room # The key to grab data from. If not found on the GameServer, those GameServer with the key will have priority over those that do not.
     order: ascending # default is "ascending" so bigger number is better. "descending" would be "smaller number is better".
 template:
   spec:
     ports:
       - name: default
         containerPort: 7654
     template:
       spec:
         containers:
           - name: simple-game-server
             image: gcr.io/agones-images/simple-game-server:0.13
     counters: # list of counters. Key value below is the key for each counter.
       rooms: # key for counter (room)
         default: 1
         capacity: 100
     lists: # list of lists.
       players: # key for this list (players)
         capacity: 100 # set capacity
       frogs: # key for another list (frogs), with the default 1000 item capacity

Status

status:
 # ... usual fleet status values
 counters: # aggregate counter values
   rooms:
     total: 43 # total of count values for key "rooms"
     capacity: 100 # total capacity count in all GameServers across the fleet "rooms" key
 lists: # aggregate list values
   players:
     count: 58 # total number of list items in all GameServers across the Fleet under "player" key
     capacity: 200 # total capacity count in all GameServers across the Fleet "player" key
   frogs:
     count: 12
     capacity: 88

FleetAutoscaling

Count based autoscaling

apiVersion: "autoscaling.agones.dev/v1"
kind: FleetAutoscaler
metadata:
 name: fleet-autoscaler-count
spec:
 fleetName: fleet-example
 policy:
   type: Count # count based autoscaling
   count:
     # The key for the count value.
     key: rooms
     # Size of a buffer of counted items that are available in the Fleet.
     # it can be specified either in absolute (i.e. 5) or percentage format (i.e. 5%)
     bufferCount: 5
     # minimum aggregate count capacity that can be provided by this FleetAutoscaler.
     # if not specified, the actual minimum capacity will be bufferCount
     minCount: 10
     # maximum aggregate count capacity that can be provided by this FleetAutoscaler.
     # required
     maxCount: 100

List based autoscaling

apiVersion: "autoscaling.agones.dev/v1"
kind: FleetAutoscaler
metadata:
  name: fleet-autoscaler-list
spec:
  fleetName: fleet-example
  policy:
    type: List # List based autoscaling.
    count:
      # The key for the count value.
      key: players
      # Size of a buffer based on the list capacity that is available over the current aggregate list length in the Fleet.
      # It can be specified either in absolute (i.e. 5) or percentage format (i.e. 5%)
      bufferLength: 5
      # minimum aggregate list capacity that can be provided by this FleetAutoscaler.
      # if not specified, the actual minimum capacity will be bufferLength
      minLength: 10
      # maximum aggregate list capacity that can be provided by this FleetAutoscaler.
      # required
      maxLength: 100

Allocations

kind: GameServerAllocation
spec:
 # Which gameservers in the selector set is most important to keep around - impacts which GameServer is checked first.
 # First item on the array of priorities is the most important for sorting.
 priorities:
   - type: count # whether a count or a list. List uses the length as the value, count the current count value.
     key: room # The key to grab data from. If not found on the GameServer, has no impact.
     order: ascending # default is "ascending" so bigger number is better. "descending" would be "smaller number is better".
 selectors:
   - matchLabels:
         agones.dev/fleet: simple-game-server
     counters: # filter on counter min and max values
       rooms: # use "room" key values
         min: 4 # filters on count values (optional, defaults to 0)
         max: 20 # (optional, defaults to max int)
         minAvailable: 0 # filters on the capacity left on a GameServer (optional, defaults to 0)
         maxAvailable: 99 # (optional, defaults to max int)
     lists: # filter on lists
       players:
         minAvailable: 0 # filters on the capacity left on a GameServer
         maxAvailable: 99
       frogs:
         contains: orange # filter on if this value is found in the list.
 counters: # apply an action to a counter
   rooms:
     action: increment # "increment" or "decrement" a count.
     amount: 1 # how much by. defaults to 1.
 lists: # apply an action to a list.
   players:
     append: # (optional) append these values to the list
       - x7un
       - 8inz
     capacity: 40 # (optional) change the capacity of the GameServer to this value.

SDK

The SDK will batch operations every 1 second for performance reasons, but changes made through the SDK will be atomically accurate through the SDK. Changes made through Allocation or the Kubernetes API will be eventually consistent when coming back to the SDK.

Question: In PlayerTracking, we told users to either use the K8s API or use the SDK commands. Can we do that here? Should we do that here? I’d like to avoid it with the strategy written above.

Counter

All functions will error if the key was not predefined in the GameServer resource on creation.

Alpha().CountGet(key): integer

Returns the current count under the provided key.

Alpha().CountIncrement(key, amount): boolean

Increment a counter by a given amount. Will max at max(int64).

Will execute the increment operation against the current CRD value.

Returns false if the count is at the current capacity (to the latest knowledge of the SDK), and no increment will occur.

Note: A potential race condition here is that if count values are set from both the SDK and through the K8s API (Allocation or otherwise), since the SDK append operation back to the CRD value is batched asynchronous any value incremented past the capacity will get silently truncated.

Alpha().CountDecrement(key, amount): boolean

Decrements the current count by the provided amount. Will not go below 0.

Will execute the decrement operation against the current CRD value.

Returns false if the count is at 0 (to the latest knowledge of the SDK), and no decrement will occur.

Alpha().CountSet(key, amount)

Sets a count at a given value. Use with care, as this will overwrite any previous invocations’ value.

Alpha().CountSetCapacity(key, capacity)

Update the capacity for a given count. A capacity of 0 is no capacity.

Alpha().CountGetCapacity(key): integer

Get the current capacity for this specific count.

Lists

All functions will error if the key was not predefined in the GameServer resource on creation.

Alpha().ListAppend(key, value): boolean

Appends the provided value to the list. If the list is already at capacity, it will return an error.

Will retrieve the current CRD value before executing the append operation.

Returns false, if the value already exists in the list, or if the list is already at capacity (to the latest knowledge of the SDK).

Note: A potential race condition here is that of list values are set from both the SDK and through the K8s API (Allocation or otherwise), since the SDK append operation back to the CRD value is batched asynchronous any value appended past the capacity will get silently truncated.

Alpha().ListDelete(key, value): boolean

Delete the specified value from the list.

Returns false if the value is not found in the list (to the latest knowledge of the SDK),

Alpha().ListSetCapacity(key, capacity)

Update the capacity for a given list. Capacity must be between 1 and 1000.

Alpha().ListGetCapacity(key): integer

Get the current capacity for this specific list.

Alpha().ListContains(key, value): boolean

Returns true if the given list contains a provided value.

Alpha().ListLength(key, value): integer

Returns the current length of the given list.

Alpha().ListGet(key): []string

Returns the contents of the given list.

Metrics

Metrics should be exported, using the key that the metric is stored under as a label on the metrics, in aggregate across all GameServers, giving us the ability to export basic numeric values as gauge metrics.

The Fleet name as a label attached to each metric.

Counters

Total of all counters on all GameServers, by key

agones_gameservers_counter_total[key=${key}]

Total count capacity of all GameServers, by key

agones_gameservers_counter_capacity_total[key=${key}]

Lists

Total number of items in each list, by key of all GameServers

agones_gameservers_list_length_total[key=${key}]

Total list capacity of all GameServers, by key

agones_gameservers_list_capacity_total[key=${key}]

Dashboards

Since we are using labels, we can create some generic dashboards with dropdowns for each fleet, and names for counts and lists.

Critical User Journeys

Some high level summaries for some user journeys that could be utilised with this new functionality.

Player Tracking

Player tracking could be implemented in essentially the same way that is possible now, but we could also take an approach that could reserve player connections at allocation time.

An end user could now add a player at allocation time to the GameServer, blocking that space for the player. A gameserver binary could watch for that addition, then wait a determined amount of time before removing it from a “players” list if that player has not yet connected.

For example:

kind: GameServerAllocation
spec:
 selectors:
   - matchLabels:
         agones.dev/fleet: simple-game-server
     lists: # filter on lists
       players:
         minAvailable: 0 # filters on the capacity left on a GameServer
         maxAvailable: 99
     gameServerState: Allocated
   - matchLabels:
       agones.dev/fleet: simple-game-server
     gameServerState: Ready
 lists: # apply an action to a list.
   players:
     append: # (optional) append these values to the list
       - x7un

Room based High Density Game Servers

This could now be handled as an integer value as a count, or as a list with individual room ids.

A count based Allocation could look something like:

kind: GameServerAllocation
spec:
 priorities: # which gameservers in the selector set is most important to keep around - impacts which GameServer is checked first.
   - type: count # whether a count or a list. List uses the length as the value, count the current count value.
     key: room # The key to grab data from. If not found on the GameServer, has no impact.
     order: ascending # default is "ascending" so bigger number is better. "descending" would be "smaller number is better".
 selectors:
   - matchLabels:
         agones.dev/fleet: simple-game-server
     counters: # filter on counter min and max values
       rooms: # one room available, against capacity
         minAvailable: 1
         maxAvailable: 1
     gameServerState: Allocated
   - matchLabels:
       agones.dev/fleet: simple-game-server
     gameServerState: Ready
 counters: # apply an action to a counter
   rooms:
     action: increment # "increment" or "decrement" a count.
     amount: 1 # how much by. defaults to 1.

This would prioritise allocation to server that have more rooms currently running, and increment the value of the room count at allocation time, which could be picked up on by SDK.WatchGameServer()

A list based Allocation could look something like:

kind: GameServerAllocation
spec:
 priorities: # which gameservers in the selector set is most important to keep around - impacts which GameServer is checked first.
   - type: list # whether a count or a list. List uses the length as the value, count the current count value.
     key: room # The key to grab data from. If not found on the GameServer, has no impact.
     order: ascending # default is "ascending" so bigger number is better. "descending" would be "smaller number is better".
 selectors:
   - matchLabels:
         agones.dev/fleet: simple-game-server
     lists: # filter on lists
       rooms:
         minAvailable: 1 # 1 room available, please
         maxAvailable: 1
     gameServerState: Allocated
   - matchLabels:
       agones.dev/fleet: simple-game-server
     gameServerState: Ready
 lists: # apply an action to a list.
   rooms:
     append: # (optional) append these values to the list
       - x7un

If you then wanted to allocate to a the GameServer with the specific Room session, you could do the following:

kind: GameServerAllocation
spec:
 selectors:
   - matchLabels:
         agones.dev/fleet: simple-game-server
     lists: # filter on lists
       rooms:
         contains: x7un # filter on if this value is found in the list.
     gameServerState: Allocated
   - matchLabels:
       agones.dev/fleet: simple-game-server
     gameServerState: Ready

Note: An end user could still use the “label locking” method for high density game servers as well / still. This just provides another way to solve the same problem that may be more applicable for some use cases.

Game Specific Weight allocation

With this new functionality, if you wanted to prioritise Allocation based on how many blueberries were available in your game server (or any arbitrary thing) , you could now do this as well. I’ve had conversations with people on how to preferentially “Allocate to the most interesting GameServer” - this would allow you to do exactly that, through an arbitrary counter tracking at the GameServer level.

For example:

kind: GameServerAllocation
spec:
 priorities: # which gameservers in the selector set is most important to keep around - impacts which GameServer is checked first.
   - type: count # whether a count or a list. List uses the length as the value, count the current count value.
     key: blueberries # The key to grab data from. If not found on the GameServer, has no impact.
     order: ascending # more blueberries, is better
 selectors:
   - matchLabels:
         agones.dev/fleet: simple-game-server
     gameServerState: Allocated
   - matchLabels:
       agones.dev/fleet: simple-game-server
     gameServerState: Ready

The blueberries key would then be incremental and decremental with Alpha().CountIncrement(key, amount) and Alpha().CountDecrement(key, amount) as necessary from within the game server binary as needed.

Alternatives considered

We could continue having specific integrations for each specific use case -- much like we did for player tracking. Personally, this is what often dissuaded me from adding more specific solutions to specific problems in many of the tickets above -- their specificity. i.e. “This solution works for this specific problem”. I personally prefer more generic solutions that can power a wide multitude of solutions. I genuinely believe that Agones’ power comes from its configurability and flexibility. That tradeoff does come with a higher cost for integration and greater overall complexity of the stack, but I don’t think the project would be as successful as it is without that flexibility.

I think the difference in player tracking was that it felt generic “enough” across use cases that it made sense. But I think this new approach is even more generic in its approach, and allows for a much wider set of use cases (probably ones we haven’t thought of yet), without need to build out yet another CRD and SDK implementation, and without sacrificing capability (in fact I think it adds capability). Which is also why I’m quite excited about it.


Work Items

List of individual work items on this design, so it doesn't seem so overwhelming 😃

API Surfaces

This is not implementation, this is creating placeholders for data, CRD structures, proto API definitions, and stubs for SDK methods.

  • Feature Flag creation
  • CRD Updates
    • GameServer CRD updates
    • GameServerSet CRD updates
    • Fleet CRD updates
    • FleetAutoscaling CRD updates
    • GameServerAllocation CRD Updates
  • .proto updates
    • Allocation .proto updates
    • Alpha SDK .proto updates and stub methods on SdkServer

Implementation

Building functionality on top of the API surfaces that have been
built out above.

  • Defaults
    • Defaults for counts on GameServerSpec
    • Defaults for lists on GameServerSpec
    • Population of GameServer -> Status on creation
  • Validation
    • Validation or counts on GameServerSpec
    • Validation for lists on GameServerSpec
  • Fleets
    • Fleet status aggregate values (also with GameServerSet)
    • Fleet scale down prioritisation
  • Autoscaling
    • FleetAutoscaling based on a count
    • FleetAutoscalong based on a list
  • GameServerAllocation
    • Conversion from .proto allocation to a GameServerAllocation
    • GameServer selection prioritisation
    • Allocation filtering on counts
    • Allocation filteirng on lists
    • Allocation actions on counts (increment / decrement)
    • Allocation actions on lists (append)
    • Allocation change capacity on counts
    • Allocation change capacity on lists
  • SDK Implementation
    • Write Go SDK stubs for Count functions
    • Write Go SDK stubs for List functions
    • Update Go simpple-game-server to have commands for Count and List SDK methods
    • Implement Count functions in SDKServer, and write e2e tests
    • Implement List functions in SDKServer, and write e2e tests
    • Write SDK conformance tests for Go SDK
  • Metrics
    • Expose metrics
    • (Optional) Create a generic dashboard based on the labels we use with our metrics.
  • Other language SDKs
    • Rust SDK implementation and conformance tests
    • C# SDK implementation and conformance tests
    • node.js SDK implementation and conformance tests
    • REST conformance tests
    • CPP implementation and conformance tests
    • Unity implementation and conformance tests
    • Unreal implementation and conformance tests #3651
@markmandel markmandel added kind/feature New features for Agones kind/design Proposal discussing new features / fixes and how they should be implemented labels Aug 24, 2022
@markmandel
Copy link
Member Author

Calling on people I think might find this interesting, since this is a big idea 😄 : @tenevdev , @highlyunavailable , @neuecc , @castaneai , @sisso , @issotina , @foxydevloper

@markmandel markmandel changed the title GameServer: Arbitrary Counts and Lists Arbitrary Counts and Lists for GameServers, SDKs and Allocation Aug 25, 2022
@roberthbailey
Copy link
Member

This is on the agenda for the community meeting tomorrow so if you have opinions / want to discuss with real time feedback we would love to see you there.

@markmandel
Copy link
Member Author

I just realised, I didn't add a section on Fleet Autoscaling! I'll amend that shortly.

@katsew
Copy link
Contributor

katsew commented Aug 26, 2022

We are looking to implement Room based High Density Game Servers
and would like to provide feedback related to this.

About Counters

Question: What if you attempt to decrement below 0? Should it silently fail at 0, or should it filter out GameServers at 0?

If it attempts to decrement the counter below 0, we would like it to filter out GameServers.
In particular, in our case, we may want it to be filtered based on whether it has capacity or not.
In other words, if it is smaller than the min or larger than the max of the counter, we would like the game server to be filtered.
This is the same idea as the lists minAvailable and maxAvailable.

Question: What if an allocation attempts to increment a value to a GameServer that doesn’t have the counter? Should it automatically Should it automatically filter to GameServers that have a counter with the provided key?

We would like to filter GameServers that have a counter with the provided key.
However, it is also strange to filter by something that is not in the selectors, so if we want to add a counters field, we may want to validate it so that the same field is also included in the selectors.

About Lists

Question: What if you attempt to add a list that is at capacity? Does it silently fail, does it filter out any GameServers that don’t have room? Something else?

We would like it to filter out any GameServers that don’t have room.
In our case, we would like to be able to filter out game servers based on whether or not they are at capacity at the time of allocation, so that we can manage room capacity for the same game server with a high degree of accuracy.

Question: What if an allocation attempts to append a value to a GameServer that doesn’t have the list? Should it automatically filter to Should it automatically filter to GameServers that have a list with the provided key?

Same as counters, we want it to filter by the GameServer that has the key.

About SDK

Question: In PlayerTracking, we told users to either use the K8s API or use the SDK commands. Can we do that here? Should we do that here? I’d like to avoid it with the strategy written above.

I didn’t understand this question, could you please elaborate?

About Critical User Journeys

In the Room based High Density Game Servers example, the StateAllocationFilter is not used.
I think this implies that if we use counters or lists as the selector, the search will also include GameServers in the Allocated state.
However, since the advantage of this feature is its high flexibility, I felt that it would be better not to infer the state of GameServer just because counters and lists are used, so that the flexibility would not be sacrificed.

@markmandel
Copy link
Member Author

Just dropped several edits to remove some questions based on the above and internal feedback. PTAL.

Summary:

  • General:
    • Make note that this isn't a metrics service, but the usage of metrics is a byproduct of the functionality.
    • Little clarification tweaks across the document.
  • Counters
    • Made a decision on what to do with counters that overflow below 0, or greater than max(in64)
    • Also made a decision that the user must be explicit on filtering for room to increment or decrement a counter
  • Allocations
    • Fixed that I totally missed gameServerState in the allocations. Reworked the examples.

To see a diff, use the edit history button:

image

@katsew to respond to your questions directly:

If it attempts to decrement the counter below 0, we would like it to filter out GameServers.
We would like to filter GameServers that have a counter with the provided key.

As per above, you would need to explicitly tell the allocation with min and/or max counter filter values on the allocation `selectors.

However, it is also strange to filter by something that is not in the selectors, so if we want to add a counters field, we may want to validate it so that the same field is also included in the selectors.

I'm thinking that there might be an implicit filtering there (i.e. if you attempt to allocate on counter "foo" and it doesn't exist on the GameServer, the system would attempt to increment, fail, and then move on to another GameServer (or it may be smart enough to pre-check). This seems reasonable. If an allocation can't perform a list/counter action on a GameServer, then the GameServer can't be moved to Allocated - hence it would get skipped. I'll write some words to this effect.

We would like it to filter out any GameServers that don’t have room.

You can choose if you want to use a list with a capacity, or just have a counter that lists how many rooms are left. It's up to you.

Same as counters, we want it to filter by the GameServer that has the key.

In that case, a list of room id tokens with a capacity seems like the appropriate choice for your use case, since you can filter on a room id within allocated game servers with this functionality.

Question: In PlayerTracking, we told users to either use the K8s API or use the SDK commands. Can we do that here? Should we do that here? I’d like to avoid it with the strategy written above.

Ah - this is an interesting point if you aren't aware of how the internal of K8s works.

Essentially everything in k8s is eventually consistent, and therefore so is Agones. It allows the entire system to be self healing even if the control plane goes down for a time.

So SDK commands are async (they go into a queue once the SDK command has been fired), and at the same time with this functionality, it's entirely possible for an Allocation or a K8s API command to change a list or counter value at the same time - so it's entirely likely that if people are doing both the count/list values in an SDK will be out of sync with what's in a CRD, and vice versa - because, eventually consistent.

For Player Tracking, we told people "pick one path, so you don't have this issue". Here we are giving people lots of different options, and we'll need to be very explicit about what each of the tradeoffs are so that unexpected issues don't arise for end users.

Did that make a certain amount of sense?

In the Room based High Density Game Servers example, the StateAllocationFilter is not used.

OMG. I totally missed that I didn't add those.

That functionality definitely would still work, I just wasn't thinking and forgot to add it 🤦🏻 thanks for the excellent catch!

Please let me know if any of that didn't make any sense.

@katsew
Copy link
Contributor

katsew commented Aug 29, 2022

For Player Tracking, we told people "pick one path, so you don't have this issue". Here we are giving people lots of different options, and we'll need to be very explicit about what each of the tradeoffs are so that unexpected issues don't arise for end users.
Did that make a certain amount of sense?

Yes, totally made sense, thank you.
This is what I really must care about within the implementation, so It's nice to have notes about it 😌

@markmandel
Copy link
Member Author

Was chatting with @roberthbailey , and he raised an interesting point.

In Player Tracking, Lists where essentially treated as Sets (i.e. every value was unique in the List). If you add a playerId that had previously been added to the set, it was treated as a no-op.

Do we do that here are well with Lists (maybe rename them to Sets?), or since we're aiming for a more generic implementation, do we allow duplicate values in a List?

🤔 or do we need add a setting to a List, something like unique: true to basically turn it into a set, and therefore it only maintains unique values.

What do people think?

@roberthbailey
Copy link
Member

I hadn't thought about the fact that we treated lists of players as having to be unique, but I guess we didn't expect to have the same player join a game session twice.

If we are making the lists more generic, we should think whether there are scenarios where having duplicate values makes sense. Also, what do duplicate values means in terms of allocation requests? Presumably you would always end up checking for at least one occurrence of the string in the list.

@castaneai
Copy link
Collaborator

@markmandel Hi, thanks for the mentions. This is an interesting proposal!
I have a couple of suggestions and concerns.

  1. the status code of the Allocator should be subdivided due to the complexity of the conditions of the allocation filter. It would be even better if the reason for failure is also exposed.
  2. As @katsew said, the issue of eventual consistency is very complex and difficult. The documentation needs to address this issue in detail. If possible, it might be a good idea to publish a demo project as an example implementation.
    Also, game developers are generally not familiar with the internals of k8s. Therefore, rather than using etcd on k8s, they may choose to implement their own game server management using an external RDB with strong consistency. We would like to make it clear in the documentation that such other choices are also available.

I am very happy to see Agones add more features to address even more use cases. Thank you!

@roberthbailey
Copy link
Member

Therefore, rather than using etcd on k8s, they may choose to implement their own game server management using an external RDB with strong consistency.

I'm not sure if it should be in scope for this proposal, but it would be really interesting to see what that looks like - which parts are owned by Agones and which parts are split out. It might help us design a better solution that more seamlessly integrates with a solution that leverages an external RDB.

@katsew
Copy link
Contributor

katsew commented Sep 1, 2022

Was chatting with @roberthbailey , and he raised an interesting point.

In Player Tracking, Lists where essentially treated as Sets (i.e. every value was unique in the List). If you add a playerId that had previously been added to the set, it was treated as a no-op.

Do we do that here are well with Lists (maybe rename them to Sets?), or since we're aiming for a more generic implementation, do we allow duplicate values in a List?

🤔 or do we need add a setting to a List, something like unique: true to basically turn it into a set, and therefore it only maintains unique values.

What do people think?

I think we should go with lists (allow duplicate values) to cover more scenarios than with sets.
However, our case needs to maintain unique values like Player Tracking, so it's nice to have an option to do so.
Or, maybe publishing an example implementation to deduplicate items in the list is enough.

Related to this, I'm wondering what is the expected behavior of Alpha().ListDelete(key, value).
Is this delete all value for the key or delete the first value found in the list?

@markmandel
Copy link
Member Author

Related to this, I'm wondering what is the expected behavior of Alpha().ListDelete(key, value).
Is this delete all value for the key or delete the first value found in the list?

Oooh, that's a good question also. I think for lists, it would have to be a single value.

We could implement a Alpha().ListDelete(key, value, [one|all]) kind of options in the SDK (defaulting to "one", depending on language etc). May not need to do this for the initial release - may be better to wait and see how people want to use it? 🤔

@katsew
Copy link
Contributor

katsew commented Sep 13, 2022

We could implement a Alpha().ListDelete(key, value, [one|all]) kind of options in the SDK (defaulting to "one", depending on language etc). May not need to do this for the initial release - may be better to wait and see how people want to use it? 🤔

In our case, we don't want to duplicate values in the list, so it would be helpful for us to have a document how to deduplicate values in the list at the initial release.
More to say, to collect feedbacks from potential users, it's nice to have a document how to migrate from Player Tracking or an example for Player Tracking users since Player Tracking doesn't expect a duplicated value.

@markmandel
Copy link
Member Author

More to say, to collect feedbacks from potential users, it's nice to have a document how to migrate from Player Tracking or an example for Player Tracking users since Player Tracking doesn't expect a duplicated value.

Yeah, that 100% makes sense - we need to make sure there is a migration path.

I'm leaning towards:

  lists: # list of lists.
    players: # key for this list (players)
      capacity: 100 # set capacity
      unique: true # this makes it work like a set

So have the unique option (maybe defaults to false? Since it's called a "list") so that you can make it work like player tracking.

How does that sound?

@katsew
Copy link
Contributor

katsew commented Sep 14, 2022

The unique option sounds good to me.

I have several thoughts:

  • The unique option should not be editable after creating GameServer, since users know what that list for and no reason to change the option after defining GameServer.
  • The SDK may return boolean value when users call Alpha().ListAppend(key, value) and Alpha().ListDelete(key, value) like Alpha().PlayerConnect(playerID) and Alpha().PlayerDisconnect(playerID) do, to cover the Player Tracking use case? 🤔
    • I think we need more feedback on this

@markmandel
Copy link
Member Author

Thanks for all the great discussion! Sorry I dropped off for a bit, been focusing on another open source project for a bit.

Discussing things in the community meeting - we were discussing only allowing unique items in the List (so basically an ordered set), if we couldn't come up with a use case for having multiple of the same item in a list, to avoid implementing features that we didn't need.

Can anyone come up with a use case? If not, maybe we just drop the ability to store duplicate values in a list (should we rename it to a sortedSet / orderedSet ?

The unique option should not be editable after creating GameServer, since users know what that list for and no reason to change the option after defining GameServer.

If we do go this route, totally agreed (see comment above).

The SDK may return boolean value when users call Alpha().ListAppend(key, value) and Alpha().ListDelete(key, value) like Alpha().PlayerConnect(playerID) and Alpha().PlayerDisconnect(playerID) do, to cover the Player Tracking use case? thinking

We can definitely do this, I had left them off since there was lots of eventual-consistency management , but we can definitely do it with whatever the SDK knows about at that point and time from itself and/or what it's current information is on the CRD. I can't see any huge downside to adding thins, so I'll put this on my list of things to add back in.

@katsew
Copy link
Contributor

katsew commented Oct 3, 2022

Can anyone come up with a use case? If not, maybe we just drop the ability to store duplicate values in a list (should we rename it to a sortedSet / orderedSet ?

I don't have a use case for lists, so dropping the ability to store duplicate values sounds good to me. :)
I think it's better to rename to a sortedSet / orderedSet to fit to actual behavior.

@katsew
Copy link
Contributor

katsew commented Oct 3, 2022

I have a question about data manipulation on allocation.

The ability to atomically increment or decrement counts on allocation
The ability to atomically add items to list on allocation

I have a plan to use multiple Allocator Service in a single k8s cluster for redundancy reason.
When I do this, are there any race condition with data manipulation on allocation?
For instance, having duplicating values in the list (even if we use sortedSet) or increment counter too much?

@markmandel
Copy link
Member Author

K8s resource modifications are generationally locked - so unless the local system has the latest generation of a resource, any update is rejected, which avoids "last-update-wins" race conditions.

You could in theory select the same GameServer in succession if after an Allocation it still matches the search criteria for Allocated GameServers - but that's an exercise for the developer to find the appropriate level of locking for their game.

@markmandel
Copy link
Member Author

Sorry for the long delay - was focusing on working towards a Quilkin release, and the addition of capacity to counts was tricky.

But we have updates! Would love your feedback!

Summary of updates to the design above:

Working through changes for autoscaling implemented a few changes to the design:

Summary of changes:

  • Counters can now have a capacity (maximum), this allows for autoscaling on counters, because we can calculate the difference between the capacity and the current amount, like we do for lists.
  • Lists can only take a maximum of 1000 items to prevent overuse.
  • Added note that lists can only contain unique value (insertion ordered sets)
  • Determined that if you go over capacity, it’s essentially a silent no-op. If you need to ensure capacity is in place / the key exists, you need to explicitly filter for it.
  • Added sections on FleetAutoscaling for counts and list
  • Added back in return values from the SDK, based on the SDKs current knowledge of the GameServer counts and lists.
  • Updates to CUJs.

Questions:

  • We’ve used “capacity” across both lists and counters for consistency. Is this easy to understand?

@katsew
Copy link
Contributor

katsew commented Oct 21, 2022

Thanks for updates!

We’ve used “capacity” across both lists and counters for consistency. Is this easy to understand?

Using "capacity" across lists and counts sounds good to me :)

Added sections on FleetAutoscaling for counts and list

I'm not sure but should we support multiple counts and lists for GameServer? 🤔
Because FleetAutoscaler does not support multiple counters and lists for autoscaling, it sounds like we don't need to support multiple counts and lists in GameServer Resource.
What do you think?

@markmandel
Copy link
Member Author

I'm not sure but should we support multiple counts and lists for GameServer? thinking

That's an interesting question.

My thought was, while it may not be used for autoscaling, it may be used for allocation filtering.

It's a bit clunky, but say you want to track Rooms and players per room - you might have something like:

apiVersion: "agones.dev/v1"
kind: GameServer
metadata:
  generateName: "simple-game-server-"
spec:
  ports:
    - name: default
      portPolicy: Dynamic
      containerPort: 7654
  template:
    spec:
      containers:
        - name: simple-game-server
          image: gcr.io/agones-images/simple-game-server:0.13
  counters:
    rooms:
      default: 0
      capacity: 4 
  lists:
    players_1:
    players_2:
    players_3:
    players_4:

That probably doesn't scale if you have 1000 rooms (at which point, go use a DB), but this works in a pinch.

So you would autoscale on rooms, but you may filter on the player lists. (or actually, one big player list across all rooms would probably be better for this scenario. But you get the drift).

WDYT?

@katsew
Copy link
Contributor

katsew commented Oct 24, 2022

Ah, that's true.
It's possible that the metrics used for GameServer Allocation and the metrics used for AutoScale are different.
Makes sense to me now. Thank you.

@markmandel
Copy link
Member Author

Thanks for the feedback and working through it with me!

Copy link

'This issue is marked as Stale due to inactivity for more than 30 days. To avoid being marked as 'stale' please add 'awaiting-maintainer' label or add a comment. Thank you for your contributions '

@github-actions github-actions bot added the stale Pending closure unless there is a strong objection. label Jan 15, 2024
@roberthbailey roberthbailey removed the stale Pending closure unless there is a strong objection. label Jan 16, 2024
@roberthbailey
Copy link
Member

This issue is still being worked on.

@markmandel markmandel added the awaiting-maintainer Block issues from being stale/obsolete/closed label Jan 16, 2024
markmandel added a commit to markmandel/agones that referenced this issue Jan 17, 2024
Much like the player tracking SDK documentation, I wanted to be explicit
in the SDK documentation for Counts and Lists where data was being
stored, and where default values where coming from with links back to
the generated API reference - so made some improvements to help
facilitate that.

This does still need a Counters and Lists landing page to reference
(this is my next task) as well, but wanted to get this done while I was
thinking about it.

Work on googleforgames#2716
markmandel added a commit to markmandel/agones that referenced this issue Jan 24, 2024
Much like the player tracking SDK documentation, I wanted to be explicit
in the SDK documentation for Counts and Lists where data was being
stored, and where default values where coming from with links back to
the generated API reference - so made some improvements to help
facilitate that.

This does still need a Counters and Lists landing page to reference
(this is my next task) as well, but wanted to get this done while I was
thinking about it.

Work on googleforgames#2716
markmandel added a commit that referenced this issue Jan 24, 2024
* Counts and Lists: Improvements to SDK docs

Much like the player tracking SDK documentation, I wanted to be explicit
in the SDK documentation for Counts and Lists where data was being
stored, and where default values where coming from with links back to
the generated API reference - so made some improvements to help
facilitate that.

This does still need a Counters and Lists landing page to reference
(this is my next task) as well, but wanted to get this done while I was
thinking about it.

Work on #2716

* Few improvements over original PR.

* Review updates.
markmandel added a commit to markmandel/agones that referenced this issue Feb 5, 2024
Updated fleetautoscaler.md with more general descriptions of each type
of autoscaling strategy.

Since we have 4 now, it seemed like it would be useful to provide some
use cases around each type of autoscaling and why you would choose one
over another.

Work on googleforgames#2716
roberthbailey pushed a commit that referenced this issue Feb 6, 2024
More description on fleetautoscaler.md

Updated fleetautoscaler.md with more general descriptions of each type
of autoscaling strategy.

Since we have 4 now, it seemed like it would be useful to provide some
use cases around each type of autoscaling and why you would choose one
over another.

Work on #2716
markmandel added a commit to markmandel/agones that referenced this issue Feb 14, 2024
The primary detail of this PR is to implement a Guide > Counters and
Lists documentation landing page to give end users documentation on how
to use all the variety of touch points of Counters and Lists.

This does sprawl out a little, as in part of this process, this also
touched on:
* Links and warnings from other pages that should link to this landing
page.
* Found a bunch of minor fixes that needed doing, with both
documentation and example bugs and updates just for consistency.
* Implemented some small changes in terminology (primarily total
capacity -> available capacity), which aligns the implementations and
the documentation.
* Fixes and updates to CRD and Go data structure documentation that
goes along with the above.

Work on googleforgames#2716
markmandel added a commit to markmandel/agones that referenced this issue Feb 14, 2024
The primary detail of this PR is to implement a Guide > Counters and
Lists documentation landing page to give end users documentation on how
to use all the variety of touch points of Counters and Lists.

This does sprawl out a little, as in part of this process, this also
touched on:
* Links and warnings from other pages that should link to this landing
page.
* Found a bunch of minor fixes that needed doing, with both
documentation and example bugs and updates just for consistency.
* Implemented some small changes in terminology (primarily total
capacity -> available capacity), which aligns the implementations and
the documentation.
* Fixes and updates to CRD and Go data structure documentation that
goes along with the above.
* Found some example content that was missing.

Work on googleforgames#2716
@markmandel
Copy link
Member Author

Just as an audit trail - ticking boxes on items that have split out issues that will be tracked separately, so we can close out this issue as the bulk of the work is complete.

markmandel added a commit to markmandel/agones that referenced this issue Feb 14, 2024
The primary detail of this PR is to implement a Guide > Counters and
Lists documentation landing page to give end users documentation on how
to use all the variety of touch points of Counters and Lists.

This does sprawl out a little, as in part of this process, this also
touched on:
* Links and warnings from other pages that should link to this landing
page.
* Found a bunch of minor fixes that needed doing, with both
documentation and example bugs and updates just for consistency.
* Implemented some small changes in terminology (primarily total
capacity -> available capacity), which aligns the implementations and
the documentation.
* Fixes and updates to CRD and Go data structure documentation that
goes along with the above.
* Found some example content that was missing.

Work on googleforgames#2716
@markmandel
Copy link
Member Author

@igooch WDYT, shall we close this ticket?

markmandel added a commit to markmandel/agones that referenced this issue Feb 14, 2024
The primary detail of this PR is to implement a Guide > Counters and
Lists documentation landing page to give end users documentation on how
to use all the variety of touch points of Counters and Lists.

This does sprawl out a little, as in part of this process, this also
touched on:
* Links and warnings from other pages that should link to this landing
page.
* Found a bunch of minor fixes that needed doing, with both
documentation and example bugs and updates just for consistency.
* Implemented some small changes in terminology (primarily total
capacity -> available capacity), which aligns the implementations and
the documentation.
* Fixes and updates to CRD and Go data structure documentation that
goes along with the above.
* Found some example content that was missing.

Work on googleforgames#2716
markmandel added a commit to markmandel/agones that referenced this issue Feb 20, 2024
The primary detail of this PR is to implement a Guide > Counters and
Lists documentation landing page to give end users documentation on how
to use all the variety of touch points of Counters and Lists.

This does sprawl out a little, as in part of this process, this also
touched on:
* Links and warnings from other pages that should link to this landing
page.
* Found a bunch of minor fixes that needed doing, with both
documentation and example bugs and updates just for consistency.
* Implemented some small changes in terminology (primarily total
capacity -> available capacity), which aligns the implementations and
the documentation.
* Fixes and updates to CRD and Go data structure documentation that
goes along with the above.
* Found some example content that was missing.

Work on googleforgames#2716
@igooch igooch closed this as completed Feb 20, 2024
markmandel added a commit to markmandel/agones that referenced this issue Mar 15, 2024
The primary detail of this PR is to implement a Guide > Counters and
Lists documentation landing page to give end users documentation on how
to use all the variety of touch points of Counters and Lists.

This does sprawl out a little, as in part of this process, this also
touched on:
* Links and warnings from other pages that should link to this landing
page.
* Found a bunch of minor fixes that needed doing, with both
documentation and example bugs and updates just for consistency.
* Implemented some small changes in terminology (primarily total
capacity -> available capacity), which aligns the implementations and
the documentation.
* Fixes and updates to CRD and Go data structure documentation that
goes along with the above.
* Found some example content that was missing.

Work on googleforgames#2716
markmandel added a commit to markmandel/agones that referenced this issue Mar 28, 2024
The primary detail of this PR is to implement a Guide > Counters and
Lists documentation landing page to give end users documentation on how
to use all the variety of touch points of Counters and Lists.

This does sprawl out a little, as in part of this process, this also
touched on:
* Links and warnings from other pages that should link to this landing
page.
* Found a bunch of minor fixes that needed doing, with both
documentation and example bugs and updates just for consistency.
* Implemented some small changes in terminology (primarily total
capacity -> available capacity), which aligns the implementations and
the documentation.
* Fixes and updates to CRD and Go data structure documentation that
goes along with the above.
* Found some example content that was missing.

Work on googleforgames#2716
markmandel added a commit that referenced this issue Mar 28, 2024
* Counters & Lists landing page and doc improvements

The primary detail of this PR is to implement a Guide > Counters and
Lists documentation landing page to give end users documentation on how
to use all the variety of touch points of Counters and Lists.

This does sprawl out a little, as in part of this process, this also
touched on:
* Links and warnings from other pages that should link to this landing
page.
* Found a bunch of minor fixes that needed doing, with both
documentation and example bugs and updates just for consistency.
* Implemented some small changes in terminology (primarily total
capacity -> available capacity), which aligns the implementations and
the documentation.
* Fixes and updates to CRD and Go data structure documentation that
goes along with the above.
* Found some example content that was missing.

Work on #2716

* Add in Fleet prioritisation section.

* Review updates.

* Add warning for Fleet priorities, until next release.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting-maintainer Block issues from being stale/obsolete/closed kind/design Proposal discussing new features / fixes and how they should be implemented kind/feature New features for Agones
Projects
None yet
Development

No branches or pull requests

6 participants