From d0bcb5228836cdf50be6c0995468bdaffe67bb4e Mon Sep 17 00:00:00 2001 From: "Hamdy H. Khader" Date: Fri, 27 Mar 2026 13:51:20 +0300 Subject: [PATCH 01/70] Main sfam 2359 (#935) * wip * wip 2 * implement snapshot replication * implement snapshot replication 2 * Fix env_var * Fix service * Fix service * Fix service * Fix service * Fix service 2 * Fix service 3 * Fix service 4 * Fix service 6 * Fix service 5 * Fix service 7 * Fix service 8 * Fix service 8 * wip * wip 2 * wip 3 * wip 3 * wip 4 * wip 4 * wip 5 * wip 7 * Fix lvol poller cpu mask * Fix target snap name * fix poller mask * fix poller mask * fix poller mask * fix chain * fix chain * fix chain * fix chain * fix chain * fix chain * Set cluster_id optional on SNodeAPI docker version * fix type checker * fix type checker * Fix snapshot replications tickets sfam-2497: add snapshot check sfam-2495: snapshot list --cluster-id sfam-2498: clone fail * Fix sfam-2496 * Follow up 1 * fix lvol replication_start * fix rep service * fix snapshot clone return value * replicate snapshot back to src _1 (#790) * replicate snapshot back to src _1 * fix linter * Fix sn list apiv2 response _2 * Fix sn list apiv2 response _3 * Add stats to spdk_http_proxy_server.py Prints max, avg and last 3 sec avg for read lines from http socket and rpc response receive from spdk * Add stats to spdk_http_proxy_server.py _2 * Fix 2 * Fix 2 * Fix 3 * Fix sfam-2524 Do not cancel snapshot replication task on node shutdown * Fix sfam-2523 Show task status to be canceled when replication task status is done and cancel flag is true * Fix sfam-2527 Fix snapshot chaining * Increase snapshot replication task retry on node not online * fix sfam-2516 _1 * fix sfam-2516 _2 * fix sfam-2516 _3 * fix linter * fix sfam-2516 _4 * wip * wip * wip 2 * wip 2 * Exclude src snap node id when starting replication on cloned lvol * fix snapshot replication source and target in case of replicate_to_source=True * Main sfam 2359 api (#844) * added api for snapshot replication * removed helm chart dep * fixed Remove assignment to unused variable * added replication_start and stop to api v2 (#845) * Enhance snapshot replication logic to support snapshot instances and streamline replication task handling * Add replication-trigger command to start replication for logical volumes * fix 1 * fix 1 * Fix 1 * fix typo * fix rep status return output * fix: handle missing replicate_as_snap_instance parameter * fix: use unique UUID for snapshot replication identifier * fix: improve replication duration calculation logic * feat: add replicate_lvol_on_target_cluster function and API endpoint * fix: change replicate_lvol endpoint from GET to POST * fix: set lvs_name for bdev_lvol in replication process * wip * adds lvol clone stack * fix: update sorting key for snapshots from creation_dt to created_at * feat: add configuration for MCP and implement device status reset functionality * set snapshot name when creating lvol no target cluster * return lvol on target if exists fix new lvol health check * fix lvol list * updated _ReplicationParams field (#847) * updated _ReplicationParams field * pool list return uuid intead of id * lvol list return uuid intead of id * lvol list return do_replicate * added service snapshot-replication * don't fails upon cr patch failure * added imagepullpolicy * removed csi configmap and secret from spdk-pod * update crs name * updated csi hostpath configuration * updated csi hostpath configuration * updated rpc_client logger message * updated env_var file * fixed snap param name created_at * updated snapshotreplications crd * reverted api v2 field to id from uuid * updated env_var * return new lvol connection string on lvol connect if cluster is suspended and lvol is replicated * feat: add endpoint to list replication tasks for a volume * updated endpoint and func list_replication_tasks * update endpoint list_replication_tasks to use instance_api * updated snapshot replication crd * feat: add suspend and resume commands for lvol subsystems * feat: add configuration settings and utility scripts for volume management * feat: add configuration settings, utility scripts, and endpoints for volume management * wip * Adds replicate_lvol_on_source_cluster apiv2 * fix 1 * Adds 'from_source' attr to lvol model * fix: update suspend and resume functions to return boolean values * fix: toggle 'from_source' attribute in lvol model during replication * return from_source from api * fix: update lvol UUID handling during replication process * fix: lvol delete on target * feat: add configuration and utility scripts for managing storage nodes and volumes * fix: update lvol attributes for cloning and set from_source flag * refactor replicate_lvol_on_source_cluster * fix issue * Revert "fix issue" This reverts commit 362935abde5b9c04612120f48e1d26bc09e83309. * Revert "refactor replicate_lvol_on_source_cluster" This reverts commit eb05502bcdc46c8549ce1844aead1da66b132d53. * fix issue * adds prints * add function to delete last snapshot if needed after replication * fix * fix 2 * fix: handle KeyError in lvol status change and enhance replication function with cluster validation * avoid Runtime error if lvol not found * feat: initialize new lvol with unique identifiers and updated naming conventions * fix: remove unnecessary sleep calls in lvol creation process * fix: filter out deleted lvols in get_lvols and update lvol deletion process * fix: update lvol retrieval method to use get_lvols instead of get_all_lvols * fix: return None instead of False when lvol is not found * fix: update replicate_lvol_on_source_cluster to include cluster ID * feat: add configuration for KiroAgent and implement lvol replication with pool UUID * fix: enhance lvol creation process with target cluster NQN and update source cluster handling * updated SimplyBlockSnapshotReplication crd field * fix: handle missing start_time in replication duration calculation * fix: handle forced deletion of snapshots when storage node is not found * fix: update KMS init container image to use the correct repository * fix lvol delete response if lvol is in_deletion * fix 1 * Add the mount of /mnt/ramdisk to docker deployment * fix the replicate_lvol_on_source api call params * fix the replicate_lvol_on_source api call params 2 * fixed init job failed to mkdir /etc/systemd/ * remove init copy script container * updated storagenode crd * added storage cr param spdkImage * fix: update replicate_lvol_on_source_cluster to accept cluster_id and pool_id * fix: update lvol deletion process to set status and write to database * Revert "fix: update replicate_lvol_on_source_cluster to accept cluster_id and pool_id" This reverts commit b6820c5c789479315a2231abd244e11e031c0a60. * point image to dockerub * point image to dockerub * added namespace to api resource * feat: add clone_lvol function to create snapshots and clones of logical volumes * feat: implement clone endpoint for logical volumes with retry logic --------- Co-authored-by: Geoffrey Israel Co-authored-by: wmousa --- simplyblock_cli/cli-reference.yaml | 127 ++ simplyblock_cli/cli.py | 99 +- simplyblock_cli/clibase.py | 45 +- simplyblock_core/cluster_ops.py | 30 +- .../controllers/health_controller.py | 14 +- .../controllers/lvol_controller.py | 627 +++++- simplyblock_core/controllers/lvol_events.py | 4 + .../controllers/snapshot_controller.py | 195 +- .../controllers/snapshot_events.py | 7 + .../controllers/tasks_controller.py | 45 + simplyblock_core/db_controller.py | 15 +- simplyblock_core/models/cluster.py | 3 + simplyblock_core/models/job_schedule.py | 1 + simplyblock_core/models/lvol_model.py | 5 + simplyblock_core/models/snapshot.py | 6 + simplyblock_core/models/storage_node.py | 1 + simplyblock_core/rpc_client.py | 52 + ...ck.io_simplyblocksnapshotreplications.yaml | 157 ++ ...implyblock.io_simplyblockstoragenodes.yaml | 258 +++ .../scripts/charts/templates/app_k8s.yaml | 1762 +++-------------- .../scripts/charts/templates/app_sa.yaml | 17 +- .../charts/templates/csi-hostpath-plugin.yaml | 1 + .../charts/templates/simplyblock-manager.yaml | 199 ++ .../templates/simplyblock_customresource.yaml | 149 ++ simplyblock_core/scripts/charts/values.yaml | 91 +- .../scripts/docker-compose-swarm.yml | 14 + simplyblock_core/services/lvol_monitor.py | 4 +- .../services/snapshot_replication.py | 356 ++++ .../services/storage_node_monitor.py | 4 +- simplyblock_core/snode_client.py | 8 + simplyblock_core/storage_node_ops.py | 21 +- simplyblock_core/utils/__init__.py | 306 ++- .../api/internal/storage_node/docker.py | 41 + simplyblock_web/api/v1/cluster.py | 17 + simplyblock_web/api/v1/lvol.py | 29 +- simplyblock_web/api/v2/cluster.py | 22 + simplyblock_web/api/v2/dtos.py | 22 +- simplyblock_web/api/v2/volume.py | 65 +- .../templates/storage_deploy_spdk.yaml.j2 | 28 - .../templates/storage_init_job.yaml.j2 | 1 + 40 files changed, 3190 insertions(+), 1658 deletions(-) create mode 100644 simplyblock_core/scripts/charts/crds/simplyblock.simplyblock.io_simplyblocksnapshotreplications.yaml create mode 100644 simplyblock_core/scripts/charts/crds/simplyblock.simplyblock.io_simplyblockstoragenodes.yaml create mode 100644 simplyblock_core/scripts/charts/templates/simplyblock-manager.yaml create mode 100644 simplyblock_core/scripts/charts/templates/simplyblock_customresource.yaml create mode 100644 simplyblock_core/services/snapshot_replication.py diff --git a/simplyblock_cli/cli-reference.yaml b/simplyblock_cli/cli-reference.yaml index 9442fe12d..f7c3a9a64 100644 --- a/simplyblock_cli/cli-reference.yaml +++ b/simplyblock_cli/cli-reference.yaml @@ -1413,6 +1413,28 @@ commands: help: "Name" dest: name type: str + - name: add-replication + help: Assigns the snapshot replication target cluster + arguments: + - name: "cluster_id" + help: "Cluster id" + dest: cluster_id + type: str + completer: _completer_get_cluster_list + - name: "target_cluster_id" + help: "Target Cluster id" + dest: target_cluster_id + type: str + completer: _completer_get_cluster_list + - name: "--timeout" + help: "Snapshot replication network timeout" + dest: timeout + type: int + default: "3600" + - name: "--target-pool" + help: "Target cluster pool ID or name" + dest: target_pool + type: str - name: "volume" help: "Logical volume commands" aliases: @@ -1543,6 +1565,11 @@ commands: dest: npcs type: int default: 0 + - name: "--replicate" + help: "Replicate LVol snapshot" + dest: replicate + type: bool + action: store_true - name: qos-set help: "Changes QoS settings for an active logical volume" arguments: @@ -1744,6 +1771,52 @@ commands: help: "Logical volume id" dest: volume_id type: str + - name: replication-start + help: "Start snapshot replication taken from lvol" + arguments: + - name: "lvol_id" + help: "Logical volume id" + dest: lvol_id + type: str + - name: "--replication-cluster-id" + help: "Cluster ID of the replication target cluster" + dest: replication_cluster_id + type: str + - name: replication-stop + help: "Stop snapshot replication taken from lvol" + arguments: + - name: "lvol_id" + help: "Logical volume id" + dest: lvol_id + type: str + - name: replication-status + help: "Lists replication status" + arguments: + - name: "cluster_id" + help: "Cluster UUID" + dest: cluster_id + type: str + - name: replication-trigger + help: "Start replication for lvol" + arguments: + - name: "lvol_id" + help: "Logical volume id" + dest: lvol_id + type: str + - name: suspend + help: "Suspend lvol subsystems" + arguments: + - name: "lvol_id" + help: "Logical volume id" + dest: lvol_id + type: str + - name: resume + help: "Resume lvol subsystems" + arguments: + - name: "lvol_id" + help: "Logical volume id" + dest: lvol_id + type: str - name: "control-plane" help: "Control plane commands" aliases: @@ -1972,6 +2045,16 @@ commands: dest: all type: bool action: store_true + - name: "--cluster-id" + help: "Filter snapshots by cluster UUID" + dest: cluster_id + type: str + required: false + - name: "--with-details" + help: "List snapshots with replicate and chaining details" + dest: with_details + type: bool + action: store_true - name: delete help: "Deletes a snapshot" arguments: @@ -1984,6 +2067,13 @@ commands: dest: force type: bool action: store_true + - name: check + help: "Check a snapshot health" + arguments: + - name: "snapshot_id" + help: "Snapshot id" + dest: snapshot_id + type: str - name: clone help: "Provisions a new logical volume from an existing snapshot" arguments: @@ -2000,6 +2090,43 @@ commands: dest: resize type: size default: "0" + - name: replication-status + help: "Lists snapshots replication status" + arguments: + - name: "cluster_id" + help: "Cluster UUID" + dest: cluster_id + type: str + - name: delete-replication-only + help: "Delete replicated version of a snapshot" + arguments: + - name: "snapshot_id" + help: "Snapshot UUID" + dest: snapshot_id + type: str + - name: get + help: "Gets a snapshot information" + arguments: + - name: "snapshot_id" + help: "Snapshot UUID" + dest: snapshot_id + type: str + - name: set + help: "set snapshot db value" + private: true + arguments: + - name: "snapshot_id" + help: "snapshot id" + dest: snapshot_id + type: str + - name: "attr_name" + help: "attr_name" + dest: attr_name + type: str + - name: "attr_value" + help: "attr_value" + dest: attr_value + type: str - name: "qos" help: "qos commands" weight: 700 diff --git a/simplyblock_cli/cli.py b/simplyblock_cli/cli.py index be4d96dee..d3b76fb03 100644 --- a/simplyblock_cli/cli.py +++ b/simplyblock_cli/cli.py @@ -373,6 +373,7 @@ def init_cluster(self): if self.developer_mode: self.init_cluster__set(subparser) self.init_cluster__change_name(subparser) + self.init_cluster__add_replication(subparser) def init_cluster__create(self, subparser): @@ -567,6 +568,13 @@ def init_cluster__change_name(self, subparser): subcommand.add_argument('cluster_id', help='Cluster id', type=str).completer = self._completer_get_cluster_list subcommand.add_argument('name', help='Name', type=str) + def init_cluster__add_replication(self, subparser): + subcommand = self.add_sub_command(subparser, 'add-replication', 'Assigns the snapshot replication target cluster') + subcommand.add_argument('cluster_id', help='Cluster id', type=str).completer = self._completer_get_cluster_list + subcommand.add_argument('target_cluster_id', help='Target Cluster id', type=str).completer = self._completer_get_cluster_list + argument = subcommand.add_argument('--timeout', help='Snapshot replication network timeout', type=int, default=3600, dest='timeout') + argument = subcommand.add_argument('--target-pool', help='Target cluster pool ID or name', type=str, dest='target_pool') + def init_volume(self): subparser = self.add_command('volume', 'Logical volume commands', aliases=['lvol',]) @@ -590,6 +598,12 @@ def init_volume(self): self.init_volume__get_io_stats(subparser) self.init_volume__check(subparser) self.init_volume__inflate(subparser) + self.init_volume__replication_start(subparser) + self.init_volume__replication_stop(subparser) + self.init_volume__replication_status(subparser) + self.init_volume__replication_trigger(subparser) + self.init_volume__suspend(subparser) + self.init_volume__resume(subparser) self.init_volume__migrate(subparser) self.init_volume__migrate_list(subparser) self.init_volume__migrate_cancel(subparser) @@ -622,7 +636,7 @@ def init_volume__add(self, subparser): argument = subcommand.add_argument('--pvc-name', '--pvc_name', help='Set logical volume PVC name for k8s clients', type=str, dest='pvc_name') argument = subcommand.add_argument('--data-chunks-per-stripe', help='Erasure coding schema parameter k (distributed raid), default: 1', type=int, default=0, dest='ndcs') argument = subcommand.add_argument('--parity-chunks-per-stripe', help='Erasure coding schema parameter n (distributed raid), default: 1', type=int, default=0, dest='npcs') - argument = subcommand.add_argument('--allowed-hosts', help='Path to JSON file with host NQNs allowed to access this volume\'s subsystem', type=str, dest='allowed_hosts') + argument = subcommand.add_argument('--replicate', help='Replicate LVol snapshot', dest='replicate', action='store_true') def init_volume__add_host(self, subparser): subcommand = self.add_sub_command(subparser, 'add-host', 'Add an allowed host NQN to a volume\'s subsystem') @@ -718,6 +732,31 @@ def init_volume__inflate(self, subparser): subcommand = self.add_sub_command(subparser, 'inflate', 'Inflate a logical volume') subcommand.add_argument('volume_id', help='Logical volume id', type=str) + def init_volume__replication_start(self, subparser): + subcommand = self.add_sub_command(subparser, 'replication-start', 'Start snapshot replication taken from lvol') + subcommand.add_argument('lvol_id', help='Logical volume id', type=str) + argument = subcommand.add_argument('--replication-cluster-id', help='Cluster ID of the replication target cluster', type=str, dest='replication_cluster_id') + + def init_volume__replication_stop(self, subparser): + subcommand = self.add_sub_command(subparser, 'replication-stop', 'Stop snapshot replication taken from lvol') + subcommand.add_argument('lvol_id', help='Logical volume id', type=str) + + def init_volume__replication_status(self, subparser): + subcommand = self.add_sub_command(subparser, 'replication-status', 'Lists replication status') + subcommand.add_argument('cluster_id', help='Cluster UUID', type=str) + + def init_volume__replication_trigger(self, subparser): + subcommand = self.add_sub_command(subparser, 'replication-trigger', 'Start replication for lvol') + subcommand.add_argument('lvol_id', help='Logical volume id', type=str) + + def init_volume__suspend(self, subparser): + subcommand = self.add_sub_command(subparser, 'suspend', 'Suspend lvol subsystems') + subcommand.add_argument('lvol_id', help='Logical volume id', type=str) + + def init_volume__resume(self, subparser): + subcommand = self.add_sub_command(subparser, 'resume', 'Resume lvol subsystems') + subcommand.add_argument('lvol_id', help='Logical volume id', type=str) + def init_volume__migrate(self, subparser): subcommand = self.add_sub_command(subparser, 'migrate', 'Migrate a logical volume to a different storage node') subcommand.add_argument('volume_id', help='Logical volume id', type=str) @@ -834,7 +873,13 @@ def init_snapshot(self): self.init_snapshot__add(subparser) self.init_snapshot__list(subparser) self.init_snapshot__delete(subparser) + self.init_snapshot__check(subparser) self.init_snapshot__clone(subparser) + self.init_snapshot__replication_status(subparser) + self.init_snapshot__delete_replication_only(subparser) + self.init_snapshot__get(subparser) + if self.developer_mode: + self.init_snapshot__set(subparser) self.init_snapshot__backup(subparser) @@ -852,18 +897,42 @@ def init_snapshot__backup(self, subparser): def init_snapshot__list(self, subparser): subcommand = self.add_sub_command(subparser, 'list', 'Lists all snapshots') argument = subcommand.add_argument('--all', help='List soft deleted snapshots', dest='all', action='store_true') + argument = subcommand.add_argument('--cluster-id', help='Filter snapshots by cluster UUID', type=str, dest='cluster_id', required=False) + argument = subcommand.add_argument('--with-details', help='List snapshots with replicate and chaining details', dest='with_details', action='store_true') def init_snapshot__delete(self, subparser): subcommand = self.add_sub_command(subparser, 'delete', 'Deletes a snapshot') subcommand.add_argument('snapshot_id', help='Snapshot id', type=str) argument = subcommand.add_argument('--force', help='Force remove', dest='force', action='store_true') + def init_snapshot__check(self, subparser): + subcommand = self.add_sub_command(subparser, 'check', 'Check a snapshot health') + subcommand.add_argument('snapshot_id', help='Snapshot id', type=str) + def init_snapshot__clone(self, subparser): subcommand = self.add_sub_command(subparser, 'clone', 'Provisions a new logical volume from an existing snapshot') subcommand.add_argument('snapshot_id', help='Snapshot id', type=str) subcommand.add_argument('lvol_name', help='Logical volume name', type=str) argument = subcommand.add_argument('--resize', help='New logical volume size: 10M, 10G, 10(bytes). Can only increase.', type=size_type(), default='0', dest='resize') + def init_snapshot__replication_status(self, subparser): + subcommand = self.add_sub_command(subparser, 'replication-status', 'Lists snapshots replication status') + subcommand.add_argument('cluster_id', help='Cluster UUID', type=str) + + def init_snapshot__delete_replication_only(self, subparser): + subcommand = self.add_sub_command(subparser, 'delete-replication-only', 'Delete replicated version of a snapshot') + subcommand.add_argument('snapshot_id', help='Snapshot UUID', type=str) + + def init_snapshot__get(self, subparser): + subcommand = self.add_sub_command(subparser, 'get', 'Gets a snapshot information') + subcommand.add_argument('snapshot_id', help='Snapshot UUID', type=str) + + def init_snapshot__set(self, subparser): + subcommand = self.add_sub_command(subparser, 'set', 'set snapshot db value') + subcommand.add_argument('snapshot_id', help='snapshot id', type=str) + subcommand.add_argument('attr_name', help='attr_name', type=str) + subcommand.add_argument('attr_value', help='attr_value', type=str) + def init_backup(self): subparser = self.add_command('backup', 'Backup commands') @@ -1213,6 +1282,8 @@ def run(self): ret = self.cluster__set(sub_command, args) elif sub_command in ['change-name']: ret = self.cluster__change_name(sub_command, args) + elif sub_command in ['add-replication']: + ret = self.cluster__add_replication(sub_command, args) else: self.parser.print_help() @@ -1271,6 +1342,18 @@ def run(self): ret = self.volume__migrate_list(sub_command, args) elif sub_command in ['migrate-cancel']: ret = self.volume__migrate_cancel(sub_command, args) + elif sub_command in ['replication-start']: + ret = self.volume__replication_start(sub_command, args) + elif sub_command in ['replication-stop']: + ret = self.volume__replication_stop(sub_command, args) + elif sub_command in ['replication-status']: + ret = self.volume__replication_status(sub_command, args) + elif sub_command in ['replication-trigger']: + ret = self.volume__replication_trigger(sub_command, args) + elif sub_command in ['suspend']: + ret = self.volume__suspend(sub_command, args) + elif sub_command in ['resume']: + ret = self.volume__resume(sub_command, args) else: self.parser.print_help() @@ -1316,8 +1399,22 @@ def run(self): ret = self.snapshot__list(sub_command, args) elif sub_command in ['delete']: ret = self.snapshot__delete(sub_command, args) + elif sub_command in ['check']: + ret = self.snapshot__check(sub_command, args) elif sub_command in ['clone']: ret = self.snapshot__clone(sub_command, args) + elif sub_command in ['replication-status']: + ret = self.snapshot__replication_status(sub_command, args) + elif sub_command in ['delete-replication-only']: + ret = self.snapshot__delete_replication_only(sub_command, args) + elif sub_command in ['get']: + ret = self.snapshot__get(sub_command, args) + elif sub_command in ['set']: + if not self.developer_mode: + print("This command is private.") + ret = False + else: + ret = self.snapshot__set(sub_command, args) elif sub_command in ['backup']: ret = self.snapshot__backup(sub_command, args) else: diff --git a/simplyblock_cli/clibase.py b/simplyblock_cli/clibase.py index 08522ecba..30188ebab 100644 --- a/simplyblock_cli/clibase.py +++ b/simplyblock_cli/clibase.py @@ -500,6 +500,9 @@ def cluster__complete_expand(self, sub_command, args): cluster_ops.cluster_expand(args.cluster_id) return True + def cluster__add_replication(self, sub_command, args): + return cluster_ops.add_replication(args.cluster_id, args.target_cluster_id, args.timeout, args.target_pool) + def volume__add(self, sub_command, args): import json as _json name = args.name @@ -538,7 +541,8 @@ def volume__add(self, sub_command, args): lvol_priority_class=lvol_priority_class, uid=args.uid, pvc_name=args.pvc_name, namespace=args.namespace, max_namespace_per_subsys=args.max_namespace_per_subsys, ndcs=ndcs, npcs=npcs, fabric=args.fabric, - allowed_hosts=allowed_hosts) + allowed_hosts=allowed_hosts, + do_replicate=args.replicate) if results: return results else: @@ -649,6 +653,24 @@ def volume__check(self, sub_command, args): def volume__inflate(self, sub_command, args): return lvol_controller.inflate_lvol(args.volume_id) + def volume__replication_start(self, sub_command, args): + return lvol_controller.replication_start(args.lvol_id, args.replication_cluster_id) + + def volume__replication_stop(self, sub_command, args): + return lvol_controller.replication_stop(args.lvol_id) + + def volume__replication_status(self, sub_command, args): + return snapshot_controller.list_replication_tasks(args.cluster_id) + + def volume__replication_trigger(self, sub_command, args): + return lvol_controller.replication_trigger(args.lvol_id) + + def volume__suspend(self, sub_command, args): + return lvol_controller.suspend_lvol(args.lvol_id) + + def volume__resume(self, sub_command, args): + return lvol_controller.resume_lvol(args.lvol_id) + def volume__migrate(self, sub_command, args): migration_id, error = migration_controller.start_migration( args.volume_id, @@ -756,16 +778,31 @@ def snapshot__backup(self, sub_command, args): return True def snapshot__list(self, sub_command, args): - return snapshot_controller.list(args.all) + return snapshot_controller.list(args.all, args.cluster_id, args.with_details) def snapshot__delete(self, sub_command, args): return snapshot_controller.delete(args.snapshot_id, args.force) + def snapshot__check(self, sub_command, args): + return health_controller.check_snap(args.snapshot_id) + def snapshot__clone(self, sub_command, args): new_size = args.resize - success, details = snapshot_controller.clone(args.snapshot_id, args.lvol_name, new_size) - return details + clone_id, error = snapshot_controller.clone(args.snapshot_id, args.lvol_name, new_size) + return clone_id if not error else error + + def snapshot__replication_status(self, sub_command, args): + return snapshot_controller.list_replication_tasks(args.cluster_id) + + def snapshot__delete_replication_only(self, sub_command, args): + return snapshot_controller.delete_replicated(args.snapshot_id) + + def snapshot__get(self, sub_command, args): + return snapshot_controller.get(args.snapshot_id) + + def snapshot__set(self, sub_command, args): + return snapshot_controller.set(args.snapshot_id, args.attr_name, args.attr_value) def qos__add(self, sub_command, args): return qos_controller.add_class(args.name, args.weight, args.cluster_id) diff --git a/simplyblock_core/cluster_ops.py b/simplyblock_core/cluster_ops.py index 0d809c9e6..869097cf7 100644 --- a/simplyblock_core/cluster_ops.py +++ b/simplyblock_core/cluster_ops.py @@ -476,7 +476,7 @@ def add_cluster(blk_size, page_size_in_blocks, cap_warn, cap_crit, prov_cap_warn raise ValueError("max_fault_tolerance > 1 requires distr_npcs >= 2") monitoring_secret = os.environ.get("MONITORING_SECRET", "") - + logger.info("Adding new cluster") cluster = Cluster() cluster.uuid = str(uuid.uuid4()) @@ -875,6 +875,7 @@ def list() -> t.List[dict]: "#storage": len(st), "Mod": f"{cl.distr_ndcs}x{cl.distr_npcs}", "Status": status.upper(), + "Replicate": cl.snapshot_replication_target_cluster, }) return data @@ -1468,3 +1469,30 @@ def set(cl_id, attr, value) -> bool: setattr(cluster, attr, value) cluster.write_to_db() return True + + +def add_replication(source_cl_id, target_cl_id, timeout=0, target_pool=None) -> bool: + db_controller = DBController() + cluster = db_controller.get_cluster_by_id(source_cl_id) + if not cluster: + raise ValueError(f"Cluster not found: {source_cl_id}") + + target_cluster = db_controller.get_cluster_by_id(target_cl_id) + if not target_cluster: + raise ValueError(f"Target cluster not found: {target_cl_id}") + + logger.info("Updating Cluster replication target") + cluster.snapshot_replication_target_cluster = target_cl_id + if target_pool: + pool = db_controller.get_pool_by_id(target_pool) + if not pool: + raise ValueError(f"Pool not found: {target_pool}") + if pool.status != Pool.STATUS_ACTIVE: + raise ValueError(f"Pool not active: {target_pool}") + cluster.snapshot_replication_target_pool = target_pool + + if timeout and timeout > 0: + cluster.snapshot_replication_timeout = timeout + cluster.write_to_db() + logger.info("Done") + return True diff --git a/simplyblock_core/controllers/health_controller.py b/simplyblock_core/controllers/health_controller.py index 1eb3bb9f4..487d71eae 100644 --- a/simplyblock_core/controllers/health_controller.py +++ b/simplyblock_core/controllers/health_controller.py @@ -887,12 +887,14 @@ def check_snap(snap_id): return False snode = db_controller.get_storage_node_by_id(snap.lvol.node_id) - rpc_client = RPCClient( - snode.mgmt_ip, snode.rpc_port, - snode.rpc_username, snode.rpc_password, timeout=5, retry=1) - - ret = rpc_client.get_bdevs(snap.snap_bdev) - return ret + check_primary = snode.rpc_client().get_bdevs(snap.snap_bdev) + logger.info(f"Checking snap bdev: {snap.snap_bdev} on node: {snap.lvol.node_id} is {bool(check_primary)}") + if snode.secondary_node_id: + secondary_node = db_controller.get_storage_node_by_id(snode.secondary_node_id) + check_secondary = secondary_node.rpc_client().get_bdevs(snap.snap_bdev) + logger.info(f"Checking snap bdev: {snap.snap_bdev} on node: {snode.secondary_node_id} is {bool(check_secondary)}") + return check_primary and check_secondary + return check_primary def check_jm_device(device_id): diff --git a/simplyblock_core/controllers/lvol_controller.py b/simplyblock_core/controllers/lvol_controller.py index 5fd1a5411..38db9867e 100644 --- a/simplyblock_core/controllers/lvol_controller.py +++ b/simplyblock_core/controllers/lvol_controller.py @@ -1,4 +1,5 @@ # coding=utf-8 +import copy import logging as lg import json import math @@ -10,8 +11,11 @@ from typing import List, Tuple from simplyblock_core import utils, constants -from simplyblock_core.controllers import snapshot_controller, pool_controller, lvol_events +from simplyblock_core.controllers import snapshot_controller, pool_controller, lvol_events, tasks_controller, \ + snapshot_events from simplyblock_core.db_controller import DBController +from simplyblock_core.models.cluster import Cluster +from simplyblock_core.models.job_schedule import JobSchedule from simplyblock_core.models.pool import Pool from simplyblock_core.models.lvol_model import LVol from simplyblock_core.models.storage_node import StorageNode @@ -300,12 +304,12 @@ def validate_aes_xts_keys(key1: str, key2: str) -> Tuple[bool, str]: return True, "" -def add_lvol_ha(name, size, host_id_or_name, ha_type, pool_id_or_name, use_comp, use_crypto, - distr_vuid, max_rw_iops, max_rw_mbytes, max_r_mbytes, max_w_mbytes, +def add_lvol_ha(name, size, host_id_or_name, ha_type, pool_id_or_name, use_comp=False, use_crypto=False, + distr_vuid=0, max_rw_iops=0, max_rw_mbytes=0, max_r_mbytes=0, max_w_mbytes=0, with_snapshot=False, max_size=0, crypto_key1=None, crypto_key2=None, lvol_priority_class=0, uid=None, pvc_name=None, namespace=None, max_namespace_per_subsys=1, fabric="tcp", ndcs=0, npcs=0, - allowed_hosts=None): - + allowed_hosts=None, + do_replicate=False, replication_cluster_id=None): db_controller = DBController() logger.info(f"Adding LVol: {name}") host_node = None @@ -523,6 +527,16 @@ def add_lvol_ha(name, size, host_id_or_name, ha_type, pool_id_or_name, use_comp, else: lvol.npcs = cl.distr_npcs lvol.ndcs = cl.distr_ndcs + lvol.do_replicate = bool(do_replicate) + if lvol.do_replicate: + if replication_cluster_id: + replication_cluster = db_controller.get_cluster_by_id(replication_cluster_id) + if not replication_cluster: + return False, f"Replication cluster not found: {replication_cluster_id}" + else: + replication_cluster_id = cl.snapshot_replication_target_cluster + random_nodes = _get_next_3_nodes(replication_cluster_id, lvol.size) + lvol.replication_node_id = random_nodes[0].get_id() lvol_count = len(db_controller.get_lvols_by_node_id(host_node.get_id())) if lvol_count > host_node.max_lvol: @@ -1028,7 +1042,7 @@ def delete_lvol(id_or_name, force_delete=False): if lvol.status == LVol.STATUS_IN_DELETION: logger.info(f"lvol:{lvol.get_id()} status is in deletion") if not force_delete: - return False + return True pool = db_controller.get_pool_by_id(lvol.pool_uuid) if pool.status == Pool.STATUS_INACTIVE: @@ -1042,7 +1056,8 @@ def delete_lvol(id_or_name, force_delete=False): logger.error(f"lvol node id not found: {lvol.node_id}") if not force_delete: return False - lvol.remove(db_controller.kv_store) + lvol.status = LVol.STATUS_DELETED + lvol.write_to_db(db_controller.kv_store) # if lvol is clone and snapshot is deleted, then delete snapshot if lvol.cloned_from_snap: @@ -1156,7 +1171,10 @@ def delete_lvol(id_or_name, force_delete=False): old_status = lvol.status lvol.status = LVol.STATUS_IN_DELETION lvol.write_to_db() - lvol_events.lvol_status_change(lvol, lvol.status, old_status) + try: + lvol_events.lvol_status_change(lvol, lvol.status, old_status) + except KeyError: + pass if lvol.cloned_from_snap and lvol.delete_snap_on_lvol_delete: logger.info(f"Deleting snap: {lvol.cloned_from_snap}") @@ -1326,7 +1344,7 @@ def list_lvols(is_json, cluster_id, pool_id_or_name, all=False): except KeyError: pass else: - lvols = db_controller.get_all_lvols() + lvols = db_controller.get_lvols() data = [] @@ -1385,6 +1403,7 @@ def list_lvols(is_json, cluster_id, pool_id_or_name, all=False): "NS ID": lvol.ns_id, "Mode": mode, "Policy": eff_policy.policy_name if eff_policy else "", + "Replicated On": lvol.replication_node_id, } data.append(lvol_data) @@ -1424,7 +1443,7 @@ def list_lvols_mem(is_json, is_csv): return utils.print_table(data) -def get_lvol(lvol_id_or_name, is_json): +def get_replication_info(lvol_id_or_name): db_controller = DBController() lvol = None for lv in db_controller.get_lvols(): # pass @@ -1432,6 +1451,64 @@ def get_lvol(lvol_id_or_name, is_json): lvol = lv break + if not lvol: + logger.error(f"LVol id or name not found: {lvol_id_or_name}") + return None + + tasks = [] + snaps = [] + out = { + "last_snapshot_id": None, + "last_replication_time": None, + "last_replication_duration": None, + "replicated_count": None, + "snaps": None, + "tasks": None, + } + node = db_controller.get_storage_node_by_id(lvol.node_id) + for task in db_controller.get_job_tasks(node.cluster_id): + if task.function_name == JobSchedule.FN_SNAPSHOT_REPLICATION: + logger.debug(task) + try: + snap = db_controller.get_snapshot_by_id(task.function_params["snapshot_id"]) + except KeyError: + continue + + if snap.lvol.get_id() != lvol.get_id(): + continue + snaps.append(snap) + tasks.append(task) + + if tasks: + tasks = sorted(tasks, key=lambda x: x.date) + snaps = sorted(snaps, key=lambda x: x.created_at) + out["snaps"] = [s.to_dict() for s in snaps] + out["tasks"] = [t.to_dict() for t in tasks] + out["replicated_count"] = len(snaps) + last_task = tasks[-1] + last_snap = db_controller.get_snapshot_by_id(last_task.function_params["snapshot_id"]) + out["last_snapshot_id"] = last_snap.get_id() + out["last_replication_time"] = last_task.updated_at + if "end_time" in last_task.function_params and "start_time" in last_task.function_params: + duration = utils.strfdelta_seconds( + last_task.function_params["end_time"] - last_task.function_params["start_time"]) + elif "start_time" in last_task.function_params: + duration = utils.strfdelta_seconds(int(time.time()) - last_task.function_params["start_time"]) + else: + duration = 0 + out["last_replication_duration"] = duration + + return out + + +def get_lvol(lvol_id_or_name, is_json): + db_controller = DBController() + lvol = db_controller.get_lvol_by_id(lvol_id_or_name) + for lv in db_controller.get_lvols(): # pass + if lv.get_id() == lvol_id_or_name or lv.lvol_name == lvol_id_or_name: + lvol = lv + break + if not lvol: logger.error(f"LVol id or name not found: {lvol_id_or_name}") return False @@ -1480,6 +1557,16 @@ def connect_lvol(uuid, ctrl_loss_tmo=constants.LVOL_NVME_CONNECT_CTRL_LOSS_TMO, # so just pass host_nqn through without secrets pass + node = db_controller.get_storage_node_by_id(lvol.node_id) + cluster = db_controller.get_cluster_by_id(node.cluster_id) + if cluster.status == Cluster.STATUS_SUSPENDED and cluster.snapshot_replication_target_cluster: + logger.error("Cluster is suspended, looking for replicated lvol") + for lv in db_controller.get_lvols(cluster.snapshot_replication_target_cluster): + if lv.nqn == lvol.nqn: + logger.info(f"LVol with same nqn already exists on target cluster: {lv.get_id()}") + lvol = lv + break + out = [] nodes_ids = [] if lvol.ha_type == 'single': @@ -1989,37 +2076,505 @@ def inflate_lvol(lvol_id): logger.error(f"Failed to inflate LVol: {lvol_id}") return ret - -def list_by_node(node_id=None, is_json=False): +def replication_trigger(lvol_id): + # create snapshot and replicate it db_controller = DBController() - lvols = db_controller.get_lvols() - lvols = sorted(lvols, key=lambda x: x.create_dt) - data = [] - for lvol in lvols: - if node_id: - if lvol.node_id != node_id: + lvol = db_controller.get_lvol_by_id(lvol_id) + node = db_controller.get_storage_node_by_id(lvol.node_id) + snapshot_controller.add(lvol_id, f"replication_{uuid.uuid4()}") + + tasks = [] + snaps = [] + out = { + "lvol": lvol, + "last_snapshot_id": None, + "last_replication_time": None, + "last_replication_duration": None, + "replicated_count": None, + "snaps": None, + "tasks": None, + } + for task in db_controller.get_job_tasks(node.cluster_id): + if task.function_name == JobSchedule.FN_SNAPSHOT_REPLICATION: + logger.debug(task) + try: + snap = db_controller.get_snapshot_by_id(task.function_params["snapshot_id"]) + except KeyError: continue - logger.debug(lvol) - cloned_from_snap = "" + + if snap.lvol.get_id() != lvol_id: + continue + snaps.append(snap) + tasks.append(task) + + if tasks: + tasks = sorted(tasks, key=lambda x: x.date) + snaps = sorted(snaps, key=lambda x: x.created_at) + out["snaps"] = snaps + out["tasks"] = tasks + out["replicated_count"] = len(snaps) + last_task = tasks[-1] + last_snap = db_controller.get_snapshot_by_id(last_task.function_params["snapshot_id"]) + out["last_snapshot_id"] = last_snap.get_id() + out["last_replication_time"] = last_task.updated_at + duration = 0 + if "start_time" in last_task.function_params: + if "end_time" in last_task.function_params: + duration = utils.strfdelta_seconds( + last_task.function_params["end_time"] - last_task.function_params["start_time"]) + else: + duration = utils.strfdelta_seconds(int(time.time()) - last_task.function_params["start_time"]) + out["last_replication_duration"] = duration + + return out + +def replication_start(lvol_id, replication_cluster_id=None): + db_controller = DBController() + try: + lvol = db_controller.get_lvol_by_id(lvol_id) + except KeyError as e: + logger.error(e) + return False + + lvol.do_replicate = True + if not lvol.replication_node_id: + excluded_nodes = [] if lvol.cloned_from_snap: - snap = db_controller.get_snapshot_by_id(lvol.cloned_from_snap) - cloned_from_snap = snap.snap_uuid - data.append({ - "UUID": lvol.uuid, - "BDdev UUID": lvol.lvol_uuid, - "BlobID": lvol.blobid, - "Name": lvol.lvol_name, - "Size": utils.humanbytes(lvol.size), - "LVS name": lvol.lvs_name, - "BDev": lvol.lvol_bdev, - "Node ID": lvol.node_id, - "Clone From Snap BDev": cloned_from_snap, - "Created At": lvol.create_dt, + lvol_snap = db_controller.get_snapshot_by_id(lvol.cloned_from_snap) + if lvol_snap.source_replicated_snap_uuid: + org_snap = db_controller.get_snapshot_by_id(lvol_snap.source_replicated_snap_uuid) + excluded_nodes.append(org_snap.lvol.node_id) + snode = db_controller.get_storage_node_by_id(lvol.node_id) + cluster = db_controller.get_cluster_by_id(snode.cluster_id) + if not replication_cluster_id: + replication_cluster_id = cluster.snapshot_replication_target_cluster + if not replication_cluster_id: + logger.error(f"Cluster: {snode.cluster_id} not replicated") + return False + random_nodes = _get_next_3_nodes(replication_cluster_id, lvol.size) + for r_node in random_nodes: + if r_node.get_id() not in excluded_nodes: + logger.info(f"Replicating on node: {r_node.get_id()}") + lvol.replication_node_id = r_node.get_id() + lvol.write_to_db() + break + if not lvol.replication_node_id: + logger.error(f"Replication node not found for lvol: {lvol.get_id()}") + return False + logger.info("Setting LVol do_replicate: True") + + for snap in db_controller.get_snapshots(): + if snap.lvol.uuid == lvol.uuid: + if not snap.target_replicated_snap_uuid: + task = tasks_controller.add_snapshot_replication_task(snap.cluster_id, snap.lvol.node_id, snap.get_id()) + if task: + snapshot_events.replication_task_created(snap) + return True + + +def replication_stop(lvol_id, delete=False): + db_controller = DBController() + try: + lvol = db_controller.get_lvol_by_id(lvol_id) + except KeyError as e: + logger.error(e) + return False + + logger.info("Setting LVol do_replicate: False") + lvol.do_replicate = False + lvol.write_to_db() + + snode = db_controller.get_storage_node_by_id(lvol.node_id) + tasks = db_controller.get_job_tasks(snode.cluster_id) + + + for task in tasks: + if task.function_name == JobSchedule.FN_SNAPSHOT_REPLICATION and task.status != JobSchedule.STATUS_DONE: + snap = db_controller.get_snapshot_by_id(task.function_params["snapshot_id"]) + if snap.lvol.uuid == lvol.uuid: + tasks_controller.cancel_task(task.uuid) + + return True + + +def replicate_lvol_on_target_cluster(lvol_id): + db_controller = DBController() + try: + lvol = db_controller.get_lvol_by_id(lvol_id) + except KeyError as e: + logger.error(e) + return False + + if not lvol.replication_node_id: + logger.error(f"LVol: {lvol_id} replication node id not found") + return False + + target_node = db_controller.get_storage_node_by_id(lvol.replication_node_id) + if not target_node: + logger.error(f"Node not found: {lvol.replication_node_id}") + return False + + if target_node.status != StorageNode.STATUS_ONLINE: + logger.error(f"Node is not online!: {target_node}, status: {target_node.status}") + return False + + source_node = db_controller.get_storage_node_by_id(lvol.node_id) + source_cluster = db_controller.get_cluster_by_id(source_node.cluster_id) + target_cluster = db_controller.get_cluster_by_id(source_cluster.snapshot_replication_target_cluster) + + for lv in db_controller.get_lvols(source_cluster.snapshot_replication_target_cluster): + if lv.nqn == lvol.nqn: + logger.info(f"LVol with same nqn already exists on target cluster: {lv.get_id()}") + return lv.get_id() + + snaps = [] + snapshot = None + for task in db_controller.get_job_tasks(source_node.cluster_id): + if task.function_name == JobSchedule.FN_SNAPSHOT_REPLICATION: + logger.debug(task) + try: + snap = db_controller.get_snapshot_by_id(task.function_params["snapshot_id"]) + except KeyError: + continue + + if snap.lvol.get_id() != lvol_id: + continue + snaps.append(snap) + + if snaps: + snaps = sorted(snaps, key=lambda x: x.created_at) + last_snapshot = snaps[-1] + rep_snap = db_controller.get_snapshot_by_id(last_snapshot.target_replicated_snap_uuid) + snapshot = rep_snap + + if not snapshot: + logger.error(f"Snapshot for replication not found for lvol: {lvol_id}") + return False + + # create lvol on target node + new_lvol = copy.deepcopy(lvol) + new_lvol.uuid = str(uuid.uuid4()) + new_lvol.create_dt = str(datetime.now()) + new_lvol.node_id = target_node.get_id() + new_lvol.nodes = [target_node.get_id(), target_node.secondary_node_id] + new_lvol.replication_node_id = "" + new_lvol.do_replicate = False + new_lvol.cloned_from_snap = snapshot.get_id() + new_lvol.pool_uuid = source_cluster.snapshot_replication_target_pool + new_lvol.lvs_name = target_node.lvstore + new_lvol.top_bdev = f"{new_lvol.lvs_name}/{new_lvol.lvol_bdev}" + new_lvol.snapshot_name = snapshot.snap_bdev + new_lvol.status = LVol.STATUS_IN_CREATION + new_lvol.nqn = target_cluster.nqn + ":lvol:" + lvol.uuid + + new_lvol.bdev_stack = [ + { + "type": "bdev_lvol_clone", + "name": new_lvol.top_bdev, + "params": { + "snapshot_name": snapshot.snap_bdev, + "clone_name": new_lvol.lvol_bdev + } + } + ] + + if new_lvol.crypto_bdev: + new_lvol.bdev_stack.append({ + "type": "crypto", + "name": new_lvol.crypto_bdev, + "params": { + "name": new_lvol.crypto_bdev, + "base_name": new_lvol.top_bdev, + "key1": new_lvol.crypto_key1, + "key2": new_lvol.crypto_key2, + } }) - if is_json: - return json.dumps(data, indent=2) - return utils.print_table(data) + new_lvol.write_to_db(db_controller.kv_store) + + lvol_bdev, error = add_lvol_on_node(new_lvol, target_node) + if error: + logger.error(error) + new_lvol.remove(db_controller.kv_store) + return False, error + + new_lvol.lvol_uuid = lvol_bdev['uuid'] + new_lvol.blobid = lvol_bdev['driver_specific']['lvol']['blobid'] + + secondary_node = db_controller.get_storage_node_by_id(target_node.secondary_node_id) + if secondary_node.status == StorageNode.STATUS_ONLINE: + lvol_bdev, error = add_lvol_on_node(new_lvol, secondary_node, is_primary=False) + if error: + logger.error(error) + # remove lvol from primary + ret = delete_lvol_from_node(new_lvol, target_node) + if not ret: + logger.error("") + new_lvol.remove(db_controller.kv_store) + return False, error + + new_lvol.status = LVol.STATUS_ONLINE + new_lvol.write_to_db(db_controller.kv_store) + lvol = db_controller.get_lvol_by_id(lvol_id) + lvol.from_source = False + lvol.write_to_db() + lvol_events.lvol_replicated(lvol, new_lvol) + + return new_lvol.lvol_uuid + + +def list_replication_tasks(lvol_id): + db_controller = DBController() + lvol = db_controller.get_lvol_by_id(lvol_id) + node = db_controller.get_storage_node_by_id(lvol.node_id) + tasks = [] + for task in db_controller.get_job_tasks(node.cluster_id): + if task.function_name == JobSchedule.FN_SNAPSHOT_REPLICATION: + try: + snap = db_controller.get_snapshot_by_id(task.function_params["snapshot_id"]) + except KeyError: + continue + if snap.lvol.get_id() != lvol_id: + continue + tasks.append(task) + + return tasks + + +def suspend_lvol(lvol_id): + + db_controller = DBController() + try: + lvol = db_controller.get_lvol_by_id(lvol_id) + except KeyError as e: + logger.error(e) + return False + + logger.info(f"suspending LVol subsystem: {lvol.get_id()}") + snode = db_controller.get_storage_node_by_id(lvol.node_id) + for iface in snode.data_nics: + if iface.ip4_address and lvol.fabric == iface.trtype.lower(): + logger.info("adding listener for %s on IP %s" % (lvol.nqn, iface.ip4_address)) + ret = snode.rpc_client().nvmf_subsystem_listener_set_ana_state(lvol.nqn, iface.ip4_address, lvol.subsys_port, ana="inaccessible") + if not ret: + logger.error(f"Failed to set subsystem listener state for {lvol.nqn} on {iface.ip4_address}") + return False + + if snode.secondary_node_id: + sec_node = db_controller.get_storage_node_by_id(snode.secondary_node_id) + if sec_node.status in [StorageNode.STATUS_ONLINE, StorageNode.STATUS_DOWN, StorageNode.STATUS_SUSPENDED]: + for iface in sec_node.data_nics: + if iface.ip4_address and lvol.fabric == iface.trtype.lower(): + logger.info("adding listener for %s on IP %s" % (lvol.nqn, iface.ip4_address)) + ret = sec_node.rpc_client().nvmf_subsystem_listener_set_ana_state(lvol.nqn, iface.ip4_address, lvol.subsys_port, ana="inaccessible") + if not ret: + logger.error(f"Failed to set subsystem listener state for {lvol.nqn} on {iface.ip4_address}") + return False + + return True + + +def resume_lvol(lvol_id): + db_controller = DBController() + try: + lvol = db_controller.get_lvol_by_id(lvol_id) + except KeyError as e: + logger.error(e) + return False + + logger.info(f"suspending LVol subsystem: {lvol.get_id()}") + snode = db_controller.get_storage_node_by_id(lvol.node_id) + for iface in snode.data_nics: + if iface.ip4_address and lvol.fabric == iface.trtype.lower(): + logger.info("adding listener for %s on IP %s" % (lvol.nqn, iface.ip4_address)) + ret = snode.rpc_client().nvmf_subsystem_listener_set_ana_state( + lvol.nqn, iface.ip4_address, lvol.subsys_port, is_optimized=True) + if not ret: + logger.error(f"Failed to set subsystem listener state for {lvol.nqn} on {iface.ip4_address}") + return False + + if snode.secondary_node_id: + sec_node = db_controller.get_storage_node_by_id(snode.secondary_node_id) + if sec_node.status in [StorageNode.STATUS_ONLINE, StorageNode.STATUS_DOWN, StorageNode.STATUS_SUSPENDED]: + for iface in sec_node.data_nics: + if iface.ip4_address and lvol.fabric == iface.trtype.lower(): + logger.info("adding listener for %s on IP %s" % (lvol.nqn, iface.ip4_address)) + ret = sec_node.rpc_client().nvmf_subsystem_listener_set_ana_state( + lvol.nqn, iface.ip4_address, lvol.subsys_port, is_optimized=False) + if not ret: + logger.error(f"Failed to set subsystem listener state for {lvol.nqn} on {iface.ip4_address}") + return False + + return True + + +def replicate_lvol_on_source_cluster(lvol_id, cluster_id=None, pool_uuid=None): + db_controller = DBController() + try: + lvol = db_controller.get_lvol_by_id(lvol_id) + except KeyError as e: + logger.error(e) + return False + + source_node = db_controller.get_storage_node_by_id(lvol.node_id) + new_source_cluster = None + if cluster_id and source_node.cluster_id == cluster_id: + new_source_cluster = db_controller.get_cluster_by_id(cluster_id) + if new_source_cluster.status != Cluster.STATUS_ACTIVE: + logger.error(f"Cluster is not active: {cluster_id}") + return False + # get new source node from the new cluster + nodes = _get_next_3_nodes(new_source_cluster.get_id(), lvol.size) + if not nodes: + return False, "No nodes found with enough resources to create the LVol" + source_node = nodes[0] + + if not source_node: + logger.error(f"Node not found: {lvol.node_id}") + return False + + if source_node.status != StorageNode.STATUS_ONLINE: + logger.error(f"Node is not online!: {source_node.get_id()}, status: {source_node.status}") + return False + + + snaps = [] + snapshot = None + for task in db_controller.get_job_tasks(source_node.cluster_id): + if task.function_name == JobSchedule.FN_SNAPSHOT_REPLICATION: + logger.debug(task) + try: + snap = db_controller.get_snapshot_by_id(task.function_params["snapshot_id"]) + except KeyError: + continue + + if snap.lvol.get_id() != lvol_id: + continue + snaps.append(snap) + + if snaps: + snaps = sorted(snaps, key=lambda x: x.created_at) + snapshot = snaps[-1] + + if not snapshot: + logger.error(f"Snapshot for replication not found for lvol: {lvol_id}") + return False + + # create lvol on target node + new_lvol = copy.deepcopy(lvol) + new_lvol.cloned_from_snap = snapshot.get_id() + new_lvol.snapshot_name = snapshot.snap_bdev + new_lvol.from_source = True + new_lvol.node_id = source_node.get_id() + new_lvol.nodes = [source_node.get_id(), source_node.secondary_node_id] + new_lvol.status = LVol.STATUS_IN_CREATION + new_lvol.vuid = utils.get_random_vuid() + new_lvol.lvol_bdev = f"LVOL_{new_lvol.vuid}" + new_lvol.lvs_name = source_node.lvstore + new_lvol.top_bdev = f"{new_lvol.lvs_name}/{new_lvol.lvol_bdev}" + if pool_uuid: + new_pool = db_controller.get_pool_by_id(pool_uuid) + new_lvol.pool_uuid = new_pool.get_id() + new_lvol.pool_name = new_pool.pool_name + if new_source_cluster: + new_lvol.nqn = new_source_cluster.nqn + ":lvol:" + new_lvol.uuid + new_lvol.bdev_stack = [ + { + "type": "bdev_lvol_clone", + "name": new_lvol.top_bdev, + "params": { + "snapshot_name": snapshot.snap_bdev, + "clone_name": new_lvol.lvol_bdev + } + } + ] + + if new_lvol.crypto_bdev: + new_lvol.bdev_stack.append({ + "type": "crypto", + "name": new_lvol.crypto_bdev, + "params": { + "name": new_lvol.crypto_bdev, + "base_name": new_lvol.top_bdev, + "key1": new_lvol.crypto_key1, + "key2": new_lvol.crypto_key2, + } + }) + + new_lvol.write_to_db(db_controller.kv_store) + + logger.debug(f"new lvol from_source: {new_lvol.from_source}") + + lvol_bdev, error = add_lvol_on_node(new_lvol, source_node) + if error: + logger.error(error) + new_lvol.remove(db_controller.kv_store) + return False, error + + new_lvol.lvol_uuid = lvol_bdev['uuid'] + new_lvol.blobid = lvol_bdev['driver_specific']['lvol']['blobid'] + + secondary_node = db_controller.get_storage_node_by_id(source_node.secondary_node_id) + if secondary_node.status == StorageNode.STATUS_ONLINE: + lvol_bdev, error = add_lvol_on_node(new_lvol, secondary_node, is_primary=False) + if error: + logger.error(error) + # remove lvol from primary + ret = delete_lvol_from_node(new_lvol, source_node) + if not ret: + logger.error("") + new_lvol.remove(db_controller.kv_store) + return False, error + + new_lvol.status = LVol.STATUS_ONLINE + new_lvol.from_source = True + new_lvol.write_to_db(db_controller.kv_store) + lvol_events.lvol_replicated(lvol, new_lvol) + logger.debug(f"new lvol from_source: {new_lvol.from_source}") + + return new_lvol.lvol_uuid + + + +def clone_lvol(lvol_id, clone_name): + # create snapshot and clone it + db_controller = DBController() + try: + lvol = db_controller.get_lvol_by_id(lvol_id) + except KeyError as e: + logger.error(e) + return False + + try: + snapshot_uuid = None + for i in range(10): + snapshot_uuid, err = snapshot_controller.add(lvol_id, clone_name) + if err: + logger.error(err) + time.sleep(3) + continue + else: + if not snapshot_uuid: + logger.error("Failed to create snapshot for clone after 10 attempts") + return False + new_lvol_uuid = None + for i in range(10): + new_lvol_uuid, err = snapshot_controller.clone(snapshot_uuid, clone_name) + if err: + logger.error(err) + time.sleep(3) + continue + else: + if not new_lvol_uuid: + logger.error("Failed to clone lvol after 10 attempts") + if snapshot_uuid: + snapshot_controller.delete(snapshot_uuid) + return False + + return new_lvol_uuid + except Exception as e: + logger.error(e) + return False def _build_host_entries(allowed_hosts, sec_options=None): """Build the allowed_hosts list with auto-generated keys. diff --git a/simplyblock_core/controllers/lvol_events.py b/simplyblock_core/controllers/lvol_events.py index 6666ed782..e5ece1a40 100644 --- a/simplyblock_core/controllers/lvol_events.py +++ b/simplyblock_core/controllers/lvol_events.py @@ -43,3 +43,7 @@ def lvol_health_check_change(lvol, new_state, old_status, caused_by=ec.CAUSED_BY def lvol_io_error_change(lvol, new_state, old_status, caused_by=ec.CAUSED_BY_CLI): _lvol_event(lvol, f"LVol IO Error changed from: {old_status} to: {new_state}", caused_by, ec.EVENT_STATUS_CHANGE) + +def lvol_replicated(lvol, new_lvol, caused_by=ec.CAUSED_BY_CLI): + _lvol_event(lvol, f"LVol Replicated, {lvol.get_id()}, new lvol: {new_lvol.get_id()}", caused_by, ec.EVENT_STATUS_CHANGE) + diff --git a/simplyblock_core/controllers/snapshot_controller.py b/simplyblock_core/controllers/snapshot_controller.py index 32013f849..0cae8efec 100644 --- a/simplyblock_core/controllers/snapshot_controller.py +++ b/simplyblock_core/controllers/snapshot_controller.py @@ -5,10 +5,11 @@ import time import uuid -from simplyblock_core.controllers import lvol_controller, snapshot_events, pool_controller +from simplyblock_core.controllers import lvol_controller, snapshot_events, pool_controller, tasks_controller from simplyblock_core import utils, constants from simplyblock_core.db_controller import DBController +from simplyblock_core.models.job_schedule import JobSchedule from simplyblock_core.models.pool import Pool from simplyblock_core.models.snapshot import SnapShot from simplyblock_core.models.lvol_model import LVol @@ -35,16 +36,19 @@ def add(lvol_id, snapshot_name, backup=False): return False, msg if lvol.cloned_from_snap: - snap = db_controller.get_snapshot_by_id(lvol.cloned_from_snap) - ref_count = snap.ref_count - if snap.snap_ref_id: - ref_snap = db_controller.get_snapshot_by_id(snap.snap_ref_id) - ref_count = ref_snap.ref_count - - if ref_count >= constants.MAX_SNAP_COUNT: - msg = f"Can not create more than {constants.MAX_SNAP_COUNT} snaps from this clone" - logger.error(msg) - return False, msg + try: + snap = db_controller.get_snapshot_by_id(lvol.cloned_from_snap) + ref_count = snap.ref_count + if snap.snap_ref_id: + ref_snap = db_controller.get_snapshot_by_id(snap.snap_ref_id) + ref_count = ref_snap.ref_count + + if ref_count >= constants.MAX_SNAP_COUNT: + msg = f"Can not create more than {constants.MAX_SNAP_COUNT} snaps from this clone" + logger.error(msg) + return False, msg + except KeyError: + pass for sn in db_controller.get_snapshots(): if sn.cluster_id == pool.cluster_id: @@ -235,8 +239,34 @@ def add(lvol_id, snapshot_name, backup=False): snap.snap_ref_id = original_snap.get_id() snap.write_to_db(db_controller.kv_store) - logger.info("Done") + for sn in db_controller.get_snapshots(cluster.get_id()): + if sn.get_id() == snap.get_id(): + continue + if sn.lvol.get_id() == lvol_id: + if not sn.next_snap_uuid: + sn.next_snap_uuid = snap.get_id() + snap.prev_snap_uuid = sn.get_id() + sn.write_to_db() + snap.write_to_db() + break + snapshot_events.snapshot_create(snap) + if lvol.do_replicate: + task = tasks_controller.add_snapshot_replication_task(snap.cluster_id, snap.lvol.node_id, snap.get_id()) + if task: + snapshot_events.replication_task_created(snap) + if lvol.cloned_from_snap: + lvol_snap = db_controller.get_snapshot_by_id(lvol.cloned_from_snap) + if lvol_snap.source_replicated_snap_uuid: + try: + org_snap = db_controller.get_snapshot_by_id(lvol_snap.source_replicated_snap_uuid) + if org_snap and org_snap.status == SnapShot.STATUS_ONLINE: + task = tasks_controller.add_snapshot_replication_task( + snap.cluster_id, org_snap.lvol.node_id, snap.get_id(), replicate_to_source=True) + if task: + logger.info("Created snapshot replication task on original node") + except KeyError: + pass if backup: from simplyblock_core.controllers import backup_controller @@ -247,8 +277,8 @@ def add(lvol_id, snapshot_name, backup=False): return snap.uuid, False -def list(node_id=None): - snaps = db_controller.get_snapshots() +def list(all=False, cluster_id=None, with_details=False): + snaps = db_controller.get_snapshots(cluster_id) snaps = sorted(snaps, key=lambda snap: snap.created_at) # Build set of lvol UUIDs with active migrations (single DB scan) @@ -256,7 +286,6 @@ def list(node_id=None): for m in db_controller.get_migrations(): if m.is_active(): migrating_lvols.add(m.lvol_id) - data = [] for snap in snaps: if node_id: @@ -267,7 +296,7 @@ def list(node_id=None): for lvol in db_controller.get_lvols(): if lvol.cloned_from_snap and lvol.cloned_from_snap == snap.get_id(): clones.append(lvol.get_id()) - data.append({ + d = { "UUID": snap.uuid, "BDdev UUID": snap.snap_uuid, "BlobID": snap.blobid, @@ -280,7 +309,13 @@ def list(node_id=None): "Created At": time.strftime("%H:%M:%S, %d/%m/%Y", time.gmtime(snap.created_at)), "Base Snapshot": snap.snap_ref_id, "Clones": clones, - }) + } + if with_details: + d["Replication target snap"] = snap.target_replicated_snap_uuid + d["Replication source snap"] = snap.source_replicated_snap_uuid + d["Rrev snap"] = snap.prev_snap_uuid + d["Next snap"] = snap.next_snap_uuid + data.append(d) return utils.print_table(data) @@ -313,10 +348,17 @@ def delete(snapshot_uuid, force_delete=False): f"{len(active_backups)} backup(s) still in progress") return False + if snap.status == SnapShot.STATUS_IN_REPLICATION: + logger.error("Snapshot is in replication") + return False + try: snode = db_controller.get_storage_node_by_id(snap.lvol.node_id) except KeyError: logger.exception(f"Storage node not found {snap.lvol.node_id}") + if force_delete: + snap.remove(db_controller.kv_store) + return True return False clones = [] @@ -422,6 +464,9 @@ def delete(snapshot_uuid, force_delete=False): except KeyError: pass + if snap.target_replicated_snap_uuid: + delete_replicated(snap.uuid) + logger.info("Done") return True @@ -681,32 +726,96 @@ def clone(snapshot_id, clone_name, new_size=0, pvc_name=None, pvc_namespace=None return lvol.uuid, False -def list_by_node(node_id=None, is_json=False): - snaps = db_controller.get_snapshots() - snaps = sorted(snaps, key=lambda snap: snap.created_at) +def list_replication_tasks(cluster_id): + tasks = db_controller.get_job_tasks(cluster_id) + data = [] - for snap in snaps: - if node_id: - if snap.lvol.node_id != node_id: + for task in tasks: + if task.function_name == JobSchedule.FN_SNAPSHOT_REPLICATION: + logger.debug(task) + try: + snap = db_controller.get_snapshot_by_id(task.function_params["snapshot_id"]) + except KeyError: continue - logger.debug(snap) - clones = [] - for lvol in db_controller.get_lvols(): - if lvol.cloned_from_snap and lvol.cloned_from_snap == snap.get_id(): - clones.append(lvol.get_id()) - data.append({ - "UUID": snap.uuid, - "BDdev UUID": snap.snap_uuid, - "BlobID": snap.blobid, - "Name": snap.snap_name, - "Size": utils.humanbytes(snap.used_size), - "BDev": snap.snap_bdev.split("/")[1], - "Node ID": snap.lvol.node_id, - "LVol ID": snap.lvol.get_id(), - "Created At": time.strftime("%H:%M:%S, %d/%m/%Y", time.gmtime(snap.created_at)), - "Base Snapshot": snap.snap_ref_id, - "Clones": clones, - }) - if is_json: - return json.dumps(data, indent=2) + + duration = "" + try: + if task.status == JobSchedule.STATUS_RUNNING: + duration = utils.strfdelta_seconds(int(time.time()) - task.function_params["start_time"]) + elif "end_time" in task.function_params: + duration = utils.strfdelta_seconds( + task.function_params["end_time"] - task.function_params["start_time"]) + except Exception as e: + logger.error(e) + status = task.status + if task.canceled: + status = "cancelled" + replicate_to = "target" + if "replicate_to_source" in task.function_params: + if task.function_params["replicate_to_source"] is True: + replicate_to = "source" + offset = 0 + if "offset" in task.function_params: + offset = task.function_params["offset"] + data.append({ + "Task ID": task.uuid, + "Snapshot ID": snap.uuid, + "Size": utils.humanbytes(snap.used_size), + "Duration": duration, + "Offset": offset, + "Status": status, + "Replicate to": replicate_to, + "Result": task.function_result, + "Cluster ID": task.cluster_id, + }) return utils.print_table(data) + + +def delete_replicated(snapshot_id): + try: + snap = db_controller.get_snapshot_by_id(snapshot_id) + except KeyError: + logger.error(f"Snapshot not found {snapshot_id}") + return False + + try: + target_replicated_snap = db_controller.get_snapshot_by_id(snap.target_replicated_snap_uuid) + logger.info("Deleting replicated snapshot %s", target_replicated_snap.uuid) + ret = delete(target_replicated_snap.uuid) + if not ret: + logger.error("Failed to delete snapshot %s", target_replicated_snap.uuid) + return False + + except KeyError: + logger.error(f"Snapshot not found {snap.target_replicated_snap_uuid}") + return False + + return True + + +def get(snapshot_uuid): + try: + snap = db_controller.get_snapshot_by_id(snapshot_uuid) + except KeyError: + logger.error(f"Snapshot not found {snapshot_uuid}") + return False + + return json.dumps(snap.get_clean_dict(), indent=2) + + +def set(snapshot_uuid, attr, value) -> bool: + try: + snap = db_controller.get_snapshot_by_id(snapshot_uuid) + except KeyError: + logger.error(f"Snapshot not found {snapshot_uuid}") + return False + + if attr not in snap.get_attrs_map(): + raise KeyError('Attribute not found') + + value = snap.get_attrs_map()[attr]['type'](value) + logger.info(f"Setting {attr} to {value}") + setattr(snap, attr, value) + snap.write_to_db() + return True + diff --git a/simplyblock_core/controllers/snapshot_events.py b/simplyblock_core/controllers/snapshot_events.py index 10cfd2622..e19567afb 100644 --- a/simplyblock_core/controllers/snapshot_events.py +++ b/simplyblock_core/controllers/snapshot_events.py @@ -31,3 +31,10 @@ def snapshot_delete(snapshot, caused_by=ec.CAUSED_BY_CLI): def snapshot_clone(snapshot, lvol_clone, caused_by=ec.CAUSED_BY_CLI): _snapshot_event(snapshot, f"Snapshot cloned: {snapshot.get_id()} clone id: {lvol_clone.get_id()}", caused_by, ec.EVENT_STATUS_CHANGE) + +def replication_task_created(snapshot, caused_by=ec.CAUSED_BY_CLI): + _snapshot_event(snapshot, "Snapshot replication task created", caused_by, ec.EVENT_OBJ_CREATED) + + +def replication_task_finished(snapshot, caused_by=ec.CAUSED_BY_CLI): + _snapshot_event(snapshot, "Snapshot replication task finished", caused_by, ec.EVENT_OBJ_CREATED) diff --git a/simplyblock_core/controllers/tasks_controller.py b/simplyblock_core/controllers/tasks_controller.py index cb1d0ee5f..fb4966af1 100644 --- a/simplyblock_core/controllers/tasks_controller.py +++ b/simplyblock_core/controllers/tasks_controller.py @@ -81,6 +81,13 @@ def _add_task(function_name, cluster_id, node_id, device_id, logger.info(f"Task found, skip adding new task: {task_id}") return False + elif function_name == JobSchedule.FN_SNAPSHOT_REPLICATION: + task_id = get_snapshot_replication_task( + cluster_id, function_params['snapshot_id'], function_params['replicate_to_source']) + if task_id: + logger.info(f"Task found, skip adding new task: {task_id}") + return False + task_obj = JobSchedule() task_obj.uuid = str(uuid.uuid4()) task_obj.cluster_id = cluster_id @@ -179,6 +186,7 @@ def list_tasks(cluster_id, is_json=False, limit=50, **kwargs): for task in tasks: if task.function_name == JobSchedule.FN_DEV_MIG: continue + logger.debug(task) if task.max_retry > 0: retry = f"{task.retry}/{task.max_retry}" else: @@ -463,6 +471,15 @@ def get_lvol_sync_del_task(cluster_id, node_id, lvol_bdev_name=None): return task.uuid return False +def get_snapshot_replication_task(cluster_id, snapshot_id, replicate_to_source): + tasks = db.get_job_tasks(cluster_id) + for task in tasks: + if task.function_name == JobSchedule.FN_SNAPSHOT_REPLICATION and task.function_params["snapshot_id"] == snapshot_id: + if task.status != JobSchedule.STATUS_DONE and task.canceled is False: + if task.function_params["replicate_to_source"] == replicate_to_source: + return task.uuid + return False + def add_backup_task(backup): """Create the task that performs an S3 backup.""" @@ -509,3 +526,31 @@ def add_backup_merge_task(cluster_id, node_id, keep_backup_id, old_backup_id): }, ) + +def _check_snap_instance_on_node(snapshot_id: str , node_id: str): + snapshot = db.get_snapshot_by_id(snapshot_id) + for sn_inst in snapshot.instances: + if sn_inst.lvol.node_id == node_id: + logger.info("Snapshot instance found on node, skip adding replication task") + return + + if snapshot.snap_ref_id: + prev_snap = db.get_snapshot_by_id(snapshot.snap_ref_id) + _check_snap_instance_on_node(prev_snap.get_id(), node_id) + + _add_task(JobSchedule.FN_SNAPSHOT_REPLICATION, snapshot.cluster_id, node_id, "", + function_params={"snapshot_id": snapshot.get_id(), "replicate_to_source": False, + "replicate_as_snap_instance": True}, + send_to_cluster_log=False) + + +def add_snapshot_replication_task(cluster_id, node_id, snapshot_id, replicate_to_source=False): + if not replicate_to_source: + snapshot = db.get_snapshot_by_id(snapshot_id) + if snapshot.snap_ref_id: + prev_snap = db.get_snapshot_by_id(snapshot.snap_ref_id) + _check_snap_instance_on_node(prev_snap.get_id(), node_id) + + return _add_task(JobSchedule.FN_SNAPSHOT_REPLICATION, cluster_id, node_id, "", + function_params={"snapshot_id": snapshot_id, "replicate_to_source": replicate_to_source}, + send_to_cluster_log=False) diff --git a/simplyblock_core/db_controller.py b/simplyblock_core/db_controller.py index e9adad6c4..f60069511 100644 --- a/simplyblock_core/db_controller.py +++ b/simplyblock_core/db_controller.py @@ -121,6 +121,7 @@ def get_pool_by_name(self, name) -> Pool: def get_lvols(self, cluster_id=None) -> List[LVol]: lvols = self.get_all_lvols() + lvols = [lvol for lvol in lvols if lvol.status != LVol.STATUS_DELETED] if not cluster_id: return lvols @@ -161,9 +162,11 @@ def get_hostnames_by_pool_id(self, pool_id) -> List[str]: hostnames.append(lv.hostname) return hostnames - def get_snapshots(self) -> List[SnapShot]: - ret = SnapShot().read_from_db(self.kv_store) - return ret + def get_snapshots(self, cluster_id=None) -> List[SnapShot]: + snaps = SnapShot().read_from_db(self.kv_store) + if cluster_id: + snaps = [n for n in snaps if n.cluster_id == cluster_id] + return sorted(snaps, key=lambda x: x.created_at) def get_snapshot_by_id(self, id) -> SnapShot: ret = SnapShot().read_from_db(self.kv_store, id) @@ -260,7 +263,9 @@ def get_events(self, event_id=" ", limit=0, reverse=False) -> List[EventObj]: return EventObj().read_from_db(self.kv_store, id=event_id, limit=limit, reverse=reverse) def get_job_tasks(self, cluster_id, reverse=True, limit=0) -> List[JobSchedule]: - return JobSchedule().read_from_db(self.kv_store, id=cluster_id, reverse=reverse, limit=limit) + ret = JobSchedule().read_from_db(self.kv_store, id=cluster_id, reverse=reverse, limit=limit) + return sorted(ret, key=lambda x: x.date) + def get_active_migration_tasks(self, cluster_id: str) -> List[JobSchedule]: """Return all non-done FN_LVOL_MIG tasks for the given cluster (single FDB scan).""" @@ -282,7 +287,7 @@ def get_snapshots_by_node_id(self, node_id) -> List[SnapShot]: for snap in snaps: if snap.lvol.node_id == node_id: ret.append(snap) - return ret + return sorted(ret, key=lambda x: x.create_dt) def get_snapshots_by_lvol_id(self, lvol_id) -> List[SnapShot]: return [s for s in self.get_snapshots() if s.lvol and s.lvol.get_id() == lvol_id] diff --git a/simplyblock_core/models/cluster.py b/simplyblock_core/models/cluster.py index 4931943ff..d42b1c9c5 100644 --- a/simplyblock_core/models/cluster.py +++ b/simplyblock_core/models/cluster.py @@ -70,6 +70,9 @@ class Cluster(BaseModel): is_re_balancing: bool = False full_page_unmap: bool = True is_single_node: bool = False + snapshot_replication_target_cluster: str = "" + snapshot_replication_target_pool: str = "" + snapshot_replication_timeout: int = 60*10 client_data_nic: str = "" max_fault_tolerance: int = 1 backup_config: dict = {} diff --git a/simplyblock_core/models/job_schedule.py b/simplyblock_core/models/job_schedule.py index 4138ec98f..939ca60fb 100644 --- a/simplyblock_core/models/job_schedule.py +++ b/simplyblock_core/models/job_schedule.py @@ -22,6 +22,7 @@ class JobSchedule(BaseModel): FN_BALANCING_AFTER_DEV_REMOVE = "balancing_on_dev_rem" FN_BALANCING_AFTER_DEV_EXPANSION = "balancing_on_dev_add" FN_JC_COMP_RESUME = "jc_comp_resume" + FN_SNAPSHOT_REPLICATION = "snapshot_replication" FN_LVOL_SYNC_DEL = "lvol_sync_del" FN_LVOL_MIG = "lvol_migration" FN_BACKUP = "s3_backup" diff --git a/simplyblock_core/models/lvol_model.py b/simplyblock_core/models/lvol_model.py index 3988cf09c..85cd0b229 100644 --- a/simplyblock_core/models/lvol_model.py +++ b/simplyblock_core/models/lvol_model.py @@ -13,6 +13,7 @@ class LVol(BaseModel): STATUS_OFFLINE = 'offline' STATUS_IN_DELETION = 'in_deletion' STATUS_RESTORING = 'restoring' + STATUS_DELETED = 'deleted' _STATUS_CODE_MAP = { STATUS_ONLINE: 1, @@ -20,6 +21,7 @@ class LVol(BaseModel): STATUS_IN_DELETION: 3, STATUS_IN_CREATION: 4, STATUS_RESTORING: 5, + STATUS_DELETED: 6, } base_bdev: str = "" @@ -70,6 +72,9 @@ class LVol(BaseModel): npcs: int = 0 allowed_hosts: List[dict] = [] delete_snap_on_lvol_delete: bool = False + do_replicate: bool = False + replication_node_id: str = "" + from_source: bool = True def has_qos(self): return (self.rw_ios_per_sec > 0 or self.rw_mbytes_per_sec > 0 or self.r_mbytes_per_sec > 0 or self.w_mbytes_per_sec > 0) diff --git a/simplyblock_core/models/snapshot.py b/simplyblock_core/models/snapshot.py index 1da571ec8..ab91a0087 100644 --- a/simplyblock_core/models/snapshot.py +++ b/simplyblock_core/models/snapshot.py @@ -9,6 +9,7 @@ class SnapShot(BaseModel): STATUS_ONLINE = 'online' STATUS_OFFLINE = 'offline' STATUS_IN_DELETION = 'in_deletion' + STATUS_IN_REPLICATION = 'in_replication' base_bdev: str = "" blobid: int = 0 @@ -29,3 +30,8 @@ class SnapShot(BaseModel): deletion_status: str = "" status: str = "" fabric: str = "tcp" + target_replicated_snap_uuid: str = "" + source_replicated_snap_uuid: str = "" + next_snap_uuid: str = "" + prev_snap_uuid: str = "" + instances: list = [] \ No newline at end of file diff --git a/simplyblock_core/models/storage_node.py b/simplyblock_core/models/storage_node.py index f1437e81f..147db5a77 100644 --- a/simplyblock_core/models/storage_node.py +++ b/simplyblock_core/models/storage_node.py @@ -107,6 +107,7 @@ class StorageNode(BaseNodeObject): active_rdma: bool = False socket: int = 0 firewall_port: int = 5001 + lvol_poller_mask: str = "" spdk_proxy_image: str = "" def get_lvol_subsys_port(self, lvs_name=None): diff --git a/simplyblock_core/rpc_client.py b/simplyblock_core/rpc_client.py index 5c78e9e73..0ca22a6ce 100644 --- a/simplyblock_core/rpc_client.py +++ b/simplyblock_core/rpc_client.py @@ -1230,6 +1230,51 @@ def bdev_distrib_check_inflight_io(self, jm_vuid): } return self._request("bdev_distrib_check_inflight_io", params) + def bdev_lvol_create_poller_group(self, cpu_mask): + params = { + "cpu_mask": cpu_mask, + } + return self._request("bdev_lvol_create_poller_group", params) + + def bdev_lvol_transfer(self, lvol_name, offset, cluster_batch, gateway, operation): + # --operation {migrate,replicate} + params = { + "lvol_name": lvol_name, + "offset": offset, + "cluster_batch": cluster_batch, + "gateway": gateway, + "operation": operation, + } + return self._request("bdev_lvol_transfer", params) + + def bdev_lvol_transfer_stat(self, lvol_name): + """ + example: + ./rpc.py bdev_lvol_transfer_stat lvs_raid0_lvol/snapshot_1 + { + "transfer_state": "No process", + "offset": 0 + } + transfer_state values: + - No process + - In progress + - Failed + - Done + """ + params = { + "lvol_name": lvol_name, + } + return self._request("bdev_lvol_transfer_stat", params) + + def bdev_lvol_convert(self, lvol_name): + """ + convert lvol to snapshot + """ + params = { + "lvol_name": lvol_name, + } + return self._request("bdev_lvol_convert", params) + def bdev_lvol_remove_from_group(self, group_id, lvol_name_list): params = { "bdev_group_id": group_id , @@ -1278,6 +1323,13 @@ def nvmf_port_unblock_rdma(self, port): def nvmf_get_blocked_ports_rdma(self): return self._request("nvmf_get_blocked_ports") + def bdev_lvol_add_clone(self, lvol_name, child_name): + params = { + "lvol_name": lvol_name, + "child_name": child_name, + } + return self._request("bdev_lvol_add_clone", params) + def bdev_raid_get_bdevs(self): params = { "category": "online" diff --git a/simplyblock_core/scripts/charts/crds/simplyblock.simplyblock.io_simplyblocksnapshotreplications.yaml b/simplyblock_core/scripts/charts/crds/simplyblock.simplyblock.io_simplyblocksnapshotreplications.yaml new file mode 100644 index 000000000..730881591 --- /dev/null +++ b/simplyblock_core/scripts/charts/crds/simplyblock.simplyblock.io_simplyblocksnapshotreplications.yaml @@ -0,0 +1,157 @@ +--- +apiVersion: apiextensions.k8s.io/v1 +kind: CustomResourceDefinition +metadata: + annotations: + controller-gen.kubebuilder.io/version: v0.19.0 + name: simplyblocksnapshotreplications.simplyblock.simplyblock.io +spec: + group: simplyblock.simplyblock.io + names: + kind: SimplyBlockSnapshotReplication + listKind: SimplyBlockSnapshotReplicationList + plural: simplyblocksnapshotreplications + singular: simplyblocksnapshotreplication + scope: Namespaced + versions: + - name: v1alpha1 + schema: + openAPIV3Schema: + description: SimplyBlockSnapshotReplication is the Schema for the simplyblocksnapshotreplications + API + properties: + apiVersion: + description: |- + APIVersion defines the versioned schema of this representation of an object. + Servers should convert recognized schemas to the latest internal value, and + may reject unrecognized values. + More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources + type: string + kind: + description: |- + Kind is a string value representing the REST resource this object represents. + Servers may infer this from the endpoint the client submits requests to. + Cannot be updated. + In CamelCase. + More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds + type: string + metadata: + type: object + spec: + description: spec defines the desired state of SimplyBlockSnapshotReplication + properties: + action: + enum: + - failback + type: string + excludeVolumeIDs: + description: 'Optional: volumes to exclude from failback.' + items: + type: string + type: array + includeVolumeIDs: + description: |- + Optional: only these volumes are included in failback. + If empty, all volumes are candidates unless excluded below. + items: + type: string + type: array + interval: + description: 'snapshot replication interval in seconds (default: 300sec)' + format: int32 + type: integer + sourceCluster: + description: Source cluster for the snapshots + type: string + sourcePool: + description: required for failback to a fresh source cluster + type: string + targetCluster: + description: Target cluster for replication + type: string + targetPool: + description: Target cluster pool for replication + type: string + timeout: + description: snapshot replication timeout + format: int32 + type: integer + volumeIDs: + description: 'Optional: list of volumes to replicate. Empty means + all volumes' + items: + type: string + type: array + required: + - sourceCluster + - targetCluster + - targetPool + type: object + status: + description: status defines the observed state of SimplyBlockSnapshotReplication + properties: + configured: + type: boolean + observedFailbackGeneration: + description: The metadata.generation value for which failback was + last processed. + format: int64 + type: integer + volumes: + description: Per-volume replication status + items: + description: VolumeReplicationStatus tracks the replication state + of an individual volume + properties: + errors: + description: 'Optional: list of errors encountered for this + volume' + items: + description: ReplicationError stores timestamped error messages + properties: + message: + type: string + timestamp: + format: date-time + type: string + required: + - message + - timestamp + type: object + type: array + lastReplicationTime: + description: Timestamp of the last successful replication for + this volume + format: date-time + type: string + lastSnapshotID: + description: Last snapshot ID replicated for this volume + type: string + phase: + description: Current phase for this volume + enum: + - Pending + - Running + - Completed + - Failed + - Paused + type: string + replicatedCount: + description: Number of snapshots successfully replicated + format: int32 + type: integer + volumeID: + description: Volume ID + type: string + required: + - volumeID + type: object + type: array + type: object + required: + - spec + type: object + served: true + storage: true + subresources: + status: {} diff --git a/simplyblock_core/scripts/charts/crds/simplyblock.simplyblock.io_simplyblockstoragenodes.yaml b/simplyblock_core/scripts/charts/crds/simplyblock.simplyblock.io_simplyblockstoragenodes.yaml new file mode 100644 index 000000000..559b6afa6 --- /dev/null +++ b/simplyblock_core/scripts/charts/crds/simplyblock.simplyblock.io_simplyblockstoragenodes.yaml @@ -0,0 +1,258 @@ +--- +apiVersion: apiextensions.k8s.io/v1 +kind: CustomResourceDefinition +metadata: + annotations: + controller-gen.kubebuilder.io/version: v0.19.0 + name: simplyblockstoragenodes.simplyblock.simplyblock.io +spec: + group: simplyblock.simplyblock.io + names: + kind: SimplyBlockStorageNode + listKind: SimplyBlockStorageNodeList + plural: simplyblockstoragenodes + singular: simplyblockstoragenode + scope: Namespaced + versions: + - name: v1alpha1 + schema: + openAPIV3Schema: + description: SimplyBlockStorageNode is the Schema for the storagenodes API + properties: + apiVersion: + description: |- + APIVersion defines the versioned schema of this representation of an object. + Servers should convert recognized schemas to the latest internal value, and + may reject unrecognized values. + More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources + type: string + kind: + description: |- + Kind is a string value representing the REST resource this object represents. + Servers may infer this from the endpoint the client submits requests to. + Cannot be updated. + In CamelCase. + More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds + type: string + metadata: + type: object + spec: + description: spec defines the desired state of StorageNode + properties: + action: + enum: + - shutdown + - restart + - suspend + - resume + - remove + type: string + addPcieToAllowList: + description: restart params + items: + type: string + type: array + clusterImage: + type: string + clusterName: + type: string + coreIsolation: + type: boolean + coreMask: + type: string + corePercentage: + format: int32 + type: integer + dataNIC: + items: + type: string + type: array + deviceNames: + items: + type: string + type: array + driveSizeRange: + type: string + enableCpuTopology: + type: boolean + force: + type: boolean + format4k: + type: boolean + haJM: + type: boolean + haJmCount: + format: int32 + type: integer + idDeviceByNQN: + type: boolean + jmPercent: + format: int32 + type: integer + maxLVol: + format: int32 + type: integer + maxSize: + type: string + mgmtIfc: + type: string + nodeAddr: + type: string + nodeUUID: + description: NodeUUID is required when action is specified + type: string + nodesPerSocket: + format: int32 + type: integer + openShiftCluster: + type: boolean + partitions: + format: int32 + type: integer + pcieAllowList: + items: + type: string + type: array + pcieDenyList: + items: + type: string + type: array + pcieModel: + type: string + reservedSystemCPU: + type: string + skipKubeletConfiguration: + type: boolean + socketsToUse: + format: int32 + type: integer + spdkDebug: + type: boolean + spdkImage: + type: string + tolerations: + items: + description: |- + The pod this Toleration is attached to tolerates any taint that matches + the triple using the matching operator . + properties: + effect: + description: |- + Effect indicates the taint effect to match. Empty means match all taint effects. + When specified, allowed values are NoSchedule, PreferNoSchedule and NoExecute. + type: string + key: + description: |- + Key is the taint key that the toleration applies to. Empty means match all taint keys. + If the key is empty, operator must be Exists; this combination means to match all values and all keys. + type: string + operator: + description: |- + Operator represents a key's relationship to the value. + Valid operators are Exists and Equal. Defaults to Equal. + Exists is equivalent to wildcard for value, so that a pod can + tolerate all taints of a particular category. + type: string + tolerationSeconds: + description: |- + TolerationSeconds represents the period of time the toleration (which must be + of effect NoExecute, otherwise this field is ignored) tolerates the taint. By default, + it is not set, which means tolerate the taint forever (do not evict). Zero and + negative values will be treated as 0 (evict immediately) by the system. + format: int64 + type: integer + value: + description: |- + Value is the taint value the toleration matches to. + If the operator is Exists, the value should be empty, otherwise just a regular string. + type: string + type: object + type: array + ubuntuHost: + type: boolean + useSeparateJournalDevice: + type: boolean + workerNode: + type: string + workerNodes: + items: + type: string + type: array + required: + - clusterName + type: object + status: + description: status defines the observed state of StorageNode + properties: + actionStatus: + properties: + action: + type: string + message: + type: string + nodeUUID: + type: string + observedGeneration: + format: int64 + type: integer + state: + type: string + triggered: + type: boolean + updatedAt: + format: date-time + type: string + type: object + nodes: + items: + properties: + cpu: + format: int32 + type: integer + devices: + type: string + health: + type: boolean + hostname: + type: string + lvol_port: + format: int32 + type: integer + memory: + type: string + mgmtIp: + type: string + nvmf_port: + format: int32 + type: integer + rpc_port: + format: int32 + type: integer + status: + type: string + uptime: + type: string + uuid: + type: string + volumes: + format: int32 + type: integer + type: object + type: array + type: object + required: + - spec + type: object + x-kubernetes-validations: + - message: nodeUUID is required when action is specified + rule: '!(has(self.spec.action) && self.spec.action != "" && (!has(self.spec.nodeUUID) + || self.spec.nodeUUID == ""))' + - message: clusterImage, maxLVol, and workerNodes are required when action + is not specified + rule: (has(self.spec.action) && self.spec.action != "") || (has(self.spec.clusterImage) + && self.spec.clusterImage != "" && has(self.spec.maxLVol) && has(self.spec.workerNodes) + && size(self.spec.workerNodes) > 0) + served: true + storage: true + subresources: + status: {} diff --git a/simplyblock_core/scripts/charts/templates/app_k8s.yaml b/simplyblock_core/scripts/charts/templates/app_k8s.yaml index 61095f325..82f1d4f2c 100644 --- a/simplyblock_core/scripts/charts/templates/app_k8s.yaml +++ b/simplyblock_core/scripts/charts/templates/app_k8s.yaml @@ -18,26 +18,7 @@ spec: labels: app: simplyblock-admin-control spec: - serviceAccountName: simplyblock-control-sa - {{- if .Values.nodeSelector.create }} - nodeSelector: - {{ .Values.nodeSelector.key }}: {{ .Values.nodeSelector.value }} - {{- end }} - {{- if .Values.tolerations.create }} - tolerations: - {{- range .Values.tolerations.list }} - - operator: {{ .operator | quote }} - {{- if .effect }} - effect: {{ .effect | quote }} - {{- end }} - {{- if .key }} - key: {{ .key | quote }} - {{- end }} - {{- if .value }} - value: {{ .value | quote }} - {{- end }} - {{- end }} - {{- end }} + serviceAccountName: simplyblock-sa hostNetwork: true dnsPolicy: ClusterFirstWithHostNet affinity: @@ -55,15 +36,11 @@ spec: env: - name: LVOL_NVMF_PORT_START value: "{{ .Values.ports.lvolNvmfPortStart }}" - - name: PROMETHEUS_URL - value: "{{ .Values.prometheus.simplyblock.prometheusURL }}" - - name: PROMETHEUS_PORT - value: "{{ .Values.prometheus.simplyblock.prometheusPORT }}" - name: K8S_NAMESPACE valueFrom: fieldRef: fieldPath: metadata.namespace -{{- if .Values.monitoring.enabled }} +{{- if .Values.observability.enabled }} - name: MONITORING_SECRET valueFrom: secretKeyRef: @@ -95,11 +72,12 @@ spec: path: fdb.cluster --- apiVersion: apps/v1 -kind: DaemonSet +kind: Deployment metadata: name: simplyblock-webappapi namespace: {{ .Release.Namespace }} spec: + replicas: 2 selector: matchLabels: app: simplyblock-webappapi @@ -112,25 +90,14 @@ spec: labels: app: simplyblock-webappapi spec: - {{- if .Values.nodeSelector.create }} - nodeSelector: - {{ .Values.nodeSelector.key }}: {{ .Values.nodeSelector.value }} - {{- end }} - {{- if .Values.tolerations.create }} - tolerations: - {{- range .Values.tolerations.list }} - - operator: {{ .operator | quote }} - {{- if .effect }} - effect: {{ .effect | quote }} - {{- end }} - {{- if .key }} - key: {{ .key | quote }} - {{- end }} - {{- if .value }} - value: {{ .value | quote }} - {{- end }} - {{- end }} - {{- end }} + serviceAccountName: simplyblock-sa + affinity: + podAntiAffinity: + requiredDuringSchedulingIgnoredDuringExecution: + - labelSelector: + matchLabels: + app: simplyblock-admin-control + topologyKey: kubernetes.io/hostname containers: - name: webappapi image: "{{ .Values.image.simplyblock.repository }}:{{ .Values.image.simplyblock.tag }}" @@ -144,14 +111,25 @@ spec: configMapKeyRef: name: simplyblock-config key: LOG_LEVEL + - name: LVOL_NVMF_PORT_START + value: "{{ .Values.ports.lvolNvmfPortStart }}" + - name: ENABLE_MONITORING + value: "{{ .Values.observability.enabled }}" + - name: K8S_NAMESPACE + valueFrom: + fieldRef: + fieldPath: metadata.namespace +{{- if .Values.observability.enabled }} + - name: MONITORING_SECRET + valueFrom: + secretKeyRef: + name: simplyblock-grafana-secrets + key: MONITORING_SECRET +{{- end }} - name: FLASK_DEBUG value: "False" - name: FLASK_ENV value: "production" - - name: PROMETHEUS_URL - value: "{{ .Values.prometheus.simplyblock.prometheusURL }}" - - name: PROMETHEUS_PORT - value: "{{ .Values.prometheus.simplyblock.prometheusPORT }}" volumeMounts: - name: fdb-cluster-file mountPath: /etc/foundationdb/fdb.cluster @@ -163,80 +141,20 @@ spec: limits: cpu: "500m" memory: "2Gi" - volumes: - - name: fdb-cluster-file - configMap: - name: simplyblock-fdb-cluster-config - items: - - key: cluster-file - path: fdb.cluster ---- -apiVersion: apps/v1 -kind: Deployment -metadata: - name: simplyblock-storage-node-monitor - namespace: {{ .Release.Namespace }} -spec: - replicas: 1 - selector: - matchLabels: - app: simplyblock-storage-node-monitor - template: - metadata: - annotations: - log-collector/enabled: "true" - reloader.stakater.com/auto: "true" - reloader.stakater.com/configmap: "simplyblock-fdb-cluster-config" - labels: - app: simplyblock-storage-node-monitor - spec: - hostNetwork: true - dnsPolicy: ClusterFirstWithHostNet - {{- if .Values.nodeSelector.create }} - nodeSelector: - {{ .Values.nodeSelector.key }}: {{ .Values.nodeSelector.value }} - {{- end }} - {{- if .Values.tolerations.create }} - tolerations: - {{- range .Values.tolerations.list }} - - operator: {{ .operator | quote }} - {{- if .effect }} - effect: {{ .effect | quote }} - {{- end }} - {{- if .key }} - key: {{ .key | quote }} - {{- end }} - {{- if .value }} - value: {{ .value | quote }} - {{- end }} - {{- end }} - {{- end }} - containers: - - name: storage-node-monitor - image: "{{ .Values.image.simplyblock.repository }}:{{ .Values.image.simplyblock.tag }}" - imagePullPolicy: "{{ .Values.image.simplyblock.pullPolicy }}" - command: ["python", "simplyblock_core/services/storage_node_monitor.py"] - env: - - name: PROMETHEUS_URL - value: "{{ .Values.prometheus.simplyblock.prometheusURL }}" - - name: PROMETHEUS_PORT - value: "{{ .Values.prometheus.simplyblock.prometheusPORT }}" - - name: SIMPLYBLOCK_LOG_LEVEL - valueFrom: - configMapKeyRef: - name: simplyblock-config - key: LOG_LEVEL + - name: fluent-bit + image: fluent/fluent-bit:1.8.11 volumeMounts: - - name: fdb-cluster-file - mountPath: /etc/foundationdb/fdb.cluster - subPath: fdb.cluster + - name: varlog + mountPath: /var/log + - name: config + mountPath: /fluent-bit/etc/ resources: requests: - cpu: "200m" - memory: "256Mi" + cpu: "100m" + memory: "200Mi" limits: - cpu: "400m" - memory: "1Gi" + cpu: "200m" + memory: "400Mi" volumes: - name: fdb-cluster-file configMap: @@ -244,18 +162,23 @@ spec: items: - key: cluster-file path: fdb.cluster - + - name: varlog + hostPath: + path: /var/log + - name: config + configMap: + name: simplyblock-fluent-bit-config --- apiVersion: apps/v1 kind: Deployment metadata: - name: simplyblock-mgmt-node-monitor + name: simplyblock-monitoring namespace: {{ .Release.Namespace }} spec: replicas: 1 selector: matchLabels: - app: simplyblock-mgmt-node-monitor + app: simplyblock-monitoring template: metadata: annotations: @@ -263,301 +186,182 @@ spec: reloader.stakater.com/auto: "true" reloader.stakater.com/configmap: "simplyblock-fdb-cluster-config" labels: - app: simplyblock-mgmt-node-monitor + app: simplyblock-monitoring spec: + serviceAccountName: simplyblock-sa hostNetwork: true dnsPolicy: ClusterFirstWithHostNet - {{- if .Values.nodeSelector.create }} - nodeSelector: - {{ .Values.nodeSelector.key }}: {{ .Values.nodeSelector.value }} - {{- end }} - {{- if .Values.tolerations.create }} - tolerations: - {{- range .Values.tolerations.list }} - - operator: {{ .operator | quote }} - {{- if .effect }} - effect: {{ .effect | quote }} - {{- end }} - {{- if .key }} - key: {{ .key | quote }} - {{- end }} - {{- if .value }} - value: {{ .value | quote }} - {{- end }} - {{- end }} - {{- end }} containers: - - name: mgmt-node-monitor + - name: storage-node-monitor image: "{{ .Values.image.simplyblock.repository }}:{{ .Values.image.simplyblock.tag }}" + command: ["python", "simplyblock_core/services/storage_node_monitor.py"] imagePullPolicy: "{{ .Values.image.simplyblock.pullPolicy }}" +{{- with (include "simplyblock.commonContainer" . | fromYaml) }} + env: +{{ toYaml .env | nindent 12 }} + volumeMounts: +{{ toYaml .volumeMounts | nindent 12 }} + resources: +{{ toYaml .resources | nindent 12 }} +{{- end }} + + - name: mgmt-node-monitor + image: "{{ .Values.image.simplyblock.repository }}:{{ .Values.image.simplyblock.tag }}" command: ["python", "simplyblock_core/services/mgmt_node_monitor.py"] + imagePullPolicy: "{{ .Values.image.simplyblock.pullPolicy }}" env: - - name: PROMETHEUS_URL - value: "{{ .Values.prometheus.simplyblock.prometheusURL }}" - - name: PROMETHEUS_PORT - value: "{{ .Values.prometheus.simplyblock.prometheusPORT }}" - - name: BACKEND_TYPE - value: "k8s" - - name: SIMPLYBLOCK_LOG_LEVEL - valueFrom: - configMapKeyRef: - name: simplyblock-config - key: LOG_LEVEL + - name: BACKEND_TYPE + value: "k8s" +{{- with (include "simplyblock.commonContainer" . | fromYaml) }} +{{ toYaml .env | nindent 12 }} volumeMounts: - - name: fdb-cluster-file - mountPath: /etc/foundationdb/fdb.cluster - subPath: fdb.cluster +{{ toYaml .volumeMounts | nindent 12 }} resources: - requests: - cpu: "200m" - memory: "256Mi" - limits: - cpu: "400m" - memory: "1Gi" - volumes: - - name: fdb-cluster-file - configMap: - name: simplyblock-fdb-cluster-config - items: - - key: cluster-file - path: fdb.cluster +{{ toYaml .resources | nindent 12 }} +{{- end }} ---- -apiVersion: apps/v1 -kind: Deployment -metadata: - name: simplyblock-lvol-stats-collector - namespace: {{ .Release.Namespace }} -spec: - replicas: 1 - selector: - matchLabels: - app: simplyblock-lvol-stats-collector - template: - metadata: - annotations: - log-collector/enabled: "true" - reloader.stakater.com/auto: "true" - reloader.stakater.com/configmap: "simplyblock-fdb-cluster-config" - labels: - app: simplyblock-lvol-stats-collector - spec: - hostNetwork: true - dnsPolicy: ClusterFirstWithHostNet - {{- if .Values.nodeSelector.create }} - nodeSelector: - {{ .Values.nodeSelector.key }}: {{ .Values.nodeSelector.value }} - {{- end }} - {{- if .Values.tolerations.create }} - tolerations: - {{- range .Values.tolerations.list }} - - operator: {{ .operator | quote }} - {{- if .effect }} - effect: {{ .effect | quote }} - {{- end }} - {{- if .key }} - key: {{ .key | quote }} - {{- end }} - {{- if .value }} - value: {{ .value | quote }} - {{- end }} - {{- end }} - {{- end }} - containers: - name: lvol-stats-collector image: "{{ .Values.image.simplyblock.repository }}:{{ .Values.image.simplyblock.tag }}" - imagePullPolicy: "{{ .Values.image.simplyblock.pullPolicy }}" command: ["python", "simplyblock_core/services/lvol_stat_collector.py"] + imagePullPolicy: "{{ .Values.image.simplyblock.pullPolicy }}" +{{- with (include "simplyblock.commonContainer" . | fromYaml) }} env: - - name: PROMETHEUS_URL - value: "{{ .Values.prometheus.simplyblock.prometheusURL }}" - - name: PROMETHEUS_PORT - value: "{{ .Values.prometheus.simplyblock.prometheusPORT }}" - - name: SIMPLYBLOCK_LOG_LEVEL - valueFrom: - configMapKeyRef: - name: simplyblock-config - key: LOG_LEVEL +{{ toYaml .env | nindent 12 }} volumeMounts: - - name: fdb-cluster-file - mountPath: /etc/foundationdb/fdb.cluster - subPath: fdb.cluster +{{ toYaml .volumeMounts | nindent 12 }} resources: - requests: - cpu: "200m" - memory: "256Mi" - limits: - cpu: "400m" - memory: "1Gi" - volumes: - - name: fdb-cluster-file - configMap: - name: simplyblock-fdb-cluster-config - items: - - key: cluster-file - path: fdb.cluster +{{ toYaml .resources | nindent 12 }} +{{- end }} ---- -apiVersion: apps/v1 -kind: Deployment -metadata: - name: simplyblock-main-distr-event-collector - namespace: {{ .Release.Namespace }} -spec: - replicas: 1 - selector: - matchLabels: - app: simplyblock-main-distr-event-collector - template: - metadata: - annotations: - log-collector/enabled: "true" - reloader.stakater.com/auto: "true" - reloader.stakater.com/configmap: "simplyblock-fdb-cluster-config" - labels: - app: simplyblock-main-distr-event-collector - spec: - hostNetwork: true - dnsPolicy: ClusterFirstWithHostNet - {{- if .Values.nodeSelector.create }} - nodeSelector: - {{ .Values.nodeSelector.key }}: {{ .Values.nodeSelector.value }} - {{- end }} - {{- if .Values.tolerations.create }} - tolerations: - {{- range .Values.tolerations.list }} - - operator: {{ .operator | quote }} - {{- if .effect }} - effect: {{ .effect | quote }} - {{- end }} - {{- if .key }} - key: {{ .key | quote }} - {{- end }} - {{- if .value }} - value: {{ .value | quote }} - {{- end }} - {{- end }} - {{- end }} - containers: - name: main-distr-event-collector image: "{{ .Values.image.simplyblock.repository }}:{{ .Values.image.simplyblock.tag }}" - imagePullPolicy: "{{ .Values.image.simplyblock.pullPolicy }}" command: ["python", "simplyblock_core/services/main_distr_event_collector.py"] + imagePullPolicy: "{{ .Values.image.simplyblock.pullPolicy }}" +{{- with (include "simplyblock.commonContainer" . | fromYaml) }} + env: +{{ toYaml .env | nindent 12 }} + volumeMounts: +{{ toYaml .volumeMounts | nindent 12 }} + resources: +{{ toYaml .resources | nindent 12 }} +{{- end }} + + - name: capacity-and-stats-collector + image: "{{ .Values.image.simplyblock.repository }}:{{ .Values.image.simplyblock.tag }}" + command: ["python", "simplyblock_core/services/capacity_and_stats_collector.py"] + imagePullPolicy: "{{ .Values.image.simplyblock.pullPolicy }}" +{{- with (include "simplyblock.commonContainer" . | fromYaml) }} + env: +{{ toYaml .env | nindent 12 }} + volumeMounts: +{{ toYaml .volumeMounts | nindent 12 }} + resources: +{{ toYaml .resources | nindent 12 }} +{{- end }} + + - name: capacity-monitor + image: "{{ .Values.image.simplyblock.repository }}:{{ .Values.image.simplyblock.tag }}" + command: ["python", "simplyblock_core/services/cap_monitor.py"] + imagePullPolicy: "{{ .Values.image.simplyblock.pullPolicy }}" +{{- with (include "simplyblock.commonContainer" . | fromYaml) }} + env: +{{ toYaml .env | nindent 12 }} + volumeMounts: +{{ toYaml .volumeMounts | nindent 12 }} + resources: +{{ toYaml .resources | nindent 12 }} +{{- end }} + + - name: health-check + image: "{{ .Values.image.simplyblock.repository }}:{{ .Values.image.simplyblock.tag }}" + command: ["python", "simplyblock_core/services/health_check_service.py"] + imagePullPolicy: "{{ .Values.image.simplyblock.pullPolicy }}" +{{- with (include "simplyblock.commonContainer" . | fromYaml) }} + env: +{{ toYaml .env | nindent 12 }} + volumeMounts: +{{ toYaml .volumeMounts | nindent 12 }} + resources: +{{ toYaml .resources | nindent 12 }} +{{- end }} + + - name: device-monitor + image: "{{ .Values.image.simplyblock.repository }}:{{ .Values.image.simplyblock.tag }}" + command: ["python", "simplyblock_core/services/device_monitor.py"] + imagePullPolicy: "{{ .Values.image.simplyblock.pullPolicy }}" +{{- with (include "simplyblock.commonContainer" . | fromYaml) }} + env: +{{ toYaml .env | nindent 12 }} + volumeMounts: +{{ toYaml .volumeMounts | nindent 12 }} + resources: +{{ toYaml .resources | nindent 12 }} +{{- end }} + + - name: lvol-monitor + image: "{{ .Values.image.simplyblock.repository }}:{{ .Values.image.simplyblock.tag }}" + command: ["python", "simplyblock_core/services/lvol_monitor.py"] + imagePullPolicy: "{{ .Values.image.simplyblock.pullPolicy }}" +{{- with (include "simplyblock.commonContainer" . | fromYaml) }} + env: +{{ toYaml .env | nindent 12 }} + volumeMounts: +{{ toYaml .volumeMounts | nindent 12 }} + resources: +{{ toYaml .resources | nindent 12 }} +{{- end }} + + - name: snapshot-monitor + image: "{{ .Values.image.simplyblock.repository }}:{{ .Values.image.simplyblock.tag }}" + command: ["python", "simplyblock_core/services/snapshot_monitor.py"] + imagePullPolicy: "{{ .Values.image.simplyblock.pullPolicy }}" +{{- with (include "simplyblock.commonContainer" . | fromYaml) }} env: - - name: PROMETHEUS_URL - value: "{{ .Values.prometheus.simplyblock.prometheusURL }}" - - name: PROMETHEUS_PORT - value: "{{ .Values.prometheus.simplyblock.prometheusPORT }}" - - name: SIMPLYBLOCK_LOG_LEVEL - valueFrom: - configMapKeyRef: - name: simplyblock-config - key: LOG_LEVEL +{{ toYaml .env | nindent 12 }} + volumeMounts: +{{ toYaml .volumeMounts | nindent 12 }} + resources: +{{ toYaml .resources | nindent 12 }} +{{- end }} + - name: fluent-bit + image: fluent/fluent-bit:1.8.11 volumeMounts: - - name: fdb-cluster-file - mountPath: /etc/foundationdb/fdb.cluster - subPath: fdb.cluster + - name: varlog + mountPath: /var/log + - name: config + mountPath: /fluent-bit/etc/ resources: requests: - cpu: "200m" - memory: "256Mi" + cpu: "100m" + memory: "200Mi" limits: - cpu: "400m" - memory: "1Gi" - volumes: - - name: fdb-cluster-file - configMap: - name: simplyblock-fdb-cluster-config - items: - - key: cluster-file - path: fdb.cluster + cpu: "200m" + memory: "400Mi" ---- -apiVersion: apps/v1 -kind: Deployment -metadata: - name: simplyblock-capacity-and-stats-collector - namespace: {{ .Release.Namespace }} -spec: - replicas: 1 - selector: - matchLabels: - app: simplyblock-capacity-and-stats-collector - template: - metadata: - annotations: - log-collector/enabled: "true" - reloader.stakater.com/auto: "true" - reloader.stakater.com/configmap: "simplyblock-fdb-cluster-config" - labels: - app: simplyblock-capacity-and-stats-collector - spec: - hostNetwork: true - dnsPolicy: ClusterFirstWithHostNet - {{- if .Values.nodeSelector.create }} - nodeSelector: - {{ .Values.nodeSelector.key }}: {{ .Values.nodeSelector.value }} - {{- end }} - {{- if .Values.tolerations.create }} - tolerations: - {{- range .Values.tolerations.list }} - - operator: {{ .operator | quote }} - {{- if .effect }} - effect: {{ .effect | quote }} - {{- end }} - {{- if .key }} - key: {{ .key | quote }} - {{- end }} - {{- if .value }} - value: {{ .value | quote }} - {{- end }} - {{- end }} - {{- end }} - containers: - - name: capacity-and-stats-collector - image: "{{ .Values.image.simplyblock.repository }}:{{ .Values.image.simplyblock.tag }}" - imagePullPolicy: "{{ .Values.image.simplyblock.pullPolicy }}" - command: ["python", "simplyblock_core/services/capacity_and_stats_collector.py"] - env: - - name: PROMETHEUS_URL - value: "{{ .Values.prometheus.simplyblock.prometheusURL }}" - - name: PROMETHEUS_PORT - value: "{{ .Values.prometheus.simplyblock.prometheusPORT }}" - - name: SIMPLYBLOCK_LOG_LEVEL - valueFrom: - configMapKeyRef: - name: simplyblock-config - key: LOG_LEVEL - volumeMounts: - - name: fdb-cluster-file - mountPath: /etc/foundationdb/fdb.cluster - subPath: fdb.cluster - resources: - requests: - cpu: "200m" - memory: "256Mi" - limits: - cpu: "400m" - memory: "1Gi" volumes: - - name: fdb-cluster-file - configMap: - name: simplyblock-fdb-cluster-config - items: - - key: cluster-file - path: fdb.cluster - + - name: fdb-cluster-file + configMap: + name: simplyblock-fdb-cluster-config + items: + - key: cluster-file + path: fdb.cluster + - name: varlog + hostPath: + path: /var/log + - name: config + configMap: + name: simplyblock-fluent-bit-config --- apiVersion: apps/v1 kind: Deployment metadata: - name: simplyblock-capacity-monitor + name: simplyblock-tasks namespace: {{ .Release.Namespace }} spec: replicas: 1 selector: matchLabels: - app: simplyblock-capacity-monitor + app: simplyblock-tasks template: metadata: annotations: @@ -565,1170 +369,168 @@ spec: reloader.stakater.com/auto: "true" reloader.stakater.com/configmap: "simplyblock-fdb-cluster-config" labels: - app: simplyblock-capacity-monitor + app: simplyblock-tasks spec: + serviceAccountName: simplyblock-sa hostNetwork: true dnsPolicy: ClusterFirstWithHostNet - {{- if .Values.nodeSelector.create }} - nodeSelector: - {{ .Values.nodeSelector.key }}: {{ .Values.nodeSelector.value }} - {{- end }} - {{- if .Values.tolerations.create }} - tolerations: - {{- range .Values.tolerations.list }} - - operator: {{ .operator | quote }} - {{- if .effect }} - effect: {{ .effect | quote }} - {{- end }} - {{- if .key }} - key: {{ .key | quote }} - {{- end }} - {{- if .value }} - value: {{ .value | quote }} - {{- end }} - {{- end }} - {{- end }} containers: - - name: capacity-monitor - image: "{{ .Values.image.simplyblock.repository }}:{{ .Values.image.simplyblock.tag }}" - imagePullPolicy: "{{ .Values.image.simplyblock.pullPolicy }}" - command: ["python", "simplyblock_core/services/cap_monitor.py"] - env: - - name: PROMETHEUS_URL - value: "{{ .Values.prometheus.simplyblock.prometheusURL }}" - - name: PROMETHEUS_PORT - value: "{{ .Values.prometheus.simplyblock.prometheusPORT }}" - - name: SIMPLYBLOCK_LOG_LEVEL - valueFrom: - configMapKeyRef: - name: simplyblock-config - key: LOG_LEVEL - volumeMounts: - - name: fdb-cluster-file - mountPath: /etc/foundationdb/fdb.cluster - subPath: fdb.cluster - resources: - requests: - cpu: "200m" - memory: "256Mi" - limits: - cpu: "400m" - memory: "1Gi" - volumes: - - name: fdb-cluster-file - configMap: - name: simplyblock-fdb-cluster-config - items: - - key: cluster-file - path: fdb.cluster - ---- -apiVersion: apps/v1 -kind: Deployment -metadata: - name: simplyblock-health-check - namespace: {{ .Release.Namespace }} -spec: - replicas: 1 - selector: - matchLabels: - app: simplyblock-health-check - template: - metadata: - annotations: - log-collector/enabled: "true" - reloader.stakater.com/auto: "true" - reloader.stakater.com/configmap: "simplyblock-fdb-cluster-config" - labels: - app: simplyblock-health-check - spec: - hostNetwork: true - dnsPolicy: ClusterFirstWithHostNet - {{- if .Values.nodeSelector.create }} - nodeSelector: - {{ .Values.nodeSelector.key }}: {{ .Values.nodeSelector.value }} - {{- end }} - {{- if .Values.tolerations.create }} - tolerations: - {{- range .Values.tolerations.list }} - - operator: {{ .operator | quote }} - {{- if .effect }} - effect: {{ .effect | quote }} - {{- end }} - {{- if .key }} - key: {{ .key | quote }} - {{- end }} - {{- if .value }} - value: {{ .value | quote }} - {{- end }} - {{- end }} - {{- end }} - containers: - - name: health-check - image: "{{ .Values.image.simplyblock.repository }}:{{ .Values.image.simplyblock.tag }}" - imagePullPolicy: "{{ .Values.image.simplyblock.pullPolicy }}" - command: ["python", "simplyblock_core/services/health_check_service.py"] - env: - - name: PROMETHEUS_URL - value: "{{ .Values.prometheus.simplyblock.prometheusURL }}" - - name: PROMETHEUS_PORT - value: "{{ .Values.prometheus.simplyblock.prometheusPORT }}" - - name: SIMPLYBLOCK_LOG_LEVEL - valueFrom: - configMapKeyRef: - name: simplyblock-config - key: LOG_LEVEL - volumeMounts: - - name: fdb-cluster-file - mountPath: /etc/foundationdb/fdb.cluster - subPath: fdb.cluster - resources: - requests: - cpu: "200m" - memory: "256Mi" - limits: - cpu: "400m" - memory: "1Gi" - volumes: - - name: fdb-cluster-file - configMap: - name: simplyblock-fdb-cluster-config - items: - - key: cluster-file - path: fdb.cluster - ---- -apiVersion: apps/v1 -kind: Deployment -metadata: - name: simplyblock-device-monitor - namespace: {{ .Release.Namespace }} -spec: - replicas: 1 - selector: - matchLabels: - app: simplyblock-device-monitor - template: - metadata: - annotations: - log-collector/enabled: "true" - reloader.stakater.com/auto: "true" - reloader.stakater.com/configmap: "simplyblock-fdb-cluster-config" - labels: - app: simplyblock-device-monitor - spec: - hostNetwork: true - dnsPolicy: ClusterFirstWithHostNet - {{- if .Values.nodeSelector.create }} - nodeSelector: - {{ .Values.nodeSelector.key }}: {{ .Values.nodeSelector.value }} - {{- end }} - {{- if .Values.tolerations.create }} - tolerations: - {{- range .Values.tolerations.list }} - - operator: {{ .operator | quote }} - {{- if .effect }} - effect: {{ .effect | quote }} - {{- end }} - {{- if .key }} - key: {{ .key | quote }} - {{- end }} - {{- if .value }} - value: {{ .value | quote }} - {{- end }} - {{- end }} - {{- end }} - containers: - - name: device-monitor - image: "{{ .Values.image.simplyblock.repository }}:{{ .Values.image.simplyblock.tag }}" - imagePullPolicy: "{{ .Values.image.simplyblock.pullPolicy }}" - command: ["python", "simplyblock_core/services/device_monitor.py"] - env: - - name: PROMETHEUS_URL - value: "{{ .Values.prometheus.simplyblock.prometheusURL }}" - - name: PROMETHEUS_PORT - value: "{{ .Values.prometheus.simplyblock.prometheusPORT }}" - - name: SIMPLYBLOCK_LOG_LEVEL - valueFrom: - configMapKeyRef: - name: simplyblock-config - key: LOG_LEVEL - volumeMounts: - - name: fdb-cluster-file - mountPath: /etc/foundationdb/fdb.cluster - subPath: fdb.cluster - resources: - requests: - cpu: "200m" - memory: "256Mi" - limits: - cpu: "400m" - memory: "1Gi" - volumes: - - name: fdb-cluster-file - configMap: - name: simplyblock-fdb-cluster-config - items: - - key: cluster-file - path: fdb.cluster - ---- -apiVersion: apps/v1 -kind: Deployment -metadata: - name: simplyblock-lvol-monitor - namespace: {{ .Release.Namespace }} -spec: - replicas: 1 - selector: - matchLabels: - app: simplyblock-lvol-monitor - template: - metadata: - annotations: - log-collector/enabled: "true" - reloader.stakater.com/auto: "true" - reloader.stakater.com/configmap: "simplyblock-fdb-cluster-config" - labels: - app: simplyblock-lvol-monitor - spec: - hostNetwork: true - dnsPolicy: ClusterFirstWithHostNet - {{- if .Values.nodeSelector.create }} - nodeSelector: - {{ .Values.nodeSelector.key }}: {{ .Values.nodeSelector.value }} - {{- end }} - {{- if .Values.tolerations.create }} - tolerations: - {{- range .Values.tolerations.list }} - - operator: {{ .operator | quote }} - {{- if .effect }} - effect: {{ .effect | quote }} - {{- end }} - {{- if .key }} - key: {{ .key | quote }} - {{- end }} - {{- if .value }} - value: {{ .value | quote }} - {{- end }} - {{- end }} - {{- end }} - containers: - - name: lvol-monitor - image: "{{ .Values.image.simplyblock.repository }}:{{ .Values.image.simplyblock.tag }}" - imagePullPolicy: "{{ .Values.image.simplyblock.pullPolicy }}" - command: ["python", "simplyblock_core/services/lvol_monitor.py"] - env: - - name: PROMETHEUS_URL - value: "{{ .Values.prometheus.simplyblock.prometheusURL }}" - - name: PROMETHEUS_PORT - value: "{{ .Values.prometheus.simplyblock.prometheusPORT }}" - - name: SIMPLYBLOCK_LOG_LEVEL - valueFrom: - configMapKeyRef: - name: simplyblock-config - key: LOG_LEVEL - volumeMounts: - - name: fdb-cluster-file - mountPath: /etc/foundationdb/fdb.cluster - subPath: fdb.cluster - resources: - requests: - cpu: "200m" - memory: "256Mi" - limits: - cpu: "400m" - memory: "1Gi" - volumes: - - name: fdb-cluster-file - configMap: - name: simplyblock-fdb-cluster-config - items: - - key: cluster-file - path: fdb.cluster ---- -apiVersion: apps/v1 -kind: Deployment -metadata: - name: simplyblock-snapshot-monitor - namespace: {{ .Release.Namespace }} -spec: - replicas: 1 - selector: - matchLabels: - app: simplyblock-snapshot-monitor - template: - metadata: - annotations: - log-collector/enabled: "true" - reloader.stakater.com/auto: "true" - reloader.stakater.com/configmap: "simplyblock-fdb-cluster-config" - labels: - app: simplyblock-snapshot-monitor - spec: - hostNetwork: true - dnsPolicy: ClusterFirstWithHostNet - {{- if .Values.nodeSelector.create }} - nodeSelector: - {{ .Values.nodeSelector.key }}: {{ .Values.nodeSelector.value }} - {{- end }} - {{- if .Values.tolerations.create }} - tolerations: - {{- range .Values.tolerations.list }} - - operator: {{ .operator | quote }} - {{- if .effect }} - effect: {{ .effect | quote }} - {{- end }} - {{- if .key }} - key: {{ .key | quote }} - {{- end }} - {{- if .value }} - value: {{ .value | quote }} - {{- end }} - {{- end }} - {{- end }} - containers: - - name: snapshot-monitor - image: "{{ .Values.image.simplyblock.repository }}:{{ .Values.image.simplyblock.tag }}" - imagePullPolicy: "{{ .Values.image.simplyblock.pullPolicy }}" - command: ["python", "simplyblock_core/services/snapshot_monitor.py"] - env: - - name: PROMETHEUS_URL - value: "{{ .Values.prometheus.simplyblock.prometheusURL }}" - - name: PROMETHEUS_PORT - value: "{{ .Values.prometheus.simplyblock.prometheusPORT }}" - - name: SIMPLYBLOCK_LOG_LEVEL - valueFrom: - configMapKeyRef: - name: simplyblock-config - key: LOG_LEVEL - volumeMounts: - - name: fdb-cluster-file - mountPath: /etc/foundationdb/fdb.cluster - subPath: fdb.cluster - resources: - requests: - cpu: "200m" - memory: "256Mi" - limits: - cpu: "400m" - memory: "1Gi" - volumes: - - name: fdb-cluster-file - configMap: - name: simplyblock-fdb-cluster-config - items: - - key: cluster-file - path: fdb.cluster ---- -apiVersion: apps/v1 -kind: Deployment -metadata: - name: simplyblock-cleanupfdb - namespace: {{ .Release.Namespace }} -spec: - replicas: 1 - selector: - matchLabels: - app: simplyblock-cleanupfdb - template: - metadata: - annotations: - log-collector/enabled: "true" - reloader.stakater.com/auto: "true" - reloader.stakater.com/configmap: "simplyblock-fdb-cluster-config" - labels: - app: simplyblock-cleanupfdb - spec: - {{- if .Values.nodeSelector.create }} - nodeSelector: - {{ .Values.nodeSelector.key }}: {{ .Values.nodeSelector.value }} - {{- end }} - {{- if .Values.tolerations.create }} - tolerations: - {{- range .Values.tolerations.list }} - - operator: {{ .operator | quote }} - {{- if .effect }} - effect: {{ .effect | quote }} - {{- end }} - {{- if .key }} - key: {{ .key | quote }} - {{- end }} - {{- if .value }} - value: {{ .value | quote }} - {{- end }} - {{- end }} - {{- end }} - containers: - - name: cleanupfdb - image: "{{ .Values.image.simplyblock.repository }}:{{ .Values.image.simplyblock.tag }}" - imagePullPolicy: "{{ .Values.image.simplyblock.pullPolicy }}" - command: ["python", "simplyblock_core/workers/cleanup_foundationdb.py"] - env: - - name: PROMETHEUS_URL - value: "{{ .Values.prometheus.simplyblock.prometheusURL }}" - - name: PROMETHEUS_PORT - value: "{{ .Values.prometheus.simplyblock.prometheusPORT }}" - - name: SIMPLYBLOCK_LOG_LEVEL - valueFrom: - configMapKeyRef: - name: simplyblock-config - key: LOG_LEVEL - - name: LOG_DELETION_INTERVAL - value: "${LOG_DELETION_INTERVAL}" - volumeMounts: - - name: fdb-cluster-file - mountPath: /etc/foundationdb/fdb.cluster - subPath: fdb.cluster - resources: - requests: - cpu: "200m" - memory: "256Mi" - limits: - cpu: "400m" - memory: "1Gi" - volumes: - - name: fdb-cluster-file - configMap: - name: simplyblock-fdb-cluster-config - items: - - key: cluster-file - path: fdb.cluster - ---- -apiVersion: apps/v1 -kind: Deployment -metadata: - name: simplyblock-tasks-runner-restart - namespace: {{ .Release.Namespace }} -spec: - replicas: 1 - selector: - matchLabels: - app: simplyblock-tasks-runner-restart - template: - metadata: - annotations: - log-collector/enabled: "true" - reloader.stakater.com/auto: "true" - reloader.stakater.com/configmap: "simplyblock-fdb-cluster-config" - labels: - app: simplyblock-tasks-runner-restart - spec: - hostNetwork: true - dnsPolicy: ClusterFirstWithHostNet - {{- if .Values.nodeSelector.create }} - nodeSelector: - {{ .Values.nodeSelector.key }}: {{ .Values.nodeSelector.value }} - {{- end }} - {{- if .Values.tolerations.create }} - tolerations: - {{- range .Values.tolerations.list }} - - operator: {{ .operator | quote }} - {{- if .effect }} - effect: {{ .effect | quote }} - {{- end }} - {{- if .key }} - key: {{ .key | quote }} - {{- end }} - {{- if .value }} - value: {{ .value | quote }} - {{- end }} - {{- end }} - {{- end }} - containers: - - name: tasks-runner-restart - image: "{{ .Values.image.simplyblock.repository }}:{{ .Values.image.simplyblock.tag }}" - imagePullPolicy: "{{ .Values.image.simplyblock.pullPolicy }}" - command: ["python", "simplyblock_core/services/tasks_runner_restart.py"] - env: - - name: PROMETHEUS_URL - value: "{{ .Values.prometheus.simplyblock.prometheusURL }}" - - name: PROMETHEUS_PORT - value: "{{ .Values.prometheus.simplyblock.prometheusPORT }}" - - name: SIMPLYBLOCK_LOG_LEVEL - valueFrom: - configMapKeyRef: - name: simplyblock-config - key: LOG_LEVEL - volumeMounts: - - name: fdb-cluster-file - mountPath: /etc/foundationdb/fdb.cluster - subPath: fdb.cluster - resources: - requests: - cpu: "200m" - memory: "256Mi" - limits: - cpu: "400m" - memory: "1Gi" - volumes: - - name: fdb-cluster-file - configMap: - name: simplyblock-fdb-cluster-config - items: - - key: cluster-file - path: fdb.cluster - ---- -apiVersion: apps/v1 -kind: Deployment -metadata: - name: simplyblock-tasks-runner-migration - namespace: {{ .Release.Namespace }} -spec: - replicas: 1 - selector: - matchLabels: - app: simplyblock-tasks-runner-migration - template: - metadata: - annotations: - log-collector/enabled: "true" - reloader.stakater.com/auto: "true" - reloader.stakater.com/configmap: "simplyblock-fdb-cluster-config" - labels: - app: simplyblock-tasks-runner-migration - spec: - hostNetwork: true - dnsPolicy: ClusterFirstWithHostNet - {{- if .Values.nodeSelector.create }} - nodeSelector: - {{ .Values.nodeSelector.key }}: {{ .Values.nodeSelector.value }} - {{- end }} - {{- if .Values.tolerations.create }} - tolerations: - {{- range .Values.tolerations.list }} - - operator: {{ .operator | quote }} - {{- if .effect }} - effect: {{ .effect | quote }} - {{- end }} - {{- if .key }} - key: {{ .key | quote }} - {{- end }} - {{- if .value }} - value: {{ .value | quote }} - {{- end }} - {{- end }} - {{- end }} - containers: - - name: tasks-runner-migration + - name: tasks-node-add-runner image: "{{ .Values.image.simplyblock.repository }}:{{ .Values.image.simplyblock.tag }}" + command: ["python", "simplyblock_core/services/tasks_runner_node_add.py"] imagePullPolicy: "{{ .Values.image.simplyblock.pullPolicy }}" - command: ["python", "simplyblock_core/services/tasks_runner_migration.py"] env: - - name: PROMETHEUS_URL - value: "{{ .Values.prometheus.simplyblock.prometheusURL }}" - - name: PROMETHEUS_PORT - value: "{{ .Values.prometheus.simplyblock.prometheusPORT }}" - - name: SIMPLYBLOCK_LOG_LEVEL - valueFrom: - configMapKeyRef: - name: simplyblock-config - key: LOG_LEVEL + - name: LVOL_NVMF_PORT_START + value: "{{ .Values.ports.lvolNvmfPortStart }}" +{{- with (include "simplyblock.commonContainer" . | fromYaml) }} +{{ toYaml .env | nindent 12 }} volumeMounts: - - name: fdb-cluster-file - mountPath: /etc/foundationdb/fdb.cluster - subPath: fdb.cluster +{{ toYaml .volumeMounts | nindent 12 }} resources: - requests: - cpu: "200m" - memory: "256Mi" - limits: - cpu: "400m" - memory: "1Gi" - volumes: - - name: fdb-cluster-file - configMap: - name: simplyblock-fdb-cluster-config - items: - - key: cluster-file - path: fdb.cluster ---- -apiVersion: apps/v1 -kind: Deployment -metadata: - name: simplyblock-tasks-runner-failed-migration - namespace: {{ .Release.Namespace }} -spec: - replicas: 1 - selector: - matchLabels: - app: simplyblock-tasks-runner-failed-migration - template: - metadata: - annotations: - log-collector/enabled: "true" - reloader.stakater.com/auto: "true" - reloader.stakater.com/configmap: "simplyblock-fdb-cluster-config" - labels: - app: simplyblock-tasks-runner-failed-migration - spec: - hostNetwork: true - dnsPolicy: ClusterFirstWithHostNet - {{- if .Values.nodeSelector.create }} - nodeSelector: - {{ .Values.nodeSelector.key }}: {{ .Values.nodeSelector.value }} - {{- end }} - {{- if .Values.tolerations.create }} - tolerations: - {{- range .Values.tolerations.list }} - - operator: {{ .operator | quote }} - {{- if .effect }} - effect: {{ .effect | quote }} - {{- end }} - {{- if .key }} - key: {{ .key | quote }} - {{- end }} - {{- if .value }} - value: {{ .value | quote }} - {{- end }} - {{- end }} - {{- end }} - containers: - - name: tasks-runner-failed-migration +{{ toYaml .resources | nindent 12 }} +{{- end }} + + - name: tasks-runner-restart image: "{{ .Values.image.simplyblock.repository }}:{{ .Values.image.simplyblock.tag }}" + command: ["python", "simplyblock_core/services/tasks_runner_restart.py"] imagePullPolicy: "{{ .Values.image.simplyblock.pullPolicy }}" - command: ["python", "simplyblock_core/services/tasks_runner_failed_migration.py"] +{{- with (include "simplyblock.commonContainer" . | fromYaml) }} env: - - name: PROMETHEUS_URL - value: "{{ .Values.prometheus.simplyblock.prometheusURL }}" - - name: PROMETHEUS_PORT - value: "{{ .Values.prometheus.simplyblock.prometheusPORT }}" - - name: SIMPLYBLOCK_LOG_LEVEL - valueFrom: - configMapKeyRef: - name: simplyblock-config - key: LOG_LEVEL +{{ toYaml .env | nindent 12 }} volumeMounts: - - name: fdb-cluster-file - mountPath: /etc/foundationdb/fdb.cluster - subPath: fdb.cluster +{{ toYaml .volumeMounts | nindent 12 }} resources: - requests: - cpu: "200m" - memory: "256Mi" - limits: - cpu: "400m" - memory: "1Gi" - volumes: - - name: fdb-cluster-file - configMap: - name: simplyblock-fdb-cluster-config - items: - - key: cluster-file - path: fdb.cluster ---- -apiVersion: apps/v1 -kind: Deployment -metadata: - name: simplyblock-tasks-runner-cluster-status - namespace: {{ .Release.Namespace }} -spec: - replicas: 1 - selector: - matchLabels: - app: simplyblock-tasks-runner-cluster-status - template: - metadata: - annotations: - log-collector/enabled: "true" - reloader.stakater.com/auto: "true" - reloader.stakater.com/configmap: "simplyblock-fdb-cluster-config" - labels: - app: simplyblock-tasks-runner-cluster-status - spec: - hostNetwork: true - dnsPolicy: ClusterFirstWithHostNet - {{- if .Values.nodeSelector.create }} - nodeSelector: - {{ .Values.nodeSelector.key }}: {{ .Values.nodeSelector.value }} - {{- end }} - {{- if .Values.tolerations.create }} - tolerations: - {{- range .Values.tolerations.list }} - - operator: {{ .operator | quote }} - {{- if .effect }} - effect: {{ .effect | quote }} - {{- end }} - {{- if .key }} - key: {{ .key | quote }} - {{- end }} - {{- if .value }} - value: {{ .value | quote }} - {{- end }} - {{- end }} - {{- end }} - containers: - - name: tasks-runner-cluster-status +{{ toYaml .resources | nindent 12 }} +{{- end }} + + - name: tasks-runner-migration image: "{{ .Values.image.simplyblock.repository }}:{{ .Values.image.simplyblock.tag }}" + command: ["python", "simplyblock_core/services/tasks_runner_migration.py"] imagePullPolicy: "{{ .Values.image.simplyblock.pullPolicy }}" - command: ["python", "simplyblock_core/services/tasks_cluster_status.py"] +{{- with (include "simplyblock.commonContainer" . | fromYaml) }} env: - - name: PROMETHEUS_URL - value: "{{ .Values.prometheus.simplyblock.prometheusURL }}" - - name: PROMETHEUS_PORT - value: "{{ .Values.prometheus.simplyblock.prometheusPORT }}" - - name: SIMPLYBLOCK_LOG_LEVEL - valueFrom: - configMapKeyRef: - name: simplyblock-config - key: LOG_LEVEL +{{ toYaml .env | nindent 12 }} volumeMounts: - - name: fdb-cluster-file - mountPath: /etc/foundationdb/fdb.cluster - subPath: fdb.cluster +{{ toYaml .volumeMounts | nindent 12 }} resources: - requests: - cpu: "200m" - memory: "256Mi" - limits: - cpu: "400m" - memory: "1Gi" - volumes: - - name: fdb-cluster-file - configMap: - name: simplyblock-fdb-cluster-config - items: - - key: cluster-file - path: fdb.cluster ---- -apiVersion: apps/v1 -kind: Deployment -metadata: - name: simplyblock-tasks-runner-new-device-migration - namespace: {{ .Release.Namespace }} -spec: - replicas: 1 - selector: - matchLabels: - app: simplyblock-tasks-runner-new-device-migration - template: - metadata: - annotations: - log-collector/enabled: "true" - reloader.stakater.com/auto: "true" - reloader.stakater.com/configmap: "simplyblock-fdb-cluster-config" - labels: - app: simplyblock-tasks-runner-new-device-migration - spec: - hostNetwork: true - dnsPolicy: ClusterFirstWithHostNet - {{- if .Values.nodeSelector.create }} - nodeSelector: - {{ .Values.nodeSelector.key }}: {{ .Values.nodeSelector.value }} - {{- end }} - {{- if .Values.tolerations.create }} - tolerations: - {{- range .Values.tolerations.list }} - - operator: {{ .operator | quote }} - {{- if .effect }} - effect: {{ .effect | quote }} - {{- end }} - {{- if .key }} - key: {{ .key | quote }} - {{- end }} - {{- if .value }} - value: {{ .value | quote }} - {{- end }} - {{- end }} - {{- end }} - containers: - - name: tasks-runner-new-device-migration +{{ toYaml .resources | nindent 12 }} +{{- end }} + + - name: tasks-runner-failed-migration image: "{{ .Values.image.simplyblock.repository }}:{{ .Values.image.simplyblock.tag }}" + command: ["python", "simplyblock_core/services/tasks_runner_failed_migration.py"] imagePullPolicy: "{{ .Values.image.simplyblock.pullPolicy }}" - command: ["python", "simplyblock_core/services/tasks_runner_new_dev_migration.py"] +{{- with (include "simplyblock.commonContainer" . | fromYaml) }} env: - - name: PROMETHEUS_URL - value: "{{ .Values.prometheus.simplyblock.prometheusURL }}" - - name: PROMETHEUS_PORT - value: "{{ .Values.prometheus.simplyblock.prometheusPORT }}" - - name: SIMPLYBLOCK_LOG_LEVEL - valueFrom: - configMapKeyRef: - name: simplyblock-config - key: LOG_LEVEL +{{ toYaml .env | nindent 12 }} volumeMounts: - - name: fdb-cluster-file - mountPath: /etc/foundationdb/fdb.cluster - subPath: fdb.cluster +{{ toYaml .volumeMounts | nindent 12 }} resources: - requests: - cpu: "200m" - memory: "256Mi" - limits: - cpu: "400m" - memory: "1Gi" - volumes: - - name: fdb-cluster-file - configMap: - name: simplyblock-fdb-cluster-config - items: - - key: cluster-file - path: fdb.cluster ---- -apiVersion: apps/v1 -kind: Deployment -metadata: - name: simplyblock-tasks-node-add-runner - namespace: {{ .Release.Namespace }} -spec: - replicas: 1 - selector: - matchLabels: - app: simplyblock-tasks-node-add-runner - template: - metadata: - annotations: - log-collector/enabled: "true" - reloader.stakater.com/auto: "true" - reloader.stakater.com/configmap: "simplyblock-fdb-cluster-config" - labels: - app: simplyblock-tasks-node-add-runner - spec: - hostNetwork: true - dnsPolicy: ClusterFirstWithHostNet - {{- if .Values.nodeSelector.create }} - nodeSelector: - {{ .Values.nodeSelector.key }}: {{ .Values.nodeSelector.value }} - {{- end }} - {{- if .Values.tolerations.create }} - tolerations: - {{- range .Values.tolerations.list }} - - operator: {{ .operator | quote }} - {{- if .effect }} - effect: {{ .effect | quote }} - {{- end }} - {{- if .key }} - key: {{ .key | quote }} - {{- end }} - {{- if .value }} - value: {{ .value | quote }} - {{- end }} - {{- end }} - {{- end }} - containers: - - name: tasks-node-addrunner +{{ toYaml .resources | nindent 12 }} +{{- end }} + + - name: tasks-runner-cluster-status image: "{{ .Values.image.simplyblock.repository }}:{{ .Values.image.simplyblock.tag }}" + command: ["python", "simplyblock_core/services/tasks_cluster_status.py"] imagePullPolicy: "{{ .Values.image.simplyblock.pullPolicy }}" - command: ["python", "simplyblock_core/services/tasks_runner_node_add.py"] +{{- with (include "simplyblock.commonContainer" . | fromYaml) }} env: - - name: LVOL_NVMF_PORT_START - value: "{{ .Values.ports.lvolNvmfPortStart }}" - - name: PROMETHEUS_URL - value: "{{ .Values.prometheus.simplyblock.prometheusURL }}" - - name: PROMETHEUS_PORT - value: "{{ .Values.prometheus.simplyblock.prometheusPORT }}" - - name: SIMPLYBLOCK_LOG_LEVEL - valueFrom: - configMapKeyRef: - name: simplyblock-config - key: LOG_LEVEL +{{ toYaml .env | nindent 12 }} volumeMounts: - - name: fdb-cluster-file - mountPath: /etc/foundationdb/fdb.cluster - subPath: fdb.cluster +{{ toYaml .volumeMounts | nindent 12 }} resources: - requests: - cpu: "200m" - memory: "256Mi" - limits: - cpu: "400m" - memory: "1Gi" - volumes: - - name: fdb-cluster-file - configMap: - name: simplyblock-fdb-cluster-config - items: - - key: cluster-file - path: fdb.cluster +{{ toYaml .resources | nindent 12 }} +{{- end }} ---- -apiVersion: apps/v1 -kind: Deployment -metadata: - name: simplyblock-tasks-runner-port-allow - namespace: {{ .Release.Namespace }} -spec: - replicas: 1 - selector: - matchLabels: - app: simplyblock-tasks-runner-port-allow - template: - metadata: - annotations: - log-collector/enabled: "true" - reloader.stakater.com/auto: "true" - reloader.stakater.com/configmap: "simplyblock-fdb-cluster-config" - labels: - app: simplyblock-tasks-runner-port-allow - spec: - hostNetwork: true - dnsPolicy: ClusterFirstWithHostNet - {{- if .Values.nodeSelector.create }} - nodeSelector: - {{ .Values.nodeSelector.key }}: {{ .Values.nodeSelector.value }} - {{- end }} - {{- if .Values.tolerations.create }} - tolerations: - {{- range .Values.tolerations.list }} - - operator: {{ .operator | quote }} - {{- if .effect }} - effect: {{ .effect | quote }} - {{- end }} - {{- if .key }} - key: {{ .key | quote }} - {{- end }} - {{- if .value }} - value: {{ .value | quote }} - {{- end }} - {{- end }} - {{- end }} - containers: - - name: tasks-runner-port-allow + - name: tasks-runner-new-device-migration image: "{{ .Values.image.simplyblock.repository }}:{{ .Values.image.simplyblock.tag }}" + command: ["python", "simplyblock_core/services/tasks_runner_new_dev_migration.py"] imagePullPolicy: "{{ .Values.image.simplyblock.pullPolicy }}" +{{- with (include "simplyblock.commonContainer" . | fromYaml) }} + env: +{{ toYaml .env | nindent 12 }} + volumeMounts: +{{ toYaml .volumeMounts | nindent 12 }} + resources: +{{ toYaml .resources | nindent 12 }} +{{- end }} + + - name: tasks-runner-port-allow + image: "{{ .Values.image.simplyblock.repository }}:{{ .Values.image.simplyblock.tag }}" command: ["python", "simplyblock_core/services/tasks_runner_port_allow.py"] + imagePullPolicy: "{{ .Values.image.simplyblock.pullPolicy }}" +{{- with (include "simplyblock.commonContainer" . | fromYaml) }} env: - - name: PROMETHEUS_URL - value: "{{ .Values.prometheus.simplyblock.prometheusURL }}" - - name: PROMETHEUS_PORT - value: "{{ .Values.prometheus.simplyblock.prometheusPORT }}" - - name: SIMPLYBLOCK_LOG_LEVEL - valueFrom: - configMapKeyRef: - name: simplyblock-config - key: LOG_LEVEL +{{ toYaml .env | nindent 12 }} volumeMounts: - - name: fdb-cluster-file - mountPath: /etc/foundationdb/fdb.cluster - subPath: fdb.cluster +{{ toYaml .volumeMounts | nindent 12 }} resources: - requests: - cpu: "200m" - memory: "256Mi" - limits: - cpu: "400m" - memory: "1Gi" - volumes: - - name: fdb-cluster-file - configMap: - name: simplyblock-fdb-cluster-config - items: - - key: cluster-file - path: fdb.cluster ---- -apiVersion: apps/v1 -kind: Deployment -metadata: - name: simplyblock-tasks-runner-jc-comp-resume - namespace: {{ .Release.Namespace }} -spec: - replicas: 1 - selector: - matchLabels: - app: simplyblock-tasks-runner-jc-comp-resume - template: - metadata: - annotations: - log-collector/enabled: "true" - reloader.stakater.com/auto: "true" - reloader.stakater.com/configmap: "simplyblock-fdb-cluster-config" - labels: - app: simplyblock-tasks-runner-jc-comp-resume - spec: - hostNetwork: true - dnsPolicy: ClusterFirstWithHostNet - {{- if .Values.nodeSelector.create }} - nodeSelector: - {{ .Values.nodeSelector.key }}: {{ .Values.nodeSelector.value }} - {{- end }} - {{- if .Values.tolerations.create }} - tolerations: - {{- range .Values.tolerations.list }} - - operator: {{ .operator | quote }} - {{- if .effect }} - effect: {{ .effect | quote }} - {{- end }} - {{- if .key }} - key: {{ .key | quote }} - {{- end }} - {{- if .value }} - value: {{ .value | quote }} - {{- end }} - {{- end }} - {{- end }} - containers: +{{ toYaml .resources | nindent 12 }} +{{- end }} + - name: tasks-runner-jc-comp-resume image: "{{ .Values.image.simplyblock.repository }}:{{ .Values.image.simplyblock.tag }}" - imagePullPolicy: "{{ .Values.image.simplyblock.pullPolicy }}" command: ["python", "simplyblock_core/services/tasks_runner_jc_comp.py"] + imagePullPolicy: "{{ .Values.image.simplyblock.pullPolicy }}" +{{- with (include "simplyblock.commonContainer" . | fromYaml) }} env: - - name: PROMETHEUS_URL - value: "{{ .Values.prometheus.simplyblock.prometheusURL }}" - - name: PROMETHEUS_PORT - value: "{{ .Values.prometheus.simplyblock.prometheusPORT }}" - - name: SIMPLYBLOCK_LOG_LEVEL - valueFrom: - configMapKeyRef: - name: simplyblock-config - key: LOG_LEVEL +{{ toYaml .env | nindent 12 }} volumeMounts: - - name: fdb-cluster-file - mountPath: /etc/foundationdb/fdb.cluster - subPath: fdb.cluster +{{ toYaml .volumeMounts | nindent 12 }} resources: - requests: - cpu: "200m" - memory: "256Mi" - limits: - cpu: "400m" - memory: "1Gi" - volumes: - - name: fdb-cluster-file - configMap: - name: simplyblock-fdb-cluster-config - items: - - key: cluster-file - path: fdb.cluster ---- -apiVersion: apps/v1 -kind: Deployment -metadata: - name: simplyblock-tasks-runner-sync-lvol-del - namespace: {{ .Release.Namespace }} -spec: - replicas: 1 - selector: - matchLabels: - app: simplyblock-tasks-runner-sync-lvol-del - template: - metadata: - annotations: - log-collector/enabled: "true" - reloader.stakater.com/auto: "true" - reloader.stakater.com/configmap: "simplyblock-fdb-cluster-config" - labels: - app: simplyblock-tasks-runner-sync-lvol-del - spec: - hostNetwork: true - dnsPolicy: ClusterFirstWithHostNet - {{- if .Values.nodeSelector.create }} - nodeSelector: - {{ .Values.nodeSelector.key }}: {{ .Values.nodeSelector.value }} - {{- end }} - {{- if .Values.tolerations.create }} - tolerations: - {{- range .Values.tolerations.list }} - - operator: {{ .operator | quote }} - {{- if .effect }} - effect: {{ .effect | quote }} - {{- end }} - {{- if .key }} - key: {{ .key | quote }} - {{- end }} - {{- if .value }} - value: {{ .value | quote }} - {{- end }} - {{- end }} - {{- end }} - containers: +{{ toYaml .resources | nindent 12 }} +{{- end }} + - name: tasks-runner-sync-lvol-del image: "{{ .Values.image.simplyblock.repository }}:{{ .Values.image.simplyblock.tag }}" - imagePullPolicy: "{{ .Values.image.simplyblock.pullPolicy }}" command: ["python", "simplyblock_core/services/tasks_runner_sync_lvol_del.py"] + imagePullPolicy: "{{ .Values.image.simplyblock.pullPolicy }}" +{{- with (include "simplyblock.commonContainer" . | fromYaml) }} env: - - name: PROMETHEUS_URL - value: "{{ .Values.prometheus.simplyblock.prometheusURL }}" - - name: PROMETHEUS_PORT - value: "{{ .Values.prometheus.simplyblock.prometheusPORT }}" - - name: SIMPLYBLOCK_LOG_LEVEL - valueFrom: - configMapKeyRef: - name: simplyblock-config - key: LOG_LEVEL +{{ toYaml .env | nindent 12 }} volumeMounts: - - name: fdb-cluster-file - mountPath: /etc/foundationdb/fdb.cluster - subPath: fdb.cluster +{{ toYaml .volumeMounts | nindent 12 }} resources: - requests: - cpu: "200m" - memory: "256Mi" - limits: - cpu: "400m" - memory: "1Gi" - volumes: - - name: fdb-cluster-file - configMap: - name: simplyblock-fdb-cluster-config - items: - - key: cluster-file - path: fdb.cluster ---- +{{ toYaml .resources | nindent 12 }} +{{- end }} -apiVersion: apps/v1 -kind: DaemonSet -metadata: - name: simplyblock-fluent-bit - namespace: {{ .Release.Namespace }} - labels: - app: simplyblock-fluent-bit -spec: - selector: - matchLabels: - app: simplyblock-fluent-bit - template: - metadata: - labels: - app: simplyblock-fluent-bit - spec: - {{- if .Values.nodeSelector.create }} - nodeSelector: - {{ .Values.nodeSelector.key }}: {{ .Values.nodeSelector.value }} - {{- end }} - {{- if .Values.tolerations.create }} - tolerations: - {{- range .Values.tolerations.list }} - - operator: {{ .operator | quote }} - {{- if .effect }} - effect: {{ .effect | quote }} - {{- end }} - {{- if .key }} - key: {{ .key | quote }} - {{- end }} - {{- if .value }} - value: {{ .value | quote }} - {{- end }} - {{- end }} - {{- end }} - containers: + - name: tasks-runner-snapshot-replication + image: "{{ .Values.image.simplyblock.repository }}:{{ .Values.image.simplyblock.tag }}" + command: ["python", "simplyblock_core/services/snapshot_replication.py"] + imagePullPolicy: "{{ .Values.image.simplyblock.pullPolicy }}" +{{- with (include "simplyblock.commonContainer" . | fromYaml) }} + env: +{{ toYaml .env | nindent 12 }} + volumeMounts: +{{ toYaml .volumeMounts | nindent 12 }} + resources: +{{ toYaml .resources | nindent 12 }} +{{- end }} - name: fluent-bit image: fluent/fluent-bit:1.8.11 - securityContext: - privileged: true volumeMounts: - name: varlog mountPath: /var/log - - name: varlibdockercontainers - mountPath: /var/lib/docker/containers - readOnly: true - name: config mountPath: /fluent-bit/etc/ resources: requests: + cpu: "100m" + memory: "200Mi" + limits: cpu: "200m" memory: "400Mi" - limits: - cpu: "400m" - memory: "1Gi" + volumes: + - name: fdb-cluster-file + configMap: + name: simplyblock-fdb-cluster-config + items: + - key: cluster-file + path: fdb.cluster - name: varlog hostPath: path: /var/log - - name: varlibdockercontainers - hostPath: - path: /var/lib/docker/containers - name: config configMap: name: simplyblock-fluent-bit-config diff --git a/simplyblock_core/scripts/charts/templates/app_sa.yaml b/simplyblock_core/scripts/charts/templates/app_sa.yaml index a5dee735b..7e46984d7 100644 --- a/simplyblock_core/scripts/charts/templates/app_sa.yaml +++ b/simplyblock_core/scripts/charts/templates/app_sa.yaml @@ -1,13 +1,13 @@ apiVersion: v1 kind: ServiceAccount metadata: - name: simplyblock-control-sa + name: simplyblock-sa namespace: {{ .Release.Namespace }} --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: - name: simplyblock-control-role + name: simplyblock-role rules: - apiGroups: [""] resources: ["configmaps"] @@ -21,16 +21,23 @@ rules: - apiGroups: ["mongodbcommunity.mongodb.com"] resources: ["mongodbcommunity"] verbs: ["get", "list", "watch", "patch", "update"] + - apiGroups: ["simplyblock.simplyblock.io"] + resources: ["simplyblockpools/status", "simplyblocklvols/status", "simplyblockstorageclusters/status", "simplyblockstoragenodes/status", "simplyblockdevices/status", "simplyblocktasks/status"] + verbs: ["get", "patch", "update"] + - apiGroups: ["simplyblock.simplyblock.io"] + resources: ["namespaces","simplyblockpools", "simplyblocklvols", "simplyblockstorageclusters", "simplyblockstoragenodes", "simplyblockdevices", "simplyblocktasks"] + verbs: ["get","list" ,"patch", "update", "watch"] + --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: - name: simplyblock-control-binding + name: simplyblock-binding subjects: - kind: ServiceAccount - name: simplyblock-control-sa + name: simplyblock-sa namespace: {{ .Release.Namespace }} roleRef: kind: ClusterRole - name: simplyblock-control-role + name: simplyblock-role apiGroup: rbac.authorization.k8s.io diff --git a/simplyblock_core/scripts/charts/templates/csi-hostpath-plugin.yaml b/simplyblock_core/scripts/charts/templates/csi-hostpath-plugin.yaml index 721815fa5..aa645bff4 100644 --- a/simplyblock_core/scripts/charts/templates/csi-hostpath-plugin.yaml +++ b/simplyblock_core/scripts/charts/templates/csi-hostpath-plugin.yaml @@ -229,3 +229,4 @@ spec: path: /dev type: Directory name: dev-dir + \ No newline at end of file diff --git a/simplyblock_core/scripts/charts/templates/simplyblock-manager.yaml b/simplyblock_core/scripts/charts/templates/simplyblock-manager.yaml new file mode 100644 index 000000000..cca5e522d --- /dev/null +++ b/simplyblock_core/scripts/charts/templates/simplyblock-manager.yaml @@ -0,0 +1,199 @@ +--- +apiVersion: apps/v1 +kind: Deployment +metadata: + name: simplyblock-manager + labels: + control-plane: simplyblock-manager + app: simplyblock-manager +spec: + selector: + matchLabels: + app: simplyblock-manager + replicas: 1 + template: + metadata: + labels: + control-plane: simplyblock-manager + app: simplyblock-manager + spec: + securityContext: + runAsUser: 65532 + runAsGroup: 65532 + fsGroup: 65532 + serviceAccountName: simplyblock-manager + containers: + - image: simplyblock/simplyblock-manager:snapshot_replication + imagePullPolicy: Always + name: manager + env: + - name: WATCH_NAMESPACE + valueFrom: + fieldRef: + fieldPath: metadata.namespace + resources: + limits: + cpu: 500m + memory: 256Mi + requests: + cpu: 500m + memory: 256Mi + securityContext: + readOnlyRootFilesystem: true + allowPrivilegeEscalation: false + privileged: false + terminationGracePeriodSeconds: 10 + +################# ROLE AND ROLE BINDING ############################## +--- +apiVersion: v1 +kind: ServiceAccount +metadata: + name: simplyblock-manager + +--- +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRole +metadata: + name: simplyblock-manager-clusterrole +rules: +- apiGroups: + - "" + resources: + - configmaps + - events + - persistentvolumeclaims + - pods + - pods/exec + - namespaces + - secrets + - services + - serviceaccounts + verbs: + - create + - delete + - get + - list + - patch + - update + - watch +- apiGroups: + - apps + resources: + - deployments + - daemonsets + verbs: + - create + - delete + - get + - list + - patch + - update + - watch +- apiGroups: + - batch + resources: + - jobs + verbs: + - create + - delete + - get + - list + - patch + - update + - watch +- apiGroups: + - "" + resources: + - nodes + verbs: + - get + - list + - watch + - update + - patch +- apiGroups: + - "rbac.authorization.k8s.io" + resources: + - roles + - clusterroles + verbs: + - create + - get + - list + - watch + - update + - patch +- apiGroups: + - "rbac.authorization.k8s.io" + resources: + - rolebindings + - clusterrolebindings + verbs: + - create + - get + - list + - watch + - update + - patch +- apiGroups: + - simplyblock.simplyblock.io + resources: + - simplyblockpools + - simplyblocklvols + - simplyblockstorageclusters + - simplyblockstoragenodes + - simplyblockdevices + - simplyblocktasks + - simplyblocksnapshotreplications + verbs: + - create + - delete + - get + - list + - patch + - update + - watch +- apiGroups: + - simplyblock.simplyblock.io + resources: + - simplyblockpools/finalizers + - simplyblocklvols/finalizers + - simplyblockstorageclusters/finalizers + - simplyblockstoragenodes/finalizers + - simplyblockdevices/finalizers + - simplyblocktasks/finalizers + - simplyblocksnapshotreplications/finalizers + verbs: + - update + - delete +- apiGroups: + - simplyblock.simplyblock.io + resources: + - simplyblockpools/status + - simplyblocklvols/status + - simplyblockstorageclusters/status + - simplyblockstoragenodes/status + - simplyblockdevices/status + - simplyblocktasks/status + - simplyblocksnapshotreplications/status + verbs: + - get + - patch + - update + +--- +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRoleBinding +metadata: + creationTimestamp: null + name: simplyblock-manager-clusterrolebinding +roleRef: + apiGroup: rbac.authorization.k8s.io + kind: ClusterRole + name: simplyblock-manager-clusterrole +subjects: +- kind: ServiceAccount + name: simplyblock-manager + namespace: {{ .Release.Namespace }} + \ No newline at end of file diff --git a/simplyblock_core/scripts/charts/templates/simplyblock_customresource.yaml b/simplyblock_core/scripts/charts/templates/simplyblock_customresource.yaml new file mode 100644 index 000000000..4f2646365 --- /dev/null +++ b/simplyblock_core/scripts/charts/templates/simplyblock_customresource.yaml @@ -0,0 +1,149 @@ +{{- if .Values.simplyblock.cluster }} +apiVersion: simplyblock.simplyblock.io/v1alpha1 +kind: SimplyBlockStorageCluster +metadata: + name: {{ .Values.simplyblock.cluster.clusterName }} + namespace: {{ .Release.Namespace }} +spec: + clusterName: {{ .Values.simplyblock.cluster.clusterName }} + + {{- if .Values.simplyblock.cluster.mgmtIfc }} + mgmtIfc: {{ .Values.simplyblock.cluster.mgmtIfc }} + {{- end }} + + {{- if .Values.simplyblock.cluster.fabric }} + fabric: {{ .Values.simplyblock.cluster.fabric }} + {{- end }} + + {{- if hasKey .Values.simplyblock.cluster "isSingleNode" }} + isSingleNode: {{ .Values.simplyblock.cluster.isSingleNode }} + {{- end }} + + {{- if hasKey .Values.simplyblock.cluster "enableNodeAffinity" }} + enableNodeAffinity: {{ .Values.simplyblock.cluster.enableNodeAffinity }} + {{- end }} + + {{- if hasKey .Values.simplyblock.cluster "strictNodeAntiAffinity" }} + strictNodeAntiAffinity: {{ .Values.simplyblock.cluster.strictNodeAntiAffinity }} + {{- end }} + + {{- if .Values.simplyblock.cluster.capWarn }} + capWarn: {{ .Values.simplyblock.cluster.capWarn }} + {{- end }} + + {{- if .Values.simplyblock.cluster.capCrit }} + capCrit: {{ .Values.simplyblock.cluster.capCrit }} + {{- end }} + + {{- if .Values.simplyblock.cluster.provCapWarn }} + provCapWarn: {{ .Values.simplyblock.cluster.provCapWarn }} + {{- end }} + + {{- if .Values.simplyblock.cluster.provCapCrit }} + provCapCrit: {{ .Values.simplyblock.cluster.provCapCrit }} + {{- end }} +{{- end }} + +--- +{{- if .Values.simplyblock.pool }} +apiVersion: simplyblock.simplyblock.io/v1alpha1 +kind: SimplyBlockPool +metadata: + name: {{ .Values.simplyblock.pool.name }} + namespace: {{ .Release.Namespace }} +spec: + name: {{ .Values.simplyblock.pool.name }} + clusterName: {{ .Values.simplyblock.cluster.clusterName }} + + {{- if .Values.simplyblock.pool.capacityLimit }} + capacityLimit: {{ .Values.simplyblock.pool.capacityLimit | quote }} + {{- end }} +{{- end }} + +--- +{{- if .Values.simplyblock.lvol }} +apiVersion: simplyblock.simplyblock.io/v1alpha1 +kind: SimplyBlockLvol +metadata: + name: {{ .Values.simplyblock.lvol.name }} + namespace: {{ .Release.Namespace }} +spec: + clusterName: {{ .Values.simplyblock.cluster.clusterName }} + poolName: {{ .Values.simplyblock.pool.name }} +{{- end }} + +--- +{{- if .Values.simplyblock.storageNodes }} +apiVersion: simplyblock.simplyblock.io/v1alpha1 +kind: SimplyBlockStorageNode +metadata: + name: {{ .Values.simplyblock.storageNodes.name }} + namespace: {{ .Release.Namespace }} +spec: + clusterName: {{ .Values.simplyblock.cluster.clusterName }} + + {{- if .Values.simplyblock.storageNodes.clusterImage }} + clusterImage: {{ .Values.simplyblock.storageNodes.clusterImage }} + {{- end }} + + {{- if .Values.simplyblock.storageNodes.mgmtIfc }} + mgmtIfc: {{ .Values.simplyblock.storageNodes.mgmtIfc }} + {{- end }} + + {{- if .Values.simplyblock.storageNodes.maxLVol }} + maxLVol: {{ .Values.simplyblock.storageNodes.maxLVol }} + {{- end }} + + {{- if .Values.simplyblock.storageNodes.maxSize }} + maxSize: {{ .Values.simplyblock.storageNodes.maxSize | quote }} + {{- end }} + + {{- if hasKey .Values.simplyblock.storageNodes "partitions" }} + partitions: {{ .Values.simplyblock.storageNodes.partitions }} + {{- end }} + + {{- if .Values.simplyblock.storageNodes.corePercentage }} + corePercentage: {{ .Values.simplyblock.storageNodes.corePercentage }} + {{- end }} + + {{- if hasKey .Values.simplyblock.storageNodes "spdkDebug" }} + spdkDebug: {{ .Values.simplyblock.storageNodes.spdkDebug }} + {{- end }} + + {{- if .Values.simplyblock.storageNodes.spdkImage }} + spdkImage: {{ .Values.simplyblock.storageNodes.spdkImage }} + {{- end }} + + {{- if hasKey .Values.simplyblock.storageNodes "coreIsolation" }} + coreIsolation: {{ .Values.simplyblock.storageNodes.coreIsolation }} + {{- end }} + + {{- if .Values.simplyblock.storageNodes.workerNodes }} + workerNodes: + {{- range .Values.simplyblock.storageNodes.workerNodes }} + - {{ . }} + {{- end }} + {{- end }} +{{- end }} + +--- +{{- if .Values.simplyblock.devices }} +apiVersion: simplyblock.simplyblock.io/v1alpha1 +kind: SimplyBlockDevice +metadata: + name: {{ .Values.simplyblock.devices.name }} + namespace: {{ .Release.Namespace }} +spec: + clusterName: {{ .Values.simplyblock.cluster.clusterName }} +{{- end }} + +--- +{{- if .Values.simplyblock.tasks }} +apiVersion: simplyblock.simplyblock.io/v1alpha1 +kind: SimplyBlockTask +metadata: + name: {{ .Values.simplyblock.tasks.name }} + namespace: {{ .Release.Namespace }} +spec: + clusterName: {{ .Values.simplyblock.cluster.clusterName }} +{{- end }} diff --git a/simplyblock_core/scripts/charts/values.yaml b/simplyblock_core/scripts/charts/values.yaml index a0fd3fcb9..863a462c2 100644 --- a/simplyblock_core/scripts/charts/values.yaml +++ b/simplyblock_core/scripts/charts/values.yaml @@ -1,41 +1,26 @@ -graylog: - rootPasswordSha2: "b87c15a8ae4736d771ca60a7cc2014baaeab19b11c31f5fedef9421958a403c9" - passwordSecret: "is6SP2EdWg0NdmVGv6CEp5hRHNL7BKVMFem4t9pouMqDQnHwXMSomas1qcbKSt5yISr8eBHv4Y7Dbswhyz84Ut0TW6kqsiPs" -monitoring: - enabled: true +observability: + enabled: false secret: "sWbpOgba1bKnCfcPkVQi" - -log: deletionInterval: "3d" - retentionPeriod: "7d" level: "DEBUG" - maxNumberIndex: "3" - -grafana: - endpoint: "" - contactPoint: "https://hooks.slack.com/services/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" + graylog: + rootPasswordSha2: "b87c15a8ae4736d771ca60a7cc2014baaeab19b11c31f5fedef9421958a403c9" + passwordSecret: "is6SP2EdWg0NdmVGv6CEp5hRHNL7BKVMFem4t9pouMqDQnHwXMSomas1qcbKSt5yISr8eBHv4Y7Dbswhyz84Ut0TW6kqsiPs" + maxNumberIndex: "3" + retentionPeriod: "7d" + grafana: + endpoint: "" + contactPoint: "https://hooks.slack.com/services/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" image: simplyblock: - repository: "public.ecr.aws/simply-block/simplyblock" - tag: "R25.10-Hotfix" + repository: "simplyblock/simplyblock" + tag: "main-sfam-2359" pullPolicy: "Always" -tolerations: - create: false - list: - - operator: Exists - effect: - key: - value: -nodeSelector: - create: false - key: - value: - ports: - lvolNvmfPortStart: + lvolNvmfPortStart: 9100 storageclass: allowedTopologyZones: [] @@ -73,7 +58,7 @@ opensearch: persistence: enabled: true storageClass: local-hostpath - size: 10Gi + size: 20Gi resources: requests: @@ -99,9 +84,6 @@ opensearch: enabled: false prometheus: - simplyblock: - prometheusURL: simplyblock-prometheus - prometheusPORT: 9090 server: fullnameOverride: simplyblock-prometheus enabled: true @@ -195,7 +177,7 @@ prometheus: enabled: false ingress: - enabled: true + enabled: false ingressClassName: nginx useDNS: false host: "" @@ -219,3 +201,46 @@ ingress: - ingress topologyKey: "kubernetes.io/hostname" nodeSelector: {} + + +simplyblock: + cluster: + clusterName: simplyblock-cluster + mgmtIfc: eth0 + fabric: tcp + isSingleNode: false + enableNodeAffinity: false + strictNodeAntiAffinity: false + capWarn: 80 + capCrit: 90 + provCapWarn: 120 + provCapCrit: 150 + + pool: + name: simplyblock-pool + capacityLimit: 100Gi + + lvol: + name: simplyblock-lvol + + storageNodes: + name: simplyblock-node + clusterImage: simplyblock/simplyblock:main-sfam-2359 + mgmtIfc: eth0 + maxLVol: 10 + maxSize: 0 + partitions: 0 + corePercentage: 65 + spdkDebug: false + spdkImage: + coreIsolation: false + workerNodes: + - israel-storage-node-1 + - israel-storage-node-2 + - israel-storage-node-3 + + devices: + name: simplyblock-devices + + tasks: + name: simplyblock-task diff --git a/simplyblock_core/scripts/docker-compose-swarm.yml b/simplyblock_core/scripts/docker-compose-swarm.yml index 01ffd0ac3..4c1e50af5 100644 --- a/simplyblock_core/scripts/docker-compose-swarm.yml +++ b/simplyblock_core/scripts/docker-compose-swarm.yml @@ -406,6 +406,20 @@ services: environment: SIMPLYBLOCK_LOG_LEVEL: "$LOG_LEVEL" + SnapshotReplication: + <<: *service-base + image: $SIMPLYBLOCK_DOCKER_IMAGE + command: "python simplyblock_core/services/snapshot_replication.py" + deploy: + placement: + constraints: [node.role == manager] + volumes: + - "/etc/foundationdb:/etc/foundationdb" + networks: + - hostnet + environment: + SIMPLYBLOCK_LOG_LEVEL: "$LOG_LEVEL" + networks: monitoring-net: external: true diff --git a/simplyblock_core/services/lvol_monitor.py b/simplyblock_core/services/lvol_monitor.py index 09e091eb9..f0d3eae01 100644 --- a/simplyblock_core/services/lvol_monitor.py +++ b/simplyblock_core/services/lvol_monitor.py @@ -163,7 +163,9 @@ def process_lvol_delete_finish(lvol): tasks_controller.add_lvol_sync_del_task(sec_node.cluster_id, sec_node.get_id(), lvol_bdev_name, primary_node.get_id()) lvol_events.lvol_delete(lvol) - lvol.remove(db.kv_store) + lvol = db.get_lvol_by_id(lvol.get_id()) + lvol.status = LVol.STATUS_DELETED + lvol.write_to_db() # check for full devices full_devs_ids = [] all_devs_ids = [] diff --git a/simplyblock_core/services/snapshot_replication.py b/simplyblock_core/services/snapshot_replication.py new file mode 100644 index 000000000..61e52d460 --- /dev/null +++ b/simplyblock_core/services/snapshot_replication.py @@ -0,0 +1,356 @@ +# coding=utf-8 +import time +import uuid + +from simplyblock_core import constants, db_controller, utils +from simplyblock_core.controllers import lvol_controller, snapshot_events, snapshot_controller +from simplyblock_core.models.job_schedule import JobSchedule +from simplyblock_core.models.pool import Pool +from simplyblock_core.models.snapshot import SnapShot +from simplyblock_core.models.storage_node import StorageNode + +logger = utils.get_logger(__name__) +utils.init_sentry_sdk(__name__) +# get DB controller +db = db_controller.DBController() + + +def process_snap_replicate_start(task, snapshot): + # 1 create lvol on remote node + logger.info("Starting snapshot replication task") + snode = db.get_storage_node_by_id(snapshot.lvol.node_id) + replicate_to_source = task.function_params["replicate_to_source"] + if "remote_lvol_id" not in task.function_params or not task.function_params["remote_lvol_id"]: + if replicate_to_source: + org_snap = db.get_snapshot_by_id(snapshot.source_replicated_snap_uuid) + remote_node_uuid = db.get_storage_node_by_id(task.node_id) + remote_pool_uuid = org_snap.lvol.pool_uuid + else: # replicate to target + remote_node_uuid = db.get_storage_node_by_id(snapshot.lvol.replication_node_id) + cluster = db.get_cluster_by_id(remote_node_uuid.cluster_id) + remote_pool_uuid = None + if cluster.snapshot_replication_target_pool: + remote_pool_uuid = cluster.snapshot_replication_target_pool + else: + for bool in db.get_pools(remote_node_uuid.cluster_id): + if bool.status == Pool.STATUS_ACTIVE: + remote_pool_uuid = bool.uuid + break + if not remote_pool_uuid: + logger.error(f"Unable to find pool on remote cluster: {remote_node_uuid.cluster_id}") + return + + lv_id, err = lvol_controller.add_lvol_ha( + f"REP_{snapshot.snap_name}", snapshot.size, remote_node_uuid.get_id(), snapshot.lvol.ha_type, + remote_pool_uuid) + if lv_id: + task.function_params["remote_lvol_id"] = lv_id + task.write_to_db() + else: + logger.error(err) + task.function_result = "Error creating remote lvol" + task.write_to_db() + return + + remote_lv = db.get_lvol_by_id(task.function_params["remote_lvol_id"]) + remote_lv_node = db.get_storage_node_by_id(remote_lv.node_id) + if remote_lv_node.status != StorageNode.STATUS_ONLINE: + task.function_result = "Target node is not online, retrying" + task.status = JobSchedule.STATUS_SUSPENDED + task.retry += 1 + task.write_to_db() + return + + # 2 connect to it + ret = snode.rpc_client().bdev_nvme_controller_list(remote_lv.top_bdev) + if not ret: + remote_snode = db.get_storage_node_by_id(remote_lv.node_id) + for nic in remote_snode.data_nics: + ip = nic.ip4_address + ret = snode.rpc_client().bdev_nvme_attach_controller( + remote_lv.top_bdev, remote_lv.nqn, ip, remote_lv.subsys_port, nic.trtype) + if not ret: + msg = "controller attach failed" + logger.error(msg) + raise RuntimeError(msg) + bdev_name = ret[0] + if not bdev_name: + msg = "Bdev name not returned from controller attach" + logger.error(msg) + raise RuntimeError(msg) + bdev_found = False + for i in range(5): + ret = snode.rpc_client().get_bdevs(bdev_name) + if ret: + bdev_found = True + break + else: + time.sleep(1) + + if not bdev_found: + logger.error("lvol Bdev not found after 5 attempts") + raise RuntimeError(f"Failed to connect to lvol: {remote_lv.get_id()}") + + offset = 0 + if "offset" in task.function_params and task.function_params["offset"]: + offset = task.function_params["offset"] + # 3 start replication + snode.rpc_client().bdev_lvol_transfer( + lvol_name=snapshot.snap_bdev, + offset=offset, + cluster_batch=16, + gateway=f"{remote_lv.top_bdev}n1", + operation="replicate" + ) + task.status = JobSchedule.STATUS_RUNNING + task.function_params["start_time"] = int(time.time()) + task.write_to_db() + + if snapshot.status != SnapShot.STATUS_IN_REPLICATION: + snapshot.status = SnapShot.STATUS_IN_REPLICATION + snapshot.write_to_db() + + +def delete_last_snapshot_if_needed(this_task, lvol): + snaps = [] + for task in db.get_job_tasks(this_task.cluster_id): + if task.function_name == JobSchedule.FN_SNAPSHOT_REPLICATION: + if task.get_id() == this_task.get_id(): + continue + logger.debug(task) + try: + snap = db.get_snapshot_by_id(task.function_params["snapshot_id"]) + except KeyError: + continue + if snap.lvol.get_id() != lvol.get_id(): + continue + snaps.append(snap) + + if snaps: + snaps = sorted(snaps, key=lambda x: x.created_at) + snapshot = snaps[-1] + logger.info("Deleting snapshot: %s", snapshot.get_id()) + ret = snapshot_controller.delete(snapshot) + logger.debug(ret) + + +def process_snap_replicate_finish(task, snapshot): + + # detach remote lvol + remote_lv = db.get_lvol_by_id(task.function_params["remote_lvol_id"]) + snode = db.get_storage_node_by_id(snapshot.lvol.node_id) + snode.rpc_client().bdev_nvme_detach_controller(remote_lv.top_bdev) + remote_snode = db.get_storage_node_by_id(remote_lv.node_id) + replicate_to_source = task.function_params["replicate_to_source"] + if "replicate_as_snap_instance" in task.function_params: + replicate_as_snap_instance = task.function_params["replicate_as_snap_instance"] + else: + replicate_as_snap_instance = False + target_prev_snap = None + if replicate_to_source: + org_snap = db.get_snapshot_by_id(snapshot.snap_ref_id) + try: + target_prev_snap = db.get_snapshot_by_id(org_snap.source_replicated_snap_uuid) + except KeyError as e: + logger.error(e) + else: + if snapshot.snap_ref_id: + try: + prev_snap = db.get_snapshot_by_id(snapshot.snap_ref_id) + for sn_inst in prev_snap.instances: + if sn_inst.lvol.node_id == remote_snode.get_id(): + target_prev_snap = sn_inst + break + except KeyError as e: + logger.error(e) + + # chain snaps on primary + if target_prev_snap: + logger.info(f"Chaining replicated lvol: {remote_lv.top_bdev} to snap: {target_prev_snap.snap_bdev}") + ret = remote_snode.rpc_client().bdev_lvol_add_clone(target_prev_snap.snap_bdev, remote_lv.top_bdev) + if not ret: + logger.error("Failed to chain replicated snapshot on primary node") + return False + + # convert to snapshot on primary + ret = remote_snode.rpc_client().bdev_lvol_convert(remote_lv.top_bdev) + if not ret: + logger.error("Failed to convert to snapshot on primary node") + return False + + # chain snaps on secondary + sec_node = db.get_storage_node_by_id(remote_snode.secondary_node_id) + if sec_node.status == StorageNode.STATUS_ONLINE: + if target_prev_snap: + logger.info(f"Chaining replicated lvol: {remote_lv.top_bdev} to snap: {target_prev_snap.snap_bdev}") + ret = sec_node.rpc_client().bdev_lvol_add_clone(target_prev_snap.snap_bdev, remote_lv.top_bdev) + if not ret: + logger.error("Failed to chain replicated snapshot on secondary node") + return False + + # convert to snapshot on secondary + ret = sec_node.rpc_client().bdev_lvol_convert(remote_lv.top_bdev) + if not ret: + logger.error("Failed to convert to snapshot on secondary node") + return False + + new_snapshot_uuid = str(uuid.uuid4()) + + new_snapshot = SnapShot() + new_snapshot.uuid = new_snapshot_uuid + new_snapshot.cluster_id = remote_snode.cluster_id + new_snapshot.lvol = remote_lv + new_snapshot.pool_uuid = remote_lv.pool_uuid + new_snapshot.snap_bdev = remote_lv.top_bdev + new_snapshot.snap_uuid = remote_lv.lvol_uuid + new_snapshot.size = snapshot.size + new_snapshot.used_size = snapshot.used_size + new_snapshot.snap_name = snapshot.snap_name + new_snapshot.blobid = remote_lv.blobid + new_snapshot.created_at = int(time.time()) + new_snapshot.status = SnapShot.STATUS_ONLINE + snapshot.instances.append(new_snapshot) + if not replicate_as_snap_instance: + if replicate_to_source: + new_snapshot.target_replicated_snap_uuid = snapshot.uuid + snapshot.source_replicated_snap_uuid = new_snapshot_uuid + else: + snapshot.target_replicated_snap_uuid = new_snapshot_uuid + new_snapshot.source_replicated_snap_uuid = snapshot.uuid + + if target_prev_snap: + new_snapshot.prev_snap_uuid = target_prev_snap.get_id() + target_prev_snap.next_snap_uuid = new_snapshot_uuid + target_prev_snap.write_to_db() + + new_snapshot.write_to_db() + + if snapshot.status == SnapShot.STATUS_IN_REPLICATION: + snapshot.status = SnapShot.STATUS_ONLINE + + snapshot.write_to_db() + + # delete lvol object + remote_lv.bdev_stack = [] + remote_lv.write_to_db() + lvol_controller.delete_lvol(remote_lv.get_id(), True) + remote_lv.remove(db.kv_store) + snapshot_events.replication_task_finished(snapshot) + delete_last_snapshot_if_needed(task, snapshot.lvol) + return new_snapshot_uuid + + +def task_runner(task: JobSchedule): + snapshot = db.get_snapshot_by_id(task.function_params["snapshot_id"]) + if not snapshot: + task.function_result = "snapshot not found" + task.status = JobSchedule.STATUS_DONE + task.write_to_db(db.kv_store) + return True + + try: + snode = db.get_storage_node_by_id(snapshot.lvol.node_id) + except KeyError: + task.function_result = "node not found" + task.status = JobSchedule.STATUS_DONE + task.write_to_db(db.kv_store) + return True + + if snode.status != StorageNode.STATUS_ONLINE: + task.function_result = "node is not online, retrying" + task.status = JobSchedule.STATUS_SUSPENDED + task.retry += 1 + task.write_to_db(db.kv_store) + return False + + if task.retry >= task.max_retry or task.canceled is True: + task.function_result = "max retry reached" + if task.canceled is True: + task.function_result = "task cancelled" + + task.status = JobSchedule.STATUS_DONE + task.write_to_db(db.kv_store) + + if snapshot.status != SnapShot.STATUS_ONLINE: + snapshot.status = SnapShot.STATUS_ONLINE + snapshot.write_to_db() + + remote_lv = db.get_lvol_by_id(task.function_params["remote_lvol_id"]) + snode.rpc_client().bdev_nvme_detach_controller(remote_lv.top_bdev) + lvol_controller.delete_lvol(remote_lv.get_id(), True) + + return True + + + if task.status in [JobSchedule.STATUS_NEW, JobSchedule.STATUS_SUSPENDED]: + process_snap_replicate_start(task, snapshot) + + elif task.status == JobSchedule.STATUS_RUNNING: + snode = db.get_storage_node_by_id(snapshot.lvol.node_id) + ret = snode.rpc_client().bdev_lvol_transfer_stat(snapshot.snap_bdev) + if not ret: + logger.error("Failed to get transfer stat") + return False + status = ret["transfer_state"] + offset = ret["offset"] + if status == "No process": + task.function_result = f"Status: {status}, offset:{offset}, retrying" + task.status = JobSchedule.STATUS_NEW + task.retry += 1 + task.write_to_db() + return False + if status == "In progress": + task.function_result = f"Status: {status}, offset:{offset}" + task.function_params["offset"] = offset + task.write_to_db() + return True + if status == "Failed": + task.function_result = f"Status: {status}, offset:{offset}, retrying" + task.status = JobSchedule.STATUS_SUSPENDED + task.retry += 1 + task.write_to_db() + return False + if status == "Done": + new_snapshot_uuid = process_snap_replicate_finish(task, snapshot) + if new_snapshot_uuid: + task.function_result = new_snapshot_uuid + task.status = JobSchedule.STATUS_DONE + task.function_params["end_time"] = int(time.time()) + task.write_to_db() + else: + task.function_result = "complete repl failed, retrying" + task.status = JobSchedule.STATUS_SUSPENDED + task.retry += 1 + task.write_to_db() + return True + + +logger.info("Starting Tasks runner...") +while True: + clusters = db.get_clusters() + if not clusters: + logger.error("No clusters found!") + else: + for cl in clusters: + tasks = db.get_job_tasks(cl.get_id(), reverse=False) + for task in tasks: + delay_seconds = constants.TASK_EXEC_INTERVAL_SEC + if task.function_name == JobSchedule.FN_SNAPSHOT_REPLICATION: + if task.status in [JobSchedule.STATUS_NEW, JobSchedule.STATUS_SUSPENDED]: + active_task = False + for t in db.get_job_tasks(task.cluster_id): + if t.function_name == JobSchedule.FN_SNAPSHOT_REPLICATION and t.function_params["snapshot_id"] == task.function_params['snapshot_id']: + if t.status == JobSchedule.STATUS_RUNNING and t.canceled is False: + active_task = True + break + if active_task: + logger.info("replication task found for same snapshot, retry") + continue + if task.status != JobSchedule.STATUS_DONE: + # get new task object because it could be changed from cancel task + task = db.get_task_by_id(task.uuid) + res = task_runner(task) + if not res: + time.sleep(3) + + time.sleep(constants.TASK_EXEC_INTERVAL_SEC) diff --git a/simplyblock_core/services/storage_node_monitor.py b/simplyblock_core/services/storage_node_monitor.py index 5e512ce3e..563ad60ca 100644 --- a/simplyblock_core/services/storage_node_monitor.py +++ b/simplyblock_core/services/storage_node_monitor.py @@ -3,6 +3,7 @@ import time from datetime import datetime, timezone + from simplyblock_core import constants, db_controller, cluster_ops, storage_node_ops, utils from simplyblock_core.controllers import health_controller, device_controller, tasks_controller, storage_events from simplyblock_core.models.cluster import Cluster @@ -13,6 +14,7 @@ logger = utils.get_logger(__name__) + # get DB controller db = db_controller.DBController() @@ -74,7 +76,7 @@ def get_next_cluster_status(cluster_id): # check for jm rep tasks: if node.rpc_client().bdev_lvol_get_lvstores(node.lvstore): try: - ret = node.rpc_client().jc_get_jm_status(node.jm_vuid) + ret = node.rpc_client(timeout=5).jc_get_jm_status(node.jm_vuid) for jm in ret: if ret[jm] is False: # jm is not ready (has active replication task) jm_replication_tasks = True diff --git a/simplyblock_core/snode_client.py b/simplyblock_core/snode_client.py index 41248c2a6..bf4d10e14 100644 --- a/simplyblock_core/snode_client.py +++ b/simplyblock_core/snode_client.py @@ -201,6 +201,14 @@ def ifc_is_roce(self, nic): def ifc_is_tcp(self, nic): params = {"nic": nic} return self._request("GET", "ifc_is_tcp", params) + def nvme_connect(self, ip, port, nqn): + params = {"ip": ip, "port": port, "nqn": nqn} + return self._request("POST", "nvme_connect", params) + + def disconnect_nqn(self, nqn): + params = {"nqn": nqn} + return self._request("POST", "disconnect_nqn", params) + def ping_ip(self, ip_address, ifname): params = { diff --git a/simplyblock_core/storage_node_ops.py b/simplyblock_core/storage_node_ops.py index a52187b6b..aeccb3941 100644 --- a/simplyblock_core/storage_node_ops.py +++ b/simplyblock_core/storage_node_ops.py @@ -1316,6 +1316,8 @@ def add_node(cluster_id, node_addr, iface_name, data_nics_list, jc_singleton_core = new_distribution.get("jc_singleton_core") app_thread_core = new_distribution.get("app_thread_core") jm_cpu_core = new_distribution.get("jm_cpu_core") + lvol_poller_core = new_distribution.get("lvol_poller_core") + lvol_poller_mask = utils.generate_mask(lvol_poller_core) else: poller_cpu_cores = node_config.get("distribution").get("poller_cpu_cores") alceml_cpu_cores = node_config.get("distribution").get("alceml_cpu_cores") @@ -1324,6 +1326,9 @@ def add_node(cluster_id, node_addr, iface_name, data_nics_list, jc_singleton_core = node_config.get("distribution").get("jc_singleton_core") app_thread_core = node_config.get("distribution").get("app_thread_core") jm_cpu_core = node_config.get("distribution").get("jm_cpu_core") + lvol_poller_core = node_config.get("distribution").get("lvol_poller_core") + lvol_poller_mask = utils.generate_mask(lvol_poller_core) + number_of_distribs = node_config.get("number_of_distribs") pollers_mask = utils.generate_mask(poller_cpu_cores) @@ -1429,6 +1434,7 @@ def add_node(cluster_id, node_addr, iface_name, data_nics_list, snode.write_to_db(kv_store) snode.app_thread_mask = app_thread_mask or "" snode.pollers_mask = pollers_mask or "" + snode.lvol_poller_mask = lvol_poller_mask or "" snode.jm_cpu_mask = jm_cpu_mask snode.alceml_cpu_index = alceml_cpu_index snode.alceml_worker_cpu_index = alceml_worker_cpu_index @@ -1506,6 +1512,12 @@ def add_node(cluster_id, node_addr, iface_name, data_nics_list, rpc_client.log_set_print_level("DEBUG") + if snode.lvol_poller_mask: + ret = rpc_client.bdev_lvol_create_poller_group(snode.lvol_poller_mask) + if not ret: + logger.error("Failed to set pollers mask") + return False + # 5- set app_thread cpu mask if snode.app_thread_mask: ret = rpc_client.thread_get_stats() @@ -2110,6 +2122,12 @@ def restart_storage_node( rpc_client.log_set_print_level("DEBUG") + if snode.lvol_poller_mask: + ret = rpc_client.bdev_lvol_create_poller_group(snode.lvol_poller_mask) + if not ret: + logger.error("Failed to set pollers mask") + return False + # 5- set app_thread cpu mask if snode.app_thread_mask: ret = rpc_client.thread_get_stats() @@ -2623,7 +2641,8 @@ def shutdown_storage_node(node_id, force=False): if force is False: return False for task in tasks: - if task.function_name != JobSchedule.FN_NODE_RESTART: + if task.function_name not in [ + JobSchedule.FN_NODE_RESTART, JobSchedule.FN_SNAPSHOT_REPLICATION, JobSchedule.FN_LVOL_SYNC_DEL]: tasks_controller.cancel_task(task.uuid) logger.info("Shutting down node") diff --git a/simplyblock_core/utils/__init__.py b/simplyblock_core/utils/__init__.py index a96630203..032b5cd36 100644 --- a/simplyblock_core/utils/__init__.py +++ b/simplyblock_core/utils/__init__.py @@ -447,23 +447,26 @@ def reserve_n(count): assigned = {} if (len(vcpu_list) < 12): - vcpu = reserve_n(4) + vcpu = reserve_n(5) assigned["app_thread_core"] = vcpu[0:1] assigned["jm_cpu_core"] = vcpu[1:2] assigned["jc_singleton_core"] = vcpu[2:3] assigned["alceml_cpu_cores"] = vcpu[3:4] + assigned["lvol_poller_core"] = vcpu[4:5] elif (len(vcpu_list) < 22): - vcpu = reserve_n(5) + vcpu = reserve_n(6) assigned["app_thread_core"] = vcpu[0:1] assigned["jm_cpu_core"] = vcpu[1:2] assigned["jc_singleton_core"] = vcpu[2:3] assigned["alceml_cpu_cores"] = vcpu[3:5] + assigned["lvol_poller_core"] = vcpu[5:6] else: - vcpus = reserve_n(3+alceml_count) + vcpus = reserve_n(4+alceml_count) assigned["app_thread_core"] = vcpus[0:1] assigned["jm_cpu_core"] = vcpus[1:2] assigned["jc_singleton_core"] = vcpus[2:3] - assigned["alceml_cpu_cores"] = vcpus[3:3+alceml_count] + assigned["lvol_poller_core"] = vcpus[3:4] + assigned["alceml_cpu_cores"] = vcpus[4:4+alceml_count] dp = int(len(remaining) / 2) if 17 > dp >= 12: poller_n = len(remaining) - 12 @@ -495,7 +498,8 @@ def reserve_n(count): assigned.get("alceml_cpu_cores", []), assigned.get("alceml_worker_cpu_cores", []), assigned.get("distrib_cpu_cores", []), - assigned.get("jc_singleton_core", []) + assigned.get("jc_singleton_core", []), + assigned.get("lvol_poller_core", []), ) @@ -746,7 +750,10 @@ def nearest_upper_power_of_2(n): def strfdelta(tdelta): - remainder = int(tdelta.total_seconds()) + return strfdelta_seconds(int(tdelta.total_seconds())) + + +def strfdelta_seconds(remainder: int) -> str: possible_fields = ('W', 'D', 'H', 'M', 'S') constants = {'W': 604800, 'D': 86400, 'H': 3600, 'M': 60, 'S': 1} values = {} @@ -1666,7 +1673,8 @@ def regenerate_config(new_config, old_config, force=False): "alceml_cpu_cores": get_core_indexes(core_to_index, distribution[3]), "alceml_worker_cpu_cores": get_core_indexes(core_to_index, distribution[4]), "distrib_cpu_cores": get_core_indexes(core_to_index, distribution[5]), - "jc_singleton_core": get_core_indexes(core_to_index, distribution[6])} + "jc_singleton_core": get_core_indexes(core_to_index, distribution[6]), + "lvol_poller_core": get_core_indexes(core_to_index, distribution[7])} isolated_cores = old_config["nodes"][i]["isolated"] number_of_distribs = 2 @@ -1819,7 +1827,8 @@ def generate_configs(max_lvol, max_prov, sockets_to_use, nodes_per_socket, pci_a # "alceml_worker_cpu_cores": get_core_indexes(core_group["core_to_index"], # core_group["distribution"][4]), "distrib_cpu_cores": get_core_indexes(core_group["core_to_index"], core_group["distribution"][5]), - "jc_singleton_core": get_core_indexes(core_group["core_to_index"], core_group["distribution"][6]) + "jc_singleton_core": get_core_indexes(core_group["core_to_index"], core_group["distribution"][6]), + "lvol_poller_core": get_core_indexes(core_group["core_to_index"], core_group["distribution"][7]) }, "ssd_pcis": [], "nic_ports": system_info[nid]["nics"] @@ -2216,6 +2225,286 @@ def load_kube_config_with_fallback(): config.load_kube_config() +def patch_cr_status( + *, + group: str, + version: str, + plural: str, + namespace: str, + name: str, + status_patch: dict, +): + """ + Patch the status subresource of a Custom Resource. + + status_patch example: + {"": "": } + """ + + load_kube_config_with_fallback() + + api = client.CustomObjectsApi() + + body = { + "status": status_patch + } + + try: + api.patch_namespaced_custom_object_status( + group=group, + version=version, + namespace=namespace, + plural=plural, + name=name, + body=body, + ) + except ApiException as e: + logger.error( + f"Failed to patch status for {name}: {e.reason} {e.body}" + ) + + +def patch_cr_node_status( + *, + group: str, + version: str, + plural: str, + namespace: str, + name: str, + node_uuid: str, + node_mgmt_ip: str, + updates: Optional[Dict[str, Any]] = None, + remove: bool = False, +): + """ + Patch status.nodes[*] fields for a specific node identified by UUID. + + Operations: + - Update a node (by uuid or mgmtIp) + - Remove a node (by uuid or mgmtIp) + + updates example: + {"health": "true"} + {"status": "offline"} + {"capacity": {"sizeUsed": 1234}} + """ + load_kube_config_with_fallback() + api = client.CustomObjectsApi() + + try: + cr = api.get_namespaced_custom_object( + group=group, + version=version, + namespace=namespace, + plural=plural, + name=name, + ) + + status_nodes = cr.get("status", {}).get("nodes", []) + if not status_nodes: + raise RuntimeError("CR has no status.nodes") + + spec_worker_nodes = cr.get("spec", {}).get("workerNodes", []) + + found = False + new_status_nodes = [] + removed_hostname = None + + for node in status_nodes: + match = ( + node.get("uuid") == node_uuid or + node.get("mgmtIp") == node_mgmt_ip + ) + + if match: + found = True + removed_hostname = node.get("hostname") + + if remove: + continue + + if updates: + node.update(updates) + + new_status_nodes.append(node) + + if not found: + raise RuntimeError( + f"Node not found (uuid={node_uuid}, mgmtIp={node_mgmt_ip})" + ) + + if remove and removed_hostname: + new_worker_nodes = [ + n for n in spec_worker_nodes if n != removed_hostname + ] + + api.patch_namespaced_custom_object( + group=group, + version=version, + namespace=namespace, + plural=plural, + name=name, + body={ + "spec": { + "workerNodes": new_worker_nodes + } + }, + ) + + api.patch_namespaced_custom_object_status( + group=group, + version=version, + namespace=namespace, + plural=plural, + name=name, + body={ + "status": { + "nodes": new_status_nodes + } + }, + ) + + except ApiException as e: + logger.error( + f"Failed to patch node for {name}: {e.reason} {e.body}" + ) + + +def patch_cr_lvol_status( + *, + group: str, + version: str, + plural: str, + namespace: str, + name: str, + lvol_uuid: Optional[str] = None, + updates: Optional[Dict[str, Any]] = None, + remove: bool = False, + add: Optional[Dict[str, Any]] = None, +): + """ + Patch status.lvols[*] for an LVOL CustomResource. + + Operations: + - Update an existing LVOL (by uuid) + - Remove an LVOL (by uuid) + - Add a new LVOL entry + + Parameters: + lvol_uuid: + UUID of the lvol entry to update or remove + + updates: + Dict of fields to update on the matched lvol + Example: + {"status": "offline", "health": False} + + remove: + If True, remove the lvol identified by lvol_uuid + + add: + Full lvol dict to append to status.lvols + """ + + load_kube_config_with_fallback() + api = client.CustomObjectsApi() + + now = datetime.now(timezone.utc).isoformat() + + try: + cr = api.get_namespaced_custom_object( + group=group, + version=version, + namespace=namespace, + plural=plural, + name=name, + ) + + status = cr.get("status", {}) + lvols = status.get("lvols", []) or [] + + changed = False + + # ---- ADD ---- + if add is not None: + add = dict(add) + add.setdefault("createDt", now) + add["updateDt"] = now + lvols.append(add) + changed = True + + # ---- UPDATE / REMOVE ---- + if lvol_uuid: + found = False + new_lvols = [] + + for lvol in lvols: + if lvol.get("uuid") == lvol_uuid: + found = True + + if remove: + changed = True + continue + + if updates: + updated_lvol = dict(lvol) + updated_lvol.update(updates) + updated_lvol["updateDt"] = now + new_lvols.append(updated_lvol) + changed = True + continue + + new_lvols.append(lvol) + + if not found: + if remove: + logger.warning( + "Skipping LVOL removal from CR status because LVOL was not found", + extra={ + "cr_name": name, + "namespace": namespace, + "lvol_uuid": lvol_uuid, + }, + ) + return + + if updates: + logger.warning( + "Skipping LVOL status update because LVOL was not found", + extra={ + "cr_name": name, + "namespace": namespace, + "lvol_uuid": lvol_uuid, + "updates": updates, + }, + ) + return + + lvols = new_lvols + + if not changed: + return + + body = { + "status": { + "lvols": lvols + } + } + + api.patch_namespaced_custom_object_status( + group=group, + version=version, + namespace=namespace, + plural=plural, + name=name, + body=body, + ) + + except ApiException as e: + logger.error( + f"Failed to patch lvol status for {name}: {e.reason} {e.body}" + ) + def get_node_name_by_ip(target_ip: str) -> str: load_kube_config_with_fallback() v1 = client.CoreV1Api() @@ -2726,7 +3015,6 @@ def clean_devices(config_path, format, force): except json.JSONDecodeError as e: logger.error(f"Error decoding JSON: {e}") - def create_rpc_socket_mount(): try: diff --git a/simplyblock_web/api/internal/storage_node/docker.py b/simplyblock_web/api/internal/storage_node/docker.py index cc0766af1..f93cb925b 100644 --- a/simplyblock_web/api/internal/storage_node/docker.py +++ b/simplyblock_web/api/internal/storage_node/docker.py @@ -749,6 +749,47 @@ def is_alive(): return utils.get_response(True) +@api.post('/nvme_connect', + summary='Connect NVMe-oF target', + responses={ + 200: {'content': {'application/json': {'schema': utils.response_schema({ + 'type': 'boolean', + })}}, + }, +}) +def connect_to_nvme(body: utils.NVMEConnectParams): + """Connect to the indicated NVMe-oF target. + """ + st = f"nvme connect --transport=tcp --traddr={body.ip} --trsvcid={body.port} --nqn={body.nqn}" + logger.debug(st) + out, err, ret_code = shell_utils.run_command(st) + logger.debug(ret_code) + logger.debug(out) + logger.debug(err) + if ret_code == 0: + return utils.get_response(True) + else: + return utils.get_response(ret_code, error=err) + + +@api.post('/disconnect_nqn', + summary='Disconnect NVMe-oF device by NQN', + responses={ + 200: {'content': {'application/json': {'schema': utils.response_schema({ + 'type': 'integer', + })}}}, +}) +def disconnect_nqn(body: utils.DisconnectParams): + """Disconnect from indicated NVMe-oF target + """ + st = f"nvme disconnect --nqn={body.nqn}" + out, err, ret_code = shell_utils.run_command(st) + logger.debug(ret_code) + logger.debug(out) + logger.debug(err) + return utils.get_response(ret_code) + + class PingQuery(BaseModel): ip: str ifname: str diff --git a/simplyblock_web/api/v1/cluster.py b/simplyblock_web/api/v1/cluster.py index 043476688..532278b8d 100644 --- a/simplyblock_web/api/v1/cluster.py +++ b/simplyblock_web/api/v1/cluster.py @@ -229,6 +229,23 @@ def cluster_activate(uuid): # FIXME: Any failure within the thread are not handled return utils.get_response(True), 202 +@bp.route('/cluster/addreplication/', methods=['PUT']) +def cluster_add_replication(uuid): + req_data = request.get_json() + target_cluster_uuid = req_data.get("target_cluster_uuid", None) + replication_timeout = req_data.get("replication_timeout", 0) + target_pool_uuid = req_data.get("target_pool_uuid", None) + + try: + db.get_cluster_by_id(uuid) + except KeyError: + return utils.get_response_error(f"Cluster not found: {uuid}", 404) + + cluster_ops.add_replication(source_cl_id=uuid, target_cl_id=target_cluster_uuid, + timeout=replication_timeout, target_pool=target_pool_uuid) + return utils.get_response(True), 202 + + @bp.route('/cluster/allstats//history/', methods=['GET']) @bp.route('/cluster/allstats/', methods=['GET'], defaults={'history': None}) diff --git a/simplyblock_web/api/v1/lvol.py b/simplyblock_web/api/v1/lvol.py index 5506070d2..d7218085b 100644 --- a/simplyblock_web/api/v1/lvol.py +++ b/simplyblock_web/api/v1/lvol.py @@ -158,6 +158,7 @@ def add_lvol(): ndcs = utils.get_value_or_default(cl_data, "ndcs", 0) npcs = utils.get_value_or_default(cl_data, "npcs", 0) fabric = utils.get_value_or_default(cl_data, "fabric", "tcp") + do_replicate = utils.get_value_or_default(cl_data, "do_replicate", False) ret, error = lvol_controller.add_lvol_ha( name=name, @@ -186,7 +187,8 @@ def add_lvol(): max_namespace_per_subsys=max_namespace_per_subsys, ndcs=ndcs, npcs=npcs, - fabric=fabric + fabric=fabric, + do_replicate=do_replicate ) return utils.get_response(ret, error, http_code=400) @@ -308,6 +310,31 @@ def inflate_lvol(uuid): ret = lvol_controller.inflate_lvol(uuid) return utils.get_response(ret) +@bp.route('/lvol/replication_start/', methods=['PUT']) +def replication_start(uuid): + try: + db.get_lvol_by_id(uuid) + except KeyError as e: + return utils.get_response_error(str(e), 404) + + ret = lvol_controller.replication_trigger(uuid) + return utils.get_response(ret) + +@bp.route('/lvol/replication_stop/', methods=['PUT']) +def replication_stop(uuid): + try: + db.get_lvol_by_id(uuid) + except KeyError as e: + return utils.get_response_error(str(e), 404) + + ret = lvol_controller.replication_stop(uuid) + return utils.get_response(ret) + +@bp.route('/lvol/clone//', methods=['GET']) +def clone(uuid, clone_name): + ret = lvol_controller.clone_lvol(uuid, clone_name) + return utils.get_response(ret) + @bp.route('/lvol/clone', methods=['POST']) def clone(): cl_data = request.get_json() diff --git a/simplyblock_web/api/v2/cluster.py b/simplyblock_web/api/v2/cluster.py index e6179ffad..c8b7b047d 100644 --- a/simplyblock_web/api/v2/cluster.py +++ b/simplyblock_web/api/v2/cluster.py @@ -17,6 +17,11 @@ db = DBController() +class _ReplicationParams(BaseModel): + snapshot_replication_target_cluster: str + snapshot_replication_timeout: int = 0 + target_pool: Optional[str] = None + class _UpdateParams(BaseModel): management_image: Optional[str] spdk_image: Optional[str] @@ -155,6 +160,23 @@ def activate(cluster: Cluster) -> Response: ).start() return Response(status_code=202) # FIXME: Provide URL for checking task status +@instance_api.post('/addreplication', name='clusters:addreplication', status_code=202, responses={202: {"content": None}}) +def cluster_add_replication(cluster: Cluster, parameters: _ReplicationParams) -> Response: + cluster_ops.add_replication( + source_cl_id=cluster.get_id(), + target_cl_id=parameters.snapshot_replication_target_cluster, + timeout=parameters.snapshot_replication_timeout, + target_pool=parameters.target_pool + ) + return Response(status_code=202) + +@instance_api.post('/expand', name='clusters:expand', status_code=202, responses={202: {"content": None}}) +def expand(cluster: Cluster) -> Response: + Thread( + target=cluster_ops.cluster_expand, + args=(cluster.get_id(),), + ).start() + return Response(status_code=202) # FIXME: Provide URL for checking task status @instance_api.post('/update', name='clusters:upgrade', status_code=204, responses={204: {"content": None}}) def update_cluster( cluster: Cluster, parameters: _UpdateParams) -> Response: diff --git a/simplyblock_web/api/v2/dtos.py b/simplyblock_web/api/v2/dtos.py index 1f6461c1c..61b8f4ad2 100644 --- a/simplyblock_web/api/v2/dtos.py +++ b/simplyblock_web/api/v2/dtos.py @@ -219,15 +219,22 @@ class VolumeDTO(BaseModel): cloned_from: Optional[util.UrlPath] crypto_key: Optional[Tuple[str, str]] high_availability: bool + lvol_priority_class: util.Unsigned + do_replicate: bool = False + max_namespace_per_subsys: int max_rw_iops: util.Unsigned max_rw_mbytes: util.Unsigned max_r_mbytes: util.Unsigned max_w_mbytes: util.Unsigned allowed_hosts: List[str] policy: str + capacity: CapacityStatDTO + rep_info: Optional[dict] = None + from_source: bool = True + @staticmethod - def from_model(model: LVol, request: Request, cluster_id: str): + def from_model(model: LVol, request: Request, cluster_id: str, stat_obj: Optional[StatsObject]=None, rep_info=None): from simplyblock_core.controllers import migration_controller from simplyblock_core.db_controller import DBController as _DBC active_mig = migration_controller.get_active_migration_for_lvol(model.uuid) @@ -262,6 +269,17 @@ def from_model(model: LVol, request: Request, cluster_id: str): else None ), high_availability=model.ha_type == 'ha', + pool_uuid=model.pool_uuid, + pool_name=model.pool_name, + pvc_name=model.pvc_name, + snapshot_name=model.snapshot_name, + ndcs=model.ndcs, + npcs=model.npcs, + blobid=model.blobid, + ns_id=model.ns_id, + lvol_priority_class=model.lvol_priority_class, + do_replicate=model.do_replicate, + max_namespace_per_subsys=model.max_namespace_per_subsys, max_rw_iops=model.rw_ios_per_sec, max_rw_mbytes=model.rw_mbytes_per_sec, max_r_mbytes=model.r_mbytes_per_sec, @@ -356,4 +374,6 @@ def from_model(model: LVolMigration): error_message=model.error_message or "", started_at=model.started_at, completed_at=model.completed_at, + rep_info=rep_info, + from_source=model.from_source ) diff --git a/simplyblock_web/api/v2/volume.py b/simplyblock_web/api/v2/volume.py index c2ef71ac2..9eeb25b1d 100644 --- a/simplyblock_web/api/v2/volume.py +++ b/simplyblock_web/api/v2/volume.py @@ -11,7 +11,7 @@ from .cluster import Cluster from .pool import StoragePool -from .dtos import VolumeDTO, SnapshotDTO +from .dtos import VolumeDTO, SnapshotDTO, TaskDTO from . import util @@ -44,6 +44,10 @@ class _CreateParams(BaseModel): ndcs: util.Unsigned = 0 npcs: util.Unsigned = 0 allowed_hosts: Optional[List[str]] = None + fabric: str = "tcp" + max_namespace_per_subsys: int = 1 + do_replicate: bool = False + replication_cluster_id: Optional[str] = None class _CloneParams(BaseModel): @@ -87,6 +91,10 @@ def add( ndcs=data.ndcs, npcs=data.npcs, allowed_hosts=data.allowed_hosts, + fabric=data.fabric, + max_namespace_per_subsys=data.max_namespace_per_subsys, + do_replicate=data.do_replicate, + replication_cluster_id=data.replication_cluster_id, ) elif isinstance(data, _CloneParams): volume_id_or_false, error = snapshot_controller.clone( @@ -124,7 +132,12 @@ def _lookup_volume(volume_id: UUID) -> LVol: @instance_api.get('/', name='clusters:storage-pools:volumes:detail') def get(request: Request, cluster: Cluster, pool: StoragePool, volume: Volume) -> VolumeDTO: - return VolumeDTO.from_model(volume, request, cluster.get_id()) + stat_obj = None + ret = db.get_lvol_stats(volume, 1) + if ret: + stat_obj = ret[0] + rep_info = lvol_controller.get_replication_info(volume.get_id()) + return VolumeDTO.from_model(volume, request, cluster.get_id(), stat_obj, rep_info) class UpdatableLVolParams(BaseModel): @@ -201,6 +214,26 @@ def inflate(cluster: Cluster, pool: StoragePool, volume: Volume) -> Response: return Response(status_code=204) +@instance_api.post('/replication_trigger', name='clusters:storage-pools:volumes:replication_start', status_code=204, responses={204: {"content": None}}) +def replication_trigger(cluster: Cluster, pool: StoragePool, volume: Volume) -> Response: + if not lvol_controller.replication_trigger(volume.get_id()): + raise ValueError('Failed to start volume snapshot replication') + + return Response(status_code=204) + +@instance_api.post('/replication_start', name='clusters:storage-pools:volumes:replication_start', status_code=204, responses={204: {"content": None}}) +def replication_start(cluster: Cluster, pool: StoragePool, volume: Volume) -> Response: + if not lvol_controller.replication_start(volume.get_id(), cluster.get_id()): + raise ValueError('Failed to start volume snapshot replication') + + return Response(status_code=204) + +@instance_api.post('/replication_stop', name='clusters:storage-pools:volumes:replication_stop', status_code=204, responses={204: {"content": None}}) +def replication_stop(cluster: Cluster, pool: StoragePool, volume: Volume) -> Response: + if not lvol_controller.replication_stop(volume.get_id()): + raise ValueError('Failed to stop volume snapshot replication') + + return Response(status_code=204) @instance_api.get('/connect', name='clusters:storage-pools:volumes:connect') def connect(cluster: Cluster, pool: StoragePool, volume: Volume): @@ -264,6 +297,34 @@ def create_snapshot( ) return Response(status_code=201, headers={'Location': entity_url}) + +@instance_api.post('/replicate_lvol', name='clusters:storage-pools:volumes:replicate_lvol') +def replicate_lvol_on_target_cluster(cluster: Cluster, pool: StoragePool, volume: Volume): + return lvol_controller.replicate_lvol_on_target_cluster(volume.get_id()) + + +class ReplicateLVolParams(BaseModel): + lvol_id: Optional[str] = None + + +@instance_api.post('/replicate_lvol_on_source_cluster', name='clusters:storage-pools:replicate_lvol_on_source_cluster') +def replicate_lvol_on_source_cluster(cluster: Cluster, pool: StoragePool, body: ReplicateLVolParams): + return lvol_controller.replicate_lvol_on_source_cluster(body.lvol_id, cluster.get_id(), pool.get_id()) + + +@instance_api.get('/list_replication_tasks', name='clusters:storage-pools:volumes:list_replication_tasks') +def list_replication_tasks(cluster: Cluster, pool: StoragePool, volume: Volume) -> List[TaskDTO]: + tasks = lvol_controller.list_replication_tasks(volume.get_id()) + return [TaskDTO.from_model(task) for task in tasks] + +@instance_api.get('/suspend', name='clusters:storage-pools:volumes:suspend') +def suspend(cluster: Cluster, pool: StoragePool, volume: Volume) -> bool: + return lvol_controller.suspend_lvol(volume.get_id()) + +@instance_api.get('/resume', name='clusters:storage-pools:volumes:resume') +def resume(cluster: Cluster, pool: StoragePool, volume: Volume) -> bool: + return lvol_controller.resume_lvol(volume.get_id()) + @instance_api.get('/clone', name='clusters:storage-pools:volumes:clone') def clone(cluster: Cluster, pool: StoragePool, volume: Volume, clone_name: str) -> bool: return lvol_controller.clone_lvol(volume.get_id(), clone_name) diff --git a/simplyblock_web/templates/storage_deploy_spdk.yaml.j2 b/simplyblock_web/templates/storage_deploy_spdk.yaml.j2 index df152797a..8bb0e5e10 100644 --- a/simplyblock_web/templates/storage_deploy_spdk.yaml.j2 +++ b/simplyblock_web/templates/storage_deploy_spdk.yaml.j2 @@ -35,9 +35,6 @@ spec: - name: host-rootfs hostPath: path: / - - name: foundationdb - hostPath: - path: /var/foundationdb - name: etc-simplyblock hostPath: path: /var/simplyblock @@ -59,21 +56,6 @@ spec: hostPath: path: /var/log/pods - initContainers: - - name: copy-script - image: public.ecr.aws/simply-block/busybox - command: ["sh", "-c", "echo \"{{ FDB_CONNECTION }}\" > /etc/foundationdb/fdb.cluster"] - volumeMounts: - - name: foundationdb - mountPath: /etc/foundationdb - resources: - requests: - cpu: "100m" - memory: "64Mi" - limits: - cpu: "100m" - memory: "64Mi" - containers: - name: spdk-container image: {{ SPDK_IMAGE }} @@ -99,16 +81,6 @@ spec: value: "{{ NSOCKET }}" - name: FW_PORT value: "{{ FW_PORT }}" - - name: SPDKCSI_SECRET - valueFrom: - secretKeyRef: - name: simplyblock-csi-secret - key: secret.json - - name: CLUSTER_CONFIG - valueFrom: - configMapKeyRef: - name: simplyblock-csi-cm - key: config.json lifecycle: postStart: exec: diff --git a/simplyblock_web/templates/storage_init_job.yaml.j2 b/simplyblock_web/templates/storage_init_job.yaml.j2 index 7acd31078..466b3dbbe 100644 --- a/simplyblock_web/templates/storage_init_job.yaml.j2 +++ b/simplyblock_web/templates/storage_init_job.yaml.j2 @@ -73,3 +73,4 @@ spec: fi echo "--- Init setup complete ---" + \ No newline at end of file From 801c600df6974c4d3bb6adc29db6fc4fc63085a3 Mon Sep 17 00:00:00 2001 From: hamdykhader Date: Fri, 27 Mar 2026 21:42:51 +0300 Subject: [PATCH 02/70] feat: enhance cluster and volume DTOs to include capacity statistics and refactor list endpoints for improved data retrieval --- simplyblock_core/env_var | 2 +- simplyblock_core/scripts/charts/values.yaml | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/simplyblock_core/env_var b/simplyblock_core/env_var index f6ec62a06..4a1ca5074 100644 --- a/simplyblock_core/env_var +++ b/simplyblock_core/env_var @@ -1,5 +1,5 @@ SIMPLY_BLOCK_COMMAND_NAME=sbcli-dev SIMPLY_BLOCK_VERSION=19.2.33 -SIMPLY_BLOCK_DOCKER_IMAGE=public.ecr.aws/simply-block/simplyblock:main +SIMPLY_BLOCK_DOCKER_IMAGE=public.ecr.aws/simply-block/simplyblock:main-snapshot-replication SIMPLY_BLOCK_SPDK_ULTRA_IMAGE=public.ecr.aws/simply-block/ultra:main-latest diff --git a/simplyblock_core/scripts/charts/values.yaml b/simplyblock_core/scripts/charts/values.yaml index 863a462c2..c8b8da595 100644 --- a/simplyblock_core/scripts/charts/values.yaml +++ b/simplyblock_core/scripts/charts/values.yaml @@ -16,7 +16,7 @@ observability: image: simplyblock: repository: "simplyblock/simplyblock" - tag: "main-sfam-2359" + tag: "main-snapshot-replication" pullPolicy: "Always" ports: @@ -225,7 +225,7 @@ simplyblock: storageNodes: name: simplyblock-node - clusterImage: simplyblock/simplyblock:main-sfam-2359 + clusterImage: simplyblock/simplyblock:main-snapshot-replication mgmtIfc: eth0 maxLVol: 10 maxSize: 0 From a9f8431b10a4f7bace3dbcbd4abfa2b44e00658c Mon Sep 17 00:00:00 2001 From: geoffrey1330 Date: Fri, 27 Mar 2026 12:24:02 +0100 Subject: [PATCH 03/70] updated helm template app_k8s.yaml --- simplyblock_core/scripts/charts/templates/app_k8s.yaml | 1 + 1 file changed, 1 insertion(+) diff --git a/simplyblock_core/scripts/charts/templates/app_k8s.yaml b/simplyblock_core/scripts/charts/templates/app_k8s.yaml index 82f1d4f2c..8a06c1d90 100644 --- a/simplyblock_core/scripts/charts/templates/app_k8s.yaml +++ b/simplyblock_core/scripts/charts/templates/app_k8s.yaml @@ -1,3 +1,4 @@ + --- apiVersion: apps/v1 kind: Deployment From 2a62dc8e13afbe00b99e39e63eca4f45b03829c8 Mon Sep 17 00:00:00 2001 From: geoffrey1330 Date: Fri, 27 Mar 2026 12:41:02 +0100 Subject: [PATCH 04/70] made prometheus url configurable --- .../scripts/charts/templates/_helpers.tpl | 21 +++ .../scripts/charts/templates/app_k8s.yaml | 124 +++++++++++++++--- simplyblock_core/scripts/charts/values.yaml | 9 +- 3 files changed, 132 insertions(+), 22 deletions(-) create mode 100644 simplyblock_core/scripts/charts/templates/_helpers.tpl diff --git a/simplyblock_core/scripts/charts/templates/_helpers.tpl b/simplyblock_core/scripts/charts/templates/_helpers.tpl new file mode 100644 index 000000000..745197e54 --- /dev/null +++ b/simplyblock_core/scripts/charts/templates/_helpers.tpl @@ -0,0 +1,21 @@ +{{- define "simplyblock.commonContainer" }} +env: + - name: SIMPLYBLOCK_LOG_LEVEL + valueFrom: + configMapKeyRef: + name: simplyblock-config + key: LOG_LEVEL + +volumeMounts: + - name: fdb-cluster-file + mountPath: /etc/foundationdb/fdb.cluster + subPath: fdb.cluster + +resources: + requests: + cpu: "50m" + memory: "100Mi" + limits: + cpu: "300m" + memory: "1Gi" +{{- end }} \ No newline at end of file diff --git a/simplyblock_core/scripts/charts/templates/app_k8s.yaml b/simplyblock_core/scripts/charts/templates/app_k8s.yaml index 8a06c1d90..6d507c059 100644 --- a/simplyblock_core/scripts/charts/templates/app_k8s.yaml +++ b/simplyblock_core/scripts/charts/templates/app_k8s.yaml @@ -37,6 +37,10 @@ spec: env: - name: LVOL_NVMF_PORT_START value: "{{ .Values.ports.lvolNvmfPortStart }}" + - name: PROMETHEUS_URL + value: "{{ .Values.prometheus.simplyblock.prometheusURL }}" + - name: PROMETHEUS_PORT + value: "{{ .Values.prometheus.simplyblock.prometheusPORT }}" - name: K8S_NAMESPACE valueFrom: fieldRef: @@ -116,6 +120,10 @@ spec: value: "{{ .Values.ports.lvolNvmfPortStart }}" - name: ENABLE_MONITORING value: "{{ .Values.observability.enabled }}" + - name: PROMETHEUS_URL + value: "{{ .Values.prometheus.simplyblock.prometheusURL }}" + - name: PROMETHEUS_PORT + value: "{{ .Values.prometheus.simplyblock.prometheusPORT }}" - name: K8S_NAMESPACE valueFrom: fieldRef: @@ -197,8 +205,12 @@ spec: image: "{{ .Values.image.simplyblock.repository }}:{{ .Values.image.simplyblock.tag }}" command: ["python", "simplyblock_core/services/storage_node_monitor.py"] imagePullPolicy: "{{ .Values.image.simplyblock.pullPolicy }}" -{{- with (include "simplyblock.commonContainer" . | fromYaml) }} env: + - name: PROMETHEUS_URL + value: "{{ .Values.prometheus.simplyblock.prometheusURL }}" + - name: PROMETHEUS_PORT + value: "{{ .Values.prometheus.simplyblock.prometheusPORT }}" +{{- with (include "simplyblock.commonContainer" . | fromYaml) }} {{ toYaml .env | nindent 12 }} volumeMounts: {{ toYaml .volumeMounts | nindent 12 }} @@ -213,6 +225,10 @@ spec: env: - name: BACKEND_TYPE value: "k8s" + - name: PROMETHEUS_URL + value: "{{ .Values.prometheus.simplyblock.prometheusURL }}" + - name: PROMETHEUS_PORT + value: "{{ .Values.prometheus.simplyblock.prometheusPORT }}" {{- with (include "simplyblock.commonContainer" . | fromYaml) }} {{ toYaml .env | nindent 12 }} volumeMounts: @@ -225,8 +241,12 @@ spec: image: "{{ .Values.image.simplyblock.repository }}:{{ .Values.image.simplyblock.tag }}" command: ["python", "simplyblock_core/services/lvol_stat_collector.py"] imagePullPolicy: "{{ .Values.image.simplyblock.pullPolicy }}" -{{- with (include "simplyblock.commonContainer" . | fromYaml) }} env: + - name: PROMETHEUS_URL + value: "{{ .Values.prometheus.simplyblock.prometheusURL }}" + - name: PROMETHEUS_PORT + value: "{{ .Values.prometheus.simplyblock.prometheusPORT }}" +{{- with (include "simplyblock.commonContainer" . | fromYaml) }} {{ toYaml .env | nindent 12 }} volumeMounts: {{ toYaml .volumeMounts | nindent 12 }} @@ -238,8 +258,12 @@ spec: image: "{{ .Values.image.simplyblock.repository }}:{{ .Values.image.simplyblock.tag }}" command: ["python", "simplyblock_core/services/main_distr_event_collector.py"] imagePullPolicy: "{{ .Values.image.simplyblock.pullPolicy }}" -{{- with (include "simplyblock.commonContainer" . | fromYaml) }} env: + - name: PROMETHEUS_URL + value: "{{ .Values.prometheus.simplyblock.prometheusURL }}" + - name: PROMETHEUS_PORT + value: "{{ .Values.prometheus.simplyblock.prometheusPORT }}" +{{- with (include "simplyblock.commonContainer" . | fromYaml) }} {{ toYaml .env | nindent 12 }} volumeMounts: {{ toYaml .volumeMounts | nindent 12 }} @@ -251,8 +275,12 @@ spec: image: "{{ .Values.image.simplyblock.repository }}:{{ .Values.image.simplyblock.tag }}" command: ["python", "simplyblock_core/services/capacity_and_stats_collector.py"] imagePullPolicy: "{{ .Values.image.simplyblock.pullPolicy }}" -{{- with (include "simplyblock.commonContainer" . | fromYaml) }} env: + - name: PROMETHEUS_URL + value: "{{ .Values.prometheus.simplyblock.prometheusURL }}" + - name: PROMETHEUS_PORT + value: "{{ .Values.prometheus.simplyblock.prometheusPORT }}" +{{- with (include "simplyblock.commonContainer" . | fromYaml) }} {{ toYaml .env | nindent 12 }} volumeMounts: {{ toYaml .volumeMounts | nindent 12 }} @@ -264,8 +292,12 @@ spec: image: "{{ .Values.image.simplyblock.repository }}:{{ .Values.image.simplyblock.tag }}" command: ["python", "simplyblock_core/services/cap_monitor.py"] imagePullPolicy: "{{ .Values.image.simplyblock.pullPolicy }}" -{{- with (include "simplyblock.commonContainer" . | fromYaml) }} env: + - name: PROMETHEUS_URL + value: "{{ .Values.prometheus.simplyblock.prometheusURL }}" + - name: PROMETHEUS_PORT + value: "{{ .Values.prometheus.simplyblock.prometheusPORT }}" +{{- with (include "simplyblock.commonContainer" . | fromYaml) }} {{ toYaml .env | nindent 12 }} volumeMounts: {{ toYaml .volumeMounts | nindent 12 }} @@ -277,8 +309,12 @@ spec: image: "{{ .Values.image.simplyblock.repository }}:{{ .Values.image.simplyblock.tag }}" command: ["python", "simplyblock_core/services/health_check_service.py"] imagePullPolicy: "{{ .Values.image.simplyblock.pullPolicy }}" -{{- with (include "simplyblock.commonContainer" . | fromYaml) }} env: + - name: PROMETHEUS_URL + value: "{{ .Values.prometheus.simplyblock.prometheusURL }}" + - name: PROMETHEUS_PORT + value: "{{ .Values.prometheus.simplyblock.prometheusPORT }}" +{{- with (include "simplyblock.commonContainer" . | fromYaml) }} {{ toYaml .env | nindent 12 }} volumeMounts: {{ toYaml .volumeMounts | nindent 12 }} @@ -290,8 +326,12 @@ spec: image: "{{ .Values.image.simplyblock.repository }}:{{ .Values.image.simplyblock.tag }}" command: ["python", "simplyblock_core/services/device_monitor.py"] imagePullPolicy: "{{ .Values.image.simplyblock.pullPolicy }}" -{{- with (include "simplyblock.commonContainer" . | fromYaml) }} env: + - name: PROMETHEUS_URL + value: "{{ .Values.prometheus.simplyblock.prometheusURL }}" + - name: PROMETHEUS_PORT + value: "{{ .Values.prometheus.simplyblock.prometheusPORT }}" +{{- with (include "simplyblock.commonContainer" . | fromYaml) }} {{ toYaml .env | nindent 12 }} volumeMounts: {{ toYaml .volumeMounts | nindent 12 }} @@ -303,8 +343,12 @@ spec: image: "{{ .Values.image.simplyblock.repository }}:{{ .Values.image.simplyblock.tag }}" command: ["python", "simplyblock_core/services/lvol_monitor.py"] imagePullPolicy: "{{ .Values.image.simplyblock.pullPolicy }}" -{{- with (include "simplyblock.commonContainer" . | fromYaml) }} env: + - name: PROMETHEUS_URL + value: "{{ .Values.prometheus.simplyblock.prometheusURL }}" + - name: PROMETHEUS_PORT + value: "{{ .Values.prometheus.simplyblock.prometheusPORT }}" +{{- with (include "simplyblock.commonContainer" . | fromYaml) }} {{ toYaml .env | nindent 12 }} volumeMounts: {{ toYaml .volumeMounts | nindent 12 }} @@ -316,8 +360,12 @@ spec: image: "{{ .Values.image.simplyblock.repository }}:{{ .Values.image.simplyblock.tag }}" command: ["python", "simplyblock_core/services/snapshot_monitor.py"] imagePullPolicy: "{{ .Values.image.simplyblock.pullPolicy }}" -{{- with (include "simplyblock.commonContainer" . | fromYaml) }} env: + - name: PROMETHEUS_URL + value: "{{ .Values.prometheus.simplyblock.prometheusURL }}" + - name: PROMETHEUS_PORT + value: "{{ .Values.prometheus.simplyblock.prometheusPORT }}" +{{- with (include "simplyblock.commonContainer" . | fromYaml) }} {{ toYaml .env | nindent 12 }} volumeMounts: {{ toYaml .volumeMounts | nindent 12 }} @@ -383,6 +431,10 @@ spec: env: - name: LVOL_NVMF_PORT_START value: "{{ .Values.ports.lvolNvmfPortStart }}" + - name: PROMETHEUS_URL + value: "{{ .Values.prometheus.simplyblock.prometheusURL }}" + - name: PROMETHEUS_PORT + value: "{{ .Values.prometheus.simplyblock.prometheusPORT }}" {{- with (include "simplyblock.commonContainer" . | fromYaml) }} {{ toYaml .env | nindent 12 }} volumeMounts: @@ -395,8 +447,12 @@ spec: image: "{{ .Values.image.simplyblock.repository }}:{{ .Values.image.simplyblock.tag }}" command: ["python", "simplyblock_core/services/tasks_runner_restart.py"] imagePullPolicy: "{{ .Values.image.simplyblock.pullPolicy }}" -{{- with (include "simplyblock.commonContainer" . | fromYaml) }} env: + - name: PROMETHEUS_URL + value: "{{ .Values.prometheus.simplyblock.prometheusURL }}" + - name: PROMETHEUS_PORT + value: "{{ .Values.prometheus.simplyblock.prometheusPORT }}" +{{- with (include "simplyblock.commonContainer" . | fromYaml) }} {{ toYaml .env | nindent 12 }} volumeMounts: {{ toYaml .volumeMounts | nindent 12 }} @@ -408,8 +464,12 @@ spec: image: "{{ .Values.image.simplyblock.repository }}:{{ .Values.image.simplyblock.tag }}" command: ["python", "simplyblock_core/services/tasks_runner_migration.py"] imagePullPolicy: "{{ .Values.image.simplyblock.pullPolicy }}" -{{- with (include "simplyblock.commonContainer" . | fromYaml) }} env: + - name: PROMETHEUS_URL + value: "{{ .Values.prometheus.simplyblock.prometheusURL }}" + - name: PROMETHEUS_PORT + value: "{{ .Values.prometheus.simplyblock.prometheusPORT }}" +{{- with (include "simplyblock.commonContainer" . | fromYaml) }} {{ toYaml .env | nindent 12 }} volumeMounts: {{ toYaml .volumeMounts | nindent 12 }} @@ -421,8 +481,12 @@ spec: image: "{{ .Values.image.simplyblock.repository }}:{{ .Values.image.simplyblock.tag }}" command: ["python", "simplyblock_core/services/tasks_runner_failed_migration.py"] imagePullPolicy: "{{ .Values.image.simplyblock.pullPolicy }}" -{{- with (include "simplyblock.commonContainer" . | fromYaml) }} env: + - name: PROMETHEUS_URL + value: "{{ .Values.prometheus.simplyblock.prometheusURL }}" + - name: PROMETHEUS_PORT + value: "{{ .Values.prometheus.simplyblock.prometheusPORT }}" +{{- with (include "simplyblock.commonContainer" . | fromYaml) }} {{ toYaml .env | nindent 12 }} volumeMounts: {{ toYaml .volumeMounts | nindent 12 }} @@ -434,8 +498,12 @@ spec: image: "{{ .Values.image.simplyblock.repository }}:{{ .Values.image.simplyblock.tag }}" command: ["python", "simplyblock_core/services/tasks_cluster_status.py"] imagePullPolicy: "{{ .Values.image.simplyblock.pullPolicy }}" -{{- with (include "simplyblock.commonContainer" . | fromYaml) }} env: + - name: PROMETHEUS_URL + value: "{{ .Values.prometheus.simplyblock.prometheusURL }}" + - name: PROMETHEUS_PORT + value: "{{ .Values.prometheus.simplyblock.prometheusPORT }}" +{{- with (include "simplyblock.commonContainer" . | fromYaml) }} {{ toYaml .env | nindent 12 }} volumeMounts: {{ toYaml .volumeMounts | nindent 12 }} @@ -447,8 +515,12 @@ spec: image: "{{ .Values.image.simplyblock.repository }}:{{ .Values.image.simplyblock.tag }}" command: ["python", "simplyblock_core/services/tasks_runner_new_dev_migration.py"] imagePullPolicy: "{{ .Values.image.simplyblock.pullPolicy }}" -{{- with (include "simplyblock.commonContainer" . | fromYaml) }} env: + - name: PROMETHEUS_URL + value: "{{ .Values.prometheus.simplyblock.prometheusURL }}" + - name: PROMETHEUS_PORT + value: "{{ .Values.prometheus.simplyblock.prometheusPORT }}" +{{- with (include "simplyblock.commonContainer" . | fromYaml) }} {{ toYaml .env | nindent 12 }} volumeMounts: {{ toYaml .volumeMounts | nindent 12 }} @@ -460,8 +532,12 @@ spec: image: "{{ .Values.image.simplyblock.repository }}:{{ .Values.image.simplyblock.tag }}" command: ["python", "simplyblock_core/services/tasks_runner_port_allow.py"] imagePullPolicy: "{{ .Values.image.simplyblock.pullPolicy }}" -{{- with (include "simplyblock.commonContainer" . | fromYaml) }} env: + - name: PROMETHEUS_URL + value: "{{ .Values.prometheus.simplyblock.prometheusURL }}" + - name: PROMETHEUS_PORT + value: "{{ .Values.prometheus.simplyblock.prometheusPORT }}" +{{- with (include "simplyblock.commonContainer" . | fromYaml) }} {{ toYaml .env | nindent 12 }} volumeMounts: {{ toYaml .volumeMounts | nindent 12 }} @@ -473,8 +549,12 @@ spec: image: "{{ .Values.image.simplyblock.repository }}:{{ .Values.image.simplyblock.tag }}" command: ["python", "simplyblock_core/services/tasks_runner_jc_comp.py"] imagePullPolicy: "{{ .Values.image.simplyblock.pullPolicy }}" -{{- with (include "simplyblock.commonContainer" . | fromYaml) }} env: + - name: PROMETHEUS_URL + value: "{{ .Values.prometheus.simplyblock.prometheusURL }}" + - name: PROMETHEUS_PORT + value: "{{ .Values.prometheus.simplyblock.prometheusPORT }}" +{{- with (include "simplyblock.commonContainer" . | fromYaml) }} {{ toYaml .env | nindent 12 }} volumeMounts: {{ toYaml .volumeMounts | nindent 12 }} @@ -486,8 +566,12 @@ spec: image: "{{ .Values.image.simplyblock.repository }}:{{ .Values.image.simplyblock.tag }}" command: ["python", "simplyblock_core/services/tasks_runner_sync_lvol_del.py"] imagePullPolicy: "{{ .Values.image.simplyblock.pullPolicy }}" -{{- with (include "simplyblock.commonContainer" . | fromYaml) }} env: + - name: PROMETHEUS_URL + value: "{{ .Values.prometheus.simplyblock.prometheusURL }}" + - name: PROMETHEUS_PORT + value: "{{ .Values.prometheus.simplyblock.prometheusPORT }}" +{{- with (include "simplyblock.commonContainer" . | fromYaml) }} {{ toYaml .env | nindent 12 }} volumeMounts: {{ toYaml .volumeMounts | nindent 12 }} @@ -499,8 +583,12 @@ spec: image: "{{ .Values.image.simplyblock.repository }}:{{ .Values.image.simplyblock.tag }}" command: ["python", "simplyblock_core/services/snapshot_replication.py"] imagePullPolicy: "{{ .Values.image.simplyblock.pullPolicy }}" -{{- with (include "simplyblock.commonContainer" . | fromYaml) }} env: + - name: PROMETHEUS_URL + value: "{{ .Values.prometheus.simplyblock.prometheusURL }}" + - name: PROMETHEUS_PORT + value: "{{ .Values.prometheus.simplyblock.prometheusPORT }}" +{{- with (include "simplyblock.commonContainer" . | fromYaml) }} {{ toYaml .env | nindent 12 }} volumeMounts: {{ toYaml .volumeMounts | nindent 12 }} diff --git a/simplyblock_core/scripts/charts/values.yaml b/simplyblock_core/scripts/charts/values.yaml index c8b8da595..ef2640d36 100644 --- a/simplyblock_core/scripts/charts/values.yaml +++ b/simplyblock_core/scripts/charts/values.yaml @@ -84,6 +84,9 @@ opensearch: enabled: false prometheus: + simplyblock: + prometheusURL: simplyblock-prometheus + prometheusPORT: 9090 server: fullnameOverride: simplyblock-prometheus enabled: true @@ -225,7 +228,7 @@ simplyblock: storageNodes: name: simplyblock-node - clusterImage: simplyblock/simplyblock:main-snapshot-replication + clusterImage: simplyblock/simplyblock:main-snap-repl mgmtIfc: eth0 maxLVol: 10 maxSize: 0 @@ -235,9 +238,7 @@ simplyblock: spdkImage: coreIsolation: false workerNodes: - - israel-storage-node-1 - - israel-storage-node-2 - - israel-storage-node-3 + devices: name: simplyblock-devices From a763c4d22c058ea2a87262a563b79f6d2c0c592d Mon Sep 17 00:00:00 2001 From: geoffrey1330 Date: Fri, 27 Mar 2026 12:48:03 +0100 Subject: [PATCH 05/70] added nodeselector and tolerations --- .../scripts/charts/templates/app_k8s.yaml | 81 ++++++++++++++++++- .../charts/templates/simplyblock-manager.yaml | 23 +++++- simplyblock_core/scripts/charts/values.yaml | 16 ++++ 3 files changed, 117 insertions(+), 3 deletions(-) diff --git a/simplyblock_core/scripts/charts/templates/app_k8s.yaml b/simplyblock_core/scripts/charts/templates/app_k8s.yaml index 6d507c059..ce0809db0 100644 --- a/simplyblock_core/scripts/charts/templates/app_k8s.yaml +++ b/simplyblock_core/scripts/charts/templates/app_k8s.yaml @@ -29,6 +29,26 @@ spec: matchLabels: app: simplyblock-admin-control topologyKey: kubernetes.io/hostname + + {{- if .Values.nodeSelector.create }} + nodeSelector: + {{ .Values.nodeSelector.key }}: {{ .Values.nodeSelector.value }} + {{- end }} + {{- if .Values.tolerations.create }} + tolerations: + {{- range .Values.tolerations.list }} + - operator: {{ .operator | quote }} + {{- if .effect }} + effect: {{ .effect | quote }} + {{- end }} + {{- if .key }} + key: {{ .key | quote }} + {{- end }} + {{- if .value }} + value: {{ .value | quote }} + {{- end }} + {{- end }} + {{- end }} containers: - name: simplyblock-control image: "{{ .Values.image.simplyblock.repository }}:{{ .Values.image.simplyblock.tag }}" @@ -102,7 +122,27 @@ spec: - labelSelector: matchLabels: app: simplyblock-admin-control - topologyKey: kubernetes.io/hostname + topologyKey: kubernetes.io/hostname + + {{- if .Values.nodeSelector.create }} + nodeSelector: + {{ .Values.nodeSelector.key }}: {{ .Values.nodeSelector.value }} + {{- end }} + {{- if .Values.tolerations.create }} + tolerations: + {{- range .Values.tolerations.list }} + - operator: {{ .operator | quote }} + {{- if .effect }} + effect: {{ .effect | quote }} + {{- end }} + {{- if .key }} + key: {{ .key | quote }} + {{- end }} + {{- if .value }} + value: {{ .value | quote }} + {{- end }} + {{- end }} + {{- end }} containers: - name: webappapi image: "{{ .Values.image.simplyblock.repository }}:{{ .Values.image.simplyblock.tag }}" @@ -200,6 +240,25 @@ spec: serviceAccountName: simplyblock-sa hostNetwork: true dnsPolicy: ClusterFirstWithHostNet + {{- if .Values.nodeSelector.create }} + nodeSelector: + {{ .Values.nodeSelector.key }}: {{ .Values.nodeSelector.value }} + {{- end }} + {{- if .Values.tolerations.create }} + tolerations: + {{- range .Values.tolerations.list }} + - operator: {{ .operator | quote }} + {{- if .effect }} + effect: {{ .effect | quote }} + {{- end }} + {{- if .key }} + key: {{ .key | quote }} + {{- end }} + {{- if .value }} + value: {{ .value | quote }} + {{- end }} + {{- end }} + {{- end }} containers: - name: storage-node-monitor image: "{{ .Values.image.simplyblock.repository }}:{{ .Values.image.simplyblock.tag }}" @@ -423,6 +482,26 @@ spec: serviceAccountName: simplyblock-sa hostNetwork: true dnsPolicy: ClusterFirstWithHostNet + + {{- if .Values.nodeSelector.create }} + nodeSelector: + {{ .Values.nodeSelector.key }}: {{ .Values.nodeSelector.value }} + {{- end }} + {{- if .Values.tolerations.create }} + tolerations: + {{- range .Values.tolerations.list }} + - operator: {{ .operator | quote }} + {{- if .effect }} + effect: {{ .effect | quote }} + {{- end }} + {{- if .key }} + key: {{ .key | quote }} + {{- end }} + {{- if .value }} + value: {{ .value | quote }} + {{- end }} + {{- end }} + {{- end }} containers: - name: tasks-node-add-runner image: "{{ .Values.image.simplyblock.repository }}:{{ .Values.image.simplyblock.tag }}" diff --git a/simplyblock_core/scripts/charts/templates/simplyblock-manager.yaml b/simplyblock_core/scripts/charts/templates/simplyblock-manager.yaml index cca5e522d..db52951cb 100644 --- a/simplyblock_core/scripts/charts/templates/simplyblock-manager.yaml +++ b/simplyblock_core/scripts/charts/templates/simplyblock-manager.yaml @@ -22,9 +22,28 @@ spec: runAsGroup: 65532 fsGroup: 65532 serviceAccountName: simplyblock-manager + {{- if .Values.nodeSelector.create }} + nodeSelector: + {{ .Values.nodeSelector.key }}: {{ .Values.nodeSelector.value }} + {{- end }} + {{- if .Values.tolerations.create }} + tolerations: + {{- range .Values.tolerations.list }} + - operator: {{ .operator | quote }} + {{- if .effect }} + effect: {{ .effect | quote }} + {{- end }} + {{- if .key }} + key: {{ .key | quote }} + {{- end }} + {{- if .value }} + value: {{ .value | quote }} + {{- end }} + {{- end }} + {{- end }} containers: - - image: simplyblock/simplyblock-manager:snapshot_replication - imagePullPolicy: Always + - image: "{{ .Values.image.manager.repository }}:{{ .Values.image.manager.tag }}" + imagePullPolicy: "{{ .Values.image.manager.pullPolicy }}" name: manager env: - name: WATCH_NAMESPACE diff --git a/simplyblock_core/scripts/charts/values.yaml b/simplyblock_core/scripts/charts/values.yaml index ef2640d36..8ca7060b2 100644 --- a/simplyblock_core/scripts/charts/values.yaml +++ b/simplyblock_core/scripts/charts/values.yaml @@ -18,6 +18,22 @@ image: repository: "simplyblock/simplyblock" tag: "main-snapshot-replication" pullPolicy: "Always" + manager: + repository: "simplyblock/simplyblock-manager" + tag: "snapshot_replication" + pullPolicy: "Always" + +tolerations: + create: false + list: + - operator: Exists + effect: + key: + value: +nodeSelector: + create: false + key: + value: ports: lvolNvmfPortStart: 9100 From dddbbe94c78ee86d862a8b1167b6fb654650bc4f Mon Sep 17 00:00:00 2001 From: geoffrey1330 Date: Fri, 27 Mar 2026 16:05:03 +0100 Subject: [PATCH 06/70] fixed the helm values --- simplyblock_core/scripts/charts/Chart.yaml | 4 ++-- .../scripts/charts/templates/dashboards.yaml | 2 +- simplyblock_core/scripts/charts/templates/mongodb.yaml | 4 ++-- .../scripts/charts/templates/monitoring_configmap.yaml | 2 +- .../scripts/charts/templates/monitoring_k8s.yaml | 4 ++-- .../scripts/charts/templates/monitoring_secret.yaml | 10 +++++----- .../scripts/charts/templates/monitoring_svc.yaml | 2 +- 7 files changed, 14 insertions(+), 14 deletions(-) diff --git a/simplyblock_core/scripts/charts/Chart.yaml b/simplyblock_core/scripts/charts/Chart.yaml index 072a4b7df..671f39cfa 100644 --- a/simplyblock_core/scripts/charts/Chart.yaml +++ b/simplyblock_core/scripts/charts/Chart.yaml @@ -17,11 +17,11 @@ dependencies: version: 1.4.0 repository: https://mongodb.github.io/helm-charts alias: mongodb - condition: monitoring.enabled + condition: observability.enabled - name: opensearch version: 2.9.0 repository: https://opensearch-project.github.io/helm-charts - condition: monitoring.enabled + condition: observability.enabled - name: prometheus version: "25.18.0" repository: "https://prometheus-community.github.io/helm-charts" diff --git a/simplyblock_core/scripts/charts/templates/dashboards.yaml b/simplyblock_core/scripts/charts/templates/dashboards.yaml index 981e961d0..f044a0c72 100644 --- a/simplyblock_core/scripts/charts/templates/dashboards.yaml +++ b/simplyblock_core/scripts/charts/templates/dashboards.yaml @@ -1,4 +1,4 @@ -{{- if .Values.monitoring.enabled }} +{{- if .Values.observability.enabled }} apiVersion: v1 kind: ConfigMap metadata: diff --git a/simplyblock_core/scripts/charts/templates/mongodb.yaml b/simplyblock_core/scripts/charts/templates/mongodb.yaml index 4830d51c3..6c004f314 100644 --- a/simplyblock_core/scripts/charts/templates/mongodb.yaml +++ b/simplyblock_core/scripts/charts/templates/mongodb.yaml @@ -1,4 +1,4 @@ -{{- if .Values.monitoring.enabled }} +{{- if .Values.observability.enabled }} apiVersion: mongodbcommunity.mongodb.com/v1 kind: MongoDBCommunity metadata: @@ -52,5 +52,5 @@ metadata: name: admin-password type: Opaque stringData: - password: {{ .Values.monitoring.secret }} + password: {{ .Values.observability.secret }} {{- end }} diff --git a/simplyblock_core/scripts/charts/templates/monitoring_configmap.yaml b/simplyblock_core/scripts/charts/templates/monitoring_configmap.yaml index def678ec4..ec10c9bf4 100644 --- a/simplyblock_core/scripts/charts/templates/monitoring_configmap.yaml +++ b/simplyblock_core/scripts/charts/templates/monitoring_configmap.yaml @@ -49,7 +49,7 @@ data: type: FILESYSTEM config: directory: /mnt/thanos -{{- if .Values.monitoring.enabled }} +{{- if .Values.observability.enabled }} --- apiVersion: v1 kind: ConfigMap diff --git a/simplyblock_core/scripts/charts/templates/monitoring_k8s.yaml b/simplyblock_core/scripts/charts/templates/monitoring_k8s.yaml index 919cbe74b..831c848d8 100644 --- a/simplyblock_core/scripts/charts/templates/monitoring_k8s.yaml +++ b/simplyblock_core/scripts/charts/templates/monitoring_k8s.yaml @@ -1,4 +1,4 @@ -{{- if .Values.monitoring.enabled }} +{{- if .Values.observability.enabled }} --- apiVersion: apps/v1 kind: Deployment @@ -65,7 +65,7 @@ spec: - name: GRAYLOG_ELASTICSEARCH_HOSTS value: "http://opensearch-cluster-master:9200" - name: GRAYLOG_MONGODB_URI - value: "mongodb://admin:{{ .Values.monitoring.secret }}@simplyblock-mongo-svc:27017/graylog" + value: "mongodb://admin:{{ .Values.observability.secret }}@simplyblock-mongo-svc:27017/graylog" - name: GRAYLOG_SKIP_PREFLIGHT_CHECKS value: "true" - name: GRAYLOG_ROTATION_STRATEGY diff --git a/simplyblock_core/scripts/charts/templates/monitoring_secret.yaml b/simplyblock_core/scripts/charts/templates/monitoring_secret.yaml index c39735159..a1923a850 100644 --- a/simplyblock_core/scripts/charts/templates/monitoring_secret.yaml +++ b/simplyblock_core/scripts/charts/templates/monitoring_secret.yaml @@ -1,4 +1,4 @@ -{{- if .Values.monitoring.enabled }} +{{- if .Values.observability.enabled }} apiVersion: v1 kind: Secret metadata: @@ -6,7 +6,7 @@ metadata: namespace: {{ .Release.Namespace }} type: Opaque stringData: - MONITORING_SECRET: "{{ .Values.monitoring.secret }}" + MONITORING_SECRET: "{{ .Values.observability.secret }}" GRAFANA_ENDPOINT: "{{ .Values.grafana.endpoint }}" --- @@ -17,7 +17,7 @@ metadata: namespace: {{ .Release.Namespace }} type: Opaque stringData: - GRAYLOG_PASSWORD_SECRET: "{{ .Values.graylog.passwordSecret }}" - GRAYLOG_ROOT_PASSWORD_SHA2: "{{ .Values.graylog.rootPasswordSha2 }}" - MAX_NUMBER_OF_INDICES: "{{ .Values.log.maxNumberIndex }}" + GRAYLOG_PASSWORD_SECRET: "{{ .Values.observability.graylog.passwordSecret }}" + GRAYLOG_ROOT_PASSWORD_SHA2: "{{ .Values.observability.graylog.rootPasswordSha2 }}" + MAX_NUMBER_OF_INDICES: "{{ .Values.observability.graylog.maxNumberIndex }}" {{- end }} diff --git a/simplyblock_core/scripts/charts/templates/monitoring_svc.yaml b/simplyblock_core/scripts/charts/templates/monitoring_svc.yaml index 55b15dccc..4680f2595 100644 --- a/simplyblock_core/scripts/charts/templates/monitoring_svc.yaml +++ b/simplyblock_core/scripts/charts/templates/monitoring_svc.yaml @@ -1,4 +1,4 @@ -{{- if .Values.monitoring.enabled }} +{{- if .Values.observability.enabled }} --- apiVersion: v1 kind: Service From 688466286205b923951261cc6f68d61fe633d38c Mon Sep 17 00:00:00 2001 From: geoffrey1330 Date: Fri, 27 Mar 2026 17:00:38 +0100 Subject: [PATCH 07/70] fixed the helm values --- simplyblock_core/scripts/charts/templates/app_configmap.yaml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/simplyblock_core/scripts/charts/templates/app_configmap.yaml b/simplyblock_core/scripts/charts/templates/app_configmap.yaml index de0a4da08..b6e22f1af 100644 --- a/simplyblock_core/scripts/charts/templates/app_configmap.yaml +++ b/simplyblock_core/scripts/charts/templates/app_configmap.yaml @@ -6,8 +6,8 @@ metadata: namespace: {{ .Release.Namespace }} data: - LOG_LEVEL: {{ .Values.log.level }} - LOG_DELETION_INTERVAL: {{ .Values.log.deletionInterval }} + LOG_LEVEL: {{ .Values.observability.level }} + LOG_DELETION_INTERVAL: {{ .Values.observability.deletionInterval }} --- From 83a0261ffe004d55a39ef343004a459bad6aa5fd Mon Sep 17 00:00:00 2001 From: geoffrey1330 Date: Fri, 27 Mar 2026 17:04:06 +0100 Subject: [PATCH 08/70] added crds --- ...ock.simplyblock.io_simplyblockdevices.yaml | 135 ++++++++++++++ ...block.simplyblock.io_simplyblocklvols.yaml | 144 +++++++++++++++ ...block.simplyblock.io_simplyblockpools.yaml | 96 ++++++++++ ...lyblock.io_simplyblockstorageclusters.yaml | 173 ++++++++++++++++++ ...block.simplyblock.io_simplyblocktasks.yaml | 84 +++++++++ 5 files changed, 632 insertions(+) create mode 100644 simplyblock_core/scripts/charts/crds/simplyblock.simplyblock.io_simplyblockdevices.yaml create mode 100644 simplyblock_core/scripts/charts/crds/simplyblock.simplyblock.io_simplyblocklvols.yaml create mode 100644 simplyblock_core/scripts/charts/crds/simplyblock.simplyblock.io_simplyblockpools.yaml create mode 100644 simplyblock_core/scripts/charts/crds/simplyblock.simplyblock.io_simplyblockstorageclusters.yaml create mode 100644 simplyblock_core/scripts/charts/crds/simplyblock.simplyblock.io_simplyblocktasks.yaml diff --git a/simplyblock_core/scripts/charts/crds/simplyblock.simplyblock.io_simplyblockdevices.yaml b/simplyblock_core/scripts/charts/crds/simplyblock.simplyblock.io_simplyblockdevices.yaml new file mode 100644 index 000000000..272030736 --- /dev/null +++ b/simplyblock_core/scripts/charts/crds/simplyblock.simplyblock.io_simplyblockdevices.yaml @@ -0,0 +1,135 @@ +--- +apiVersion: apiextensions.k8s.io/v1 +kind: CustomResourceDefinition +metadata: + annotations: + controller-gen.kubebuilder.io/version: v0.19.0 + name: simplyblockdevices.simplyblock.simplyblock.io +spec: + group: simplyblock.simplyblock.io + names: + kind: SimplyBlockDevice + listKind: SimplyBlockDeviceList + plural: simplyblockdevices + singular: simplyblockdevice + scope: Namespaced + versions: + - name: v1alpha1 + schema: + openAPIV3Schema: + description: SimplyBlockDevice is the Schema for the simplyblockdevices API + properties: + apiVersion: + description: |- + APIVersion defines the versioned schema of this representation of an object. + Servers should convert recognized schemas to the latest internal value, and + may reject unrecognized values. + More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources + type: string + kind: + description: |- + Kind is a string value representing the REST resource this object represents. + Servers may infer this from the endpoint the client submits requests to. + Cannot be updated. + In CamelCase. + More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds + type: string + metadata: + type: object + spec: + description: spec defines the desired state of SimplyBlockDevice + properties: + action: + enum: + - remove + - restart + type: string + clusterName: + type: string + deviceID: + type: string + nodeUUID: + type: string + required: + - clusterName + type: object + status: + description: status defines the observed state of SimplyBlockDevice + properties: + actionStatus: + properties: + action: + type: string + message: + type: string + nodeUUID: + type: string + observedGeneration: + format: int64 + type: integer + state: + type: string + triggered: + type: boolean + updatedAt: + format: date-time + type: string + type: object + nodes: + items: + properties: + devices: + items: + properties: + health: + type: string + model: + type: string + size: + type: string + stats: + items: + properties: + capacityUtil: + format: int64 + type: integer + riops: + format: int64 + type: integer + rtp: + format: int64 + type: integer + wiops: + format: int64 + type: integer + wtp: + format: int64 + type: integer + type: object + type: array + status: + type: string + utilization: + format: int64 + type: integer + uuid: + type: string + type: object + type: array + nodeUUID: + type: string + type: object + type: array + type: object + required: + - spec + type: object + x-kubernetes-validations: + - message: nodeUUID and deviceID are required when action is specified + rule: '!(has(self.spec.action) && self.spec.action != "" && ((!has(self.spec.nodeUUID) + || self.spec.nodeUUID == "") || (!has(self.spec.deviceID) || self.spec.deviceID + == "")))' + served: true + storage: true + subresources: + status: {} diff --git a/simplyblock_core/scripts/charts/crds/simplyblock.simplyblock.io_simplyblocklvols.yaml b/simplyblock_core/scripts/charts/crds/simplyblock.simplyblock.io_simplyblocklvols.yaml new file mode 100644 index 000000000..8e44a687d --- /dev/null +++ b/simplyblock_core/scripts/charts/crds/simplyblock.simplyblock.io_simplyblocklvols.yaml @@ -0,0 +1,144 @@ +--- +apiVersion: apiextensions.k8s.io/v1 +kind: CustomResourceDefinition +metadata: + annotations: + controller-gen.kubebuilder.io/version: v0.19.0 + name: simplyblocklvols.simplyblock.simplyblock.io +spec: + group: simplyblock.simplyblock.io + names: + kind: SimplyBlockLvol + listKind: SimplyBlockLvolList + plural: simplyblocklvols + singular: simplyblocklvol + scope: Namespaced + versions: + - additionalPrinterColumns: + - jsonPath: .status.lvols.length() + name: LVOLs + type: integer + name: v1alpha1 + schema: + openAPIV3Schema: + description: SimplyBlockLvol is the Schema for the simplyblocklvols API + properties: + apiVersion: + description: |- + APIVersion defines the versioned schema of this representation of an object. + Servers should convert recognized schemas to the latest internal value, and + may reject unrecognized values. + More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources + type: string + kind: + description: |- + Kind is a string value representing the REST resource this object represents. + Servers may infer this from the endpoint the client submits requests to. + Cannot be updated. + In CamelCase. + More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds + type: string + metadata: + type: object + spec: + description: spec defines the desired state of SimplyBlockLvol + properties: + clusterName: + type: string + poolName: + type: string + required: + - clusterName + - poolName + type: object + status: + description: status defines the observed state of SimplyBlockLvol + properties: + configured: + type: boolean + lvols: + items: + properties: + blobID: + format: int64 + type: integer + clonedFromSnap: + type: string + createDt: + format: date-time + type: string + fabric: + type: string + ha: + type: boolean + health: + type: boolean + hostname: + type: string + isCrypto: + type: boolean + lvolName: + type: string + maxNamespacesPerSubsystem: + format: int64 + type: integer + namespaceID: + format: int64 + type: integer + nodeUUID: + items: + type: string + type: array + nqn: + type: string + poolName: + type: string + poolUUID: + type: string + pvcName: + type: string + qosClass: + format: int64 + type: integer + qosIOPS: + format: int64 + type: integer + qosRTP: + format: int64 + type: integer + qosRWTP: + format: int64 + type: integer + qosWTP: + format: int64 + type: integer + size: + type: string + snapName: + type: string + status: + type: string + stripeWdata: + format: int64 + type: integer + stripeWparity: + format: int64 + type: integer + subsysPort: + format: int64 + type: integer + updateDt: + format: date-time + type: string + uuid: + type: string + type: object + type: array + type: object + required: + - spec + type: object + served: true + storage: true + subresources: + status: {} diff --git a/simplyblock_core/scripts/charts/crds/simplyblock.simplyblock.io_simplyblockpools.yaml b/simplyblock_core/scripts/charts/crds/simplyblock.simplyblock.io_simplyblockpools.yaml new file mode 100644 index 000000000..693322dc3 --- /dev/null +++ b/simplyblock_core/scripts/charts/crds/simplyblock.simplyblock.io_simplyblockpools.yaml @@ -0,0 +1,96 @@ +--- +apiVersion: apiextensions.k8s.io/v1 +kind: CustomResourceDefinition +metadata: + annotations: + controller-gen.kubebuilder.io/version: v0.19.0 + name: simplyblockpools.simplyblock.simplyblock.io +spec: + group: simplyblock.simplyblock.io + names: + kind: SimplyBlockPool + listKind: SimplyBlockPoolList + plural: simplyblockpools + singular: simplyblockpool + scope: Namespaced + versions: + - name: v1alpha1 + schema: + openAPIV3Schema: + description: SimplyBlockPool is the Schema for the pools API + properties: + apiVersion: + description: |- + APIVersion defines the versioned schema of this representation of an object. + Servers should convert recognized schemas to the latest internal value, and + may reject unrecognized values. + More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources + type: string + kind: + description: |- + Kind is a string value representing the REST resource this object represents. + Servers may infer this from the endpoint the client submits requests to. + Cannot be updated. + In CamelCase. + More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds + type: string + metadata: + type: object + spec: + description: spec defines the desired state of Pool + properties: + action: + type: string + capacityLimit: + type: string + clusterName: + type: string + name: + type: string + qosIOPSLimit: + format: int32 + type: integer + rLimit: + format: int32 + type: integer + rwLimit: + format: int32 + type: integer + status: + type: string + wLimit: + format: int32 + type: integer + required: + - clusterName + - name + type: object + status: + description: status defines the observed state of Pool + properties: + qosHost: + type: string + qosIOPSLimit: + format: int32 + type: integer + rLimit: + format: int32 + type: integer + rwLimit: + format: int32 + type: integer + status: + type: string + uuid: + type: string + wLimit: + format: int32 + type: integer + type: object + required: + - spec + type: object + served: true + storage: true + subresources: + status: {} diff --git a/simplyblock_core/scripts/charts/crds/simplyblock.simplyblock.io_simplyblockstorageclusters.yaml b/simplyblock_core/scripts/charts/crds/simplyblock.simplyblock.io_simplyblockstorageclusters.yaml new file mode 100644 index 000000000..cfd99fdee --- /dev/null +++ b/simplyblock_core/scripts/charts/crds/simplyblock.simplyblock.io_simplyblockstorageclusters.yaml @@ -0,0 +1,173 @@ +--- +apiVersion: apiextensions.k8s.io/v1 +kind: CustomResourceDefinition +metadata: + annotations: + controller-gen.kubebuilder.io/version: v0.19.0 + name: simplyblockstorageclusters.simplyblock.simplyblock.io +spec: + group: simplyblock.simplyblock.io + names: + kind: SimplyBlockStorageCluster + listKind: SimplyBlockStorageClusterList + plural: simplyblockstorageclusters + singular: simplyblockstoragecluster + scope: Namespaced + versions: + - name: v1alpha1 + schema: + openAPIV3Schema: + description: SimplyBlockStorageCluster is the Schema for the simplyblockstorageclusters + API + properties: + apiVersion: + description: |- + APIVersion defines the versioned schema of this representation of an object. + Servers should convert recognized schemas to the latest internal value, and + may reject unrecognized values. + More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources + type: string + kind: + description: |- + Kind is a string value representing the REST resource this object represents. + Servers may infer this from the endpoint the client submits requests to. + Cannot be updated. + In CamelCase. + More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds + type: string + metadata: + type: object + spec: + description: spec defines the desired state of SimplyBlockStorageCluster + properties: + action: + enum: + - activate + - expand + type: string + blkSize: + format: int32 + type: integer + capCrit: + format: int32 + type: integer + capWarn: + format: int32 + type: integer + clientQpairCount: + format: int32 + type: integer + clusterName: + type: string + distrBs: + format: int32 + type: integer + distrChunkBs: + format: int32 + type: integer + enableNodeAffinity: + type: boolean + eventLogEntries: + format: int32 + type: integer + fabric: + type: string + haType: + type: string + includeEventLog: + type: boolean + inflightIOThreshold: + format: int32 + type: integer + isSingleNode: + type: boolean + maxQueueSize: + format: int32 + type: integer + mgmtIfc: + description: Create-only + type: string + pageSizeInBlocks: + format: int32 + type: integer + provCapCrit: + format: int32 + type: integer + provCapWarn: + format: int32 + type: integer + qosClasses: + description: Updatable + type: string + qpairCount: + format: int32 + type: integer + strictNodeAntiAffinity: + type: boolean + stripeWdata: + format: int32 + type: integer + stripeWparity: + format: int32 + type: integer + required: + - clusterName + type: object + status: + description: status defines the observed state of SimplyBlockStorageCluster + properties: + MOD: + type: string + NQN: + type: string + UUID: + type: string + actionStatus: + properties: + action: + type: string + message: + type: string + nodeUUID: + type: string + observedGeneration: + format: int64 + type: integer + state: + type: string + triggered: + type: boolean + updatedAt: + format: date-time + type: string + type: object + clusterName: + type: string + configured: + type: boolean + created: + format: date-time + type: string + lastUpdated: + format: date-time + type: string + mgmtNodes: + format: int32 + type: integer + rebalancing: + type: boolean + secretName: + type: string + status: + type: string + storageNodes: + format: int32 + type: integer + type: object + required: + - spec + type: object + served: true + storage: true + subresources: + status: {} diff --git a/simplyblock_core/scripts/charts/crds/simplyblock.simplyblock.io_simplyblocktasks.yaml b/simplyblock_core/scripts/charts/crds/simplyblock.simplyblock.io_simplyblocktasks.yaml new file mode 100644 index 000000000..2d25e21e1 --- /dev/null +++ b/simplyblock_core/scripts/charts/crds/simplyblock.simplyblock.io_simplyblocktasks.yaml @@ -0,0 +1,84 @@ +--- +apiVersion: apiextensions.k8s.io/v1 +kind: CustomResourceDefinition +metadata: + annotations: + controller-gen.kubebuilder.io/version: v0.19.0 + name: simplyblocktasks.simplyblock.simplyblock.io +spec: + group: simplyblock.simplyblock.io + names: + kind: SimplyBlockTask + listKind: SimplyBlockTaskList + plural: simplyblocktasks + singular: simplyblocktask + scope: Namespaced + versions: + - name: v1alpha1 + schema: + openAPIV3Schema: + description: SimplyBlockTask is the Schema for the simplyblocktasks API + properties: + apiVersion: + description: |- + APIVersion defines the versioned schema of this representation of an object. + Servers should convert recognized schemas to the latest internal value, and + may reject unrecognized values. + More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources + type: string + kind: + description: |- + Kind is a string value representing the REST resource this object represents. + Servers may infer this from the endpoint the client submits requests to. + Cannot be updated. + In CamelCase. + More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds + type: string + metadata: + type: object + spec: + description: spec defines the desired state of SimplyBlockTask + properties: + clusterName: + type: string + subtasks: + type: boolean + taskID: + type: string + required: + - clusterName + type: object + status: + description: status defines the observed state of SimplyBlockTask + properties: + tasks: + items: + properties: + canceled: + type: boolean + parentTask: + type: string + retried: + format: int32 + type: integer + startedAt: + format: date-time + type: string + taskResult: + type: string + taskStatus: + type: string + taskType: + type: string + uuid: + type: string + type: object + type: array + type: object + required: + - spec + type: object + served: true + storage: true + subresources: + status: {} From 59f07f0fe4cb4ae4a2127af2a3f8fe42236d2638 Mon Sep 17 00:00:00 2001 From: geoffrey1330 Date: Fri, 27 Mar 2026 17:09:15 +0100 Subject: [PATCH 09/70] fixed the helm values --- .../scripts/charts/templates/monitoring_configmap.yaml | 2 +- .../scripts/charts/templates/monitoring_secret.yaml | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/simplyblock_core/scripts/charts/templates/monitoring_configmap.yaml b/simplyblock_core/scripts/charts/templates/monitoring_configmap.yaml index ec10c9bf4..bc20ffb9d 100644 --- a/simplyblock_core/scripts/charts/templates/monitoring_configmap.yaml +++ b/simplyblock_core/scripts/charts/templates/monitoring_configmap.yaml @@ -833,7 +833,7 @@ data: type: slack settings: username: grafana_bot - url: '{{ .Values.grafana.contactPoint }}' + url: '{{ .Values.observability.grafana.contactPoint }}' title: | '{{ "{{" }} template "slack.title" . {{ "}}" }}' text: | diff --git a/simplyblock_core/scripts/charts/templates/monitoring_secret.yaml b/simplyblock_core/scripts/charts/templates/monitoring_secret.yaml index a1923a850..df741f026 100644 --- a/simplyblock_core/scripts/charts/templates/monitoring_secret.yaml +++ b/simplyblock_core/scripts/charts/templates/monitoring_secret.yaml @@ -7,7 +7,7 @@ metadata: type: Opaque stringData: MONITORING_SECRET: "{{ .Values.observability.secret }}" - GRAFANA_ENDPOINT: "{{ .Values.grafana.endpoint }}" + GRAFANA_ENDPOINT: "{{ .Values.observability.grafana.endpoint }}" --- apiVersion: v1 From 419663deb4315a9960ca5af3fd16199791bf87d2 Mon Sep 17 00:00:00 2001 From: hamdykhader Date: Fri, 27 Mar 2026 20:16:22 +0300 Subject: [PATCH 10/70] feat: remove clone_name from clone endpoint and update to POST method --- simplyblock_web/api/v1/lvol.py | 4 ---- 1 file changed, 4 deletions(-) diff --git a/simplyblock_web/api/v1/lvol.py b/simplyblock_web/api/v1/lvol.py index d7218085b..ef20d4ec0 100644 --- a/simplyblock_web/api/v1/lvol.py +++ b/simplyblock_web/api/v1/lvol.py @@ -330,10 +330,6 @@ def replication_stop(uuid): ret = lvol_controller.replication_stop(uuid) return utils.get_response(ret) -@bp.route('/lvol/clone//', methods=['GET']) -def clone(uuid, clone_name): - ret = lvol_controller.clone_lvol(uuid, clone_name) - return utils.get_response(ret) @bp.route('/lvol/clone', methods=['POST']) def clone(): From df32e3e792dffeee3aee1b7f43d91afe4815a962 Mon Sep 17 00:00:00 2001 From: hamdykhader Date: Fri, 27 Mar 2026 20:25:08 +0300 Subject: [PATCH 11/70] feat: refactor migration handling to use list instead of set for active lvols --- simplyblock_core/controllers/snapshot_controller.py | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/simplyblock_core/controllers/snapshot_controller.py b/simplyblock_core/controllers/snapshot_controller.py index 0cae8efec..7d6029932 100644 --- a/simplyblock_core/controllers/snapshot_controller.py +++ b/simplyblock_core/controllers/snapshot_controller.py @@ -282,15 +282,12 @@ def list(all=False, cluster_id=None, with_details=False): snaps = sorted(snaps, key=lambda snap: snap.created_at) # Build set of lvol UUIDs with active migrations (single DB scan) - migrating_lvols = set() + migrating_lvols = [] for m in db_controller.get_migrations(): if m.is_active(): - migrating_lvols.add(m.lvol_id) + migrating_lvols.append(m.lvol_id) data = [] for snap in snaps: - if node_id: - if snap.lvol.node_id != node_id: - continue logger.debug(snap) clones = [] for lvol in db_controller.get_lvols(): From cacf45c8f138d5fa8eb0dba038d2bd4511abf920 Mon Sep 17 00:00:00 2001 From: geoffrey1330 Date: Fri, 27 Mar 2026 18:49:33 +0100 Subject: [PATCH 12/70] fixed CapacityStatDTO is not defined --- simplyblock_web/api/v2/dtos.py | 22 ++++++++++++++++++++-- 1 file changed, 20 insertions(+), 2 deletions(-) diff --git a/simplyblock_web/api/v2/dtos.py b/simplyblock_web/api/v2/dtos.py index 61b8f4ad2..1986db429 100644 --- a/simplyblock_web/api/v2/dtos.py +++ b/simplyblock_web/api/v2/dtos.py @@ -14,10 +14,30 @@ from simplyblock_core.models.snapshot import SnapShot from simplyblock_core.models.storage_node import StorageNode from simplyblock_core.models.backup import Backup, BackupPolicy +from simplyblock_core.models.stats import StatsObject from simplyblock_core.models.lvol_migration import LVolMigration from . import util +class CapacityStatDTO(BaseModel): + date: int + size_total: int + size_prov: int + size_used: int + size_free: int + size_util: int + + @staticmethod + def from_model(model: StatsObject): + return CapacityStatDTO( + date=model.date, + size_total=model.size_total, + size_prov=model.size_prov, + size_used=model.size_used, + size_free=model.size_free, + size_util=model.size_util, + ) + class ClusterDTO(BaseModel): id: UUID @@ -374,6 +394,4 @@ def from_model(model: LVolMigration): error_message=model.error_message or "", started_at=model.started_at, completed_at=model.completed_at, - rep_info=rep_info, - from_source=model.from_source ) From b02c5bcb7564fefa24df9c1c8122ead0b9e6b871 Mon Sep 17 00:00:00 2001 From: geoffrey1330 Date: Fri, 27 Mar 2026 20:08:52 +0100 Subject: [PATCH 13/70] feat: enhance cluster and volume DTOs to include capacity statistics and refactor list endpoints for improved data retrieval --- simplyblock_web/api/v2/cluster.py | 7 +++++ simplyblock_web/api/v2/dtos.py | 52 ++++++++++++++++++++++++++----- simplyblock_web/api/v2/volume.py | 13 +++++--- 3 files changed, 60 insertions(+), 12 deletions(-) diff --git a/simplyblock_web/api/v2/cluster.py b/simplyblock_web/api/v2/cluster.py index c8b7b047d..e8e937c6a 100644 --- a/simplyblock_web/api/v2/cluster.py +++ b/simplyblock_web/api/v2/cluster.py @@ -46,6 +46,13 @@ class ClusterParams(BaseModel): inflight_io_threshold: int = 4 enable_node_affinity: bool = False strict_node_anti_affinity: bool = False + is_single_node: bool = False + fabric: str = "tcp" + cr_name: str = "" + cr_namespace: str = "" + cr_plural: str = "" + cluster_ip: str = "" + grafana_secret: str = "" client_data_nic: str = "" diff --git a/simplyblock_web/api/v2/dtos.py b/simplyblock_web/api/v2/dtos.py index 1986db429..1ad888351 100644 --- a/simplyblock_web/api/v2/dtos.py +++ b/simplyblock_web/api/v2/dtos.py @@ -58,9 +58,10 @@ class ClusterDTO(BaseModel): tls_enabled: bool max_fault_tolerance: int backup_enabled: bool + capacity: CapacityStatDTO @staticmethod - def from_model(model: Cluster): + def from_model(model: Cluster, stat_obj: Optional[StatsObject]=None): return ClusterDTO( id=UUID(model.get_id()), name=model.cluster_name, @@ -80,6 +81,7 @@ def from_model(model: Cluster): tls_enabled=model.tls, max_fault_tolerance=model.max_fault_tolerance, backup_enabled=bool(model.backup_config), + capacity=CapacityStatDTO.from_model(stat_obj if stat_obj else StatsObject()), ) @@ -93,9 +95,10 @@ class DeviceDTO(BaseModel): nvmf_ips: List[IPv4Address] nvmf_nqn: str = "" nvmf_port: int = 0 + capacity: CapacityStatDTO @staticmethod - def from_model(model: NVMeDevice): + def from_model(model: NVMeDevice, stat_obj: Optional[StatsObject]=None): return DeviceDTO( id=UUID(model.get_id()), status=model.status, @@ -106,6 +109,7 @@ def from_model(model: NVMeDevice): nvmf_ips=[IPv4Address(ip) for ip in model.nvmf_ip.split(',')], nvmf_nqn=model.nvmf_nqn, nvmf_port=model.nvmf_port, + capacity=CapacityStatDTO.from_model(stat_obj if stat_obj else StatsObject()), ) @@ -135,10 +139,11 @@ class StoragePoolDTO(BaseModel): max_rw_mbytes: util.Unsigned max_r_mbytes: util.Unsigned max_w_mbytes: util.Unsigned + capacity: CapacityStatDTO sec_options: dict = {} @staticmethod - def from_model(model: Pool): + def from_model(model: Pool, stat_obj: Optional[StatsObject]=None): return StoragePoolDTO( id=UUID(model.get_id()), name=model.pool_name, @@ -150,6 +155,7 @@ def from_model(model: Pool): max_r_mbytes=model.max_r_mbytes_per_sec, max_w_mbytes=model.max_w_mbytes_per_sec, sec_options=model.sec_options, + capacity=CapacityStatDTO.from_model(stat_obj if stat_obj else StatsObject()), ) @@ -191,14 +197,34 @@ def from_model(model: SnapShot, request: Request, cluster_id, pool_id, volume_id class StorageNodeDTO(BaseModel): id: UUID status: str - ip: IPv4Address + hostname: str + cpu: int + spdk_mem: int + lvols: int + rpc_port: int + lvol_subsys_port: int + nvmf_port: int + mgmt_ip: IPv4Address + health_check: bool + online_devices: str + capacity: CapacityStatDTO @staticmethod - def from_model(model: StorageNode): + def from_model(model: StorageNode, stat_obj: Optional[StatsObject]=None): return StorageNodeDTO( id=UUID(model.get_id()), status=model.status, - ip=IPv4Address(model.mgmt_ip), + hostname=model.hostname, + cpu=model.cpu, + spdk_mem=model.spdk_mem, + lvols=model.lvols, + rpc_port=model.rpc_port, + lvol_subsys_port=model.lvol_subsys_port, + nvmf_port=model.nvmf_port, + mgmt_ip=IPv4Address(model.mgmt_ip), + health_check=model.health_check, + online_devices=f"{len(model.nvme_devices)}/{len([d for d in model.nvme_devices if d.status=='online'])}", + capacity=CapacityStatDTO.from_model(stat_obj if stat_obj else StatsObject()), ) @@ -215,7 +241,7 @@ class TaskDTO(BaseModel): @staticmethod def from_model(model: JobSchedule): return TaskDTO( - id=UUID(model.get_id()), + id=UUID(model.uuid), status=model.status, canceled=model.canceled, function_name=model.function_name, @@ -233,9 +259,19 @@ class VolumeDTO(BaseModel): health_check: bool migrating: bool nqn: str + hostname: str + fabric: str nodes: List[util.UrlPath] port: util.Port size: util.Unsigned + ndcs: int + npcs: int + pool_uuid: str + pool_name: str + pvc_name: str = "" + snapshot_name: str = "" + blobid: int + ns_id: int cloned_from: Optional[util.UrlPath] crypto_key: Optional[Tuple[str, str]] high_availability: bool @@ -267,6 +303,8 @@ def from_model(model: LVol, request: Request, cluster_id: str, stat_obj: Optiona health_check=model.health_check, migrating=active_mig is not None, nqn=model.nqn, + hostname=model.hostname, + fabric=model.fabric, nodes=[ str(request.url_for( 'clusters:storage-nodes:detail', diff --git a/simplyblock_web/api/v2/volume.py b/simplyblock_web/api/v2/volume.py index 9eeb25b1d..3f67e7d1e 100644 --- a/simplyblock_web/api/v2/volume.py +++ b/simplyblock_web/api/v2/volume.py @@ -21,11 +21,14 @@ @api.get('/', name='clusters:storage-pools:volumes:list') def list(request: Request, cluster: Cluster, pool: StoragePool) -> List[VolumeDTO]: - return [ - VolumeDTO.from_model(lvol, request, cluster.get_id()) - for lvol - in db.get_lvols_by_pool_id(pool.get_id()) - ] + data = [] + for lvol in db.get_lvols_by_pool_id(pool.get_id()): + stat_obj = None + ret = db.get_lvol_stats(lvol, 1) + if ret: + stat_obj = ret[0] + data.append(VolumeDTO.from_model(lvol, request, cluster.get_id(), stat_obj)) + return data class _CreateParams(BaseModel): From 896048285e68b3b422a8e646120158695a83a136 Mon Sep 17 00:00:00 2001 From: geoffrey1330 Date: Fri, 27 Mar 2026 21:33:02 +0100 Subject: [PATCH 14/70] added changes to fix broken operator operation --- simplyblock_core/cluster_ops.py | 90 ++++++++++++------- simplyblock_core/constants.py | 9 ++ .../controllers/cluster_events.py | 28 ++++++ simplyblock_core/controllers/device_events.py | 20 +++++ simplyblock_core/controllers/lvol_events.py | 76 +++++++++++++++- .../controllers/pool_controller.py | 22 ++++- simplyblock_core/controllers/pool_events.py | 25 +++++- .../controllers/storage_events.py | 67 +++++++++++++- simplyblock_core/mgmt_node_ops.py | 10 +-- simplyblock_core/models/cluster.py | 3 + simplyblock_core/models/pool.py | 6 ++ simplyblock_core/models/storage_node.py | 3 + simplyblock_core/rpc_client.py | 7 +- simplyblock_core/storage_node_ops.py | 25 +++--- simplyblock_core/utils/__init__.py | 13 +-- simplyblock_web/api/v1/__init__.py | 27 ++++++ simplyblock_web/api/v1/cluster.py | 58 +++++++++++- simplyblock_web/api/v2/__init__.py | 4 + simplyblock_web/api/v2/cluster.py | 31 ++++--- simplyblock_web/api/v2/device.py | 31 +++++-- simplyblock_web/api/v2/dtos.py | 13 ++- simplyblock_web/api/v2/pool.py | 32 +++++-- simplyblock_web/api/v2/storage_node.py | 60 ++++++++----- simplyblock_web/api/v2/task.py | 15 ++-- simplyblock_web/auth_middleware.py | 4 + 25 files changed, 551 insertions(+), 128 deletions(-) diff --git a/simplyblock_core/cluster_ops.py b/simplyblock_core/cluster_ops.py index 869097cf7..9a2039440 100644 --- a/simplyblock_core/cluster_ops.py +++ b/simplyblock_core/cluster_ops.py @@ -80,7 +80,7 @@ def _create_update_user(cluster_id, grafana_url, grafana_secret, user_secret, up def _add_graylog_input(cluster_ip, password): - base_url = f"http://{cluster_ip}/graylog/api" + base_url = f"{cluster_ip}/api" input_url = f"{base_url}/system/inputs" retries = 30 @@ -161,7 +161,7 @@ def _add_graylog_input(cluster_ip, password): def _set_max_result_window(cluster_ip, max_window=100000): - url_existing_indices = f"http://{cluster_ip}/opensearch/_all/_settings" + url_existing_indices = f"{cluster_ip}/_all/_settings" retries = 30 reachable=False @@ -188,7 +188,7 @@ def _set_max_result_window(cluster_ip, max_window=100000): logger.error(f"Failed to update settings for existing indices: {response.text}") return False - url_template = f"http://{cluster_ip}/opensearch/_template/all_indices_template" + url_template = f"{cluster_ip}/_template/all_indices_template" payload_template = json.dumps({ "index_patterns": ["*"], "settings": { @@ -290,8 +290,6 @@ def create_cluster(blk_size, page_size_in_blocks, cli_pass, if not dev_ip: raise ValueError("Error getting ip: For Kubernetes-based deployments, please supply --mgmt-ip.") - current_node = utils.get_node_name_by_ip(dev_ip) - utils.label_node_as_mgmt_plane(current_node) if not cli_pass: cli_pass = utils.generate_string(10) @@ -324,12 +322,16 @@ def create_cluster(blk_size, page_size_in_blocks, cli_pass, cluster.fabric_tcp = protocols["tcp"] cluster.fabric_rdma = protocols["rdma"] cluster.is_single_node = is_single_node - if grafana_endpoint: - cluster.grafana_endpoint = grafana_endpoint - elif ingress_host_source == "hostip": - cluster.grafana_endpoint = f"http://{dev_ip}/grafana" + if ingress_host_source == "hostip": + base = dev_ip else: - cluster.grafana_endpoint = f"http://{dns_name}/grafana" + base = dns_name + + graylog_endpoint = f"http://{base}/graylog" + os_endpoint = f"http://{base}/opensearch" + default_grafana = f"http://{base}/grafana" + + cluster.grafana_endpoint = grafana_endpoint or default_grafana cluster.enable_node_affinity = enable_node_affinity cluster.qpair_count = qpair_count or constants.QPAIR_COUNT cluster.client_qpair_count = client_qpair_count or constants.CLIENT_QPAIR_COUNT @@ -382,9 +384,9 @@ def create_cluster(blk_size, page_size_in_blocks, cli_pass, if ingress_host_source == "hostip": dns_name = dev_ip - _set_max_result_window(dns_name) + _set_max_result_window(os_endpoint) - _add_graylog_input(dns_name, monitoring_secret) + _add_graylog_input(graylog_endpoint, monitoring_secret) _create_update_user(cluster.uuid, cluster.grafana_endpoint, monitoring_secret, cluster.secret) @@ -458,26 +460,26 @@ def _run_fio(mount_point) -> None: def add_cluster(blk_size, page_size_in_blocks, cap_warn, cap_crit, prov_cap_warn, prov_cap_crit, distr_ndcs, distr_npcs, distr_bs, distr_chunk_bs, ha_type, enable_node_affinity, qpair_count, - max_queue_size, inflight_io_threshold, strict_node_anti_affinity, is_single_node, name, fabric="tcp", + max_queue_size, inflight_io_threshold, strict_node_anti_affinity, is_single_node, name, cr_name=None, + cr_namespace=None, cr_plural=None, fabric="tcp", cluster_ip=None, grafana_secret=None, client_data_nic="", max_fault_tolerance=1, backup_config=None, nvmf_base_port=4420, rpc_base_port=8080, snode_api_port=50001) -> str: + + default_cluster = None + monitoring_secret = os.environ.get("MONITORING_SECRET", "") + enable_monitoring = os.environ.get("ENABLE_MONITORING", "") clusters = db_controller.get_clusters() - if not clusters: - raise ValueError("No previous clusters found!") + if clusters: + default_cluster = clusters[0] + else: + logger.info("No previous clusters found") if distr_ndcs == 0 and distr_npcs == 0: raise ValueError("both distr_ndcs and distr_npcs cannot be 0") - if max_fault_tolerance > 1: - if ha_type != "ha": - raise ValueError("max_fault_tolerance > 1 requires ha_type='ha'") - if distr_npcs < 2: - raise ValueError("max_fault_tolerance > 1 requires distr_npcs >= 2") - - monitoring_secret = os.environ.get("MONITORING_SECRET", "") - logger.info("Adding new cluster") + cluster = Cluster() cluster.uuid = str(uuid.uuid4()) cluster.cluster_name = name @@ -486,14 +488,40 @@ def add_cluster(blk_size, page_size_in_blocks, cap_warn, cap_crit, prov_cap_warn cluster.nqn = f"{constants.CLUSTER_NQN}:{cluster.uuid}" cluster.secret = utils.generate_string(20) cluster.strict_node_anti_affinity = strict_node_anti_affinity + if default_cluster: + cluster.mode = default_cluster.mode + cluster.db_connection = default_cluster.db_connection + cluster.grafana_secret = grafana_secret if grafana_secret else default_cluster.grafana_secret + cluster.grafana_endpoint = default_cluster.grafana_endpoint + else: + # creating first cluster on k8s + cluster.mode = "kubernetes" + logger.info("Retrieving foundationdb connection string...") + fdb_cluster_string = utils.get_fdb_cluster_string(constants.FDB_CONFIG_NAME, constants.K8S_NAMESPACE) + cluster.db_connection = fdb_cluster_string + if monitoring_secret: + cluster.grafana_secret = monitoring_secret + elif enable_monitoring != "true": + cluster.grafana_secret = "" + else: + raise Exception("monitoring_secret is required") + cluster.grafana_endpoint = constants.GRAFANA_K8S_ENDPOINT + if not cluster_ip: + cluster_ip = "0.0.0.0" + + # add mgmt node object + mgmt_node_ops.add_mgmt_node(cluster_ip, "kubernetes", cluster.uuid) + if enable_monitoring == "true": + graylog_endpoint = constants.GRAYLOG_K8S_ENDPOINT + os_endpoint = constants.OS_K8S_ENDPOINT + _create_update_user(cluster.uuid, cluster.grafana_endpoint, cluster.grafana_secret, cluster.secret) - default_cluster = clusters[0] - cluster.mode = default_cluster.mode - cluster.db_connection = default_cluster.db_connection - cluster.grafana_secret = monitoring_secret if default_cluster.mode == "kubernetes" else default_cluster.grafana_secret - cluster.grafana_endpoint = default_cluster.grafana_endpoint + _set_max_result_window(os_endpoint) - _create_update_user(cluster.uuid, cluster.grafana_endpoint, cluster.grafana_secret, cluster.secret) + _add_graylog_input(graylog_endpoint, monitoring_secret) + + if cluster.mode == "kubernetes": + utils.patch_prometheus_configmap(cluster.uuid, cluster.secret) cluster.distr_ndcs = distr_ndcs cluster.distr_npcs = distr_npcs @@ -505,6 +533,9 @@ def add_cluster(blk_size, page_size_in_blocks, cap_warn, cap_crit, prov_cap_warn cluster.qpair_count = qpair_count or constants.QPAIR_COUNT cluster.max_queue_size = max_queue_size cluster.inflight_io_threshold = inflight_io_threshold + cluster.cr_name = cr_name + cluster.cr_namespace = cr_namespace + cluster.cr_plural = cr_plural if cap_warn and cap_warn > 0: cluster.cap_warn = cap_warn if cap_crit and cap_crit > 0: @@ -529,7 +560,6 @@ def add_cluster(blk_size, page_size_in_blocks, cap_warn, cap_crit, prov_cap_warn cluster.create_dt = str(datetime.datetime.now()) cluster.write_to_db(db_controller.kv_store) cluster_events.cluster_create(cluster) - qos_controller.add_class("Default", 100, cluster.get_id()) return cluster.get_id() diff --git a/simplyblock_core/constants.py b/simplyblock_core/constants.py index 598e42248..c37b2a26a 100644 --- a/simplyblock_core/constants.py +++ b/simplyblock_core/constants.py @@ -162,6 +162,15 @@ def get_config_var(name, default=None): LVO_MAX_NAMESPACES_PER_SUBSYS=32 +CR_GROUP = "simplyblock.simplyblock.io" +CR_VERSION = "v1alpha1" + +GRAFANA_K8S_ENDPOINT = "http://simplyblock-grafana:3000" +GRAYLOG_K8S_ENDPOINT = "http://simplyblock-graylog:9000" +OS_K8S_ENDPOINT = "http://opensearch-cluster-master:9200" + +WEBAPI_K8S_ENDPOINT = "http://simplyblock-webappapi:5000/api/v2" + K8S_NAMESPACE = os.getenv('K8S_NAMESPACE', 'simplyblock') OS_STATEFULSET_NAME = "simplyblock-opensearch" MONGODB_STATEFULSET_NAME = "simplyblock-mongo" diff --git a/simplyblock_core/controllers/cluster_events.py b/simplyblock_core/controllers/cluster_events.py index e8e6c406e..e201c53a9 100644 --- a/simplyblock_core/controllers/cluster_events.py +++ b/simplyblock_core/controllers/cluster_events.py @@ -4,6 +4,7 @@ from simplyblock_core.controllers import events_controller as ec from simplyblock_core.db_controller import DBController from simplyblock_core.models.events import EventObj +from simplyblock_core import utils, constants logger = logging.getLogger() db_controller = DBController() @@ -39,6 +40,15 @@ def cluster_status_change(cluster, new_state, old_status): caused_by=ec.CAUSED_BY_CLI, message=f"Cluster status changed from {old_status} to {new_state}") + if cluster.mode == "kubernetes": + utils.patch_cr_status( + group=constants.CR_GROUP, + version=constants.CR_VERSION, + plural=cluster.cr_plural, + namespace=cluster.cr_namespace, + name=cluster.cr_name, + status_patch={"status": new_state}) + def _cluster_cap_event(cluster, msg, event_level): return ec.log_event_cluster( @@ -80,3 +90,21 @@ def cluster_delete(cluster): db_object=cluster, caused_by=ec.CAUSED_BY_CLI, message=f"Cluster deleted {cluster.get_id()}") + + +def cluster_rebalancing_change(cluster, new_state, old_status): + ec.log_event_cluster( + cluster_id=cluster.get_id(), + domain=ec.DOMAIN_CLUSTER, + event=ec.EVENT_STATUS_CHANGE, + db_object=cluster, + caused_by=ec.CAUSED_BY_CLI, + message=f"Cluster rebalancing changed from {old_status} to {new_state}") + if cluster.mode == "kubernetes": + utils.patch_cr_status( + group=constants.CR_GROUP, + version=constants.CR_VERSION, + plural=cluster.cr_plural, + namespace=cluster.cr_namespace, + name=cluster.cr_name, + status_patch={"rebalancing": new_state}) diff --git a/simplyblock_core/controllers/device_events.py b/simplyblock_core/controllers/device_events.py index f2e1e959d..1f5ee881a 100644 --- a/simplyblock_core/controllers/device_events.py +++ b/simplyblock_core/controllers/device_events.py @@ -3,6 +3,8 @@ from simplyblock_core.controllers import events_controller as ec from simplyblock_core.db_controller import DBController +from simplyblock_core.models.nvme_device import NVMeDevice +from simplyblock_core import utils, constants logger = logging.getLogger() @@ -20,6 +22,24 @@ def _device_event(device, message, caused_by, event): node_id=device.get_id(), storage_id=device.cluster_device_order) + cluster = db_controller.get_cluster_by_id(snode.cluster_id) + if cluster.mode == "kubernetes": + total_devices = len(snode.nvme_devices) + online_devices = 0 + for dev in snode.nvme_devices: + if dev.status == NVMeDevice.STATUS_ONLINE: + online_devices += 1 + utils.patch_cr_node_status( + group=constants.CR_GROUP, + version=constants.CR_VERSION, + plural=snode.cr_plural, + namespace=snode.cr_namespace, + name=snode.cr_name, + node_uuid=snode.get_id(), + node_mgmt_ip=snode.mgmt_ip, + updates={"devices": f"{total_devices}/{online_devices}"}, + ) + def device_create(device, caused_by=ec.CAUSED_BY_CLI): _device_event(device, f"Device created: {device.get_id()}", caused_by, ec.EVENT_OBJ_CREATED) diff --git a/simplyblock_core/controllers/lvol_events.py b/simplyblock_core/controllers/lvol_events.py index e5ece1a40..b0116061a 100644 --- a/simplyblock_core/controllers/lvol_events.py +++ b/simplyblock_core/controllers/lvol_events.py @@ -3,6 +3,7 @@ from simplyblock_core.controllers import events_controller as ec from simplyblock_core.db_controller import DBController +from simplyblock_core import utils, constants logger = logging.getLogger() @@ -10,6 +11,7 @@ def _lvol_event(lvol, message, caused_by, event): db_controller = DBController() snode = db_controller.get_storage_node_by_id(lvol.node_id) + cluster = db_controller.get_cluster_by_id(snode.cluster_id) ec.log_event_cluster( cluster_id=snode.cluster_id, domain=ec.DOMAIN_CLUSTER, @@ -18,7 +20,79 @@ def _lvol_event(lvol, message, caused_by, event): caused_by=caused_by, message=message, node_id=lvol.get_id()) - + if cluster.mode == "kubernetes": + pool = db_controller.get_pool_by_id(lvol.pool_uuid) + + if event == ec.EVENT_OBJ_CREATED: + crypto_key=( + (lvol.crypto_key1, lvol.crypto_key2) + if lvol.crypto_key1 and lvol.crypto_key2 + else None + ) + + node_urls = [ + f"{constants.WEBAPI_K8S_ENDPOINT}/clusters/{snode.cluster_id}/storage-nodes/{node_id}/" + for node_id in lvol.nodes + ] + + utils.patch_cr_lvol_status( + group=constants.CR_GROUP, + version=constants.CR_VERSION, + plural=pool.lvols_cr_plural, + namespace=pool.lvols_cr_namespace, + name=pool.lvols_cr_name, + add={ + "uuid": lvol.get_id(), + "lvolName": lvol.lvol_name, + "status": lvol.status, + "nodeUUID": node_urls, + "size": utils.humanbytes(lvol.size), + "health": lvol.health_check, + "isCrypto": crypto_key is not None, + "nqn": lvol.nqn, + "subsysPort": lvol.subsys_port, + "hostname": lvol.hostname, + "fabric": lvol.fabric, + "ha": lvol.ha_type == 'ha', + "poolUUID": lvol.pool_uuid, + "poolName": lvol.pool_name, + "PvcName": lvol.pvc_name, + "snapName": lvol.snapshot_name, + "clonedFromSnap": lvol.cloned_from_snap, + "stripeWdata": lvol.ndcs, + "stripeWparity": lvol.npcs, + "blobID": lvol.blobid, + "namespaceID": lvol.ns_id, + "qosClass": lvol.lvol_priority_class, + "maxNamespacesPerSubsystem": lvol.max_namespace_per_subsys, + "qosIOPS": lvol.rw_ios_per_sec, + "qosRWTP": lvol.rw_mbytes_per_sec, + "qosRTP": lvol.r_mbytes_per_sec, + "qosWTP": lvol.w_mbytes_per_sec, + }, + ) + + elif event == ec.EVENT_STATUS_CHANGE: + utils.patch_cr_lvol_status( + group=constants.CR_GROUP, + version=constants.CR_VERSION, + plural=pool.lvols_cr_plural, + namespace=pool.lvols_cr_namespace, + name=pool.lvols_cr_name, + lvol_uuid=lvol.get_id(), + updates={"status": lvol.status, "health": lvol.health_check}, + ) + elif event == ec.EVENT_OBJ_DELETED: + logger.info("Deleting lvol CR object") + utils.patch_cr_lvol_status( + group=constants.CR_GROUP, + version=constants.CR_VERSION, + plural=pool.lvols_cr_plural, + namespace=pool.lvols_cr_namespace, + name=pool.lvols_cr_name, + lvol_uuid=lvol.get_id(), + remove=True, + ) def lvol_create(lvol, caused_by=ec.CAUSED_BY_CLI): _lvol_event(lvol, f"LVol created, {lvol.lvol_bdev}", caused_by, ec.EVENT_OBJ_CREATED) diff --git a/simplyblock_core/controllers/pool_controller.py b/simplyblock_core/controllers/pool_controller.py index d7c21e9a2..5fc6b7d9d 100644 --- a/simplyblock_core/controllers/pool_controller.py +++ b/simplyblock_core/controllers/pool_controller.py @@ -23,7 +23,8 @@ def _generate_string(length): string.ascii_letters + string.digits) for _ in range(length)) -def add_pool(name, pool_max, lvol_max, max_rw_iops, max_rw_mbytes, max_r_mbytes, max_w_mbytes, cluster_id, qos_host=None, sec_options=None): +def add_pool(name, pool_max, lvol_max, max_rw_iops, max_rw_mbytes, max_r_mbytes, max_w_mbytes, cluster_id, + cr_name=None, cr_namespace=None, cr_plural=None, qos_host=None, sec_options=None): db_controller = DBController() if not name: logger.error("Pool name is empty!") @@ -71,6 +72,9 @@ def add_pool(name, pool_max, lvol_max, max_rw_iops, max_rw_mbytes, max_r_mbytes, pool.max_rw_mbytes_per_sec = max_rw_mbytes pool.max_r_mbytes_per_sec = max_r_mbytes pool.max_w_mbytes_per_sec = max_w_mbytes + pool.cr_name = cr_name + pool.cr_namespace = cr_namespace + pool.cr_plural = cr_plural if pool.has_qos() and not qos_host: next_nodes = lvol_controller._get_next_3_nodes(cluster_id) if next_nodes: @@ -129,7 +133,8 @@ def qos_exists_on_child_lvol(db_controller: DBController, pool_uuid): return False def set_pool(uuid, pool_max=0, lvol_max=0, max_rw_iops=0, - max_rw_mbytes=0, max_r_mbytes=0, max_w_mbytes=0, name=""): + max_rw_mbytes=0, max_r_mbytes=0, max_w_mbytes=0, name="", + lvols_cr_name="", lvols_cr_namespace="", lvols_cr_plural=""): db_controller = DBController() try: pool = db_controller.get_pool_by_id(uuid) @@ -151,6 +156,17 @@ def set_pool(uuid, pool_max=0, lvol_max=0, max_rw_iops=0, return False, msg pool.pool_name = name + if lvols_cr_name and lvols_cr_name != pool.lvols_cr_name: + for p in db_controller.get_pools(): + if p.lvols_cr_name == lvols_cr_name: + msg = f"Pool found with the same lvol cr name: {name}" + logger.error(msg) + return False, msg + pool.lvols_cr_name = lvols_cr_name + pool.lvols_cr_namespace = lvols_cr_namespace + pool.lvols_cr_plural = lvols_cr_plural + + # Normalize inputs max_rw_iops = max_rw_iops or 0 max_rw_mbytes = max_rw_mbytes or 0 @@ -303,8 +319,10 @@ def set_status(pool_id, status): except KeyError: logger.error(f"Pool not found {pool_id}") return False + old_status = pool.status pool.status = status pool.write_to_db(db_controller.kv_store) + pool_events.pool_status_change(pool, pool.status, old_status) logger.info("Done") diff --git a/simplyblock_core/controllers/pool_events.py b/simplyblock_core/controllers/pool_events.py index 2581d59b1..7d35d18e1 100644 --- a/simplyblock_core/controllers/pool_events.py +++ b/simplyblock_core/controllers/pool_events.py @@ -2,7 +2,8 @@ import logging from simplyblock_core.controllers import events_controller as ec - +from simplyblock_core.db_controller import DBController +from simplyblock_core import utils, constants logger = logging.getLogger() @@ -29,3 +30,25 @@ def pool_remove(pool): def pool_updated(pool): _add(pool, f"Pool updated {pool.pool_name}", event=ec.EVENT_STATUS_CHANGE) + +def pool_status_change(pool, new_state, old_status): + db_controller = DBController() + cluster = db_controller.get_cluster_by_id(pool.cluster_id) + ec.log_event_cluster( + cluster_id=pool.cluster_id, + domain=ec.DOMAIN_CLUSTER, + event=ec.EVENT_STATUS_CHANGE, + db_object=pool, + caused_by=ec.CAUSED_BY_CLI, + message=f"Pool status changed from {old_status} to {new_state}", + node_id=pool.cluster_id) + + if cluster.mode == "kubernetes": + utils.patch_cr_status( + group=constants.CR_GROUP, + version=constants.CR_VERSION, + plural=pool.cr_plural, + namespace=pool.cr_namespace, + name=pool.cr_name, + status_patch={"status": new_state}) + diff --git a/simplyblock_core/controllers/storage_events.py b/simplyblock_core/controllers/storage_events.py index bd5a9eb8d..f81558736 100644 --- a/simplyblock_core/controllers/storage_events.py +++ b/simplyblock_core/controllers/storage_events.py @@ -3,6 +3,8 @@ from simplyblock_core.controllers import events_controller as ec from simplyblock_core.models.events import EventObj +from simplyblock_core.db_controller import DBController +from simplyblock_core import utils, constants logger = logging.getLogger() @@ -19,6 +21,8 @@ def snode_add(node): def snode_delete(node): + db_controller = DBController() + cluster = db_controller.get_cluster_by_id(node.cluster_id) ec.log_event_cluster( cluster_id=node.cluster_id, domain=ec.DOMAIN_CLUSTER, @@ -27,9 +31,21 @@ def snode_delete(node): caused_by=ec.CAUSED_BY_CLI, message=f"Storage node deleted {node.get_id()}", node_id=node.get_id()) - + if cluster.mode == "kubernetes": + utils.patch_cr_node_status( + group=constants.CR_GROUP, + version=constants.CR_VERSION, + plural=node.cr_plural, + namespace=node.cr_namespace, + name=node.cr_name, + node_uuid=node.get_id(), + node_mgmt_ip=node.mgmt_ip, + remove=True, + ) def snode_status_change(node, new_state, old_status, caused_by=ec.CAUSED_BY_CLI): + db_controller = DBController() + cluster = db_controller.get_cluster_by_id(node.cluster_id) ec.log_event_cluster( cluster_id=node.cluster_id, domain=ec.DOMAIN_CLUSTER, @@ -38,9 +54,22 @@ def snode_status_change(node, new_state, old_status, caused_by=ec.CAUSED_BY_CLI) caused_by=caused_by, message=f"Storage node status changed from: {old_status} to: {new_state}", node_id=node.get_id()) + if cluster.mode == "kubernetes": + utils.patch_cr_node_status( + group=constants.CR_GROUP, + version=constants.CR_VERSION, + plural=node.cr_plural, + namespace=node.cr_namespace, + name=node.cr_name, + node_uuid=node.get_id(), + node_mgmt_ip=node.mgmt_ip, + updates={"status": new_state}, + ) def snode_health_check_change(node, new_state, old_status, caused_by=ec.CAUSED_BY_CLI): + db_controller = DBController() + cluster = db_controller.get_cluster_by_id(node.cluster_id) ec.log_event_cluster( cluster_id=node.cluster_id, domain=ec.DOMAIN_CLUSTER, @@ -49,6 +78,17 @@ def snode_health_check_change(node, new_state, old_status, caused_by=ec.CAUSED_B caused_by=caused_by, message=f"Storage node health check changed from: {old_status} to: {new_state}", node_id=node.get_id()) + if cluster.mode == "kubernetes": + utils.patch_cr_node_status( + group=constants.CR_GROUP, + version=constants.CR_VERSION, + plural=node.cr_plural, + namespace=node.cr_namespace, + name=node.cr_name, + node_uuid=node.get_id(), + node_mgmt_ip=node.mgmt_ip, + updates={"health": new_state}, + ) def snode_restart_failed(node): @@ -84,3 +124,28 @@ def jm_repl_tasks_found(node, jm_vuid, caused_by=ec.CAUSED_BY_MONITOR): event_level=EventObj.LEVEL_WARN, message=f"JM replication task found for jm {jm_vuid}", node_id=node.get_id()) + + +def node_ports_changed(node, caused_by=ec.CAUSED_BY_MONITOR): + db_controller = DBController() + cluster = db_controller.get_cluster_by_id(node.cluster_id) + ec.log_event_cluster( + cluster_id=node.cluster_id, + domain=ec.DOMAIN_CLUSTER, + event=ec.EVENT_STATUS_CHANGE, + db_object=node, + caused_by=caused_by, + event_level=EventObj.LEVEL_WARN, + message=f"Storage node ports set, LVol:{node.lvol_subsys_port} RPC:{node.rpc_port} Internal:{node.nvmf_port}", + node_id=node.get_id()) + if cluster.mode == "kubernetes": + utils.patch_cr_node_status( + group=constants.CR_GROUP, + version=constants.CR_VERSION, + plural=node.cr_plural, + namespace=node.cr_namespace, + name=node.cr_name, + node_uuid=node.get_id(), + node_mgmt_ip=node.mgmt_ip, + updates={"nvmf_port": node.nvmf_port, "rpc_port": node.rpc_port, "lvol_port": node.lvol_subsys_port}, + ) diff --git a/simplyblock_core/mgmt_node_ops.py b/simplyblock_core/mgmt_node_ops.py index 84375d819..b72cffbef 100644 --- a/simplyblock_core/mgmt_node_ops.py +++ b/simplyblock_core/mgmt_node_ops.py @@ -106,18 +106,13 @@ def deploy_mgmt_node(cluster_ip, cluster_id, ifname, mgmt_ip, cluster_secret, mo logger.info(f"Node IP: {dev_ip}") - hostname = utils.get_node_name_by_ip(dev_ip) - utils.label_node_as_mgmt_plane(hostname) db_connection = cluster_data['db_connection'] db_controller = DBController() nodes = db_controller.get_mgmt_nodes() if not nodes: logger.error("No mgmt nodes was found in the cluster!") return False - for node in nodes: - if node.hostname == hostname: - logger.error("Node already exists in the cluster") - return False + logger.info("Adding management node object") node_id = add_mgmt_node(dev_ip, mode, cluster_id) @@ -225,10 +220,9 @@ def deploy_mgmt_node(cluster_ip, cluster_id, ifname, mgmt_ip, cluster_secret, mo def add_mgmt_node(mgmt_ip, mode, cluster_id=None): db_controller = DBController() + hostname = None if mode == "docker": hostname = utils.get_hostname() - elif mode == "kubernetes": - hostname = utils.get_node_name_by_ip(mgmt_ip) try: node = db_controller.get_mgmt_node_by_hostname(hostname) if node: diff --git a/simplyblock_core/models/cluster.py b/simplyblock_core/models/cluster.py index d42b1c9c5..1f0588a1b 100644 --- a/simplyblock_core/models/cluster.py +++ b/simplyblock_core/models/cluster.py @@ -63,6 +63,9 @@ class Cluster(BaseModel): fabric_rdma: bool = False client_qpair_count: int = 3 secret: str = "" + cr_name: str = "" + cr_namespace: str = "" + cr_plural: str = "" disable_monitoring: bool = False strict_node_anti_affinity: bool = False tls: bool = False diff --git a/simplyblock_core/models/pool.py b/simplyblock_core/models/pool.py index ccc5affc5..dc5d9780f 100644 --- a/simplyblock_core/models/pool.py +++ b/simplyblock_core/models/pool.py @@ -29,6 +29,12 @@ class Pool(BaseModel): secret: str = "" # unused users: List[str] = [] qos_host: str = "" + cr_name: str = "" + cr_namespace: str = "" + cr_plural: str = "" + lvols_cr_name: str = "" + lvols_cr_namespace: str = "" + lvols_cr_plural: str = "" sec_options: dict = {} diff --git a/simplyblock_core/models/storage_node.py b/simplyblock_core/models/storage_node.py index 147db5a77..c9aedc8bc 100644 --- a/simplyblock_core/models/storage_node.py +++ b/simplyblock_core/models/storage_node.py @@ -100,6 +100,9 @@ class StorageNode(BaseNodeObject): subsystem: str = "" system_uuid: str = "" lvstore_status: str = "" + cr_name: str = "" + cr_namespace: str = "" + cr_plural: str = "" nvmf_port: int = 4420 physical_label: int = 0 hublvol: HubLVol = None # type: ignore[assignment] diff --git a/simplyblock_core/rpc_client.py b/simplyblock_core/rpc_client.py index 0ca22a6ce..cadbd12f1 100644 --- a/simplyblock_core/rpc_client.py +++ b/simplyblock_core/rpc_client.py @@ -342,7 +342,7 @@ def ultra21_alloc_ns_init(self, pci_addr): } return self._request2("ultra21_alloc_ns_init", params) - def nvmf_subsystem_add_ns(self, nqn, dev_name, uuid=None, nguid=None, nsid=None): + def nvmf_subsystem_add_ns(self, nqn, dev_name, uuid=None, nguid=None, nsid=None, eui64=None): params = { "nqn": nqn, "namespace": { @@ -359,6 +359,11 @@ def nvmf_subsystem_add_ns(self, nqn, dev_name, uuid=None, nguid=None, nsid=None) if nsid: params['namespace']['nsid'] = nsid + if eui64: + params['namespace']['eui64'] = eui64 + params['namespace']['ptpl_file'] = "/mnt/ns_resv"+eui64+".json" + + return self._request("nvmf_subsystem_add_ns", params) def nvmf_subsystem_remove_ns(self, nqn, nsid): diff --git a/simplyblock_core/storage_node_ops.py b/simplyblock_core/storage_node_ops.py index aeccb3941..7c5c50734 100644 --- a/simplyblock_core/storage_node_ops.py +++ b/simplyblock_core/storage_node_ops.py @@ -2,7 +2,6 @@ import datetime import json import math -import os import platform import socket @@ -38,6 +37,8 @@ from simplyblock_web import node_utils from simplyblock_core.utils import addNvmeDevices from simplyblock_core.utils import pull_docker_image_with_retry +import os + logger = utils.get_logger(__name__) @@ -691,7 +692,7 @@ def _prepare_cluster_devices_partitions(snode, devices): t = threading.Thread( target=_create_device_partitions, args=(snode.rpc_client(), nvme, snode, snode.num_partitions_per_dev, - snode.jm_percent, snode.partition_size, index+1,)) + snode.jm_percent, snode.partition_size, index + 1,)) thread_list.append(t) t.start() @@ -1118,8 +1119,8 @@ def add_node(cluster_id, node_addr, iface_name, data_nics_list, max_snap, spdk_image=None, spdk_debug=False, small_bufsize=0, large_bufsize=0, num_partitions_per_dev=0, jm_percent=0, enable_test_device=False, - namespace=None, enable_ha_jm=False, id_device_by_nqn=False, - partition_size="", ha_jm_count=3, format_4k=False, spdk_proxy_image=None): + namespace=None, enable_ha_jm=False, cr_name=None, cr_namespace=None, cr_plural=None, + id_device_by_nqn=False, partition_size="", ha_jm_count=3, format_4k=False, spdk_proxy_image=None): snode_api = SNodeClient(node_addr) node_info, _ = snode_api.info() if node_info.get("nodes_config") and node_info["nodes_config"].get("nodes"): @@ -1391,6 +1392,9 @@ def add_node(cluster_id, node_addr, iface_name, data_nics_list, snode.cloud_name = cloud_instance['cloud'] or "" snode.namespace = namespace + snode.cr_name = cr_name + snode.cr_namespace = cr_namespace + snode.cr_plural = cr_plural snode.ssd_pcie = ssd_pcie snode.hostname = hostname snode.host_nqn = subsystem_nqn @@ -2284,7 +2288,6 @@ def restart_storage_node( return False if snode.enable_ha_jm: snode.remote_jm_devices = _connect_to_remote_jm_devs(snode) - snode.health_check = True snode.lvstore_status = "" snode.write_to_db(db_controller.kv_store) @@ -3412,9 +3415,9 @@ def set_node_status(node_id, status, reconnect_on_online=True): return False if snode.enable_ha_jm: snode.remote_jm_devices = _connect_to_remote_jm_devs(snode) - snode.health_check = True snode.write_to_db(db_controller.kv_store) - distr_controller.send_cluster_map_to_node(snode) + for device in snode.nvme_devices: + distr_controller.send_dev_status_event(device, device.status, target_node=snode) for node in db_controller.get_storage_nodes_by_cluster_id(snode.cluster_id): if node.get_id() == snode.get_id(): @@ -3425,7 +3428,8 @@ def set_node_status(node_id, status, reconnect_on_online=True): node = db_controller.get_storage_node_by_id(node.get_id()) node.remote_devices = _connect_to_remote_devs(node) node.write_to_db() - distr_controller.send_cluster_map_to_node(node) + for device in node.nvme_devices: + distr_controller.send_dev_status_event(device, device.status, target_node=node) except RuntimeError: logger.error(f'Failed to connect to remote devices from node: {node.get_id()}') continue @@ -3837,8 +3841,8 @@ def add_lvol_thread(lvol, snode, lvol_ana_state="optimized"): logger.error(msg) return False, msg - logger.info("Add BDev to subsystem") - ret = rpc_client.nvmf_subsystem_add_ns(lvol.nqn, lvol.top_bdev, lvol.uuid, lvol.guid, nsid=lvol.ns_id) + logger.info("Add BDev to subsystem "+f"{lvol.vuid:016X}") + ret = rpc_client.nvmf_subsystem_add_ns(lvol.nqn, lvol.top_bdev, lvol.uuid, lvol.guid, nsid=lvol.ns_id, eui64=f"{lvol.vuid:016X}") # Use per-lvstore port for this lvol's lvstore listener_port = snode.get_lvol_subsys_port(lvol.lvs_name) for iface in snode.data_nics: @@ -4162,6 +4166,7 @@ def create_lvstore(snode, ndcs, npcs, distr_bs, distr_chunk_bs, page_size_in_blo logger.error("Error establishing hublvol: %s", e) # return False + storage_events.node_ports_changed(snode) return True diff --git a/simplyblock_core/utils/__init__.py b/simplyblock_core/utils/__init__.py index 032b5cd36..c493448b7 100644 --- a/simplyblock_core/utils/__init__.py +++ b/simplyblock_core/utils/__init__.py @@ -11,7 +11,6 @@ import sys import uuid import time -import socket from typing import Union, Any, Optional, Tuple, List, Dict, Iterable from docker import DockerClient from kubernetes import client, config @@ -199,16 +198,8 @@ def get_k8s_node_ip(): logger.error("No mgmt nodes was found in the cluster!") return False - mgmt_ips = [node.mgmt_ip for node in nodes] - - for ip in mgmt_ips: - try: - with socket.create_connection((ip, 10250), timeout=2): - return ip - except Exception as e: - print(e) - raise e - return False + for node in nodes: + return node.mgmt_ip def dict_agg(data, mean=False, keys=None): diff --git a/simplyblock_web/api/v1/__init__.py b/simplyblock_web/api/v1/__init__.py index 4bcc5ba41..3659758e3 100644 --- a/simplyblock_web/api/v1/__init__.py +++ b/simplyblock_web/api/v1/__init__.py @@ -1,9 +1,13 @@ import logging +import fdb +from flask import jsonify from flask import Flask from simplyblock_web.auth_middleware import token_required from simplyblock_web import utils +from simplyblock_core import constants + from . import cluster from . import mgmt_node @@ -39,3 +43,26 @@ def before_request(): @api.route('/', methods=['GET']) def status(): return utils.get_response("Live") + + +@api.route('/health/fdb', methods=['GET']) +def health_fdb(): + try: + fdb.api_version(constants.KVD_DB_VERSION) + + db = fdb.open(constants.KVD_DB_FILE_PATH) + tr = db.create_transaction() + + tr.get(b"\x00") + tr.commit().wait() + + return jsonify({ + "fdb_connected": True + }), 200 + + except Exception as e: + return jsonify({ + "fdb_connected": False, + "error": str(e) + }), 503 + \ No newline at end of file diff --git a/simplyblock_web/api/v1/cluster.py b/simplyblock_web/api/v1/cluster.py index 532278b8d..d31d79582 100644 --- a/simplyblock_web/api/v1/cluster.py +++ b/simplyblock_web/api/v1/cluster.py @@ -47,6 +47,9 @@ def add_cluster(): qpair_count = cl_data.get('qpair_count', 256) name = cl_data.get('name', None) fabric = cl_data.get('fabric', "tcp") + cr_name = cl_data.get('cr_name', None) + cr_namespace = cl_data.get('cr_namespace', None) + cr_plural = cl_data.get('cr_plural', None) max_queue_size = cl_data.get('max_queue_size', 128) inflight_io_threshold = cl_data.get('inflight_io_threshold', 4) @@ -57,11 +60,62 @@ def add_cluster(): return utils.get_response(cluster_ops.add_cluster( blk_size, page_size_in_blocks, cap_warn, cap_crit, prov_cap_warn, prov_cap_crit, distr_ndcs, distr_npcs, distr_bs, distr_chunk_bs, ha_type, enable_node_affinity, - qpair_count, max_queue_size, inflight_io_threshold, strict_node_anti_affinity, is_single_node, name, fabric, - client_data_nic + qpair_count, max_queue_size, inflight_io_threshold, strict_node_anti_affinity, is_single_node, name, + cr_name, cr_namespace, cr_plural, fabric, client_data_nic )) +@bp.route('/cluster/create_first', methods=['POST']) +def create_first_cluster(): + cl_data = request.get_json() + + if db.get_clusters(): + return utils.get_response_error("Cluster found!", 400) + + blk_size = 512 + if 'blk_size' in cl_data: + if cl_data['blk_size'] not in [512, 4096]: + return utils.get_response_error("blk_size can be 512 or 4096", 400) + else: + blk_size = cl_data['blk_size'] + page_size_in_blocks = cl_data.get('page_size_in_blocks', 2097152) + distr_ndcs = cl_data.get('distr_ndcs', 1) + distr_npcs = cl_data.get('distr_npcs', 1) + distr_bs = cl_data.get('distr_bs', 4096) + distr_chunk_bs = cl_data.get('distr_chunk_bs', 4096) + ha_type = cl_data.get('ha_type', 'ha') + enable_node_affinity = cl_data.get('enable_node_affinity', False) + qpair_count = cl_data.get('qpair_count', 256) + name = cl_data.get('name', None) + fabric = cl_data.get('fabric', "tcp") + cap_warn = cl_data.get('cap_warn', 0) + cap_crit = cl_data.get('cap_crit', 0) + prov_cap_warn = cl_data.get('prov_cap_warn', 0) + prov_cap_crit = cl_data.get('prov_cap_crit', 0) + max_queue_size = cl_data.get('max_queue_size', 128) + inflight_io_threshold = cl_data.get('inflight_io_threshold', 4) + strict_node_anti_affinity = cl_data.get('strict_node_anti_affinity', False) + is_single_node = cl_data.get('is_single_node', False) + cr_name = cl_data.get('cr_name', None) + cr_namespace = cl_data.get('cr_namespace', None) + cr_plural = cl_data.get('cr_plural', None) + cluster_ip = cl_data.get('cluster_ip', None) + grafana_secret = cl_data.get('grafana_secret', None) + + try: + cluster_id = cluster_ops.add_cluster( + blk_size, page_size_in_blocks, cap_warn, cap_crit, prov_cap_warn, prov_cap_crit, + distr_ndcs, distr_npcs, distr_bs, distr_chunk_bs, ha_type, enable_node_affinity, + qpair_count, max_queue_size, inflight_io_threshold, strict_node_anti_affinity, is_single_node, name, + cr_name, cr_namespace, cr_plural, fabric, cluster_ip=cluster_ip, grafana_secret=grafana_secret) + if cluster_id: + return utils.get_response(db.get_cluster_by_id(cluster_id).to_dict()) + else: + return utils.get_response(False, "Failed to create cluster", 400) + except Exception as e: + return utils.get_response(False, str(e), 404) + + @bp.route('/cluster', methods=['GET'], defaults={'uuid': None}) @bp.route('/cluster/', methods=['GET']) def list_clusters(uuid): diff --git a/simplyblock_web/api/v2/__init__.py b/simplyblock_web/api/v2/__init__.py index 556c664f5..dca5f7dd3 100644 --- a/simplyblock_web/api/v2/__init__.py +++ b/simplyblock_web/api/v2/__init__.py @@ -11,6 +11,7 @@ from . import pool from . import snapshot from . import storage_node +from . import task from . import migration from simplyblock_core.db_controller import DBController @@ -40,6 +41,9 @@ def _verify_api_token( cluster.instance_api.include_router(storage_node.api) +task.api.include_router(task.instance_api) + +cluster.instance_api.include_router(task.api) volume.api.include_router(volume.instance_api) pool.instance_api.include_router(volume.api) diff --git a/simplyblock_web/api/v2/cluster.py b/simplyblock_web/api/v2/cluster.py index e8e937c6a..9f5b3cd87 100644 --- a/simplyblock_web/api/v2/cluster.py +++ b/simplyblock_web/api/v2/cluster.py @@ -2,7 +2,7 @@ from typing import Annotated, List, Literal, Optional from uuid import UUID -from fastapi import APIRouter, Depends, HTTPException, Request, Response +from fastapi import APIRouter, Depends, HTTPException, Response from pydantic import BaseModel, Field from simplyblock_core.db_controller import DBController @@ -29,7 +29,7 @@ class _UpdateParams(BaseModel): class ClusterParams(BaseModel): - name: Optional[str] = None + name: str = "" blk_size: Literal[512, 4096] = 512 page_size_in_blocks: int = Field(2097152, gt=0) cap_warn: util.Percent = 0 @@ -40,7 +40,7 @@ class ClusterParams(BaseModel): distr_npcs: int = 1 distr_bs: int = 4096 distr_chunk_bs: int = 4096 - ha_type: Literal['single', 'ha'] = 'single' + ha_type: Literal['single', 'ha'] = 'ha' qpair_count: int = 256 max_queue_size: int = 128 inflight_io_threshold: int = 4 @@ -58,21 +58,24 @@ class ClusterParams(BaseModel): @api.get('/', name='clusters:list') def list() -> List[ClusterDTO]: - return [ - ClusterDTO.from_model(cluster) - for cluster - in db.get_clusters() - ] + data = [] + for cluster in db.get_clusters(): + stat_obj = None + ret = db.get_cluster_capacity(cluster, 1) + if ret: + stat_obj = ret[0] + data.append(ClusterDTO.from_model(cluster, stat_obj)) + return data @api.post('/', name='clusters:create', status_code=201, responses={201: {"content": None}}) -def add(request: Request, parameters: ClusterParams): +def add(parameters: ClusterParams): cluster_id_or_false = cluster_ops.add_cluster(**parameters.model_dump()) if not cluster_id_or_false: raise ValueError('Failed to create cluster') - entity_url = request.app.url_path_for('get', cluster_id=cluster_id_or_false) - return Response(status_code=201, headers={'Location': entity_url}) + cluster = db.get_cluster_by_id(cluster_id_or_false) + return ClusterDTO.from_model(cluster) instance_api = APIRouter(prefix='/{cluster_id}') @@ -90,7 +93,11 @@ def _lookup_cluster(cluster_id: UUID): @instance_api.get('/', name='clusters:detail') def get(cluster: Cluster) -> ClusterDTO: - return ClusterDTO.from_model(cluster) + stat_obj = None + ret = db.get_cluster_capacity(cluster, 1) + if ret: + stat_obj = ret[0] + return ClusterDTO.from_model(cluster, stat_obj) class UpdatableClusterParameters(BaseModel): diff --git a/simplyblock_web/api/v2/device.py b/simplyblock_web/api/v2/device.py index 1c7b40d7e..f62a134fe 100644 --- a/simplyblock_web/api/v2/device.py +++ b/simplyblock_web/api/v2/device.py @@ -18,10 +18,14 @@ @api.get('/', name='clusters:storage_nodes:devices:list') def list(cluster: Cluster, storage_node: StorageNode) -> List[DeviceDTO]: - return [ - DeviceDTO.from_model(device) - for device in storage_node.nvme_devices - ] + data = [] + for device in storage_node.nvme_devices: + stat_obj = None + ret = db.get_device_stats(device, 1) + if ret: + stat_obj = ret[0] + data.append(DeviceDTO.from_model(device, stat_obj)) + return data instance_api = APIRouter(prefix='/{device_id}') @@ -38,16 +42,27 @@ def _lookup_device(storage_node: StorageNode, device_id: UUID) -> NVMeDevice: @instance_api.get('/', name='clusters:storage_nodes:devices:detail') def get(cluster: Cluster, storage_node: StorageNode, device: Device) -> DeviceDTO: - return DeviceDTO.from_model(device) + stat_obj = None + ret = db.get_device_stats(device, 1) + if ret: + stat_obj = ret[0] + return DeviceDTO.from_model(device, stat_obj) -@instance_api.delete('/', name='clusters:storage_nodes:devices:delete', status_code=204, responses={204: {"content": None}}) -def delete(cluster: Cluster, storage_node: StorageNode, device: Device) -> Response: - if not device_controller.device_remove(device.get_id()): +@instance_api.post('/remove', name='clusters:storage_nodes:devices:remove', status_code=204, responses={204: {"content": None}}) +def remove(cluster: Cluster, storage_node: StorageNode, device: Device, force: bool = False) -> Response: + if not device_controller.device_remove(device.get_id(), force): raise ValueError('Failed to remove device') return Response(status_code=204) +@instance_api.post('/restart', name='clusters:storage_nodes:devices:restart', status_code=204, responses={204: {"content": None}}) +def restart(cluster: Cluster, storage_node: StorageNode, device: Device, force: bool = False) -> Response: + if not device_controller.restart_device(device.get_id(), force): + raise ValueError('Failed to restart device') + + return Response(status_code=204) + @instance_api.get('/capacity', name='clusters:storage_nodes:devices:capacity') def capacity( diff --git a/simplyblock_web/api/v2/dtos.py b/simplyblock_web/api/v2/dtos.py index 1ad888351..0265c20e2 100644 --- a/simplyblock_web/api/v2/dtos.py +++ b/simplyblock_web/api/v2/dtos.py @@ -44,9 +44,10 @@ class ClusterDTO(BaseModel): name: Optional[str] nqn: str status: Literal['active', 'read_only', 'inactive', 'suspended', 'degraded', 'unready', 'in_activation', 'in_expansion'] - rebalancing: bool + is_re_balancing: bool block_size: util.Unsigned - coding: Tuple[util.Unsigned, util.Unsigned] + distr_ndcs: int + distr_npcs: int ha: bool utliziation_critical: util.Percent utilization_warning: util.Percent @@ -67,9 +68,10 @@ def from_model(model: Cluster, stat_obj: Optional[StatsObject]=None): name=model.cluster_name, nqn=model.nqn, status=model.status, # type: ignore - rebalancing=model.is_re_balancing, + is_re_balancing=model.is_re_balancing, block_size=model.blk_size, - coding=(model.distr_ndcs, model.distr_npcs), + distr_ndcs=model.distr_ndcs, + distr_npcs=model.distr_npcs, ha=model.ha_type == 'ha', utilization_warning=model.cap_warn, utliziation_critical=model.cap_crit, @@ -344,6 +346,9 @@ def from_model(model: LVol, request: Request, cluster_id: str, stat_obj: Optiona max_w_mbytes=model.w_mbytes_per_sec, allowed_hosts=[h["nqn"] for h in (model.allowed_hosts or [])], policy=eff_policy.policy_name if eff_policy else "", + capacity=CapacityStatDTO.from_model(stat_obj if stat_obj else StatsObject()), + rep_info=rep_info, + from_source=model.from_source, ) diff --git a/simplyblock_web/api/v2/pool.py b/simplyblock_web/api/v2/pool.py index 8ebc85639..cf9d8f882 100644 --- a/simplyblock_web/api/v2/pool.py +++ b/simplyblock_web/api/v2/pool.py @@ -20,12 +20,15 @@ @api.get('/', name='clusters:storage-pools:list') def list(cluster: Cluster) -> List[StoragePoolDTO]: - return [ - StoragePoolDTO.from_model(pool) - for pool - in db.get_pools() - if pool.cluster_id == cluster.get_id() - ] + data = [] + for pool in db.get_pools(): + if pool.cluster_id == cluster.get_id(): + stat_obj = None + ret = db.get_pool_stats(pool, 1) + if ret: + stat_obj = ret[0] + data.append(StoragePoolDTO.from_model(pool, stat_obj)) + return data class StoragePoolParams(BaseModel): @@ -37,6 +40,9 @@ class StoragePoolParams(BaseModel): max_r_mbytes: util.Unsigned = 0 max_w_mbytes: util.Unsigned = 0 sec_options: Optional[Dict[str, bool]] = None + cr_name: str = "" + cr_namespace: str = "" + cr_plural: str = "" @api.post('/', name='clusters:storage-pools:create', status_code=201, responses={201: {"content": None}}) @@ -51,14 +57,15 @@ def add(request: Request, cluster: Cluster, parameters: StoragePoolParams) -> Re id_or_false = pool_controller.add_pool( parameters.name, parameters.pool_max, parameters.volume_max_size, parameters.max_rw_iops, parameters.max_rw_mbytes, parameters.max_r_mbytes, parameters.max_w_mbytes, cluster.get_id(), + parameters.cr_name, parameters.cr_namespace, parameters.cr_plural, sec_options=parameters.sec_options, ) if not id_or_false: raise ValueError('Failed to create pool') - entity_url = request.app.url_path_for('clusters:storage-pools:detail', cluster_id=cluster.get_id(), pool_id=id_or_false) - return Response(status_code=201, headers={'Location': entity_url}) + pool = db.get_pool_by_id(id_or_false) + return pool.to_dict() instance_api = APIRouter(prefix='/{pool_id}') @@ -76,7 +83,11 @@ def _lookup_storage_pool(pool_id: UUID) -> PoolModel: @instance_api.get('/', name='clusters:storage-pools:detail') def get(cluster: Cluster, pool: StoragePool) -> StoragePoolDTO: - return StoragePoolDTO.from_model(pool) + stat_obj = None + ret = db.get_pool_stats(pool, 1) + if ret: + stat_obj = ret[0] + return StoragePoolDTO.from_model(pool, stat_obj) @instance_api.delete('/', name='clusters:storage-pools:delete', status_code=204, responses={204: {"content": None}}) @@ -98,6 +109,9 @@ class UpdatableStoragePoolParams(BaseModel): max_rw_mbytes: Optional[util.Unsigned] = None max_r_mbytes: Optional[util.Unsigned] = None max_w_mbytes: Optional[util.Unsigned] = None + lvols_cr_name: Optional[str] = None + lvols_cr_namespace: Optional[str] = None + lvols_cr_plural: Optional[str] = None @instance_api.put('/', name='clusters:storage-pools:update', status_code=204, responses={204: {"content": None}}) diff --git a/simplyblock_web/api/v2/storage_node.py b/simplyblock_web/api/v2/storage_node.py index 595e9a1f2..c62503b05 100644 --- a/simplyblock_web/api/v2/storage_node.py +++ b/simplyblock_web/api/v2/storage_node.py @@ -2,7 +2,7 @@ from typing import Annotated, List, Optional from uuid import UUID -from fastapi import APIRouter, Depends, HTTPException, Request, Response +from fastapi import APIRouter, Depends, HTTPException, Response from pydantic import BaseModel, Field from simplyblock_core.db_controller import DBController @@ -22,35 +22,41 @@ @api.get('/', name='clusters:storage-nodes:list') def list(cluster: Cluster) -> List[StorageNodeDTO]: - return [ - StorageNodeDTO.from_model(storage_node) - for storage_node - in db.get_storage_nodes_by_cluster_id(cluster.get_id()) - ] + data = [] + for storage_node in db.get_storage_nodes_by_cluster_id(cluster.get_id()): + node_stat_obj = None + ret = db.get_node_capacity(storage_node, 1) + if ret: + node_stat_obj = ret[0] + data.append(StorageNodeDTO.from_model(storage_node, node_stat_obj)) + return data class StorageNodeParams(BaseModel): node_address: Annotated[str, Field(web_utils.IP_PATTERN)] interface_name: str - max_snapshots: int = Field(500) - ha_jm: bool = Field(True) - test_device: bool = Field(False) - spdk_image: Optional[str] + max_snapshots: Optional[int] = Field(500) + ha_jm: Optional[bool] = Field(True) + test_device: Optional[bool] = Field(False) + spdk_image: Optional[str] = Field("") spdk_debug: bool = Field(False) - full_page_unmap: bool = Field(False) data_nics: List[str] = Field([]) namespace: str = Field('default') + id_device_by_nqn: Optional[bool] = Field(False) jm_percent: util.Percent = Field(3) partitions: int = Field(1) iobuf_small_pool_count: int = Field(0) iobuf_large_pool_count: int = Field(0) + cr_name: str = "" + cr_namespace: str = "" + cr_plural: str = "" ha_jm_count: int = Field(3) format_4k: bool = Field(False) - spdk_proxy_image: Optional[str] + spdk_proxy_image: Optional[str] = None @api.post('/', name='clusters:storage-nodes:create', status_code=201, responses={201: {"content": None}}) -def add(request: Request, cluster: Cluster, parameters: StorageNodeParams) -> Response: +def add(cluster: Cluster, parameters: StorageNodeParams): task_id_or_false = tasks_controller.add_node_add_task( cluster.get_id(), { @@ -68,7 +74,10 @@ def add(request: Request, cluster: Cluster, parameters: StorageNodeParams) -> Re 'enable_test_device': parameters.test_device, 'namespace': parameters.namespace, 'enable_ha_jm': parameters.ha_jm, - 'full_page_unmap': parameters.full_page_unmap, + 'id_device_by_nqn': parameters.id_device_by_nqn, + 'cr_name': parameters.cr_name, + 'cr_namespace': parameters.cr_namespace, + 'cr_plural': parameters.cr_plural, "ha_jm_count": parameters.ha_jm_count, "format_4k": parameters.format_4k, "spdk_proxy_image": parameters.spdk_proxy_image, @@ -76,9 +85,7 @@ def add(request: Request, cluster: Cluster, parameters: StorageNodeParams) -> Re ) if not task_id_or_false: raise ValueError('Failed to create add-node task') - - task_url = request.app.url_path_for('clusters:storage-nodes:detail', cluster_id=cluster.get_id(), task_id=task_id_or_false) - return Response(status_code=201, headers={'Location': task_url}) + return task_id_or_false instance_api = APIRouter(prefix='/{storage_node_id}') @@ -96,18 +103,29 @@ def _lookup_storage_node(storage_node_id: UUID) -> StorageNodeModel: @instance_api.get('/', name='clusters:storage-nodes:detail') def get(cluster: Cluster, storage_node: StorageNode): - return StorageNodeDTO.from_model(storage_node) + node_stat_obj = None + ret = db.get_node_capacity(storage_node, 1) + if ret: + node_stat_obj = ret[0] + return StorageNodeDTO.from_model(storage_node, node_stat_obj) @instance_api.delete('/', name='clusters:storage-nodes:delete') def delete( - cluster: Cluster, storage_node: StorageNode, force_remove: bool = False, force_migrate: bool = False) -> Response: + cluster: Cluster, storage_node: StorageNode, force_remove: bool = False, force_migrate: bool = False, force_delete: bool = False) -> Response: none_or_false = storage_node_ops.remove_storage_node( storage_node.get_id(), force_remove=force_remove, force_migrate=force_migrate ) if none_or_false == False: # noqa raise ValueError('Failed to remove storage node') + if force_delete: + none_or_false = storage_node_ops.delete_storage_node( + storage_node.get_id(), force=force_delete + ) + if none_or_false == False: # noqa + raise ValueError('Failed to delete storage node') + return Response(status_code=204) @@ -204,17 +222,19 @@ def shutdown(cluster: Cluster, storage_node: StorageNode, force: bool = False) - class _RestartParams(BaseModel): force: bool = False reattach_volume: bool = False + node_address: Optional[Annotated[str, Field(pattern=web_utils.IP_PATTERN)]] = None @instance_api.post('/start', name='clusters:storage-nodes:start', status_code=202, responses={202: {"content": None}}) # Same as restart for now @instance_api.post('/restart', name='clusters:storage-nodes:restart', status_code=202, responses={202: {"content": None}}) -def restart(cluster: Cluster, storage_node: StorageNode, parameters: _RestartParams = _RestartParams()) -> Response: +def restart(cluster: Cluster, storage_node: StorageNode, parameters: _RestartParams) -> Response: storage_node = storage_node Thread( target=storage_node_ops.restart_storage_node, kwargs={ "node_id": storage_node.get_id(), "force": parameters.force, + "node_ip": parameters.node_address, "reattach_volume": parameters.reattach_volume, } ).start() diff --git a/simplyblock_web/api/v2/task.py b/simplyblock_web/api/v2/task.py index c17bec3b7..b6702181c 100644 --- a/simplyblock_web/api/v2/task.py +++ b/simplyblock_web/api/v2/task.py @@ -5,8 +5,6 @@ from simplyblock_core.db_controller import DBController from simplyblock_core.models.job_schedule import JobSchedule -from simplyblock_core.controllers import tasks_controller - from .cluster import Cluster from .dtos import TaskDTO @@ -16,12 +14,13 @@ @api.get('/', name='clusters:tasks:list') def list(cluster: Cluster) -> List[TaskDTO]: - return [ - TaskDTO.from_model(task) - for task - in tasks_controller.list_tasks(cluster.get_id()) - if task.cluster_id == cluster.get_id() - ] + cluster_tasks = db.get_job_tasks(cluster.get_id(), limit=0) + data = [] + for t in cluster_tasks: + if t.function_name == JobSchedule.FN_DEV_MIG: + continue + data.append(t) + return [TaskDTO.from_model(task) for task in data] instance_api = APIRouter(prefix='/{task_id}') diff --git a/simplyblock_web/auth_middleware.py b/simplyblock_web/auth_middleware.py index 8a1a9e83a..5e935d976 100644 --- a/simplyblock_web/auth_middleware.py +++ b/simplyblock_web/auth_middleware.py @@ -34,6 +34,10 @@ def decorated(*args: Any, **kwargs: Any) -> ResponseType: # Skip authentication for Swagger UI if request.method == "GET" and request.path.startswith("/swagger"): return cast(ResponseType, f(*args, **kwargs)) + if request.method == "POST" and request.path.startswith("/cluster/create_first"): + return cast(ResponseType, f(*args, **kwargs)) + if request.method == "GET" and request.path.startswith("/health/fdb"): + return cast(ResponseType, f(*args, **kwargs)) cluster_id: str = "" cluster_secret: str = "" From 16d444896c03360e31f47773e2f04c802861c64c Mon Sep 17 00:00:00 2001 From: hamdykhader Date: Sat, 28 Mar 2026 02:25:13 +0300 Subject: [PATCH 15/70] feat: update cluster image tag for snapshot replication --- simplyblock_core/scripts/charts/values.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/simplyblock_core/scripts/charts/values.yaml b/simplyblock_core/scripts/charts/values.yaml index 8ca7060b2..c622be010 100644 --- a/simplyblock_core/scripts/charts/values.yaml +++ b/simplyblock_core/scripts/charts/values.yaml @@ -244,7 +244,7 @@ simplyblock: storageNodes: name: simplyblock-node - clusterImage: simplyblock/simplyblock:main-snap-repl + clusterImage: simplyblock/simplyblock:main-snapshot-replication mgmtIfc: eth0 maxLVol: 10 maxSize: 0 From d9b56b4323cb9a6279bcc7adea2b18876300ca46 Mon Sep 17 00:00:00 2001 From: hamdykhader Date: Sat, 28 Mar 2026 02:34:00 +0300 Subject: [PATCH 16/70] feat: remove unused bdev_lvol_create_poller_group method from rpc_client --- simplyblock_core/rpc_client.py | 6 ------ 1 file changed, 6 deletions(-) diff --git a/simplyblock_core/rpc_client.py b/simplyblock_core/rpc_client.py index cadbd12f1..28f1d6d33 100644 --- a/simplyblock_core/rpc_client.py +++ b/simplyblock_core/rpc_client.py @@ -1235,12 +1235,6 @@ def bdev_distrib_check_inflight_io(self, jm_vuid): } return self._request("bdev_distrib_check_inflight_io", params) - def bdev_lvol_create_poller_group(self, cpu_mask): - params = { - "cpu_mask": cpu_mask, - } - return self._request("bdev_lvol_create_poller_group", params) - def bdev_lvol_transfer(self, lvol_name, offset, cluster_batch, gateway, operation): # --operation {migrate,replicate} params = { From e2053bce085934002360b63d80f6a35220493981 Mon Sep 17 00:00:00 2001 From: hamdykhader Date: Sat, 28 Mar 2026 03:05:30 +0300 Subject: [PATCH 17/70] feat: refactor lvol cloning logic and remove unused RPC methods --- .../controllers/lvol_controller.py | 43 +----------------- simplyblock_core/rpc_client.py | 45 ------------------- .../services/snapshot_replication.py | 4 +- simplyblock_core/utils/__init__.py | 1 + 4 files changed, 4 insertions(+), 89 deletions(-) diff --git a/simplyblock_core/controllers/lvol_controller.py b/simplyblock_core/controllers/lvol_controller.py index 38db9867e..ac90a812c 100644 --- a/simplyblock_core/controllers/lvol_controller.py +++ b/simplyblock_core/controllers/lvol_controller.py @@ -2535,47 +2535,6 @@ def replicate_lvol_on_source_cluster(lvol_id, cluster_id=None, pool_uuid=None): return new_lvol.lvol_uuid - -def clone_lvol(lvol_id, clone_name): - # create snapshot and clone it - db_controller = DBController() - try: - lvol = db_controller.get_lvol_by_id(lvol_id) - except KeyError as e: - logger.error(e) - return False - - try: - snapshot_uuid = None - for i in range(10): - snapshot_uuid, err = snapshot_controller.add(lvol_id, clone_name) - if err: - logger.error(err) - time.sleep(3) - continue - else: - if not snapshot_uuid: - logger.error("Failed to create snapshot for clone after 10 attempts") - return False - new_lvol_uuid = None - for i in range(10): - new_lvol_uuid, err = snapshot_controller.clone(snapshot_uuid, clone_name) - if err: - logger.error(err) - time.sleep(3) - continue - else: - if not new_lvol_uuid: - logger.error("Failed to clone lvol after 10 attempts") - if snapshot_uuid: - snapshot_controller.delete(snapshot_uuid) - return False - - return new_lvol_uuid - except Exception as e: - logger.error(e) - return False - def _build_host_entries(allowed_hosts, sec_options=None): """Build the allowed_hosts list with auto-generated keys. @@ -2754,7 +2713,7 @@ def remove_host_from_lvol(lvol_id, host_nqn): def clone_lvol(lvol_id, clone_name, new_size=None, pvc_name=None): db_controller = DBController() try: - lvol = db_controller.get_lvol_by_id(lvol_id) + _ = db_controller.get_lvol_by_id(lvol_id) except KeyError as e: logger.error(e) return False diff --git a/simplyblock_core/rpc_client.py b/simplyblock_core/rpc_client.py index 28f1d6d33..12012a51b 100644 --- a/simplyblock_core/rpc_client.py +++ b/simplyblock_core/rpc_client.py @@ -1235,44 +1235,6 @@ def bdev_distrib_check_inflight_io(self, jm_vuid): } return self._request("bdev_distrib_check_inflight_io", params) - def bdev_lvol_transfer(self, lvol_name, offset, cluster_batch, gateway, operation): - # --operation {migrate,replicate} - params = { - "lvol_name": lvol_name, - "offset": offset, - "cluster_batch": cluster_batch, - "gateway": gateway, - "operation": operation, - } - return self._request("bdev_lvol_transfer", params) - - def bdev_lvol_transfer_stat(self, lvol_name): - """ - example: - ./rpc.py bdev_lvol_transfer_stat lvs_raid0_lvol/snapshot_1 - { - "transfer_state": "No process", - "offset": 0 - } - transfer_state values: - - No process - - In progress - - Failed - - Done - """ - params = { - "lvol_name": lvol_name, - } - return self._request("bdev_lvol_transfer_stat", params) - - def bdev_lvol_convert(self, lvol_name): - """ - convert lvol to snapshot - """ - params = { - "lvol_name": lvol_name, - } - return self._request("bdev_lvol_convert", params) def bdev_lvol_remove_from_group(self, group_id, lvol_name_list): params = { @@ -1322,13 +1284,6 @@ def nvmf_port_unblock_rdma(self, port): def nvmf_get_blocked_ports_rdma(self): return self._request("nvmf_get_blocked_ports") - def bdev_lvol_add_clone(self, lvol_name, child_name): - params = { - "lvol_name": lvol_name, - "child_name": child_name, - } - return self._request("bdev_lvol_add_clone", params) - def bdev_raid_get_bdevs(self): params = { "category": "online" diff --git a/simplyblock_core/services/snapshot_replication.py b/simplyblock_core/services/snapshot_replication.py index 61e52d460..76b9ab84d 100644 --- a/simplyblock_core/services/snapshot_replication.py +++ b/simplyblock_core/services/snapshot_replication.py @@ -167,7 +167,7 @@ def process_snap_replicate_finish(task, snapshot): # chain snaps on primary if target_prev_snap: logger.info(f"Chaining replicated lvol: {remote_lv.top_bdev} to snap: {target_prev_snap.snap_bdev}") - ret = remote_snode.rpc_client().bdev_lvol_add_clone(target_prev_snap.snap_bdev, remote_lv.top_bdev) + ret = remote_snode.rpc_client().bdev_lvol_add_clone( remote_lv.top_bdev, target_prev_snap.snap_bdev) if not ret: logger.error("Failed to chain replicated snapshot on primary node") return False @@ -183,7 +183,7 @@ def process_snap_replicate_finish(task, snapshot): if sec_node.status == StorageNode.STATUS_ONLINE: if target_prev_snap: logger.info(f"Chaining replicated lvol: {remote_lv.top_bdev} to snap: {target_prev_snap.snap_bdev}") - ret = sec_node.rpc_client().bdev_lvol_add_clone(target_prev_snap.snap_bdev, remote_lv.top_bdev) + ret = sec_node.rpc_client().bdev_lvol_add_clone(remote_lv.top_bdev, target_prev_snap.snap_bdev) if not ret: logger.error("Failed to chain replicated snapshot on secondary node") return False diff --git a/simplyblock_core/utils/__init__.py b/simplyblock_core/utils/__init__.py index c493448b7..f4eba9186 100644 --- a/simplyblock_core/utils/__init__.py +++ b/simplyblock_core/utils/__init__.py @@ -11,6 +11,7 @@ import sys import uuid import time +from datetime import datetime, timezone from typing import Union, Any, Optional, Tuple, List, Dict, Iterable from docker import DockerClient from kubernetes import client, config From de3bf56a98fd270dce306327f6d48585ba9507b7 Mon Sep 17 00:00:00 2001 From: hamdykhader Date: Sat, 28 Mar 2026 05:12:10 +0300 Subject: [PATCH 18/70] fix code checks --- requirements.txt | 3 +- simplyblock_core/cluster_ops.py | 4 +- .../controllers/lvol_controller.py | 56 +++++++++++++++---- .../controllers/migration_controller.py | 2 +- .../controllers/snapshot_controller.py | 31 ++++++++++ simplyblock_core/mgmt_node_ops.py | 2 +- simplyblock_core/services/lvol_monitor.py | 1 - .../services/spdk_http_proxy_server.py | 56 ++++++++++--------- .../services/tasks_runner_lvol_migration.py | 4 +- simplyblock_core/utils/__init__.py | 2 +- 10 files changed, 113 insertions(+), 48 deletions(-) diff --git a/requirements.txt b/requirements.txt index 9ee458f00..2bd6493f1 100644 --- a/requirements.txt +++ b/requirements.txt @@ -24,4 +24,5 @@ flask-openapi3 jsonschema fastapi uvicorn -prometheus_api_client \ No newline at end of file +prometheus_api_client +paramiko \ No newline at end of file diff --git a/simplyblock_core/cluster_ops.py b/simplyblock_core/cluster_ops.py index 9a2039440..016cb6f84 100644 --- a/simplyblock_core/cluster_ops.py +++ b/simplyblock_core/cluster_ops.py @@ -614,8 +614,8 @@ def cluster_activate(cl_id, force=False, force_lvstore_create=False) -> None: records = db_controller.get_cluster_capacity(cluster) max_size = records[0]['size_total'] - used_nodes_as_sec = [] - used_nodes_as_sec_2 = [] + used_nodes_as_sec: list[str] = [] + used_nodes_as_sec_2: list[str] = [] snodes = db_controller.get_storage_nodes_by_cluster_id(cl_id) if cluster.ha_type == "ha": for snode in snodes: diff --git a/simplyblock_core/controllers/lvol_controller.py b/simplyblock_core/controllers/lvol_controller.py index ac90a812c..20803274d 100644 --- a/simplyblock_core/controllers/lvol_controller.py +++ b/simplyblock_core/controllers/lvol_controller.py @@ -1458,12 +1458,12 @@ def get_replication_info(lvol_id_or_name): tasks = [] snaps = [] out = { - "last_snapshot_id": None, - "last_replication_time": None, - "last_replication_duration": None, - "replicated_count": None, - "snaps": None, - "tasks": None, + "last_snapshot_id": "", + "last_replication_time": "", + "last_replication_duration": 0, + "replicated_count": 0, + "snaps": [], + "tasks": [], } node = db_controller.get_storage_node_by_id(lvol.node_id) for task in db_controller.get_job_tasks(node.cluster_id): @@ -2087,12 +2087,12 @@ def replication_trigger(lvol_id): snaps = [] out = { "lvol": lvol, - "last_snapshot_id": None, - "last_replication_time": None, - "last_replication_duration": None, - "replicated_count": None, - "snaps": None, - "tasks": None, + "last_snapshot_id": "", + "last_replication_time": "", + "last_replication_duration": 0, + "replicated_count": 0, + "snaps": [], + "tasks": [], } for task in db_controller.get_job_tasks(node.cluster_id): if task.function_name == JobSchedule.FN_SNAPSHOT_REPLICATION: @@ -2172,6 +2172,38 @@ def replication_start(lvol_id, replication_cluster_id=None): return True +def list_by_node(node_id=None, is_json=False): + db_controller = DBController() + lvols = db_controller.get_lvols() + lvols = sorted(lvols, key=lambda x: x.create_dt) + data = [] + for lvol in lvols: + if node_id: + if lvol.node_id != node_id: + continue + logger.debug(lvol) + cloned_from_snap = "" + if lvol.cloned_from_snap: + snap = db_controller.get_snapshot_by_id(lvol.cloned_from_snap) + cloned_from_snap = snap.snap_uuid + data.append({ + "UUID": lvol.uuid, + "BDdev UUID": lvol.lvol_uuid, + "BlobID": lvol.blobid, + "Name": lvol.lvol_name, + "Size": utils.humanbytes(lvol.size), + "LVS name": lvol.lvs_name, + "BDev": lvol.lvol_bdev, + "Node ID": lvol.node_id, + "Clone From Snap BDev": cloned_from_snap, + "Created At": lvol.create_dt, + "Status": lvol.status, + }) + if is_json: + return json.dumps(data, indent=2) + return utils.print_table(data) + + def replication_stop(lvol_id, delete=False): db_controller = DBController() try: diff --git a/simplyblock_core/controllers/migration_controller.py b/simplyblock_core/controllers/migration_controller.py index 1bff23465..0180379ea 100644 --- a/simplyblock_core/controllers/migration_controller.py +++ b/simplyblock_core/controllers/migration_controller.py @@ -385,7 +385,7 @@ def get_snaps_to_delete_on_target(migration): preexisting = set(migration.snaps_preexisting_on_target) # Rule 2: protect snaps referenced by other target lvols - protected = set() + protected: set[str] = set() target_lvols = db.get_lvols_by_node_id(migration.target_node_id) for lvol in target_lvols: if lvol.uuid == migration.lvol_id: diff --git a/simplyblock_core/controllers/snapshot_controller.py b/simplyblock_core/controllers/snapshot_controller.py index 7d6029932..57471501a 100644 --- a/simplyblock_core/controllers/snapshot_controller.py +++ b/simplyblock_core/controllers/snapshot_controller.py @@ -306,6 +306,7 @@ def list(all=False, cluster_id=None, with_details=False): "Created At": time.strftime("%H:%M:%S, %d/%m/%Y", time.gmtime(snap.created_at)), "Base Snapshot": snap.snap_ref_id, "Clones": clones, + "Status": snap.status, } if with_details: d["Replication target snap"] = snap.target_replicated_snap_uuid @@ -816,3 +817,33 @@ def set(snapshot_uuid, attr, value) -> bool: snap.write_to_db() return True +def list_by_node(node_id=None, is_json=False): + snaps = db_controller.get_snapshots() + snaps = sorted(snaps, key=lambda snap: snap.created_at) + data = [] + for snap in snaps: + if node_id: + if snap.lvol.node_id != node_id: + continue + logger.debug(snap) + clones = [] + for lvol in db_controller.get_lvols(): + if lvol.cloned_from_snap and lvol.cloned_from_snap == snap.get_id(): + clones.append(lvol.get_id()) + data.append({ + "UUID": snap.uuid, + "BDdev UUID": snap.snap_uuid, + "BlobID": snap.blobid, + "Name": snap.snap_name, + "Size": utils.humanbytes(snap.used_size), + "BDev": snap.snap_bdev.split("/")[1], + "Node ID": snap.lvol.node_id, + "LVol ID": snap.lvol.get_id(), + "Created At": time.strftime("%H:%M:%S, %d/%m/%Y", time.gmtime(snap.created_at)), + "Base Snapshot": snap.snap_ref_id, + "Clones": clones, + "Status": snap.status, + }) + if is_json: + return json.dumps(data, indent=2) + return utils.print_table(data) \ No newline at end of file diff --git a/simplyblock_core/mgmt_node_ops.py b/simplyblock_core/mgmt_node_ops.py index b72cffbef..6d752a86c 100644 --- a/simplyblock_core/mgmt_node_ops.py +++ b/simplyblock_core/mgmt_node_ops.py @@ -220,7 +220,7 @@ def deploy_mgmt_node(cluster_ip, cluster_id, ifname, mgmt_ip, cluster_secret, mo def add_mgmt_node(mgmt_ip, mode, cluster_id=None): db_controller = DBController() - hostname = None + hostname = "" if mode == "docker": hostname = utils.get_hostname() try: diff --git a/simplyblock_core/services/lvol_monitor.py b/simplyblock_core/services/lvol_monitor.py index f0d3eae01..3b99fe8e1 100644 --- a/simplyblock_core/services/lvol_monitor.py +++ b/simplyblock_core/services/lvol_monitor.py @@ -131,7 +131,6 @@ def process_lvol_delete_finish(lvol): non_leader_nodes.append(db.get_storage_node_by_id(node_id)) except KeyError: pass - sec_node = non_leader_nodes[0] if non_leader_nodes else None # 3-1 async delete lvol bdev from primary primary_node = db.get_storage_node_by_id(leader_node.get_id()) diff --git a/simplyblock_core/services/spdk_http_proxy_server.py b/simplyblock_core/services/spdk_http_proxy_server.py index 6bef2246a..3cb2e684c 100644 --- a/simplyblock_core/services/spdk_http_proxy_server.py +++ b/simplyblock_core/services/spdk_http_proxy_server.py @@ -27,33 +27,35 @@ def print_stats(): try: time.sleep(3) t = time.time_ns() - read_line_time_diff_max = max(list(read_line_time_diff.values())) - read_line_time_diff_avg = int(sum(list(read_line_time_diff.values()))/len(read_line_time_diff)) - last_3_sec = [] - for k,v in read_line_time_diff.items(): - if k > t - 3*1000*1000*1000: - last_3_sec.append(v) - if len(last_3_sec) > 0: - read_line_time_diff_avg_last_3_sec = int(sum(last_3_sec)/len(last_3_sec)) - else: - read_line_time_diff_avg_last_3_sec = 0 - logger.info(f"Periodic stats: {t}: read_line_time: max={read_line_time_diff_max} ns, avg={read_line_time_diff_avg} ns, last_3s_avg={read_line_time_diff_avg_last_3_sec} ns") - if len(read_line_time_diff) > 10000: - read_line_time_diff.clear() - - recv_from_spdk_time_max = max(list(recv_from_spdk_time_diff.values())) - recv_from_spdk_time_avg = int(sum(list(recv_from_spdk_time_diff.values()))/len(recv_from_spdk_time_diff)) - last_3_sec = [] - for k,v in recv_from_spdk_time_diff.items(): - if k > t - 3*1000*1000*1000: - last_3_sec.append(v) - if len(last_3_sec) > 0: - recv_from_spdk_time_avg_last_3_sec = int(sum(last_3_sec)/len(last_3_sec)) - else: - recv_from_spdk_time_avg_last_3_sec = 0 - logger.info(f"Periodic stats: {t}: recv_from_spdk_time: max={recv_from_spdk_time_max} ns, avg={recv_from_spdk_time_avg} ns, last_3s_avg={recv_from_spdk_time_avg_last_3_sec} ns") - if len(recv_from_spdk_time_diff) > 10000: - recv_from_spdk_time_diff.clear() + if len(read_line_time_diff) > 0: + read_line_time_diff_max = max(list(read_line_time_diff.values())) + read_line_time_diff_avg = int(sum(list(read_line_time_diff.values()))/len(read_line_time_diff)) + last_3_sec = [] + for k,v in read_line_time_diff.items(): + if k > t - 3*1000*1000*1000: + last_3_sec.append(v) + if len(last_3_sec) > 0: + read_line_time_diff_avg_last_3_sec = int(sum(last_3_sec)/len(last_3_sec)) + else: + read_line_time_diff_avg_last_3_sec = 0 + logger.info(f"Periodic stats: {t}: read_line_time: max={read_line_time_diff_max} ns, avg={read_line_time_diff_avg} ns, last_3s_avg={read_line_time_diff_avg_last_3_sec} ns") + if len(read_line_time_diff) > 10000: + read_line_time_diff.clear() + + if len(recv_from_spdk_time_diff) > 0: + recv_from_spdk_time_max = max(list(recv_from_spdk_time_diff.values())) + recv_from_spdk_time_avg = int(sum(list(recv_from_spdk_time_diff.values()))/len(recv_from_spdk_time_diff)) + last_3_sec = [] + for k,v in recv_from_spdk_time_diff.items(): + if k > t - 3*1000*1000*1000: + last_3_sec.append(v) + if len(last_3_sec) > 0: + recv_from_spdk_time_avg_last_3_sec = int(sum(last_3_sec)/len(last_3_sec)) + else: + recv_from_spdk_time_avg_last_3_sec = 0 + logger.info(f"Periodic stats: {t}: recv_from_spdk_time: max={recv_from_spdk_time_max} ns, avg={recv_from_spdk_time_avg} ns, last_3s_avg={recv_from_spdk_time_avg_last_3_sec} ns") + if len(recv_from_spdk_time_diff) > 10000: + recv_from_spdk_time_diff.clear() except Exception as e: logger.error(e) diff --git a/simplyblock_core/services/tasks_runner_lvol_migration.py b/simplyblock_core/services/tasks_runner_lvol_migration.py index bfe002a5e..1f906a137 100644 --- a/simplyblock_core/services/tasks_runner_lvol_migration.py +++ b/simplyblock_core/services/tasks_runner_lvol_migration.py @@ -595,7 +595,7 @@ def _handle_snap_copy(migration, src_node, tgt_node, src_rpc, tgt_rpc): return False, True, _WAIT break # one check is enough - transfers = [] + transfers: list[dict] = [] for snap_uuid in unprocessed: snap_index = plan.index(snap_uuid) try: @@ -1474,7 +1474,7 @@ def task_runner(task): elif phase == LVolMigration.PHASE_CLEANUP_TARGET: done, suspend, error = _handle_cleanup_target(migration, tgt_node, tgt_rpc) - next_phase = None # terminal failure path + next_phase = "" # terminal failure path else: _fail_task(task, migration, f"unknown phase: {phase}") diff --git a/simplyblock_core/utils/__init__.py b/simplyblock_core/utils/__init__.py index f4eba9186..7378ce9a8 100644 --- a/simplyblock_core/utils/__init__.py +++ b/simplyblock_core/utils/__init__.py @@ -902,7 +902,7 @@ def get_next_lvstore_ports(cluster_id): """Allocate two consecutive NVMe-oF ports for a new lvstore (lvol_subsys + hublvol).""" nvmf_base, _, _ = _get_cluster_port_config(cluster_id) used_ports = _get_all_nvmf_ports(cluster_id) - ports = [] + ports: list[int] = [] next_port = nvmf_base while len(ports) < 2: if next_port not in used_ports: From bd74372abd754021c5cf93e9318879e1ef2a0612 Mon Sep 17 00:00:00 2001 From: hamdykhader Date: Sat, 28 Mar 2026 05:18:08 +0300 Subject: [PATCH 19/70] fix code checks --- tests/test_failover_failback_combinations.py | 12 ++++++------ tests/test_rpc_client_cache.py | 4 +--- tests/test_spdk_proxy_e2e.py | 4 +--- tests/test_spdk_proxy_unit.py | 3 ++- 4 files changed, 10 insertions(+), 13 deletions(-) diff --git a/tests/test_failover_failback_combinations.py b/tests/test_failover_failback_combinations.py index 5e6f8c00f..ab993c10d 100644 --- a/tests/test_failover_failback_combinations.py +++ b/tests/test_failover_failback_combinations.py @@ -20,7 +20,7 @@ """ import unittest -from unittest.mock import MagicMock, patch, call +from unittest.mock import MagicMock, patch from simplyblock_core.models.cluster import Cluster from simplyblock_core.models.lvol_model import LVol @@ -261,8 +261,8 @@ def test_ftt1_failover_primary_to_secondary(self): # First secondary should be set to optimized ana_calls = rpc.nvmf_subsystem_listener_set_ana_state.call_args_list - optimized_calls = [c for c in ana_calls if c[1].get('ana_state') == 'optimized' - or (len(c[0]) > 0 and 'optimized' in str(c))] + # optimized_calls = [c for c in ana_calls if c[1].get('ana_state') == 'optimized' + # or (len(c[0]) > 0 and 'optimized' in str(c))] self.assertTrue(len(ana_calls) > 0, "Should have ANA state change calls") def test_ftt2_failover_primary_to_both_secondaries(self): @@ -342,7 +342,7 @@ def test_ftt1_failback_secondary_to_primary(self): rpc = self._run_failback(nodes, "node-1", lvols) - ana_calls = rpc.nvmf_subsystem_listener_set_ana_state.call_args_list + # ana_calls = rpc.nvmf_subsystem_listener_set_ana_state.call_args_list # With FTT=1, no secondary_node_id_2, so _failback_primary_ana not called # (it requires secondary_node_id_2). No-op for FTT=1 via this path. # The actual failback for FTT=1 happens inside recreate_lvstore. @@ -378,7 +378,7 @@ def test_ftt2_failback_first_sec_restarts_second_sec_offline(self): lvols = [_lvol("lv1", "node-1")] nodes["node-3"].status = StorageNode.STATUS_OFFLINE - rpc = self._run_failback(nodes, "node-2", lvols) + _ = self._run_failback(nodes, "node-2", lvols) # With second sec offline, failback for first sec role doesn't demote anyone # (second sec is not online so it's skipped) @@ -619,7 +619,7 @@ def test_primary_offline_first_sec_restarts_drops_leadership_on_second_sec( nodes = _build_ftt2_nodes() nodes["node-1"].status = StorageNode.STATUS_OFFLINE # primary offline secondary = nodes["node-2"] # first secondary, restarting - second_sec = nodes["node-3"] # second secondary, online + # second_sec = nodes["node-3"] # second secondary, online lvols = [_lvol("lv1", "node-1")] db = _make_db_mock(nodes, lvols) diff --git a/tests/test_rpc_client_cache.py b/tests/test_rpc_client_cache.py index 3f8836110..1853cba48 100644 --- a/tests/test_rpc_client_cache.py +++ b/tests/test_rpc_client_cache.py @@ -4,13 +4,11 @@ the cached wrappers get_bdevs / subsystem_list. """ -import json import time import threading import unittest -from unittest.mock import patch, MagicMock +from unittest.mock import patch -from simplyblock_core import rpc_client as mod from simplyblock_core.rpc_client import RPCClient, _rpc_cache, _rpc_cache_lock diff --git a/tests/test_spdk_proxy_e2e.py b/tests/test_spdk_proxy_e2e.py index 419d1b5a9..c2379e56c 100644 --- a/tests/test_spdk_proxy_e2e.py +++ b/tests/test_spdk_proxy_e2e.py @@ -12,7 +12,6 @@ import base64 import json import os -import socket import socketserver import sys import tempfile @@ -264,7 +263,6 @@ class TestProxyReadinessGate(unittest.TestCase): def test_proxy_waits_for_spdk(self): """Proxy should not accept HTTP requests until SPDK responds.""" - import simplyblock_core.services.spdk_http_proxy_server as mod tmpdir = tempfile.mkdtemp() sock_path = os.path.join(tmpdir, "spdk_delayed.sock") @@ -282,7 +280,7 @@ def delayed_spdk(): spdk_thread = threading.Thread(target=delayed_spdk, daemon=True) spdk_thread.start() - start = time.monotonic() + _ = time.monotonic() _, stop_event, mod_ref = _start_proxy(sock_path, http_port, max_concurrent=4, timeout=5) # Wait for proxy to come up diff --git a/tests/test_spdk_proxy_unit.py b/tests/test_spdk_proxy_unit.py index 54db2bfcb..581ca3cbb 100644 --- a/tests/test_spdk_proxy_unit.py +++ b/tests/test_spdk_proxy_unit.py @@ -10,7 +10,8 @@ import unittest from unittest.mock import patch, MagicMock -import sys, os +import sys +import os sys.path.insert(0, os.path.dirname(__file__)) from conftest_proxy import import_proxy_module From e5df8ced87c8652c72562b9824bedcb348f2f807 Mon Sep 17 00:00:00 2001 From: hamdykhader Date: Sat, 28 Mar 2026 05:28:52 +0300 Subject: [PATCH 20/70] fix code checks --- simplyblock_core/cluster_ops.py | 4 ++-- simplyblock_core/controllers/lvol_controller.py | 8 ++++---- simplyblock_web/app.py | 16 ++++++++-------- tests/test_failover_failback_combinations.py | 2 +- 4 files changed, 15 insertions(+), 15 deletions(-) diff --git a/simplyblock_core/cluster_ops.py b/simplyblock_core/cluster_ops.py index 016cb6f84..d8438c183 100644 --- a/simplyblock_core/cluster_ops.py +++ b/simplyblock_core/cluster_ops.py @@ -614,8 +614,8 @@ def cluster_activate(cl_id, force=False, force_lvstore_create=False) -> None: records = db_controller.get_cluster_capacity(cluster) max_size = records[0]['size_total'] - used_nodes_as_sec: list[str] = [] - used_nodes_as_sec_2: list[str] = [] + used_nodes_as_sec: t.List[str] = [] + used_nodes_as_sec_2: t.List[str] = [] snodes = db_controller.get_storage_nodes_by_cluster_id(cl_id) if cluster.ha_type == "ha": for snode in snodes: diff --git a/simplyblock_core/controllers/lvol_controller.py b/simplyblock_core/controllers/lvol_controller.py index 20803274d..bb1794cf7 100644 --- a/simplyblock_core/controllers/lvol_controller.py +++ b/simplyblock_core/controllers/lvol_controller.py @@ -1460,7 +1460,7 @@ def get_replication_info(lvol_id_or_name): out = { "last_snapshot_id": "", "last_replication_time": "", - "last_replication_duration": 0, + "last_replication_duration": "", "replicated_count": 0, "snaps": [], "tasks": [], @@ -1495,7 +1495,7 @@ def get_replication_info(lvol_id_or_name): elif "start_time" in last_task.function_params: duration = utils.strfdelta_seconds(int(time.time()) - last_task.function_params["start_time"]) else: - duration = 0 + duration = "" out["last_replication_duration"] = duration return out @@ -2089,7 +2089,7 @@ def replication_trigger(lvol_id): "lvol": lvol, "last_snapshot_id": "", "last_replication_time": "", - "last_replication_duration": 0, + "last_replication_duration": "", "replicated_count": 0, "snaps": [], "tasks": [], @@ -2117,7 +2117,7 @@ def replication_trigger(lvol_id): last_snap = db_controller.get_snapshot_by_id(last_task.function_params["snapshot_id"]) out["last_snapshot_id"] = last_snap.get_id() out["last_replication_time"] = last_task.updated_at - duration = 0 + duration = "" if "start_time" in last_task.function_params: if "end_time" in last_task.function_params: duration = utils.strfdelta_seconds( diff --git a/simplyblock_web/app.py b/simplyblock_web/app.py index 1e3311d1c..ca40c9b9b 100644 --- a/simplyblock_web/app.py +++ b/simplyblock_web/app.py @@ -25,14 +25,14 @@ app.mount('/api/v1', WSGIMiddleware(v1.api)) # For some reason this fails if done in `api/__init__.py` -@app.route('/', methods=['GET']) -@app.route('/cluster/{full_path:path}', methods=['GET', 'POST', 'PUT', 'DELETE']) -@app.route('/mgmtnode/{full_path:path}', methods=['GET', 'POST', 'PUT', 'DELETE']) -@app.route('/device/{full_path:path}', methods=['GET', 'POST', 'PUT', 'DELETE']) -@app.route('/lvol/{full_path:path}', methods=['GET', 'POST', 'PUT', 'DELETE']) -@app.route('/snapshot/{full_path:path}', methods=['GET', 'POST', 'PUT', 'DELETE']) -@app.route('/storagenode/{full_path:path}', methods=['GET', 'POST', 'PUT', 'DELETE']) -@app.route('/pool/{full_path:path}', methods=['GET', 'POST', 'PUT', 'DELETE']) +@app.route('/', methods=['GET']) # type: ignore[attr-defined] +@app.route('/cluster/{full_path:path}', methods=['GET', 'POST', 'PUT', 'DELETE']) # type: ignore[attr-defined] +@app.route('/mgmtnode/{full_path:path}', methods=['GET', 'POST', 'PUT', 'DELETE']) # type: ignore[attr-defined] +@app.route('/device/{full_path:path}', methods=['GET', 'POST', 'PUT', 'DELETE']) # type: ignore[attr-defined] +@app.route('/lvol/{full_path:path}', methods=['GET', 'POST', 'PUT', 'DELETE']) # type: ignore[attr-defined] +@app.route('/snapshot/{full_path:path}', methods=['GET', 'POST', 'PUT', 'DELETE']) # type: ignore[attr-defined] +@app.route('/storagenode/{full_path:path}', methods=['GET', 'POST', 'PUT', 'DELETE']) # type: ignore[attr-defined] +@app.route('/pool/{full_path:path}', methods=['GET', 'POST', 'PUT', 'DELETE']) # type: ignore[attr-defined] def redirect_legacy(request: Request) -> RedirectResponse: """ Redirect legacy API routes to their corresponding v1 endpoints. diff --git a/tests/test_failover_failback_combinations.py b/tests/test_failover_failback_combinations.py index ab993c10d..8056afa5a 100644 --- a/tests/test_failover_failback_combinations.py +++ b/tests/test_failover_failback_combinations.py @@ -340,7 +340,7 @@ def test_ftt1_failback_secondary_to_primary(self): nodes = _build_ftt1_nodes() lvols = [_lvol("lv1", "node-1")] - rpc = self._run_failback(nodes, "node-1", lvols) + _ = self._run_failback(nodes, "node-1", lvols) # ana_calls = rpc.nvmf_subsystem_listener_set_ana_state.call_args_list # With FTT=1, no secondary_node_id_2, so _failback_primary_ana not called From 2785fb77de863511584b9ecd6e506b689f562e5e Mon Sep 17 00:00:00 2001 From: hamdykhader Date: Sat, 28 Mar 2026 06:16:27 +0300 Subject: [PATCH 21/70] fix: comment out print_stats_thread.start() to prevent premature thread execution --- simplyblock_core/services/spdk_http_proxy_server.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/simplyblock_core/services/spdk_http_proxy_server.py b/simplyblock_core/services/spdk_http_proxy_server.py index 5920fa973..e6a1a0279 100644 --- a/simplyblock_core/services/spdk_http_proxy_server.py +++ b/simplyblock_core/services/spdk_http_proxy_server.py @@ -249,7 +249,7 @@ def run_server(host, port, user, password, is_threading_enabled=False): # encoding user and password key = base64.b64encode((user+':'+password).encode(encoding='ascii')).decode('ascii') print_stats_thread = threading.Thread(target=print_stats, ) - print_stats_thread.start() + # print_stats_thread.start() wait_for_spdk_ready() try: ServerHandler.key = key From f159fbf8b18021318beea4c5cebafb03c3e94ffb Mon Sep 17 00:00:00 2001 From: hamdykhader Date: Sat, 28 Mar 2026 06:55:35 +0300 Subject: [PATCH 22/70] fix: modify print_stats_thread to control execution flow with do_run attribute --- simplyblock_core/services/spdk_http_proxy_server.py | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/simplyblock_core/services/spdk_http_proxy_server.py b/simplyblock_core/services/spdk_http_proxy_server.py index e6a1a0279..57bdf533c 100644 --- a/simplyblock_core/services/spdk_http_proxy_server.py +++ b/simplyblock_core/services/spdk_http_proxy_server.py @@ -23,7 +23,8 @@ read_line_time_diff: dict = {} recv_from_spdk_time_diff: dict = {} def print_stats(): - while True: + t = threading.current_thread() + while getattr(t, "do_run", True): try: time.sleep(3) t = time.time_ns() @@ -249,7 +250,8 @@ def run_server(host, port, user, password, is_threading_enabled=False): # encoding user and password key = base64.b64encode((user+':'+password).encode(encoding='ascii')).decode('ascii') print_stats_thread = threading.Thread(target=print_stats, ) - # print_stats_thread.start() + print_stats_thread.do_run = True + print_stats_thread.start() wait_for_spdk_ready() try: ServerHandler.key = key @@ -260,6 +262,7 @@ def run_server(host, port, user, password, is_threading_enabled=False): except KeyboardInterrupt: logger.info('Shutting down server') httpd.socket.close() + print_stats_thread.do_run = False TIMEOUT = int(get_env_var("TIMEOUT", is_required=False, default=60*5)) From cf85f83498817121bfb7302ae8a39f21fab6d021 Mon Sep 17 00:00:00 2001 From: hamdykhader Date: Sat, 28 Mar 2026 07:05:53 +0300 Subject: [PATCH 23/70] fix: refactor print_stats to remove do_run attribute and simplify execution flow --- simplyblock_core/services/spdk_http_proxy_server.py | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/simplyblock_core/services/spdk_http_proxy_server.py b/simplyblock_core/services/spdk_http_proxy_server.py index 57bdf533c..5920fa973 100644 --- a/simplyblock_core/services/spdk_http_proxy_server.py +++ b/simplyblock_core/services/spdk_http_proxy_server.py @@ -23,8 +23,7 @@ read_line_time_diff: dict = {} recv_from_spdk_time_diff: dict = {} def print_stats(): - t = threading.current_thread() - while getattr(t, "do_run", True): + while True: try: time.sleep(3) t = time.time_ns() @@ -250,7 +249,6 @@ def run_server(host, port, user, password, is_threading_enabled=False): # encoding user and password key = base64.b64encode((user+':'+password).encode(encoding='ascii')).decode('ascii') print_stats_thread = threading.Thread(target=print_stats, ) - print_stats_thread.do_run = True print_stats_thread.start() wait_for_spdk_ready() try: @@ -262,7 +260,6 @@ def run_server(host, port, user, password, is_threading_enabled=False): except KeyboardInterrupt: logger.info('Shutting down server') httpd.socket.close() - print_stats_thread.do_run = False TIMEOUT = int(get_env_var("TIMEOUT", is_required=False, default=60*5)) From 9eb303292294c927734b58064f7d0a12f638e029 Mon Sep 17 00:00:00 2001 From: hamdykhader Date: Sun, 29 Mar 2026 22:24:48 +0300 Subject: [PATCH 24/70] feat: add --allowed-hosts argument to CLI for volume access control --- simplyblock_cli/cli-reference.yaml | 4 ++++ simplyblock_cli/cli.py | 1 + 2 files changed, 5 insertions(+) diff --git a/simplyblock_cli/cli-reference.yaml b/simplyblock_cli/cli-reference.yaml index f7c3a9a64..910b79c64 100644 --- a/simplyblock_cli/cli-reference.yaml +++ b/simplyblock_cli/cli-reference.yaml @@ -1570,6 +1570,10 @@ commands: dest: replicate type: bool action: store_true + - name: "--allowed-hosts" + help: "Path to JSON file with host NQNs allowed to access this volume's subsystem" + dest: allowed_hosts + type: str - name: qos-set help: "Changes QoS settings for an active logical volume" arguments: diff --git a/simplyblock_cli/cli.py b/simplyblock_cli/cli.py index d8a3bb786..bf95ea662 100644 --- a/simplyblock_cli/cli.py +++ b/simplyblock_cli/cli.py @@ -641,6 +641,7 @@ def init_volume__add(self, subparser): argument = subcommand.add_argument('--data-chunks-per-stripe', help='Erasure coding schema parameter k (distributed raid), default: 1', type=int, default=0, dest='ndcs') argument = subcommand.add_argument('--parity-chunks-per-stripe', help='Erasure coding schema parameter n (distributed raid), default: 1', type=int, default=0, dest='npcs') argument = subcommand.add_argument('--replicate', help='Replicate LVol snapshot', dest='replicate', action='store_true') + argument = subcommand.add_argument('--allowed-hosts', help='Path to JSON file with host NQNs allowed to access this volume\'s subsystem', type=str, dest='allowed_hosts') def init_volume__add_host(self, subparser): subcommand = self.add_sub_command(subparser, 'add-host', 'Add an allowed host NQN to a volume\'s subsystem') From 56725d91abab3523e067b4047e428fac084665e1 Mon Sep 17 00:00:00 2001 From: hamdykhader Date: Sun, 29 Mar 2026 22:44:36 +0300 Subject: [PATCH 25/70] feat: prevent deletion of snapshots in deletion status unless forced (cherry picked from commit 53ddd4586c68ea6b32c625e8f9b82c95c3186d22) --- simplyblock_core/controllers/snapshot_controller.py | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/simplyblock_core/controllers/snapshot_controller.py b/simplyblock_core/controllers/snapshot_controller.py index 19e754254..3ed32c2b0 100644 --- a/simplyblock_core/controllers/snapshot_controller.py +++ b/simplyblock_core/controllers/snapshot_controller.py @@ -324,6 +324,11 @@ def delete(snapshot_uuid, force_delete=False): logger.error(f"Snapshot not found {snapshot_uuid}") return False + if snap.status == SnapShot.STATUS_IN_DELETION: + logger.error(f"Snapshot is in deletion {snapshot_uuid}") + if not force_delete: + return True + # Block deletion if the snapshot's parent volume is being migrated from simplyblock_core.controllers import migration_controller active_mig = migration_controller.get_active_migration_for_lvol( From 29dbe5cb95f7585f5a648f27487c319cc4ae8720 Mon Sep 17 00:00:00 2001 From: geoffrey1330 Date: Mon, 30 Mar 2026 18:25:06 +0100 Subject: [PATCH 26/70] made prometheus image repo and tag configurable --- simplyblock_core/scripts/charts/values.yaml | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/simplyblock_core/scripts/charts/values.yaml b/simplyblock_core/scripts/charts/values.yaml index c622be010..916f14be0 100644 --- a/simplyblock_core/scripts/charts/values.yaml +++ b/simplyblock_core/scripts/charts/values.yaml @@ -110,6 +110,9 @@ prometheus: enabled: true name: simplyblock-prometheus replicaCount: 1 + image: + repository: quay.io/prometheus/prometheus + tag: "" podLabels: app: simplyblock-prometheus podAnnotations: @@ -194,6 +197,12 @@ prometheus: kube-state-metrics: enabled: false + configmapReload: + prometheus: + repository: quay.io/prometheus-operator/prometheus-config-reloader + tag: v0.72.0 + + ingress: enabled: false From 41fa41e6e46ffb454a9a38e435bd2e1f434859e8 Mon Sep 17 00:00:00 2001 From: geoffrey1330 Date: Mon, 30 Mar 2026 18:32:10 +0100 Subject: [PATCH 27/70] made reloader image repo and tag configurable --- simplyblock_core/scripts/charts/values.yaml | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/simplyblock_core/scripts/charts/values.yaml b/simplyblock_core/scripts/charts/values.yaml index 916f14be0..33508635c 100644 --- a/simplyblock_core/scripts/charts/values.yaml +++ b/simplyblock_core/scripts/charts/values.yaml @@ -203,6 +203,12 @@ prometheus: tag: v0.72.0 +reloader: + nameOverride: simplyblock-reloader + fullnameOverride: simplyblock-reloader + image: + repository: ghcr.io/stakater/reloader + tag: v1.3.0 ingress: enabled: false From 459356d9e5d2ed6862d83e9e94822ab9c78b8ffd Mon Sep 17 00:00:00 2001 From: geoffrey1330 Date: Mon, 30 Mar 2026 18:52:23 +0100 Subject: [PATCH 28/70] made opensearch image repo and tag configurable --- simplyblock_core/scripts/charts/values.yaml | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/simplyblock_core/scripts/charts/values.yaml b/simplyblock_core/scripts/charts/values.yaml index 33508635c..6044d41cf 100644 --- a/simplyblock_core/scripts/charts/values.yaml +++ b/simplyblock_core/scripts/charts/values.yaml @@ -1,6 +1,6 @@ observability: - enabled: false + enabled: true secret: "sWbpOgba1bKnCfcPkVQi" deletionInterval: "3d" level: "DEBUG" @@ -73,9 +73,12 @@ opensearch: antiAffinity: "hard" persistence: enabled: true + image: busybox storageClass: local-hostpath size: 20Gi - + image: + repository: "opensearchproject/opensearch" + tag: "" resources: requests: cpu: "100m" @@ -85,8 +88,8 @@ opensearch: memory: "3Gi" extraEnvs: - - name: OPENSEARCH_JAVA_OPTS - value: "-Xms1g -Xmx1g" + # - name: OPENSEARCH_JAVA_OPTS + # value: "-Xms1g -Xmx1g" - name: bootstrap.memory_lock value: "true" - name: action.auto_create_index From 008655e56b2764c74928773b96893acd86ac5c1a Mon Sep 17 00:00:00 2001 From: geoffrey1330 Date: Mon, 30 Mar 2026 19:17:16 +0100 Subject: [PATCH 29/70] made mongodb image repo and tag configurable --- simplyblock_core/scripts/charts/values.yaml | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/simplyblock_core/scripts/charts/values.yaml b/simplyblock_core/scripts/charts/values.yaml index 6044d41cf..d249d7c7f 100644 --- a/simplyblock_core/scripts/charts/values.yaml +++ b/simplyblock_core/scripts/charts/values.yaml @@ -54,6 +54,14 @@ mongodb: limits: cpu: 250m memory: 1Gi + community: + mongodb: + repo: quay.io/mongodb + registry: + operator: quay.io/mongodb + agent: quay.io/mongodb + readinessProbe: quay.io/mongodb + affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: From 410df72cb2e5f785158b357b3de03e20ae7efcf4 Mon Sep 17 00:00:00 2001 From: geoffrey1330 Date: Mon, 30 Mar 2026 21:18:42 +0100 Subject: [PATCH 30/70] made ingress image repo and tag configurable --- simplyblock_core/scripts/charts/values.yaml | 3 +++ 1 file changed, 3 insertions(+) diff --git a/simplyblock_core/scripts/charts/values.yaml b/simplyblock_core/scripts/charts/values.yaml index d249d7c7f..7af1c8bb7 100644 --- a/simplyblock_core/scripts/charts/values.yaml +++ b/simplyblock_core/scripts/charts/values.yaml @@ -231,6 +231,9 @@ ingress: hostNetwork: true dnsPolicy: ClusterFirstWithHostNet replicaCount: 2 + image: + image: ingress-nginx/controller + tag: "v1.10.1" service: type: ClusterIP extraArgs: From cea5313a957c69f2d93bf083d92dd1f34515fc67 Mon Sep 17 00:00:00 2001 From: geoffrey1330 Date: Tue, 31 Mar 2026 09:19:10 +0100 Subject: [PATCH 31/70] made reloader image repo and tag configurable --- simplyblock_core/scripts/charts/values.yaml | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/simplyblock_core/scripts/charts/values.yaml b/simplyblock_core/scripts/charts/values.yaml index 7af1c8bb7..f71374af0 100644 --- a/simplyblock_core/scripts/charts/values.yaml +++ b/simplyblock_core/scripts/charts/values.yaml @@ -217,9 +217,11 @@ prometheus: reloader: nameOverride: simplyblock-reloader fullnameOverride: simplyblock-reloader - image: - repository: ghcr.io/stakater/reloader - tag: v1.3.0 + reloader: + deployment: + image: + name: ghcr.io/stakater/reloader + tag: v1.3.0 ingress: enabled: false From eec05aa804f7cea4690aa6d18d4102dc63106c6f Mon Sep 17 00:00:00 2001 From: geoffrey1330 Date: Tue, 31 Mar 2026 10:18:38 +0100 Subject: [PATCH 32/70] made mongodb image repo and tag configurable --- simplyblock_core/scripts/charts/values.yaml | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/simplyblock_core/scripts/charts/values.yaml b/simplyblock_core/scripts/charts/values.yaml index f71374af0..360a3860e 100644 --- a/simplyblock_core/scripts/charts/values.yaml +++ b/simplyblock_core/scripts/charts/values.yaml @@ -57,10 +57,18 @@ mongodb: community: mongodb: repo: quay.io/mongodb + imageType: ubi8 + agent: + name: mongodb-agent + version: 108.0.2.8729-1 + registry: + agent: quay.io/mongodb + registry: operator: quay.io/mongodb agent: quay.io/mongodb readinessProbe: quay.io/mongodb + versionUpgradeHook: quay.io/mongodb affinity: podAntiAffinity: @@ -234,7 +242,8 @@ ingress: dnsPolicy: ClusterFirstWithHostNet replicaCount: 2 image: - image: ingress-nginx/controller + registry: "quay.io/simplyblock-io" + image: ingress-nginx-controller tag: "v1.10.1" service: type: ClusterIP From 3b60f40ca6def3e7b2bfb5c91f6698bb49699f72 Mon Sep 17 00:00:00 2001 From: geoffrey1330 Date: Tue, 31 Mar 2026 15:15:32 +0100 Subject: [PATCH 33/70] added cluster add param to api v1 and v2 --- simplyblock_web/api/v1/cluster.py | 17 +++++++++++++++-- simplyblock_web/api/v2/cluster.py | 4 ++++ 2 files changed, 19 insertions(+), 2 deletions(-) diff --git a/simplyblock_web/api/v1/cluster.py b/simplyblock_web/api/v1/cluster.py index d31d79582..c36cd2d72 100644 --- a/simplyblock_web/api/v1/cluster.py +++ b/simplyblock_web/api/v1/cluster.py @@ -56,12 +56,18 @@ def add_cluster(): strict_node_anti_affinity = cl_data.get('strict_node_anti_affinity', False) is_single_node = cl_data.get('is_single_node', False) client_data_nic = cl_data.get('client_data_nic', "") + max_fault_tolerance = cl_data.get('max_fault_tolerance', 1) + nvmf_base_port = cl_data.get('nvmf_base_port', 4420) + rpc_base_port = cl_data.get('rpc_base_port', 8080) + snode_api_port = cl_data.get('snode_api_port', 50001) return utils.get_response(cluster_ops.add_cluster( blk_size, page_size_in_blocks, cap_warn, cap_crit, prov_cap_warn, prov_cap_crit, distr_ndcs, distr_npcs, distr_bs, distr_chunk_bs, ha_type, enable_node_affinity, qpair_count, max_queue_size, inflight_io_threshold, strict_node_anti_affinity, is_single_node, name, - cr_name, cr_namespace, cr_plural, fabric, client_data_nic + cr_name, cr_namespace, cr_plural, fabric, client_data_nic=client_data_nic, + max_fault_tolerance=max_fault_tolerance, + nvmf_base_port=nvmf_base_port, rpc_base_port=rpc_base_port, snode_api_port=snode_api_port )) @@ -101,13 +107,20 @@ def create_first_cluster(): cr_plural = cl_data.get('cr_plural', None) cluster_ip = cl_data.get('cluster_ip', None) grafana_secret = cl_data.get('grafana_secret', None) + client_data_nic = cl_data.get('client_data_nic', "") + max_fault_tolerance = cl_data.get('max_fault_tolerance', 1) + nvmf_base_port = cl_data.get('nvmf_base_port', 4420) + rpc_base_port = cl_data.get('rpc_base_port', 8080) + snode_api_port = cl_data.get('snode_api_port', 50001) try: cluster_id = cluster_ops.add_cluster( blk_size, page_size_in_blocks, cap_warn, cap_crit, prov_cap_warn, prov_cap_crit, distr_ndcs, distr_npcs, distr_bs, distr_chunk_bs, ha_type, enable_node_affinity, qpair_count, max_queue_size, inflight_io_threshold, strict_node_anti_affinity, is_single_node, name, - cr_name, cr_namespace, cr_plural, fabric, cluster_ip=cluster_ip, grafana_secret=grafana_secret) + cr_name, cr_namespace, cr_plural, fabric, cluster_ip=cluster_ip, grafana_secret=grafana_secret, + client_data_nic=client_data_nic, max_fault_tolerance=max_fault_tolerance, + nvmf_base_port=nvmf_base_port, rpc_base_port=rpc_base_port, snode_api_port=snode_api_port) if cluster_id: return utils.get_response(db.get_cluster_by_id(cluster_id).to_dict()) else: diff --git a/simplyblock_web/api/v2/cluster.py b/simplyblock_web/api/v2/cluster.py index 9f5b3cd87..afd0d6f56 100644 --- a/simplyblock_web/api/v2/cluster.py +++ b/simplyblock_web/api/v2/cluster.py @@ -54,6 +54,10 @@ class ClusterParams(BaseModel): cluster_ip: str = "" grafana_secret: str = "" client_data_nic: str = "" + max_fault_tolerance: int = 1 + nvmf_base_port: int = 4420 + rpc_base_port: int = 8080 + snode_api_port: int = 50001 @api.get('/', name='clusters:list') From a27aa6ffb8f8e7fc2a856a07ce0694bdab2f319c Mon Sep 17 00:00:00 2001 From: hamdykhader Date: Tue, 31 Mar 2026 17:29:29 +0300 Subject: [PATCH 34/70] Fix sfam-2663 --- simplyblock_core/services/storage_node_monitor.py | 1 + 1 file changed, 1 insertion(+) diff --git a/simplyblock_core/services/storage_node_monitor.py b/simplyblock_core/services/storage_node_monitor.py index 07b21f459..f1cf653c2 100644 --- a/simplyblock_core/services/storage_node_monitor.py +++ b/simplyblock_core/services/storage_node_monitor.py @@ -409,6 +409,7 @@ def check_node(snode): except Exception as e: logger.error("ANA failover for offline node %s failed: %s", snode.get_id(), e) tasks_controller.add_node_to_auto_restart(snode) + return True # 1- check node ping ping_check = health_controller._check_node_ping(snode.mgmt_ip) From de85c8c5c65175a8dba0c51ea83763e9f0ea5d62 Mon Sep 17 00:00:00 2001 From: geoffrey1330 Date: Tue, 31 Mar 2026 15:31:02 +0100 Subject: [PATCH 35/70] updated the cluster crd --- ....simplyblock.io_simplyblockstorageclusters.yaml | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/simplyblock_core/scripts/charts/crds/simplyblock.simplyblock.io_simplyblockstorageclusters.yaml b/simplyblock_core/scripts/charts/crds/simplyblock.simplyblock.io_simplyblockstorageclusters.yaml index cfd99fdee..cfed158ff 100644 --- a/simplyblock_core/scripts/charts/crds/simplyblock.simplyblock.io_simplyblockstorageclusters.yaml +++ b/simplyblock_core/scripts/charts/crds/simplyblock.simplyblock.io_simplyblockstorageclusters.yaml @@ -54,6 +54,8 @@ spec: capWarn: format: int32 type: integer + clientDataNic: + type: string clientQpairCount: format: int32 type: integer @@ -81,12 +83,18 @@ spec: type: integer isSingleNode: type: boolean + maxFaultTolerance: + format: int32 + type: integer maxQueueSize: format: int32 type: integer mgmtIfc: description: Create-only type: string + nvmfBasePort: + format: int32 + type: integer pageSizeInBlocks: format: int32 type: integer @@ -102,6 +110,12 @@ spec: qpairCount: format: int32 type: integer + rpcBasePort: + format: int32 + type: integer + snodeApiPort: + format: int32 + type: integer strictNodeAntiAffinity: type: boolean stripeWdata: From c4930d4b3e282cbc4068143db1638408c31abeea Mon Sep 17 00:00:00 2001 From: geoffrey1330 Date: Tue, 31 Mar 2026 19:37:27 +0100 Subject: [PATCH 36/70] added helm value for cpu topology support --- .../charts/templates/simplyblock_customresource.yaml | 8 ++++++++ simplyblock_core/scripts/charts/values.yaml | 2 ++ 2 files changed, 10 insertions(+) diff --git a/simplyblock_core/scripts/charts/templates/simplyblock_customresource.yaml b/simplyblock_core/scripts/charts/templates/simplyblock_customresource.yaml index 4f2646365..e98680363 100644 --- a/simplyblock_core/scripts/charts/templates/simplyblock_customresource.yaml +++ b/simplyblock_core/scripts/charts/templates/simplyblock_customresource.yaml @@ -118,6 +118,14 @@ spec: coreIsolation: {{ .Values.simplyblock.storageNodes.coreIsolation }} {{- end }} + {{- if hasKey .Values.simplyblock.storageNodes "skipKubeletConfiguration" }} + skipKubeletConfiguration: {{ .Values.simplyblock.storageNodes.skipKubeletConfiguration }} + {{- end }} + + {{- if hasKey .Values.simplyblock.storageNodes "enableCpuTopology" }} + enableCpuTopology: {{ .Values.simplyblock.storageNodes.enableCpuTopology }} + {{- end }} + {{- if .Values.simplyblock.storageNodes.workerNodes }} workerNodes: {{- range .Values.simplyblock.storageNodes.workerNodes }} diff --git a/simplyblock_core/scripts/charts/values.yaml b/simplyblock_core/scripts/charts/values.yaml index 360a3860e..60d794513 100644 --- a/simplyblock_core/scripts/charts/values.yaml +++ b/simplyblock_core/scripts/charts/values.yaml @@ -293,6 +293,8 @@ simplyblock: spdkDebug: false spdkImage: coreIsolation: false + skipKubeletConfiguration: false + enableCpuTopology: false workerNodes: From e3b3f6be8d28badd0a2545cf3d779b6043771778 Mon Sep 17 00:00:00 2001 From: hamdykhader Date: Wed, 1 Apr 2026 15:27:59 +0300 Subject: [PATCH 37/70] refactor: update parameters in bdev_lvol_transfer for clarity --- simplyblock_core/services/snapshot_replication.py | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/simplyblock_core/services/snapshot_replication.py b/simplyblock_core/services/snapshot_replication.py index 76b9ab84d..f8ba7d662 100644 --- a/simplyblock_core/services/snapshot_replication.py +++ b/simplyblock_core/services/snapshot_replication.py @@ -96,10 +96,10 @@ def process_snap_replicate_start(task, snapshot): offset = task.function_params["offset"] # 3 start replication snode.rpc_client().bdev_lvol_transfer( - lvol_name=snapshot.snap_bdev, + name=snapshot.snap_bdev, offset=offset, - cluster_batch=16, - gateway=f"{remote_lv.top_bdev}n1", + batch_size=16, + bdev_name=f"{remote_lv.top_bdev}n1", operation="replicate" ) task.status = JobSchedule.STATUS_RUNNING From d618ec2e2a3641d2b5a818d33f6cf56077c1f0c1 Mon Sep 17 00:00:00 2001 From: hamdykhader Date: Wed, 1 Apr 2026 18:25:02 +0300 Subject: [PATCH 38/70] feat: enhance volume deletion logic to handle already deleted volumes --- simplyblock_core/controllers/lvol_controller.py | 13 ++++++++----- simplyblock_web/api/v2/volume.py | 3 +++ 2 files changed, 11 insertions(+), 5 deletions(-) diff --git a/simplyblock_core/controllers/lvol_controller.py b/simplyblock_core/controllers/lvol_controller.py index 4242039bd..19195b1b5 100644 --- a/simplyblock_core/controllers/lvol_controller.py +++ b/simplyblock_core/controllers/lvol_controller.py @@ -1046,17 +1046,15 @@ def delete_lvol(id_or_name, force_delete=False): if lvol.status == LVol.STATUS_RESTORING and not force_delete: logger.error(f"Cannot delete lvol {lvol.uuid}: backup restore in progress") return False + if lvol.status == LVol.STATUS_DELETED: + logger.error(f"lvol {lvol.uuid}: deleted already") + return False if lvol.status == LVol.STATUS_IN_DELETION: logger.info(f"lvol:{lvol.get_id()} status is in deletion") if not force_delete: return True - pool = db_controller.get_pool_by_id(lvol.pool_uuid) - if pool.status == Pool.STATUS_INACTIVE: - logger.error("Pool is disabled") - return False - logger.debug(lvol) try: snode = db_controller.get_storage_node_by_id(lvol.node_id) @@ -1084,6 +1082,11 @@ def delete_lvol(id_or_name, force_delete=False): logger.info("Done") return True + pool = db_controller.get_pool_by_id(lvol.pool_uuid) + if pool.status == Pool.STATUS_INACTIVE: + logger.error("Pool is disabled") + return False + if lvol.ha_type == 'single': if snode.status != StorageNode.STATUS_ONLINE: logger.error(f"Node status is not online, node: {snode.get_id()}, status: {snode.status}") diff --git a/simplyblock_web/api/v2/volume.py b/simplyblock_web/api/v2/volume.py index 3f67e7d1e..1c642d982 100644 --- a/simplyblock_web/api/v2/volume.py +++ b/simplyblock_web/api/v2/volume.py @@ -174,6 +174,9 @@ def update(cluster: Cluster, pool: StoragePool, volume: Volume, body: UpdatableL @instance_api.delete('/', name='clusters:storage-pools:volumes:delete', status_code=204, responses={204: {"content": None}}) def delete(cluster: Cluster, pool: StoragePool, volume: Volume) -> Response: + if volume.status == LVol.STATUS_DELETED: + return Response(status_code=404) + if not lvol_controller.delete_lvol(volume.get_id()): raise ValueError('Failed to delete volume') From e8348a234a60f5110fac82bda8b72d83fb75c787 Mon Sep 17 00:00:00 2001 From: hamdykhader Date: Wed, 1 Apr 2026 19:24:24 +0300 Subject: [PATCH 39/70] fix: correct API decorator for replicate_lvol_on_source_cluster endpoint --- simplyblock_web/api/v2/volume.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/simplyblock_web/api/v2/volume.py b/simplyblock_web/api/v2/volume.py index 1c642d982..83eb096e6 100644 --- a/simplyblock_web/api/v2/volume.py +++ b/simplyblock_web/api/v2/volume.py @@ -313,7 +313,7 @@ class ReplicateLVolParams(BaseModel): lvol_id: Optional[str] = None -@instance_api.post('/replicate_lvol_on_source_cluster', name='clusters:storage-pools:replicate_lvol_on_source_cluster') +@api.post('/replicate_lvol_on_source_cluster', name='clusters:storage-pools:replicate_lvol_on_source_cluster') def replicate_lvol_on_source_cluster(cluster: Cluster, pool: StoragePool, body: ReplicateLVolParams): return lvol_controller.replicate_lvol_on_source_cluster(body.lvol_id, cluster.get_id(), pool.get_id()) From ca5bf76d0d21fe1904e5313d9cd1dd1e1d0838d2 Mon Sep 17 00:00:00 2001 From: hamdykhader Date: Wed, 1 Apr 2026 20:44:51 +0300 Subject: [PATCH 40/70] fix: handle KeyError when retrieving source node in replicate_lvol_on_source_cluster --- simplyblock_core/controllers/lvol_controller.py | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/simplyblock_core/controllers/lvol_controller.py b/simplyblock_core/controllers/lvol_controller.py index 19195b1b5..ff34fc53e 100644 --- a/simplyblock_core/controllers/lvol_controller.py +++ b/simplyblock_core/controllers/lvol_controller.py @@ -2487,9 +2487,13 @@ def replicate_lvol_on_source_cluster(lvol_id, cluster_id=None, pool_uuid=None): logger.error(e) return False - source_node = db_controller.get_storage_node_by_id(lvol.node_id) + source_node = None new_source_cluster = None - if cluster_id and source_node.cluster_id == cluster_id: + try: + source_node = db_controller.get_storage_node_by_id(lvol.node_id) + except KeyError: + pass + if cluster_id and (source_node is None or source_node.cluster_id != cluster_id): new_source_cluster = db_controller.get_cluster_by_id(cluster_id) if new_source_cluster.status != Cluster.STATUS_ACTIVE: logger.error(f"Cluster is not active: {cluster_id}") From 29fc47f0fd892f36341012bbeb87294a1fbc3ac5 Mon Sep 17 00:00:00 2001 From: wmousa Date: Thu, 2 Apr 2026 01:03:31 +0200 Subject: [PATCH 41/70] Fix lvol_poller_mask issue --- simplyblock_core/utils/__init__.py | 3 ++- simplyblock_web/api/internal/storage_node/docker.py | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/simplyblock_core/utils/__init__.py b/simplyblock_core/utils/__init__.py index 8c2d0612e..874427700 100644 --- a/simplyblock_core/utils/__init__.py +++ b/simplyblock_core/utils/__init__.py @@ -3074,4 +3074,5 @@ def recalculate_cores_distribution(cores, number_of_alcemls): "alceml_cpu_cores": get_core_indexes(core_to_index, distribution[3]), "alceml_worker_cpu_cores": get_core_indexes(core_to_index, distribution[4]), "distrib_cpu_cores": get_core_indexes(core_to_index, distribution[5]), - "jc_singleton_core": get_core_indexes(core_to_index, distribution[6])} + "jc_singleton_core": get_core_indexes(core_to_index, distribution[6]), + "lvol_poller_core": get_core_indexes(core_to_index, distribution[7])} diff --git a/simplyblock_web/api/internal/storage_node/docker.py b/simplyblock_web/api/internal/storage_node/docker.py index f93cb925b..cd13dbe48 100644 --- a/simplyblock_web/api/internal/storage_node/docker.py +++ b/simplyblock_web/api/internal/storage_node/docker.py @@ -848,5 +848,6 @@ def recalculate_cores_distribution(body: CoresParams): "alceml_cpu_cores": distribution["alceml_cpu_cores"], "alceml_worker_cpu_cores": distribution["alceml_worker_cpu_cores"], "distrib_cpu_cores": distribution["distrib_cpu_cores"], - "jc_singleton_core": distribution["jc_singleton_core"]}) + "jc_singleton_core": distribution["jc_singleton_core"], + "lvol_poller_core": distribution["lvol_poller_core"]}) return resp From c03c8d38e14fbb9352b32347833bad1fa5f3e16b Mon Sep 17 00:00:00 2001 From: geoffrey1330 Date: Thu, 2 Apr 2026 13:06:06 +0100 Subject: [PATCH 42/70] fix: move reloader annotations to Deployment metadata in app_k8s.yaml --- .../scripts/charts/templates/app_k8s.yaml | 20 +++++++++++-------- 1 file changed, 12 insertions(+), 8 deletions(-) diff --git a/simplyblock_core/scripts/charts/templates/app_k8s.yaml b/simplyblock_core/scripts/charts/templates/app_k8s.yaml index ce0809db0..665d70018 100644 --- a/simplyblock_core/scripts/charts/templates/app_k8s.yaml +++ b/simplyblock_core/scripts/charts/templates/app_k8s.yaml @@ -5,6 +5,9 @@ kind: Deployment metadata: name: simplyblock-admin-control namespace: {{ .Release.Namespace }} + annotations: + reloader.stakater.com/auto: "true" + reloader.stakater.com/configmap: "simplyblock-fdb-cluster-config" spec: replicas: 2 selector: @@ -14,8 +17,6 @@ spec: metadata: annotations: log-collector/enabled: "true" - reloader.stakater.com/auto: "true" - reloader.stakater.com/configmap: "simplyblock-fdb-cluster-config" labels: app: simplyblock-admin-control spec: @@ -101,6 +102,9 @@ kind: Deployment metadata: name: simplyblock-webappapi namespace: {{ .Release.Namespace }} + annotations: + reloader.stakater.com/auto: "true" + reloader.stakater.com/configmap: "simplyblock-fdb-cluster-config" spec: replicas: 2 selector: @@ -110,8 +114,6 @@ spec: metadata: annotations: log-collector/enabled: "true" - reloader.stakater.com/auto: "true" - reloader.stakater.com/configmap: "simplyblock-fdb-cluster-config" labels: app: simplyblock-webappapi spec: @@ -223,6 +225,9 @@ kind: Deployment metadata: name: simplyblock-monitoring namespace: {{ .Release.Namespace }} + annotations: + reloader.stakater.com/auto: "true" + reloader.stakater.com/configmap: "simplyblock-fdb-cluster-config" spec: replicas: 1 selector: @@ -232,8 +237,6 @@ spec: metadata: annotations: log-collector/enabled: "true" - reloader.stakater.com/auto: "true" - reloader.stakater.com/configmap: "simplyblock-fdb-cluster-config" labels: app: simplyblock-monitoring spec: @@ -465,6 +468,9 @@ kind: Deployment metadata: name: simplyblock-tasks namespace: {{ .Release.Namespace }} + annotations: + reloader.stakater.com/auto: "true" + reloader.stakater.com/configmap: "simplyblock-fdb-cluster-config" spec: replicas: 1 selector: @@ -474,8 +480,6 @@ spec: metadata: annotations: log-collector/enabled: "true" - reloader.stakater.com/auto: "true" - reloader.stakater.com/configmap: "simplyblock-fdb-cluster-config" labels: app: simplyblock-tasks spec: From 607426b9e44afb3ebcec0b13924df0cc9dc30af0 Mon Sep 17 00:00:00 2001 From: hamdykhader Date: Thu, 2 Apr 2026 15:20:50 +0300 Subject: [PATCH 43/70] fix: remove unnecessary status check for storage node monitoring (cherry picked from commit 33ee82577b05861d25cbd938c21afa4dc22d40fe) --- simplyblock_core/services/storage_node_monitor.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/simplyblock_core/services/storage_node_monitor.py b/simplyblock_core/services/storage_node_monitor.py index f1cf653c2..351bf8f22 100644 --- a/simplyblock_core/services/storage_node_monitor.py +++ b/simplyblock_core/services/storage_node_monitor.py @@ -390,7 +390,7 @@ def check_node(snode): if snode.status not in [StorageNode.STATUS_ONLINE, StorageNode.STATUS_UNREACHABLE, StorageNode.STATUS_SCHEDULABLE, StorageNode.STATUS_DOWN, - StorageNode.STATUS_IN_SHUTDOWN, StorageNode.STATUS_OFFLINE]: + StorageNode.STATUS_OFFLINE]: logger.info(f"Node status is: {snode.status}, skipping") return False From f4c75227d7a9204785b6e07f8d504d353a4faa6a Mon Sep 17 00:00:00 2001 From: geoffrey1330 Date: Thu, 2 Apr 2026 13:45:34 +0100 Subject: [PATCH 44/70] updated the webapi matchLabels --- simplyblock_core/scripts/charts/templates/app_k8s.yaml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/simplyblock_core/scripts/charts/templates/app_k8s.yaml b/simplyblock_core/scripts/charts/templates/app_k8s.yaml index 665d70018..8d08cd535 100644 --- a/simplyblock_core/scripts/charts/templates/app_k8s.yaml +++ b/simplyblock_core/scripts/charts/templates/app_k8s.yaml @@ -123,7 +123,7 @@ spec: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchLabels: - app: simplyblock-admin-control + app: simplyblock-webappapi topologyKey: kubernetes.io/hostname {{- if .Values.nodeSelector.create }} From 7e610b018a29c1f3f8e7d731c91ce43474595f74 Mon Sep 17 00:00:00 2001 From: RaunakJalan Date: Fri, 3 Apr 2026 03:33:07 +0530 Subject: [PATCH 45/70] Fixing security backup restore issue --- simplyblock_core/controllers/backup_controller.py | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/simplyblock_core/controllers/backup_controller.py b/simplyblock_core/controllers/backup_controller.py index e614fe411..82059c9f6 100644 --- a/simplyblock_core/controllers/backup_controller.py +++ b/simplyblock_core/controllers/backup_controller.py @@ -449,6 +449,7 @@ def restore_backup(backup_id, lvol_name, pool_id_or_name, cluster_id=None, if not target_node.lvstore: return None, f"Target node {restore_node_id} has no lvstore (S3 bdev requires lvstore)" + logger.info(f"Backup allowed hosts: {backup.allowed_hosts}") lvol_id, error = lvol_controller.add_lvol_ha( name=lvol_name, size=size, @@ -466,7 +467,8 @@ def restore_backup(backup_id, lvol_name, pool_id_or_name, cluster_id=None, use_comp=False, distr_vuid=0, lvol_priority_class=0, - allowed_hosts=backup.allowed_hosts, + allowed_hosts=[h["nqn"] if isinstance(h, dict) else h + for h in (backup.allowed_hosts or [])] or None, fabric="tcp", ) if error or not lvol_id: From 9834b62c07d92066b68c5c73d23e38bae618ca1e Mon Sep 17 00:00:00 2001 From: RaunakJalan Date: Fri, 3 Apr 2026 04:33:41 +0530 Subject: [PATCH 46/70] Revert "Fixing security backup restore issue" This reverts commit 7e610b018a29c1f3f8e7d731c91ce43474595f74. --- simplyblock_core/controllers/backup_controller.py | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/simplyblock_core/controllers/backup_controller.py b/simplyblock_core/controllers/backup_controller.py index 82059c9f6..e614fe411 100644 --- a/simplyblock_core/controllers/backup_controller.py +++ b/simplyblock_core/controllers/backup_controller.py @@ -449,7 +449,6 @@ def restore_backup(backup_id, lvol_name, pool_id_or_name, cluster_id=None, if not target_node.lvstore: return None, f"Target node {restore_node_id} has no lvstore (S3 bdev requires lvstore)" - logger.info(f"Backup allowed hosts: {backup.allowed_hosts}") lvol_id, error = lvol_controller.add_lvol_ha( name=lvol_name, size=size, @@ -467,8 +466,7 @@ def restore_backup(backup_id, lvol_name, pool_id_or_name, cluster_id=None, use_comp=False, distr_vuid=0, lvol_priority_class=0, - allowed_hosts=[h["nqn"] if isinstance(h, dict) else h - for h in (backup.allowed_hosts or [])] or None, + allowed_hosts=backup.allowed_hosts, fabric="tcp", ) if error or not lvol_id: From 596d2c3c7c08549566f04496c0e8db75514b12a6 Mon Sep 17 00:00:00 2001 From: hamdykhader Date: Fri, 3 Apr 2026 16:19:10 +0300 Subject: [PATCH 47/70] fix: handle KeyError when retrieving original snapshot for excluded nodes --- simplyblock_core/controllers/lvol_controller.py | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/simplyblock_core/controllers/lvol_controller.py b/simplyblock_core/controllers/lvol_controller.py index 6f3fed266..bb405e522 100644 --- a/simplyblock_core/controllers/lvol_controller.py +++ b/simplyblock_core/controllers/lvol_controller.py @@ -2181,8 +2181,11 @@ def replication_start(lvol_id, replication_cluster_id=None): if lvol.cloned_from_snap: lvol_snap = db_controller.get_snapshot_by_id(lvol.cloned_from_snap) if lvol_snap.source_replicated_snap_uuid: - org_snap = db_controller.get_snapshot_by_id(lvol_snap.source_replicated_snap_uuid) - excluded_nodes.append(org_snap.lvol.node_id) + try: + org_snap = db_controller.get_snapshot_by_id(lvol_snap.source_replicated_snap_uuid) + excluded_nodes.append(org_snap.lvol.node_id) + except KeyError: + pass snode = db_controller.get_storage_node_by_id(lvol.node_id) cluster = db_controller.get_cluster_by_id(snode.cluster_id) if not replication_cluster_id: From 01a9c0a35f4c019c97fdb1bd4ba262f3b25e6509 Mon Sep 17 00:00:00 2001 From: hamdykhader Date: Fri, 3 Apr 2026 17:22:50 +0300 Subject: [PATCH 48/70] fix: remove unnecessary parameter from nvmf_subsystem_add_ns call --- simplyblock_core/storage_node_ops.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/simplyblock_core/storage_node_ops.py b/simplyblock_core/storage_node_ops.py index 753fd21dd..c328d4825 100644 --- a/simplyblock_core/storage_node_ops.py +++ b/simplyblock_core/storage_node_ops.py @@ -4074,7 +4074,7 @@ def add_lvol_thread(lvol, snode, lvol_ana_state="optimized"): return False, msg logger.info("Add BDev to subsystem "+f"{lvol.vuid:016X}") - ret = rpc_client.nvmf_subsystem_add_ns(lvol.nqn, lvol.top_bdev, lvol.uuid, lvol.guid, nsid=lvol.ns_id, eui64=f"{lvol.vuid:016X}") + ret = rpc_client.nvmf_subsystem_add_ns(lvol.nqn, lvol.top_bdev, lvol.uuid, lvol.guid, nsid=lvol.ns_id) # Use per-lvstore port for this lvol's lvstore listener_port = snode.get_lvol_subsys_port(lvol.lvs_name) for iface in snode.data_nics: From f59bb41a662bbedbbe7369a2ca34b0c060c24202 Mon Sep 17 00:00:00 2001 From: hamdykhader Date: Fri, 3 Apr 2026 18:29:37 +0300 Subject: [PATCH 49/70] fix: update snapshot instance access to use dictionary syntax --- simplyblock_core/controllers/tasks_controller.py | 2 +- simplyblock_core/services/snapshot_replication.py | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/simplyblock_core/controllers/tasks_controller.py b/simplyblock_core/controllers/tasks_controller.py index fb4966af1..8c6b0fc36 100644 --- a/simplyblock_core/controllers/tasks_controller.py +++ b/simplyblock_core/controllers/tasks_controller.py @@ -530,7 +530,7 @@ def add_backup_merge_task(cluster_id, node_id, keep_backup_id, old_backup_id): def _check_snap_instance_on_node(snapshot_id: str , node_id: str): snapshot = db.get_snapshot_by_id(snapshot_id) for sn_inst in snapshot.instances: - if sn_inst.lvol.node_id == node_id: + if sn_inst["lvol"]["node_id"] == node_id: logger.info("Snapshot instance found on node, skip adding replication task") return diff --git a/simplyblock_core/services/snapshot_replication.py b/simplyblock_core/services/snapshot_replication.py index f8ba7d662..8c244c505 100644 --- a/simplyblock_core/services/snapshot_replication.py +++ b/simplyblock_core/services/snapshot_replication.py @@ -158,7 +158,7 @@ def process_snap_replicate_finish(task, snapshot): try: prev_snap = db.get_snapshot_by_id(snapshot.snap_ref_id) for sn_inst in prev_snap.instances: - if sn_inst.lvol.node_id == remote_snode.get_id(): + if sn_inst["lvol"]["node_id"] == remote_snode.get_id(): target_prev_snap = sn_inst break except KeyError as e: From e239b8744c4a6c8eda29ac2f8baed0cbd98a837d Mon Sep 17 00:00:00 2001 From: hamdykhader Date: Sun, 5 Apr 2026 16:56:06 +0300 Subject: [PATCH 50/70] Fix sfam-2678 --- simplyblock_cli/clibase.py | 2 +- simplyblock_core/controllers/snapshot_controller.py | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/simplyblock_cli/clibase.py b/simplyblock_cli/clibase.py index d4ed091f0..4cb7e844f 100644 --- a/simplyblock_cli/clibase.py +++ b/simplyblock_cli/clibase.py @@ -814,7 +814,7 @@ def snapshot__get(self, sub_command, args): return snapshot_controller.get(args.snapshot_id) def snapshot__set(self, sub_command, args): - return snapshot_controller.set(args.snapshot_id, args.attr_name, args.attr_value) + return snapshot_controller.set_value(args.snapshot_id, args.attr_name, args.attr_value) def qos__add(self, sub_command, args): return qos_controller.add_class(args.name, args.weight, args.cluster_id) diff --git a/simplyblock_core/controllers/snapshot_controller.py b/simplyblock_core/controllers/snapshot_controller.py index 141db2502..938b986ef 100644 --- a/simplyblock_core/controllers/snapshot_controller.py +++ b/simplyblock_core/controllers/snapshot_controller.py @@ -833,7 +833,7 @@ def get(snapshot_uuid): return json.dumps(snap.get_clean_dict(), indent=2) -def set(snapshot_uuid, attr, value) -> bool: +def set_value(snapshot_uuid, attr, value) -> bool: try: snap = db_controller.get_snapshot_by_id(snapshot_uuid) except KeyError: From 4f6a24bd10fb7df0d671056620a56d2f42791173 Mon Sep 17 00:00:00 2001 From: geoffrey1330 Date: Mon, 6 Apr 2026 09:59:05 +0100 Subject: [PATCH 51/70] added a fix for sfam-2677 --- simplyblock_core/controllers/lvol_events.py | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/simplyblock_core/controllers/lvol_events.py b/simplyblock_core/controllers/lvol_events.py index b0116061a..a5c079a2d 100644 --- a/simplyblock_core/controllers/lvol_events.py +++ b/simplyblock_core/controllers/lvol_events.py @@ -22,7 +22,10 @@ def _lvol_event(lvol, message, caused_by, event): node_id=lvol.get_id()) if cluster.mode == "kubernetes": pool = db_controller.get_pool_by_id(lvol.pool_uuid) - + + if not pool.lvols_cr_name: + return + if event == ec.EVENT_OBJ_CREATED: crypto_key=( (lvol.crypto_key1, lvol.crypto_key2) From 2a622c709f4b600767180aaa586a6ec18693f600 Mon Sep 17 00:00:00 2001 From: geoffrey1330 Date: Mon, 6 Apr 2026 11:00:52 +0100 Subject: [PATCH 52/70] added check if master lvol is namespaced --- simplyblock_core/controllers/snapshot_controller.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/simplyblock_core/controllers/snapshot_controller.py b/simplyblock_core/controllers/snapshot_controller.py index 938b986ef..73e3d60d9 100644 --- a/simplyblock_core/controllers/snapshot_controller.py +++ b/simplyblock_core/controllers/snapshot_controller.py @@ -590,7 +590,7 @@ def clone(snapshot_id, clone_name, new_size=0, pvc_name=None, pvc_namespace=None else: master_lvol = source_lvol - if master_lvol.max_namespace_per_subsys > 1: + if master_lvol.namespace and master_lvol.max_namespace_per_subsys > 1 : # Count how many lvols currently share this master's subsystem ns_count = 0 for lv in db_controller.get_lvols(cluster.get_id()): From e4d84fe491ef584e7c032a9d783f36af812b262e Mon Sep 17 00:00:00 2001 From: geoffrey1330 Date: Mon, 6 Apr 2026 14:29:05 +0100 Subject: [PATCH 53/70] updated logic for calculate_unisolated_cores --- simplyblock_core/utils/__init__.py | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/simplyblock_core/utils/__init__.py b/simplyblock_core/utils/__init__.py index 874427700..f92177d28 100644 --- a/simplyblock_core/utils/__init__.py +++ b/simplyblock_core/utils/__init__.py @@ -1436,8 +1436,7 @@ def calculate_unisolated_cores(cores, cores_percentage=0): total = len(cores) if cores_percentage: n = math.ceil(total * (100 - cores_percentage) / 100) - n_even = (n + 1) // 2 * 2 - return n_even + return n if total <= 10: return 2 if total <= 20: @@ -1445,8 +1444,7 @@ def calculate_unisolated_cores(cores, cores_percentage=0): if total <= 28: return 4 n = math.ceil(total * 0.15) - n_even = (n + 1) // 2 * 2 - return n_even + return n def get_core_indexes(core_to_index, list_of_cores): From f88574ae481652e8cd16f229d0d948ed8a26e9ba Mon Sep 17 00:00:00 2001 From: geoffrey1330 Date: Tue, 7 Apr 2026 12:10:02 +0100 Subject: [PATCH 54/70] make storageclass configurable and hostpath optional --- .../charts/templates/csi-hostpath-driverinfo.yaml | 3 ++- .../charts/templates/csi-hostpath-plugin.yaml | 3 ++- .../scripts/charts/templates/foundationdb.yaml | 4 +++- .../scripts/charts/templates/mongodb.yaml | 8 ++++++-- .../charts/templates/monitoring_configmap.yaml | 11 ----------- .../scripts/charts/templates/monitoring_k8s.yaml | 2 -- .../scripts/charts/templates/storage_class.yaml | 4 +++- simplyblock_core/scripts/charts/values.yaml | 13 ++++++++----- 8 files changed, 24 insertions(+), 24 deletions(-) diff --git a/simplyblock_core/scripts/charts/templates/csi-hostpath-driverinfo.yaml b/simplyblock_core/scripts/charts/templates/csi-hostpath-driverinfo.yaml index 2a9d7d044..d0c4adea5 100644 --- a/simplyblock_core/scripts/charts/templates/csi-hostpath-driverinfo.yaml +++ b/simplyblock_core/scripts/charts/templates/csi-hostpath-driverinfo.yaml @@ -1,3 +1,4 @@ +{{- if .Values.csiHostpathDriver.enabled }} apiVersion: storage.k8s.io/v1 kind: CSIDriver metadata: @@ -21,4 +22,4 @@ spec: # Kubernetes may use fsGroup to change permissions and ownership # of the volume to match user requested fsGroup in the pod's SecurityPolicy fsGroupPolicy: File - \ No newline at end of file +{{- end }} diff --git a/simplyblock_core/scripts/charts/templates/csi-hostpath-plugin.yaml b/simplyblock_core/scripts/charts/templates/csi-hostpath-plugin.yaml index aa645bff4..c685cb49f 100644 --- a/simplyblock_core/scripts/charts/templates/csi-hostpath-plugin.yaml +++ b/simplyblock_core/scripts/charts/templates/csi-hostpath-plugin.yaml @@ -1,3 +1,4 @@ +{{- if .Values.csiHostpathDriver.enabled }} apiVersion: v1 kind: ServiceAccount metadata: @@ -229,4 +230,4 @@ spec: path: /dev type: Directory name: dev-dir - \ No newline at end of file +{{- end }} diff --git a/simplyblock_core/scripts/charts/templates/foundationdb.yaml b/simplyblock_core/scripts/charts/templates/foundationdb.yaml index 96d1c1979..35ca075f3 100644 --- a/simplyblock_core/scripts/charts/templates/foundationdb.yaml +++ b/simplyblock_core/scripts/charts/templates/foundationdb.yaml @@ -284,7 +284,9 @@ spec: runAsUser: 0 volumeClaimTemplate: spec: - storageClassName: local-hostpath + {{- if .Values.storageclass.name }} + storageClassName: {{ .Values.storageclass.name }} + {{- end }} accessModes: - ReadWriteOnce resources: diff --git a/simplyblock_core/scripts/charts/templates/mongodb.yaml b/simplyblock_core/scripts/charts/templates/mongodb.yaml index 6c004f314..399768719 100644 --- a/simplyblock_core/scripts/charts/templates/mongodb.yaml +++ b/simplyblock_core/scripts/charts/templates/mongodb.yaml @@ -15,7 +15,9 @@ spec: name: data-volume spec: accessModes: [ "ReadWriteOnce" ] - storageClassName: local-hostpath + {{- if .Values.storageclass.name }} + storageClassName: {{ .Values.storageclass.name }} + {{- end }} resources: requests: storage: 5Gi @@ -23,7 +25,9 @@ spec: name: logs-volume spec: accessModes: [ "ReadWriteOnce" ] - storageClassName: local-hostpath + {{- if .Values.storageclass.name }} + storageClassName: {{ .Values.storageclass.name }} + {{- end }} resources: requests: storage: 5Gi diff --git a/simplyblock_core/scripts/charts/templates/monitoring_configmap.yaml b/simplyblock_core/scripts/charts/templates/monitoring_configmap.yaml index bc20ffb9d..25ef6faab 100644 --- a/simplyblock_core/scripts/charts/templates/monitoring_configmap.yaml +++ b/simplyblock_core/scripts/charts/templates/monitoring_configmap.yaml @@ -69,17 +69,6 @@ data: access: proxy uid: PBFA97CFB590B2093 editable: true - - name: GRAYLOG - type: elasticsearch - url: http://opensearch-cluster-master:9200 - isDefault: false - access: proxy - uid: graylog_uid - editable: true - jsonData: - index: '*' - interval: Hourly - timeField: 'timestamp' --- apiVersion: v1 diff --git a/simplyblock_core/scripts/charts/templates/monitoring_k8s.yaml b/simplyblock_core/scripts/charts/templates/monitoring_k8s.yaml index 831c848d8..99814293b 100644 --- a/simplyblock_core/scripts/charts/templates/monitoring_k8s.yaml +++ b/simplyblock_core/scripts/charts/templates/monitoring_k8s.yaml @@ -432,8 +432,6 @@ spec: value: "true" - name: GF_PATHS_PROVISIONING value: "/etc/grafana/provisioning" - - name: GF_INSTALL_PLUGINS - value: "grafana-opensearch-datasource" - name: GF_SERVER_ROOT_URL value: "http://localhost/grafana" volumeMounts: diff --git a/simplyblock_core/scripts/charts/templates/storage_class.yaml b/simplyblock_core/scripts/charts/templates/storage_class.yaml index b23cb4a07..274527230 100644 --- a/simplyblock_core/scripts/charts/templates/storage_class.yaml +++ b/simplyblock_core/scripts/charts/templates/storage_class.yaml @@ -1,8 +1,9 @@ +{{- if .Values.csiHostpathDriver.enabled }} --- apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: - name: local-hostpath + name: {{ .Values.storageclass.name | default "local-hostpath" }} labels: app.kubernetes.io/instance: hostpath.csi.k8s.io app.kubernetes.io/part-of: csi-driver-host-path @@ -21,3 +22,4 @@ allowedTopologies: - {{ . }} {{- end }} {{- end }} +{{- end }} diff --git a/simplyblock_core/scripts/charts/values.yaml b/simplyblock_core/scripts/charts/values.yaml index 60d794513..76de4ce82 100644 --- a/simplyblock_core/scripts/charts/values.yaml +++ b/simplyblock_core/scripts/charts/values.yaml @@ -1,4 +1,3 @@ - observability: enabled: true secret: "sWbpOgba1bKnCfcPkVQi" @@ -38,9 +37,14 @@ nodeSelector: ports: lvolNvmfPortStart: 9100 -storageclass: +csiHostpathDriver: + enabled: false + +storageclass: + name: allowedTopologyZones: [] + foundationdb: multiAZ: false @@ -90,7 +94,7 @@ opensearch: persistence: enabled: true image: busybox - storageClass: local-hostpath + storageClass: size: 20Gi image: repository: "opensearchproject/opensearch" @@ -164,7 +168,7 @@ prometheus: persistentVolume: enabled: true size: 5Gi - storageClass: local-hostpath + storageClass: extraArgs: storage.tsdb.min-block-duration: 2h storage.tsdb.max-block-duration: 2h @@ -297,7 +301,6 @@ simplyblock: enableCpuTopology: false workerNodes: - devices: name: simplyblock-devices From 6958d4a345a9a9c63b4d5c8a60650fe1fca7eeda Mon Sep 17 00:00:00 2001 From: hamdykhader Date: Tue, 7 Apr 2026 21:15:41 +0300 Subject: [PATCH 55/70] fix: enhance snapshot retrieval for replication by checking target cluster tasks --- .../controllers/lvol_controller.py | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/simplyblock_core/controllers/lvol_controller.py b/simplyblock_core/controllers/lvol_controller.py index bb405e522..630060318 100644 --- a/simplyblock_core/controllers/lvol_controller.py +++ b/simplyblock_core/controllers/lvol_controller.py @@ -2534,6 +2534,25 @@ def replicate_lvol_on_source_cluster(lvol_id, cluster_id=None, pool_uuid=None): snaps = sorted(snaps, key=lambda x: x.created_at) snapshot = snaps[-1] + if not snapshot: + target_node = db_controller.get_storage_node_by_id(lvol.replication_node_id) + logger.info(f"Looking for snapshot in target cluster: {target_node.cluster_id}") + for task in db_controller.get_job_tasks(target_node.cluster_id): + if task.function_name == JobSchedule.FN_SNAPSHOT_REPLICATION: + logger.debug(task) + try: + snap = db_controller.get_snapshot_by_id(task.function_params["snapshot_id"]) + except KeyError: + continue + + if snap.lvol.get_id() != lvol_id: + continue + snaps.append(snap) + + if snaps: + snaps = sorted(snaps, key=lambda x: x.created_at) + snapshot = snaps[-1] + if not snapshot: logger.error(f"Snapshot for replication not found for lvol: {lvol_id}") return False From a60e6b03a5e82c1155eddd0c9dbef9505fbbf606 Mon Sep 17 00:00:00 2001 From: geoffrey1330 Date: Wed, 8 Apr 2026 10:59:35 +0100 Subject: [PATCH 56/70] added script to collect k8s pod logs --- simplyblock_core/scripts/collect_pod_logs.sh | 160 +++++++++++++++++++ 1 file changed, 160 insertions(+) create mode 100755 simplyblock_core/scripts/collect_pod_logs.sh diff --git a/simplyblock_core/scripts/collect_pod_logs.sh b/simplyblock_core/scripts/collect_pod_logs.sh new file mode 100755 index 000000000..472f76a06 --- /dev/null +++ b/simplyblock_core/scripts/collect_pod_logs.sh @@ -0,0 +1,160 @@ +#!/bin/bash +set -euo pipefail + +usage() { + echo "Usage: $0 -n -f -d [-o ]" + echo "" + echo " -n Kubernetes namespace" + echo " -f Start time (RFC3339 or relative, e.g. '2026-04-08T10:00:00Z' or '2h' or '30m')" + echo " -d Duration to collect from start time (e.g. '30m', '1h', '2h30m')" + echo " -o Output directory (default: ./pod-logs)" + echo "" + echo "Examples:" + echo " $0 -n simplyblock -f 2026-04-08T10:00:00Z -d 1h" + echo " $0 -n simplyblock -f 2h -d 30m -o ./logs" + exit 1 +} + +parse_duration_seconds() { + local dur="$1" + local total=0 + local tmp="$dur" + + if [[ "$tmp" =~ ^([0-9]+)h ]]; then total=$((total + ${BASH_REMATCH[1]} * 3600)); tmp="${tmp#*h}"; fi + if [[ "$tmp" =~ ^([0-9]+)m ]]; then total=$((total + ${BASH_REMATCH[1]} * 60)); tmp="${tmp#*m}"; fi + if [[ "$tmp" =~ ^([0-9]+)s ]]; then total=$((total + ${BASH_REMATCH[1]})); fi + + echo "$total" +} + +to_epoch() { + local ts="$1" + # If it looks like a relative duration (e.g. "2h", "30m"), treat as "now minus that offset" + if [[ "$ts" =~ ^[0-9]+[hms] ]] || [[ "$ts" =~ ^[0-9]+h[0-9]+m ]]; then + local secs + secs=$(parse_duration_seconds "$ts") + echo $(( $(date -u +%s) - secs )) + else + date -u -d "$ts" +%s 2>/dev/null || date -u -j -f "%Y-%m-%dT%H:%M:%SZ" "$ts" +%s + fi +} + +NAMESPACE="" +FROM="" +DURATION="" +OUTPUT_DIR="./pod-logs" + +while getopts "n:f:d:o:" opt; do + case $opt in + n) NAMESPACE="$OPTARG" ;; + f) FROM="$OPTARG" ;; + d) DURATION="$OPTARG" ;; + o) OUTPUT_DIR="$OPTARG" ;; + *) usage ;; + esac +done + +[[ -z "$NAMESPACE" || -z "$FROM" || -z "$DURATION" ]] && usage + +FROM_EPOCH=$(to_epoch "$FROM") +DURATION_SECS=$(parse_duration_seconds "$DURATION") +UNTIL_EPOCH=$(( FROM_EPOCH + DURATION_SECS )) + +FROM_TS=$(date -u -d "@$FROM_EPOCH" '+%Y-%m-%dT%H:%M:%SZ' 2>/dev/null || date -u -r "$FROM_EPOCH" '+%Y-%m-%dT%H:%M:%SZ') +UNTIL_TS=$(date -u -d "@$UNTIL_EPOCH" '+%Y-%m-%dT%H:%M:%SZ' 2>/dev/null || date -u -r "$UNTIL_EPOCH" '+%Y-%m-%dT%H:%M:%SZ') + +mkdir -p "$OUTPUT_DIR" + +echo "Namespace : $NAMESPACE" +echo "From : $FROM_TS" +echo "Until : $UNTIL_TS" +echo "Output dir : $OUTPUT_DIR" +echo "" + +pods=$(kubectl get pods -n "$NAMESPACE" --no-headers -o custom-columns=":metadata.name" 2>/dev/null) + +if [[ -z "$pods" ]]; then + echo "No pods found in namespace: $NAMESPACE" + exit 0 +fi + +for pod in $pods; do + containers=$(kubectl get pod "$pod" -n "$NAMESPACE" \ + -o jsonpath='{range .spec.initContainers[*]}{.name}{"\n"}{end}{range .spec.containers[*]}{.name}{"\n"}{end}' 2>/dev/null) + + for container in $containers; do + log_file="${OUTPUT_DIR}/${pod}_${container}.log" + echo " -> $pod / $container" + + { + echo "=== Pod: $pod | Container: $container | Namespace: $NAMESPACE ===" + echo "=== From: $FROM_TS | Until: $UNTIL_TS ===" + echo "" + kubectl logs "$pod" -c "$container" -n "$NAMESPACE" \ + --timestamps \ + --since-time="$FROM_TS" 2>&1 \ + | awk -v until="$UNTIL_TS" ' + /^[0-9]{4}-[0-9]{2}-[0-9]{2}T/ { + split($1, a, /[TZ\+]/); + ts = a[1] "T" a[2] "Z"; + if (ts > until) exit + } + { print } + ' || true + } > "$log_file" + done +done + +# --- dmesg from simplyblock-csi-node pods (container: csi-node) --- +echo "" +echo "Collecting dmesg from simplyblock-csi-node pods..." + +csi_node_pods=$(kubectl get pods -n "$NAMESPACE" --no-headers \ + -o custom-columns=":metadata.name" 2>/dev/null \ + | grep '^simplyblock-csi-node' || true) + +if [[ -z "$csi_node_pods" ]]; then + echo " No simplyblock-csi-node pods found." +else + # dmesg timestamps are seconds since boot; compute the boot time of each pod's node + # to convert dmesg relative times to wall clock and filter by window. + for pod in $csi_node_pods; do + dmesg_file="${OUTPUT_DIR}/${pod}_csi-node_dmesg.log" + echo " -> $pod / csi-node (dmesg)" + + { + echo "=== Pod: $pod | Container: csi-node | dmesg ===" + echo "=== From: $FROM_TS | Until: $UNTIL_TS ===" + echo "" + + # Get node boot epoch: current time minus kernel uptime + boot_epoch=$(kubectl exec "$pod" -c csi-node -n "$NAMESPACE" -- \ + awk '{print int(systime() - $1)}' /proc/uptime 2>/dev/null) || boot_epoch=0 + + kubectl exec "$pod" -c csi-node -n "$NAMESPACE" -- \ + dmesg --kernel --time-format=reltime --nopager 2>/dev/null \ + | awk -v boot="$boot_epoch" -v from="$FROM_EPOCH" -v until="$UNTIL_EPOCH" ' + /^\[/ { + # reltime format: [Mar 8 10:00:00.000000] + # fall back to using dmesg monotonic seconds if reltime unavailable + } + { print } + ' || \ + kubectl exec "$pod" -c csi-node -n "$NAMESPACE" -- \ + dmesg --kernel --nopager 2>/dev/null \ + | awk -v boot="$boot_epoch" -v from="$FROM_EPOCH" -v until="$UNTIL_EPOCH" ' + /^\[[ ]*([0-9]+\.[0-9]+)\]/ { + match($0, /\[[ ]*([0-9]+\.[0-9]+)\]/, a) + wall = boot + int(a[1]) + if (wall < from) next + if (wall > until) exit + } + { print } + ' || true + } > "$dmesg_file" + done +fi + +echo "" +echo "Done. Logs written to: $OUTPUT_DIR" +ls -lh "$OUTPUT_DIR" From a18fc929e0a7998846e8629bba3a198481e49078 Mon Sep 17 00:00:00 2001 From: hamdykhader Date: Wed, 8 Apr 2026 15:15:29 +0300 Subject: [PATCH 57/70] fix: update snapshot chaining to use dictionary syntax and add error handling --- .../services/snapshot_replication.py | 19 +++++++++++-------- 1 file changed, 11 insertions(+), 8 deletions(-) diff --git a/simplyblock_core/services/snapshot_replication.py b/simplyblock_core/services/snapshot_replication.py index 8c244c505..6a88d6c47 100644 --- a/simplyblock_core/services/snapshot_replication.py +++ b/simplyblock_core/services/snapshot_replication.py @@ -166,8 +166,8 @@ def process_snap_replicate_finish(task, snapshot): # chain snaps on primary if target_prev_snap: - logger.info(f"Chaining replicated lvol: {remote_lv.top_bdev} to snap: {target_prev_snap.snap_bdev}") - ret = remote_snode.rpc_client().bdev_lvol_add_clone( remote_lv.top_bdev, target_prev_snap.snap_bdev) + logger.info(f"Chaining replicated lvol: {remote_lv.top_bdev} to snap: {target_prev_snap['snap_bdev']}") + ret = remote_snode.rpc_client().bdev_lvol_add_clone( remote_lv.top_bdev, target_prev_snap['snap_bdev']) if not ret: logger.error("Failed to chain replicated snapshot on primary node") return False @@ -182,8 +182,8 @@ def process_snap_replicate_finish(task, snapshot): sec_node = db.get_storage_node_by_id(remote_snode.secondary_node_id) if sec_node.status == StorageNode.STATUS_ONLINE: if target_prev_snap: - logger.info(f"Chaining replicated lvol: {remote_lv.top_bdev} to snap: {target_prev_snap.snap_bdev}") - ret = sec_node.rpc_client().bdev_lvol_add_clone(remote_lv.top_bdev, target_prev_snap.snap_bdev) + logger.info(f"Chaining replicated lvol: {remote_lv.top_bdev} to snap: {target_prev_snap['snap_bdev']}") + ret = sec_node.rpc_client().bdev_lvol_add_clone(remote_lv.top_bdev, target_prev_snap['snap_bdev']) if not ret: logger.error("Failed to chain replicated snapshot on secondary node") return False @@ -218,10 +218,13 @@ def process_snap_replicate_finish(task, snapshot): snapshot.target_replicated_snap_uuid = new_snapshot_uuid new_snapshot.source_replicated_snap_uuid = snapshot.uuid - if target_prev_snap: - new_snapshot.prev_snap_uuid = target_prev_snap.get_id() - target_prev_snap.next_snap_uuid = new_snapshot_uuid - target_prev_snap.write_to_db() + try: + if target_prev_snap: + new_snapshot.prev_snap_uuid = target_prev_snap.get_id() + target_prev_snap.next_snap_uuid = new_snapshot_uuid + target_prev_snap.write_to_db() + except Exception as e: + logger.error(e) new_snapshot.write_to_db() From 11b847829e40bccdfb61eb025e4b17705d3dd026 Mon Sep 17 00:00:00 2001 From: hamdykhader Date: Wed, 8 Apr 2026 16:18:13 +0300 Subject: [PATCH 58/70] fix: improve snapshot replication by verifying target LVol existence --- simplyblock_core/controllers/lvol_controller.py | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/simplyblock_core/controllers/lvol_controller.py b/simplyblock_core/controllers/lvol_controller.py index 630060318..4ab2d77d8 100644 --- a/simplyblock_core/controllers/lvol_controller.py +++ b/simplyblock_core/controllers/lvol_controller.py @@ -2537,6 +2537,16 @@ def replicate_lvol_on_source_cluster(lvol_id, cluster_id=None, pool_uuid=None): if not snapshot: target_node = db_controller.get_storage_node_by_id(lvol.replication_node_id) logger.info(f"Looking for snapshot in target cluster: {target_node.cluster_id}") + target_lvol_id = None + for lv in db_controller.get_lvols(target_node.cluster_id): + if lv.nqn == lvol.nqn: + logger.info(f"LVol with same nqn already exists on target cluster: {lv.get_id()}") + target_lvol_id = lv.get_id() + + if not target_lvol_id: + logger.error(f"LVol with same nqn does not exist on target cluster: {target_node.cluster_id}") + return False + for task in db_controller.get_job_tasks(target_node.cluster_id): if task.function_name == JobSchedule.FN_SNAPSHOT_REPLICATION: logger.debug(task) @@ -2545,7 +2555,7 @@ def replicate_lvol_on_source_cluster(lvol_id, cluster_id=None, pool_uuid=None): except KeyError: continue - if snap.lvol.get_id() != lvol_id: + if snap.lvol.get_id() != target_lvol_id: continue snaps.append(snap) From 6c940b0b506a30d0e96c426bcb13df8641d93f39 Mon Sep 17 00:00:00 2001 From: hamdykhader Date: Wed, 8 Apr 2026 16:21:10 +0300 Subject: [PATCH 59/70] fix: update lvol NQN comparison for target cluster snapshot verification --- simplyblock_core/controllers/lvol_controller.py | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/simplyblock_core/controllers/lvol_controller.py b/simplyblock_core/controllers/lvol_controller.py index 4ab2d77d8..20d1c6875 100644 --- a/simplyblock_core/controllers/lvol_controller.py +++ b/simplyblock_core/controllers/lvol_controller.py @@ -2538,9 +2538,10 @@ def replicate_lvol_on_source_cluster(lvol_id, cluster_id=None, pool_uuid=None): target_node = db_controller.get_storage_node_by_id(lvol.replication_node_id) logger.info(f"Looking for snapshot in target cluster: {target_node.cluster_id}") target_lvol_id = None + lvol_id_in_nqn = lvol.nqn.split(":")[-1] for lv in db_controller.get_lvols(target_node.cluster_id): - if lv.nqn == lvol.nqn: - logger.info(f"LVol with same nqn already exists on target cluster: {lv.get_id()}") + if lv.nqn.split(":")[-1] == lvol_id_in_nqn: + logger.info(f"LVol with same lvol nqn already exists on target cluster: {lv.get_id()}") target_lvol_id = lv.get_id() if not target_lvol_id: From ed754ae145dfd86520bf4e136121a1178c930490 Mon Sep 17 00:00:00 2001 From: hamdykhader Date: Wed, 8 Apr 2026 18:24:41 +0300 Subject: [PATCH 60/70] fix: retrieve target replicated snapshot UUID for replication --- simplyblock_core/controllers/lvol_controller.py | 1 + 1 file changed, 1 insertion(+) diff --git a/simplyblock_core/controllers/lvol_controller.py b/simplyblock_core/controllers/lvol_controller.py index 20d1c6875..3a8e26727 100644 --- a/simplyblock_core/controllers/lvol_controller.py +++ b/simplyblock_core/controllers/lvol_controller.py @@ -2563,6 +2563,7 @@ def replicate_lvol_on_source_cluster(lvol_id, cluster_id=None, pool_uuid=None): if snaps: snaps = sorted(snaps, key=lambda x: x.created_at) snapshot = snaps[-1] + snapshot = snapshot.target_replicated_snap_uuid if not snapshot: logger.error(f"Snapshot for replication not found for lvol: {lvol_id}") From e252771de424279d8aa29c55c38f4bd3ba9e71e8 Mon Sep 17 00:00:00 2001 From: geoffrey1330 Date: Thu, 9 Apr 2026 13:01:51 +0100 Subject: [PATCH 61/70] use alpine:3.21.3 image for init job container --- simplyblock_web/templates/storage_init_job.yaml.j2 | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/simplyblock_web/templates/storage_init_job.yaml.j2 b/simplyblock_web/templates/storage_init_job.yaml.j2 index 466b3dbbe..ca4a1e712 100644 --- a/simplyblock_web/templates/storage_init_job.yaml.j2 +++ b/simplyblock_web/templates/storage_init_job.yaml.j2 @@ -23,7 +23,7 @@ spec: path: /proc containers: - name: init-setup - image: simplyblock/ubuntu-tools:22.04 + image: alpine:3.21.3 securityContext: privileged: true volumeMounts: @@ -33,6 +33,7 @@ spec: args: - | set -e + apk add --no-cache curl iproute2 util-linux >/dev/null echo "--- Starting init setup ---" From f6034ec0298c7b1da1b211cfa40715ef3674b4a2 Mon Sep 17 00:00:00 2001 From: hamdykhader Date: Thu, 9 Apr 2026 15:09:05 +0300 Subject: [PATCH 62/70] fix: retrieve snapshot details using database controller for replication --- simplyblock_core/controllers/lvol_controller.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/simplyblock_core/controllers/lvol_controller.py b/simplyblock_core/controllers/lvol_controller.py index 3a8e26727..1b1003a16 100644 --- a/simplyblock_core/controllers/lvol_controller.py +++ b/simplyblock_core/controllers/lvol_controller.py @@ -2563,7 +2563,7 @@ def replicate_lvol_on_source_cluster(lvol_id, cluster_id=None, pool_uuid=None): if snaps: snaps = sorted(snaps, key=lambda x: x.created_at) snapshot = snaps[-1] - snapshot = snapshot.target_replicated_snap_uuid + snapshot = db_controller.get_snapshot_by_id(snapshot.target_replicated_snap_uuid) if not snapshot: logger.error(f"Snapshot for replication not found for lvol: {lvol_id}") From 15b1c35a968edff3474571fa7ae3b6f43b43afdd Mon Sep 17 00:00:00 2001 From: hamdykhader Date: Thu, 9 Apr 2026 16:05:04 +0300 Subject: [PATCH 63/70] fix: add error handling for missing storage node during snapshot replication --- simplyblock_core/services/snapshot_replication.py | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/simplyblock_core/services/snapshot_replication.py b/simplyblock_core/services/snapshot_replication.py index 6a88d6c47..2527853b0 100644 --- a/simplyblock_core/services/snapshot_replication.py +++ b/simplyblock_core/services/snapshot_replication.py @@ -23,7 +23,15 @@ def process_snap_replicate_start(task, snapshot): if "remote_lvol_id" not in task.function_params or not task.function_params["remote_lvol_id"]: if replicate_to_source: org_snap = db.get_snapshot_by_id(snapshot.source_replicated_snap_uuid) - remote_node_uuid = db.get_storage_node_by_id(task.node_id) + try: + remote_node_uuid = db.get_storage_node_by_id(task.node_id) + except KeyError: + msg = f"Unable to find node: {task.node_id}, stopping task" + logger.error(msg) + task.function_result = msg + task.status = JobSchedule.STATUS_DONE + task.write_to_db() + return remote_pool_uuid = org_snap.lvol.pool_uuid else: # replicate to target remote_node_uuid = db.get_storage_node_by_id(snapshot.lvol.replication_node_id) From 58e7224cb8fa233333260c5db4160eed98b9b02a Mon Sep 17 00:00:00 2001 From: geoffrey1330 Date: Fri, 10 Apr 2026 09:27:21 +0100 Subject: [PATCH 64/70] removed script collect_logs.sh --- simplyblock_core/scripts/collect_pod_logs.sh | 160 ------------------- 1 file changed, 160 deletions(-) delete mode 100755 simplyblock_core/scripts/collect_pod_logs.sh diff --git a/simplyblock_core/scripts/collect_pod_logs.sh b/simplyblock_core/scripts/collect_pod_logs.sh deleted file mode 100755 index 472f76a06..000000000 --- a/simplyblock_core/scripts/collect_pod_logs.sh +++ /dev/null @@ -1,160 +0,0 @@ -#!/bin/bash -set -euo pipefail - -usage() { - echo "Usage: $0 -n -f -d [-o ]" - echo "" - echo " -n Kubernetes namespace" - echo " -f Start time (RFC3339 or relative, e.g. '2026-04-08T10:00:00Z' or '2h' or '30m')" - echo " -d Duration to collect from start time (e.g. '30m', '1h', '2h30m')" - echo " -o Output directory (default: ./pod-logs)" - echo "" - echo "Examples:" - echo " $0 -n simplyblock -f 2026-04-08T10:00:00Z -d 1h" - echo " $0 -n simplyblock -f 2h -d 30m -o ./logs" - exit 1 -} - -parse_duration_seconds() { - local dur="$1" - local total=0 - local tmp="$dur" - - if [[ "$tmp" =~ ^([0-9]+)h ]]; then total=$((total + ${BASH_REMATCH[1]} * 3600)); tmp="${tmp#*h}"; fi - if [[ "$tmp" =~ ^([0-9]+)m ]]; then total=$((total + ${BASH_REMATCH[1]} * 60)); tmp="${tmp#*m}"; fi - if [[ "$tmp" =~ ^([0-9]+)s ]]; then total=$((total + ${BASH_REMATCH[1]})); fi - - echo "$total" -} - -to_epoch() { - local ts="$1" - # If it looks like a relative duration (e.g. "2h", "30m"), treat as "now minus that offset" - if [[ "$ts" =~ ^[0-9]+[hms] ]] || [[ "$ts" =~ ^[0-9]+h[0-9]+m ]]; then - local secs - secs=$(parse_duration_seconds "$ts") - echo $(( $(date -u +%s) - secs )) - else - date -u -d "$ts" +%s 2>/dev/null || date -u -j -f "%Y-%m-%dT%H:%M:%SZ" "$ts" +%s - fi -} - -NAMESPACE="" -FROM="" -DURATION="" -OUTPUT_DIR="./pod-logs" - -while getopts "n:f:d:o:" opt; do - case $opt in - n) NAMESPACE="$OPTARG" ;; - f) FROM="$OPTARG" ;; - d) DURATION="$OPTARG" ;; - o) OUTPUT_DIR="$OPTARG" ;; - *) usage ;; - esac -done - -[[ -z "$NAMESPACE" || -z "$FROM" || -z "$DURATION" ]] && usage - -FROM_EPOCH=$(to_epoch "$FROM") -DURATION_SECS=$(parse_duration_seconds "$DURATION") -UNTIL_EPOCH=$(( FROM_EPOCH + DURATION_SECS )) - -FROM_TS=$(date -u -d "@$FROM_EPOCH" '+%Y-%m-%dT%H:%M:%SZ' 2>/dev/null || date -u -r "$FROM_EPOCH" '+%Y-%m-%dT%H:%M:%SZ') -UNTIL_TS=$(date -u -d "@$UNTIL_EPOCH" '+%Y-%m-%dT%H:%M:%SZ' 2>/dev/null || date -u -r "$UNTIL_EPOCH" '+%Y-%m-%dT%H:%M:%SZ') - -mkdir -p "$OUTPUT_DIR" - -echo "Namespace : $NAMESPACE" -echo "From : $FROM_TS" -echo "Until : $UNTIL_TS" -echo "Output dir : $OUTPUT_DIR" -echo "" - -pods=$(kubectl get pods -n "$NAMESPACE" --no-headers -o custom-columns=":metadata.name" 2>/dev/null) - -if [[ -z "$pods" ]]; then - echo "No pods found in namespace: $NAMESPACE" - exit 0 -fi - -for pod in $pods; do - containers=$(kubectl get pod "$pod" -n "$NAMESPACE" \ - -o jsonpath='{range .spec.initContainers[*]}{.name}{"\n"}{end}{range .spec.containers[*]}{.name}{"\n"}{end}' 2>/dev/null) - - for container in $containers; do - log_file="${OUTPUT_DIR}/${pod}_${container}.log" - echo " -> $pod / $container" - - { - echo "=== Pod: $pod | Container: $container | Namespace: $NAMESPACE ===" - echo "=== From: $FROM_TS | Until: $UNTIL_TS ===" - echo "" - kubectl logs "$pod" -c "$container" -n "$NAMESPACE" \ - --timestamps \ - --since-time="$FROM_TS" 2>&1 \ - | awk -v until="$UNTIL_TS" ' - /^[0-9]{4}-[0-9]{2}-[0-9]{2}T/ { - split($1, a, /[TZ\+]/); - ts = a[1] "T" a[2] "Z"; - if (ts > until) exit - } - { print } - ' || true - } > "$log_file" - done -done - -# --- dmesg from simplyblock-csi-node pods (container: csi-node) --- -echo "" -echo "Collecting dmesg from simplyblock-csi-node pods..." - -csi_node_pods=$(kubectl get pods -n "$NAMESPACE" --no-headers \ - -o custom-columns=":metadata.name" 2>/dev/null \ - | grep '^simplyblock-csi-node' || true) - -if [[ -z "$csi_node_pods" ]]; then - echo " No simplyblock-csi-node pods found." -else - # dmesg timestamps are seconds since boot; compute the boot time of each pod's node - # to convert dmesg relative times to wall clock and filter by window. - for pod in $csi_node_pods; do - dmesg_file="${OUTPUT_DIR}/${pod}_csi-node_dmesg.log" - echo " -> $pod / csi-node (dmesg)" - - { - echo "=== Pod: $pod | Container: csi-node | dmesg ===" - echo "=== From: $FROM_TS | Until: $UNTIL_TS ===" - echo "" - - # Get node boot epoch: current time minus kernel uptime - boot_epoch=$(kubectl exec "$pod" -c csi-node -n "$NAMESPACE" -- \ - awk '{print int(systime() - $1)}' /proc/uptime 2>/dev/null) || boot_epoch=0 - - kubectl exec "$pod" -c csi-node -n "$NAMESPACE" -- \ - dmesg --kernel --time-format=reltime --nopager 2>/dev/null \ - | awk -v boot="$boot_epoch" -v from="$FROM_EPOCH" -v until="$UNTIL_EPOCH" ' - /^\[/ { - # reltime format: [Mar 8 10:00:00.000000] - # fall back to using dmesg monotonic seconds if reltime unavailable - } - { print } - ' || \ - kubectl exec "$pod" -c csi-node -n "$NAMESPACE" -- \ - dmesg --kernel --nopager 2>/dev/null \ - | awk -v boot="$boot_epoch" -v from="$FROM_EPOCH" -v until="$UNTIL_EPOCH" ' - /^\[[ ]*([0-9]+\.[0-9]+)\]/ { - match($0, /\[[ ]*([0-9]+\.[0-9]+)\]/, a) - wall = boot + int(a[1]) - if (wall < from) next - if (wall > until) exit - } - { print } - ' || true - } > "$dmesg_file" - done -fi - -echo "" -echo "Done. Logs written to: $OUTPUT_DIR" -ls -lh "$OUTPUT_DIR" From ff3c7869367d28009fe8b3cb4de8ebac2c2c2e47 Mon Sep 17 00:00:00 2001 From: geoffrey1330 Date: Fri, 10 Apr 2026 10:09:21 +0100 Subject: [PATCH 65/70] updated the cr kind and group name --- ...ock.simplyblock.io_simplyblockdevices.yaml | 135 -------- ...block.simplyblock.io_simplyblocklvols.yaml | 144 --------- ...block.simplyblock.io_simplyblockpools.yaml | 96 ------ ...lyblock.io_simplyblockstorageclusters.yaml | 187 ------------ .../crds/storage.simplyblock.io_devices.yaml | 192 ++++++++++++ .../crds/storage.simplyblock.io_lvols.yaml | 193 ++++++++++++ .../crds/storage.simplyblock.io_pools.yaml | 134 ++++++++ ....simplyblock.io_snapshotreplications.yaml} | 18 +- ...torage.simplyblock.io_storageclusters.yaml | 288 ++++++++++++++++++ ... storage.simplyblock.io_storagenodes.yaml} | 148 ++++++--- ...yaml => storage.simplyblock.io_tasks.yaml} | 36 ++- .../templates/simplyblock_customresource.yaml | 24 +- 12 files changed, 966 insertions(+), 629 deletions(-) delete mode 100644 simplyblock_core/scripts/charts/crds/simplyblock.simplyblock.io_simplyblockdevices.yaml delete mode 100644 simplyblock_core/scripts/charts/crds/simplyblock.simplyblock.io_simplyblocklvols.yaml delete mode 100644 simplyblock_core/scripts/charts/crds/simplyblock.simplyblock.io_simplyblockpools.yaml delete mode 100644 simplyblock_core/scripts/charts/crds/simplyblock.simplyblock.io_simplyblockstorageclusters.yaml create mode 100644 simplyblock_core/scripts/charts/crds/storage.simplyblock.io_devices.yaml create mode 100644 simplyblock_core/scripts/charts/crds/storage.simplyblock.io_lvols.yaml create mode 100644 simplyblock_core/scripts/charts/crds/storage.simplyblock.io_pools.yaml rename simplyblock_core/scripts/charts/crds/{simplyblock.simplyblock.io_simplyblocksnapshotreplications.yaml => storage.simplyblock.io_snapshotreplications.yaml} (90%) create mode 100644 simplyblock_core/scripts/charts/crds/storage.simplyblock.io_storageclusters.yaml rename simplyblock_core/scripts/charts/crds/{simplyblock.simplyblock.io_simplyblockstoragenodes.yaml => storage.simplyblock.io_storagenodes.yaml} (55%) rename simplyblock_core/scripts/charts/crds/{simplyblock.simplyblock.io_simplyblocktasks.yaml => storage.simplyblock.io_tasks.yaml} (60%) diff --git a/simplyblock_core/scripts/charts/crds/simplyblock.simplyblock.io_simplyblockdevices.yaml b/simplyblock_core/scripts/charts/crds/simplyblock.simplyblock.io_simplyblockdevices.yaml deleted file mode 100644 index 272030736..000000000 --- a/simplyblock_core/scripts/charts/crds/simplyblock.simplyblock.io_simplyblockdevices.yaml +++ /dev/null @@ -1,135 +0,0 @@ ---- -apiVersion: apiextensions.k8s.io/v1 -kind: CustomResourceDefinition -metadata: - annotations: - controller-gen.kubebuilder.io/version: v0.19.0 - name: simplyblockdevices.simplyblock.simplyblock.io -spec: - group: simplyblock.simplyblock.io - names: - kind: SimplyBlockDevice - listKind: SimplyBlockDeviceList - plural: simplyblockdevices - singular: simplyblockdevice - scope: Namespaced - versions: - - name: v1alpha1 - schema: - openAPIV3Schema: - description: SimplyBlockDevice is the Schema for the simplyblockdevices API - properties: - apiVersion: - description: |- - APIVersion defines the versioned schema of this representation of an object. - Servers should convert recognized schemas to the latest internal value, and - may reject unrecognized values. - More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources - type: string - kind: - description: |- - Kind is a string value representing the REST resource this object represents. - Servers may infer this from the endpoint the client submits requests to. - Cannot be updated. - In CamelCase. - More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds - type: string - metadata: - type: object - spec: - description: spec defines the desired state of SimplyBlockDevice - properties: - action: - enum: - - remove - - restart - type: string - clusterName: - type: string - deviceID: - type: string - nodeUUID: - type: string - required: - - clusterName - type: object - status: - description: status defines the observed state of SimplyBlockDevice - properties: - actionStatus: - properties: - action: - type: string - message: - type: string - nodeUUID: - type: string - observedGeneration: - format: int64 - type: integer - state: - type: string - triggered: - type: boolean - updatedAt: - format: date-time - type: string - type: object - nodes: - items: - properties: - devices: - items: - properties: - health: - type: string - model: - type: string - size: - type: string - stats: - items: - properties: - capacityUtil: - format: int64 - type: integer - riops: - format: int64 - type: integer - rtp: - format: int64 - type: integer - wiops: - format: int64 - type: integer - wtp: - format: int64 - type: integer - type: object - type: array - status: - type: string - utilization: - format: int64 - type: integer - uuid: - type: string - type: object - type: array - nodeUUID: - type: string - type: object - type: array - type: object - required: - - spec - type: object - x-kubernetes-validations: - - message: nodeUUID and deviceID are required when action is specified - rule: '!(has(self.spec.action) && self.spec.action != "" && ((!has(self.spec.nodeUUID) - || self.spec.nodeUUID == "") || (!has(self.spec.deviceID) || self.spec.deviceID - == "")))' - served: true - storage: true - subresources: - status: {} diff --git a/simplyblock_core/scripts/charts/crds/simplyblock.simplyblock.io_simplyblocklvols.yaml b/simplyblock_core/scripts/charts/crds/simplyblock.simplyblock.io_simplyblocklvols.yaml deleted file mode 100644 index 8e44a687d..000000000 --- a/simplyblock_core/scripts/charts/crds/simplyblock.simplyblock.io_simplyblocklvols.yaml +++ /dev/null @@ -1,144 +0,0 @@ ---- -apiVersion: apiextensions.k8s.io/v1 -kind: CustomResourceDefinition -metadata: - annotations: - controller-gen.kubebuilder.io/version: v0.19.0 - name: simplyblocklvols.simplyblock.simplyblock.io -spec: - group: simplyblock.simplyblock.io - names: - kind: SimplyBlockLvol - listKind: SimplyBlockLvolList - plural: simplyblocklvols - singular: simplyblocklvol - scope: Namespaced - versions: - - additionalPrinterColumns: - - jsonPath: .status.lvols.length() - name: LVOLs - type: integer - name: v1alpha1 - schema: - openAPIV3Schema: - description: SimplyBlockLvol is the Schema for the simplyblocklvols API - properties: - apiVersion: - description: |- - APIVersion defines the versioned schema of this representation of an object. - Servers should convert recognized schemas to the latest internal value, and - may reject unrecognized values. - More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources - type: string - kind: - description: |- - Kind is a string value representing the REST resource this object represents. - Servers may infer this from the endpoint the client submits requests to. - Cannot be updated. - In CamelCase. - More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds - type: string - metadata: - type: object - spec: - description: spec defines the desired state of SimplyBlockLvol - properties: - clusterName: - type: string - poolName: - type: string - required: - - clusterName - - poolName - type: object - status: - description: status defines the observed state of SimplyBlockLvol - properties: - configured: - type: boolean - lvols: - items: - properties: - blobID: - format: int64 - type: integer - clonedFromSnap: - type: string - createDt: - format: date-time - type: string - fabric: - type: string - ha: - type: boolean - health: - type: boolean - hostname: - type: string - isCrypto: - type: boolean - lvolName: - type: string - maxNamespacesPerSubsystem: - format: int64 - type: integer - namespaceID: - format: int64 - type: integer - nodeUUID: - items: - type: string - type: array - nqn: - type: string - poolName: - type: string - poolUUID: - type: string - pvcName: - type: string - qosClass: - format: int64 - type: integer - qosIOPS: - format: int64 - type: integer - qosRTP: - format: int64 - type: integer - qosRWTP: - format: int64 - type: integer - qosWTP: - format: int64 - type: integer - size: - type: string - snapName: - type: string - status: - type: string - stripeWdata: - format: int64 - type: integer - stripeWparity: - format: int64 - type: integer - subsysPort: - format: int64 - type: integer - updateDt: - format: date-time - type: string - uuid: - type: string - type: object - type: array - type: object - required: - - spec - type: object - served: true - storage: true - subresources: - status: {} diff --git a/simplyblock_core/scripts/charts/crds/simplyblock.simplyblock.io_simplyblockpools.yaml b/simplyblock_core/scripts/charts/crds/simplyblock.simplyblock.io_simplyblockpools.yaml deleted file mode 100644 index 693322dc3..000000000 --- a/simplyblock_core/scripts/charts/crds/simplyblock.simplyblock.io_simplyblockpools.yaml +++ /dev/null @@ -1,96 +0,0 @@ ---- -apiVersion: apiextensions.k8s.io/v1 -kind: CustomResourceDefinition -metadata: - annotations: - controller-gen.kubebuilder.io/version: v0.19.0 - name: simplyblockpools.simplyblock.simplyblock.io -spec: - group: simplyblock.simplyblock.io - names: - kind: SimplyBlockPool - listKind: SimplyBlockPoolList - plural: simplyblockpools - singular: simplyblockpool - scope: Namespaced - versions: - - name: v1alpha1 - schema: - openAPIV3Schema: - description: SimplyBlockPool is the Schema for the pools API - properties: - apiVersion: - description: |- - APIVersion defines the versioned schema of this representation of an object. - Servers should convert recognized schemas to the latest internal value, and - may reject unrecognized values. - More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources - type: string - kind: - description: |- - Kind is a string value representing the REST resource this object represents. - Servers may infer this from the endpoint the client submits requests to. - Cannot be updated. - In CamelCase. - More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds - type: string - metadata: - type: object - spec: - description: spec defines the desired state of Pool - properties: - action: - type: string - capacityLimit: - type: string - clusterName: - type: string - name: - type: string - qosIOPSLimit: - format: int32 - type: integer - rLimit: - format: int32 - type: integer - rwLimit: - format: int32 - type: integer - status: - type: string - wLimit: - format: int32 - type: integer - required: - - clusterName - - name - type: object - status: - description: status defines the observed state of Pool - properties: - qosHost: - type: string - qosIOPSLimit: - format: int32 - type: integer - rLimit: - format: int32 - type: integer - rwLimit: - format: int32 - type: integer - status: - type: string - uuid: - type: string - wLimit: - format: int32 - type: integer - type: object - required: - - spec - type: object - served: true - storage: true - subresources: - status: {} diff --git a/simplyblock_core/scripts/charts/crds/simplyblock.simplyblock.io_simplyblockstorageclusters.yaml b/simplyblock_core/scripts/charts/crds/simplyblock.simplyblock.io_simplyblockstorageclusters.yaml deleted file mode 100644 index cfed158ff..000000000 --- a/simplyblock_core/scripts/charts/crds/simplyblock.simplyblock.io_simplyblockstorageclusters.yaml +++ /dev/null @@ -1,187 +0,0 @@ ---- -apiVersion: apiextensions.k8s.io/v1 -kind: CustomResourceDefinition -metadata: - annotations: - controller-gen.kubebuilder.io/version: v0.19.0 - name: simplyblockstorageclusters.simplyblock.simplyblock.io -spec: - group: simplyblock.simplyblock.io - names: - kind: SimplyBlockStorageCluster - listKind: SimplyBlockStorageClusterList - plural: simplyblockstorageclusters - singular: simplyblockstoragecluster - scope: Namespaced - versions: - - name: v1alpha1 - schema: - openAPIV3Schema: - description: SimplyBlockStorageCluster is the Schema for the simplyblockstorageclusters - API - properties: - apiVersion: - description: |- - APIVersion defines the versioned schema of this representation of an object. - Servers should convert recognized schemas to the latest internal value, and - may reject unrecognized values. - More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources - type: string - kind: - description: |- - Kind is a string value representing the REST resource this object represents. - Servers may infer this from the endpoint the client submits requests to. - Cannot be updated. - In CamelCase. - More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds - type: string - metadata: - type: object - spec: - description: spec defines the desired state of SimplyBlockStorageCluster - properties: - action: - enum: - - activate - - expand - type: string - blkSize: - format: int32 - type: integer - capCrit: - format: int32 - type: integer - capWarn: - format: int32 - type: integer - clientDataNic: - type: string - clientQpairCount: - format: int32 - type: integer - clusterName: - type: string - distrBs: - format: int32 - type: integer - distrChunkBs: - format: int32 - type: integer - enableNodeAffinity: - type: boolean - eventLogEntries: - format: int32 - type: integer - fabric: - type: string - haType: - type: string - includeEventLog: - type: boolean - inflightIOThreshold: - format: int32 - type: integer - isSingleNode: - type: boolean - maxFaultTolerance: - format: int32 - type: integer - maxQueueSize: - format: int32 - type: integer - mgmtIfc: - description: Create-only - type: string - nvmfBasePort: - format: int32 - type: integer - pageSizeInBlocks: - format: int32 - type: integer - provCapCrit: - format: int32 - type: integer - provCapWarn: - format: int32 - type: integer - qosClasses: - description: Updatable - type: string - qpairCount: - format: int32 - type: integer - rpcBasePort: - format: int32 - type: integer - snodeApiPort: - format: int32 - type: integer - strictNodeAntiAffinity: - type: boolean - stripeWdata: - format: int32 - type: integer - stripeWparity: - format: int32 - type: integer - required: - - clusterName - type: object - status: - description: status defines the observed state of SimplyBlockStorageCluster - properties: - MOD: - type: string - NQN: - type: string - UUID: - type: string - actionStatus: - properties: - action: - type: string - message: - type: string - nodeUUID: - type: string - observedGeneration: - format: int64 - type: integer - state: - type: string - triggered: - type: boolean - updatedAt: - format: date-time - type: string - type: object - clusterName: - type: string - configured: - type: boolean - created: - format: date-time - type: string - lastUpdated: - format: date-time - type: string - mgmtNodes: - format: int32 - type: integer - rebalancing: - type: boolean - secretName: - type: string - status: - type: string - storageNodes: - format: int32 - type: integer - type: object - required: - - spec - type: object - served: true - storage: true - subresources: - status: {} diff --git a/simplyblock_core/scripts/charts/crds/storage.simplyblock.io_devices.yaml b/simplyblock_core/scripts/charts/crds/storage.simplyblock.io_devices.yaml new file mode 100644 index 000000000..077b3d92d --- /dev/null +++ b/simplyblock_core/scripts/charts/crds/storage.simplyblock.io_devices.yaml @@ -0,0 +1,192 @@ +--- +apiVersion: apiextensions.k8s.io/v1 +kind: CustomResourceDefinition +metadata: + annotations: + controller-gen.kubebuilder.io/version: v0.19.0 + name: devices.storage.simplyblock.io +spec: + group: storage.simplyblock.io + names: + kind: Device + listKind: DeviceList + plural: devices + singular: device + scope: Namespaced + versions: + - name: v1alpha1 + schema: + openAPIV3Schema: + description: Device is the Schema for the devices API + properties: + apiVersion: + description: |- + APIVersion defines the versioned schema of this representation of an object. + Servers should convert recognized schemas to the latest internal value, and + may reject unrecognized values. + More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources + type: string + kind: + description: |- + Kind is a string value representing the REST resource this object represents. + Servers may infer this from the endpoint the client submits requests to. + Cannot be updated. + In CamelCase. + More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds + type: string + metadata: + type: object + spec: + description: spec defines the desired state of Device + properties: + action: + description: Action triggers an imperative device operation. + enum: + - remove + - restart + type: string + clusterName: + description: ClusterName is the target storage cluster name. + type: string + deviceID: + description: DeviceID is the backend device identifier used for actions. + type: string + nodeUUID: + description: NodeUUID scopes operations to a single storage node when + set. + type: string + required: + - clusterName + type: object + status: + description: status defines the observed state of Device + properties: + actionStatus: + description: ActionStatus tracks the lifecycle of the latest device + action. + properties: + action: + description: Action is the requested action name. + type: string + message: + description: Message is a human-readable action result or error. + type: string + nodeUUID: + description: NodeUUID is the target node UUID for the action. + type: string + observedGeneration: + description: ObservedGeneration is the resource generation observed + by this status. + format: int64 + type: integer + state: + type: string + triggered: + description: Triggered indicates whether the underlying backend + action has been fired. + type: boolean + updatedAt: + description: UpdatedAt is the timestamp of the last status transition. + format: date-time + type: string + type: object + nodes: + description: Nodes contains observed devices grouped by storage node. + items: + properties: + devices: + description: Devices is the observed device inventory for the + node. + items: + properties: + health: + description: Health is the backend health indicator for + the device. + type: string + model: + description: Model is the reported device model. + type: string + size: + description: Size is the formatted device capacity value. + type: string + stats: + description: Stats is the time-series/statistics collection + for the device. + items: + properties: + iops: + description: |- + IOPS contains read/write IOPS values. + FIXME: Unused for now + properties: + read: + description: |- + Read is the read IOPS metric. + FIXME: Unused for now + format: int64 + type: integer + write: + description: |- + Write is the write IOPS metric. + FIXME: Unused for now + format: int64 + type: integer + type: object + throughput: + description: |- + Throughput contains read/write throughput values. + FIXME: Unused for now + properties: + read: + description: |- + Read is the read throughput metric. + FIXME: Unused for now + format: int64 + type: integer + write: + description: |- + Write is the write throughput metric. + FIXME: Unused for now + format: int64 + type: integer + type: object + utilizedCapacity: + description: |- + UtilizedCapacity is the used-capacity metric for the device. + FIXME: Unused for now + format: int64 + type: integer + type: object + type: array + status: + description: Status is the backend lifecycle status of + the device. + type: string + utilization: + description: Utilization is the backend utilization metric. + format: int64 + type: integer + uuid: + description: UUID is the backend device UUID. + type: string + type: object + type: array + nodeUUID: + description: NodeUUID is the backend node UUID owning the listed + devices. + type: string + type: object + type: array + type: object + required: + - spec + type: object + x-kubernetes-validations: + - message: nodeUUID and deviceID are required when action is specified + rule: '!(has(self.spec.action) && self.spec.action != "" && ((!has(self.spec.nodeUUID) + || self.spec.nodeUUID == "") || (!has(self.spec.deviceID) || self.spec.deviceID + == "")))' + served: true + storage: true + subresources: + status: {} diff --git a/simplyblock_core/scripts/charts/crds/storage.simplyblock.io_lvols.yaml b/simplyblock_core/scripts/charts/crds/storage.simplyblock.io_lvols.yaml new file mode 100644 index 000000000..a1237f3d5 --- /dev/null +++ b/simplyblock_core/scripts/charts/crds/storage.simplyblock.io_lvols.yaml @@ -0,0 +1,193 @@ +--- +apiVersion: apiextensions.k8s.io/v1 +kind: CustomResourceDefinition +metadata: + annotations: + controller-gen.kubebuilder.io/version: v0.19.0 + name: lvols.storage.simplyblock.io +spec: + group: storage.simplyblock.io + names: + kind: Lvol + listKind: LvolList + plural: lvols + singular: lvol + scope: Namespaced + versions: + - additionalPrinterColumns: + - jsonPath: .status.lvols.length() + name: LVOLs + type: integer + name: v1alpha1 + schema: + openAPIV3Schema: + description: Lvol is the Schema for the lvols API + properties: + apiVersion: + description: |- + APIVersion defines the versioned schema of this representation of an object. + Servers should convert recognized schemas to the latest internal value, and + may reject unrecognized values. + More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources + type: string + kind: + description: |- + Kind is a string value representing the REST resource this object represents. + Servers may infer this from the endpoint the client submits requests to. + Cannot be updated. + In CamelCase. + More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds + type: string + metadata: + type: object + spec: + description: spec defines the desired state of Lvol + properties: + clusterName: + description: ClusterName is the target storage cluster name. + type: string + poolName: + description: PoolName is the target storage pool name. + type: string + required: + - clusterName + - poolName + type: object + status: + description: status defines the observed state of Lvol + properties: + configured: + description: Configured indicates whether initial Lvol reconciliation + has completed. + type: boolean + lvols: + description: Lvols contains observed logical volume entries. + items: + properties: + blobID: + description: BlobID is the backend blob identifier. + format: int64 + type: integer + clonedFromSnapshot: + description: ClonedFromSnapshot is the source snapshot name/ID + for clones. + type: string + created: + description: |- + Created is the backend creation timestamp. + FIXME: Unused for now + format: date-time + type: string + encrypted: + description: IsEncrypted indicates whether encryption is enabled + for the volume. + type: boolean + erasureCodingScheme: + description: ErasureCodingScheme is the erasure coding layout, + for example "2x1". + type: string + fabric: + description: Fabric is the storage fabric/protocol in use. + type: string + ha: + description: HA indicates whether high availability is enabled. + type: boolean + health: + description: Health indicates current health-check state. + type: boolean + hostname: + description: Hostname is the node hostname associated with the + volume. + type: string + lvolName: + description: LvolName is the logical volume name. + type: string + maxNamespacesPerSubsystem: + description: MaxNamespacesPerSubsystem is the max number of + namespaces per subsystem. + format: int64 + type: integer + namespaceID: + description: NamespaceID is the NVMe namespace identifier. + format: int64 + type: integer + nodeUUID: + description: NodeUUID is the set of node UUIDs associated with + the volume. + items: + type: string + type: array + nqn: + description: NQN is the NVMe Qualified Name for the volume. + type: string + poolName: + description: PoolName is the storage pool name. + type: string + poolUUID: + description: PoolUUID is the backend storage pool UUID. + type: string + pvcName: + description: PvcName is the bound Kubernetes PVC name when applicable. + type: string + qos: + description: QoS contains quality-of-service limits/metrics. + properties: + class: + description: Class is the QosSpec class identifier. + format: int64 + type: integer + iops: + description: IOPS is the IOPS limit/metric. + format: int64 + type: integer + throughput: + description: Throughput contains throughput limits/metrics. + properties: + read: + description: Read is the read throughput limit/metric. + format: int64 + type: integer + readWrite: + description: ReadWrite is the combined read/write throughput + limit/metric. + format: int64 + type: integer + write: + description: Write is the write throughput limit/metric. + format: int64 + type: integer + type: object + type: object + size: + description: Size is the formatted volume size. + type: string + sourceSnapshotName: + description: SourceSnapshotName is the source snapshot name + used for this volume. + type: string + status: + description: Status is the backend lifecycle status. + type: string + subsysPort: + description: SubsysPort is the NVMe subsystem/listener port. + format: int64 + type: integer + updated: + description: |- + Updated is the backend last-update timestamp. + FIXME: Unused for now + format: date-time + type: string + uuid: + description: UUID is the backend logical volume UUID. + type: string + type: object + type: array + type: object + required: + - spec + type: object + served: true + storage: true + subresources: + status: {} diff --git a/simplyblock_core/scripts/charts/crds/storage.simplyblock.io_pools.yaml b/simplyblock_core/scripts/charts/crds/storage.simplyblock.io_pools.yaml new file mode 100644 index 000000000..9eafa8e45 --- /dev/null +++ b/simplyblock_core/scripts/charts/crds/storage.simplyblock.io_pools.yaml @@ -0,0 +1,134 @@ +--- +apiVersion: apiextensions.k8s.io/v1 +kind: CustomResourceDefinition +metadata: + annotations: + controller-gen.kubebuilder.io/version: v0.19.0 + name: pools.storage.simplyblock.io +spec: + group: storage.simplyblock.io + names: + kind: Pool + listKind: PoolList + plural: pools + singular: pool + scope: Namespaced + versions: + - name: v1alpha1 + schema: + openAPIV3Schema: + description: Pool is the Schema for the pools API + properties: + apiVersion: + description: |- + APIVersion defines the versioned schema of this representation of an object. + Servers should convert recognized schemas to the latest internal value, and + may reject unrecognized values. + More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources + type: string + kind: + description: |- + Kind is a string value representing the REST resource this object represents. + Servers may infer this from the endpoint the client submits requests to. + Cannot be updated. + In CamelCase. + More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds + type: string + metadata: + type: object + spec: + description: spec defines the desired state of Pool + properties: + action: + description: Action triggers an imperative pool operation. + type: string + capacityLimit: + description: CapacityLimit is the maximum pool capacity. + type: string + clusterName: + description: ClusterName is the target storage cluster name. + type: string + name: + description: Name is the backend pool name. + type: string + qos: + description: QosSpec defines QosSpec limits for the pool. + properties: + iops: + description: IOPS is the IOPS limit for the pool. + format: int32 + type: integer + throughput: + description: Throughput contains throughput limits for the pool. + properties: + read: + description: Read is the read throughput limit for the pool. + format: int32 + type: integer + readWrite: + description: ReadWrite is the combined read/write throughput + limit for the pool. + format: int32 + type: integer + write: + description: Write is the write throughput limit for the pool. + format: int32 + type: integer + type: object + type: object + status: + description: Status is an optional desired-status hint for backend + workflows. + type: string + required: + - clusterName + - name + type: object + status: + description: status defines the observed state of Pool + properties: + qos: + description: QoS contains observed/configured QoS values. + properties: + host: + description: Host is the backend host handling pool QosSpec enforcement. + type: string + iops: + description: IOPS is the observed/configured IOPS value. + format: int32 + type: integer + throughput: + description: Throughput contains observed/configured throughput + values. + properties: + read: + description: Read is the observed/configured read throughput + value. + format: int32 + type: integer + readWrite: + description: ReadWrite is the observed/configured combined + read/write throughput value. + format: int32 + type: integer + write: + description: Write is the observed/configured write throughput + value. + format: int32 + type: integer + type: object + type: object + status: + description: Status is the backend lifecycle status. + type: string + uuid: + description: UUID is the backend pool UUID. + type: string + type: object + required: + - spec + type: object + served: true + storage: true + subresources: + status: {} diff --git a/simplyblock_core/scripts/charts/crds/simplyblock.simplyblock.io_simplyblocksnapshotreplications.yaml b/simplyblock_core/scripts/charts/crds/storage.simplyblock.io_snapshotreplications.yaml similarity index 90% rename from simplyblock_core/scripts/charts/crds/simplyblock.simplyblock.io_simplyblocksnapshotreplications.yaml rename to simplyblock_core/scripts/charts/crds/storage.simplyblock.io_snapshotreplications.yaml index 730881591..e87b7b321 100644 --- a/simplyblock_core/scripts/charts/crds/simplyblock.simplyblock.io_simplyblocksnapshotreplications.yaml +++ b/simplyblock_core/scripts/charts/crds/storage.simplyblock.io_snapshotreplications.yaml @@ -4,20 +4,20 @@ kind: CustomResourceDefinition metadata: annotations: controller-gen.kubebuilder.io/version: v0.19.0 - name: simplyblocksnapshotreplications.simplyblock.simplyblock.io + name: snapshotreplications.storage.simplyblock.io spec: - group: simplyblock.simplyblock.io + group: storage.simplyblock.io names: - kind: SimplyBlockSnapshotReplication - listKind: SimplyBlockSnapshotReplicationList - plural: simplyblocksnapshotreplications - singular: simplyblocksnapshotreplication + kind: SnapshotReplication + listKind: SnapshotReplicationList + plural: snapshotreplications + singular: snapshotreplication scope: Namespaced versions: - name: v1alpha1 schema: openAPIV3Schema: - description: SimplyBlockSnapshotReplication is the Schema for the simplyblocksnapshotreplications + description: SnapshotReplication is the Schema for the snapshotreplications API properties: apiVersion: @@ -38,7 +38,7 @@ spec: metadata: type: object spec: - description: spec defines the desired state of SimplyBlockSnapshotReplication + description: spec defines the desired state of SnapshotReplication properties: action: enum: @@ -88,7 +88,7 @@ spec: - targetPool type: object status: - description: status defines the observed state of SimplyBlockSnapshotReplication + description: status defines the observed state of SnapshotReplication properties: configured: type: boolean diff --git a/simplyblock_core/scripts/charts/crds/storage.simplyblock.io_storageclusters.yaml b/simplyblock_core/scripts/charts/crds/storage.simplyblock.io_storageclusters.yaml new file mode 100644 index 000000000..5ecee49a6 --- /dev/null +++ b/simplyblock_core/scripts/charts/crds/storage.simplyblock.io_storageclusters.yaml @@ -0,0 +1,288 @@ +--- +apiVersion: apiextensions.k8s.io/v1 +kind: CustomResourceDefinition +metadata: + annotations: + controller-gen.kubebuilder.io/version: v0.19.0 + name: storageclusters.storage.simplyblock.io +spec: + group: storage.simplyblock.io + names: + kind: StorageCluster + listKind: StorageClusterList + plural: storageclusters + singular: storagecluster + scope: Namespaced + versions: + - name: v1alpha1 + schema: + openAPIV3Schema: + description: StorageCluster is the Schema for the storageclusters API + properties: + apiVersion: + description: |- + APIVersion defines the versioned schema of this representation of an object. + Servers should convert recognized schemas to the latest internal value, and + may reject unrecognized values. + More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources + type: string + kind: + description: |- + Kind is a string value representing the REST resource this object represents. + Servers may infer this from the endpoint the client submits requests to. + Cannot be updated. + In CamelCase. + More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds + type: string + metadata: + type: object + spec: + description: spec defines the desired state of StorageCluster + properties: + action: + description: Action triggers a cluster-level action. + enum: + - activate + - expand + type: string + backup: + description: Backup specifies the specification for backup to S3 configuration + properties: + credentialsSecretRef: + description: CredentialsSecretRef points to the Secret holding + access_key_id and secret_access_key. + properties: + name: + description: Name is the name of the Secret in the same namespace + as the cluster CR. + type: string + required: + - name + type: object + localEndpoint: + type: string + localTesting: + type: boolean + secondaryTarget: + format: int32 + type: integer + snapshotBackups: + type: boolean + withCompression: + type: boolean + required: + - credentialsSecretRef + type: object + blockSize: + description: BlockSize defines the logical block size in bytes. + format: int32 + type: integer + clientDataNic: + description: ClientDataNic defines the client data network interface. + type: string + clientQpairCount: + description: |- + ClientQpairCount defines client-side queue-pair count. + FIXME: Unused for now (API update required?) + format: int32 + type: integer + clusterName: + description: ClusterName is the user-facing cluster identifier. + type: string + criticalThreshold: + description: CriticalThresholdSpec defines critical-level capacity + thresholds. + properties: + capacity: + description: Capacity defines the absolute capacity threshold + value. + format: int32 + type: integer + provisionedCapacity: + description: ProvisionedCapacity defines the provisioned-capacity + threshold value. + format: int32 + type: integer + type: object + enableNodeAffinity: + description: EnableNodeAffinity enables node-affinity placement for + storage components. + type: boolean + eventLogEntries: + description: EventLogEntries limits the number of event-log entries + returned/retained. + format: int32 + type: integer + fabric: + description: Fabric defines the storage fabric type. + type: string + haType: + description: HAType defines the backend high-availability mode. + type: string + includeEventLog: + description: IncludeEventLog controls whether event logs are included + in responses/exports. + type: boolean + inflightIOThreshold: + description: InflightIOThreshold defines the inflight I/O threshold. + format: int32 + type: integer + isSingleNode: + description: IsSingleNode enables single-node cluster mode. + type: boolean + maxFaultTolerance: + description: MaxFaultTolerance defines the maximum tolerated concurrent + faults. + format: int32 + type: integer + maxQueueSize: + description: MaxQueueSize defines the maximum backend queue size. + format: int32 + type: integer + mgmtIfname: + description: MgmtIfname is the management network interface name used + for cluster communication. + type: string + nvmfBasePort: + description: NvmfBasePort defines the base NVMf service port. + format: int32 + type: integer + pageSizeInBlocks: + description: PageSizeInBlocks defines page size expressed in blocks. + format: int32 + type: integer + qosClasses: + description: QoSClasses defines backend QosSpec class configuration. + type: string + qpairCount: + description: QpairCount defines the NVMe queue-pair count used by + the cluster. + format: int32 + type: integer + rpcBasePort: + description: RpcBasePort defines the base RPC service port. + format: int32 + type: integer + snodeApiPort: + description: SnodeApiPort defines the storage-node API port. + format: int32 + type: integer + strictNodeAntiAffinity: + description: StrictNodeAntiAffinity enforces strict anti-affinity + between storage nodes. + type: boolean + stripe: + description: StripeSpec configures erasure-coding data/parity chunk + counts. + properties: + dataChunks: + description: DataChunks defines the number of data chunks in the + erasure-coding layout. + format: int32 + type: integer + parityChunks: + description: ParityChunks defines the number of parity chunks + in the erasure-coding layout. + format: int32 + type: integer + type: object + warningThreshold: + description: WarningThresholdSpec defines warning-level capacity thresholds. + properties: + capacity: + description: Capacity defines the absolute capacity threshold + value. + format: int32 + type: integer + provisionedCapacity: + description: ProvisionedCapacity defines the provisioned-capacity + threshold value. + format: int32 + type: integer + type: object + required: + - clusterName + type: object + status: + description: status defines the observed state of StorageCluster + properties: + actionStatus: + description: ActionStatus tracks the most recent action execution + state. + properties: + action: + description: Action is the requested action name. + type: string + message: + description: Message is a human-readable action result or error. + type: string + nodeUUID: + description: NodeUUID is the target node UUID for the action. + type: string + observedGeneration: + description: ObservedGeneration is the resource generation observed + by this status. + format: int64 + type: integer + state: + type: string + triggered: + description: Triggered indicates whether the underlying backend + action has been fired. + type: boolean + updatedAt: + description: UpdatedAt is the timestamp of the last status transition. + format: date-time + type: string + type: object + clusterName: + description: ClusterName is the resolved backend cluster name. + type: string + configured: + description: Configured indicates whether initial cluster setup completed. + type: boolean + created: + description: Created is the backend creation timestamp. + format: date-time + type: string + erasureCodingScheme: + description: ErasureCodingScheme is the active erasure-coding layout, + for example "2x1". + type: string + lastUpdated: + description: LastUpdated is the last backend update timestamp. + format: date-time + type: string + mgmtNodes: + description: MgmtNodes is the number of management nodes. + format: int32 + type: integer + nqn: + description: NQN is the cluster NVM subsystem qualified name. + type: string + rebalancing: + description: Rebalancing indicates whether cluster rebalancing is + currently active. + type: boolean + secretName: + description: SecretName is the Kubernetes Secret containing cluster + credentials. + type: string + status: + description: Status is the backend-reported lifecycle status. + type: string + storageNodes: + description: StorageNodes is the number of storage nodes. + format: int32 + type: integer + uuid: + description: UUID is the backend cluster UUID. + type: string + type: object + required: + - spec + type: object + served: true + storage: true + subresources: + status: {} diff --git a/simplyblock_core/scripts/charts/crds/simplyblock.simplyblock.io_simplyblockstoragenodes.yaml b/simplyblock_core/scripts/charts/crds/storage.simplyblock.io_storagenodes.yaml similarity index 55% rename from simplyblock_core/scripts/charts/crds/simplyblock.simplyblock.io_simplyblockstoragenodes.yaml rename to simplyblock_core/scripts/charts/crds/storage.simplyblock.io_storagenodes.yaml index 559b6afa6..6d34717fd 100644 --- a/simplyblock_core/scripts/charts/crds/simplyblock.simplyblock.io_simplyblockstoragenodes.yaml +++ b/simplyblock_core/scripts/charts/crds/storage.simplyblock.io_storagenodes.yaml @@ -4,20 +4,20 @@ kind: CustomResourceDefinition metadata: annotations: controller-gen.kubebuilder.io/version: v0.19.0 - name: simplyblockstoragenodes.simplyblock.simplyblock.io + name: storagenodes.storage.simplyblock.io spec: - group: simplyblock.simplyblock.io + group: storage.simplyblock.io names: - kind: SimplyBlockStorageNode - listKind: SimplyBlockStorageNodeList - plural: simplyblockstoragenodes - singular: simplyblockstoragenode + kind: StorageNode + listKind: StorageNodeList + plural: storagenodes + singular: storagenode scope: Namespaced versions: - name: v1alpha1 schema: openAPIV3Schema: - description: SimplyBlockStorageNode is the Schema for the storagenodes API + description: StorageNode is the Schema for the storagenodes API properties: apiVersion: description: |- @@ -40,6 +40,7 @@ spec: description: spec defines the desired state of StorageNode properties: action: + description: Action triggers an imperative node operation. enum: - shutdown - restart @@ -48,89 +49,134 @@ spec: - remove type: string addPcieToAllowList: - description: restart params + description: AddPcieToAllowList appends devices to the allow-list + during restart actions. items: type: string type: array clusterImage: + description: ClusterImage is the container image used for storage-node + workloads. type: string clusterName: + description: ClusterName is the target storage cluster name. type: string coreIsolation: + description: CoreIsolation enables CPU core isolation mode. type: boolean - coreMask: - type: string corePercentage: + description: CorePercentage is the percentage of cores to be used + for spdk (0-99). format: int32 type: integer - dataNIC: + dataIfname: + description: DataIfname lists data-plane network interfaces. items: type: string type: array deviceNames: + description: DeviceNames explicitly defines a comma separated list + of nvme namespace names like nvme0n1,nvme1n1... items: type: string type: array driveSizeRange: + description: DriveSizeRange filters devices by size range. type: string enableCpuTopology: + description: EnableCpuTopology enables topology-aware CPU handling. type: boolean force: + description: Force enables forced action execution where supported. type: boolean - format4k: - type: boolean - haJM: - type: boolean - haJmCount: - format: int32 - type: integer - idDeviceByNQN: + forceFormat4K: + description: ForceFormat4K forces 4K blocksize formatting of the NVMe + device where supported. type: boolean - jmPercent: - format: int32 - type: integer - maxLVol: + journalManager: + description: JournalManagerSpec configures journal manager behavior. + properties: + count: + description: Count is the number of journal managers to configure. + format: int32 + type: integer + percentPerDevice: + description: PercentPerDevice is the journal manager capacity + percentage per device. + format: int32 + type: integer + useSeparateJournalDevice: + description: UseSeparateJournalDevice enables using separate journal + devices. + type: boolean + type: object + maxLogicalVolumeCount: + description: MaxLogicalVolumeCount is the maximum number of logical + volumes per node. format: int32 type: integer maxSize: + description: MaxSize is the maximum allocatable size of the storage + node. type: string - mgmtIfc: + mgmtIfname: + description: MgmtIfname is the management interface name used by storage + nodes. type: string nodeAddr: + description: NodeAddr is the explicit node address used by action + flows. type: string nodeUUID: description: NodeUUID is required when action is specified type: string nodesPerSocket: + description: NodesPerSocket defines how many storage nodes are created + per NUMA socket. format: int32 type: integer openShiftCluster: + description: OpenShiftCluster indicates OpenShift-specific behavior + should be enabled. type: boolean partitions: + description: Partitions is the number of partitions created per backend + storage device. format: int32 type: integer pcieAllowList: + description: PcieAllowList is the list of PCI addresses allowed for + use. items: type: string type: array pcieDenyList: + description: PcieDenyList is the list of PCI addresses excluded from + use. items: type: string type: array pcieModel: + description: PcieModel filters devices by PCI model. type: string reservedSystemCPU: + description: ReservedSystemCPU defines CPUs reserved for system workloads. type: string skipKubeletConfiguration: + description: SkipKubeletConfiguration skips kubelet configuration + changes. type: boolean socketsToUse: - format: int32 - type: integer - spdkDebug: - type: boolean + description: SocketsToUse restricts deployment to selected NUMA sockets. + items: + type: string + type: array spdkImage: + description: SpdkImage is the SPDK image reference used by node services. type: string tolerations: + description: Tolerations configures pod tolerations for storage-node + pods. items: description: |- The pod this Toleration is attached to tolerates any taint that matches @@ -169,12 +215,14 @@ spec: type: object type: array ubuntuHost: - type: boolean - useSeparateJournalDevice: + description: UbuntuHost indicates the node host OS is Ubuntu. type: boolean workerNode: + description: WorkerNode is a single worker node used by action flows. type: string workerNodes: + description: WorkerNodes is the set of Kubernetes worker nodes to + manage. items: type: string type: array @@ -185,56 +233,82 @@ spec: description: status defines the observed state of StorageNode properties: actionStatus: + description: ActionStatus tracks the latest action execution status. properties: action: + description: Action is the requested action name. type: string message: + description: Message is a human-readable action result or error. type: string nodeUUID: + description: NodeUUID is the target node UUID for the action. type: string observedGeneration: + description: ObservedGeneration is the resource generation observed + by this status. format: int64 type: integer state: type: string triggered: + description: Triggered indicates whether the underlying backend + action has been fired. type: boolean updatedAt: + description: UpdatedAt is the timestamp of the last status transition. format: date-time type: string type: object nodes: + description: Nodes is the observed state of each managed storage node. items: properties: cpu: + description: CPU is the reported CPU allocation/count for the + node. format: int32 type: integer devices: + description: Devices is the backend summary of devices on this + node. type: string health: + description: Health indicates whether health checks are currently + passing. type: boolean hostname: + description: Hostname is the Kubernetes node hostname. type: string - lvol_port: + lvolPort: + description: LvolPort is the logical-volume subsystem port. format: int32 type: integer memory: + description: Memory is the reported memory value. type: string mgmtIp: + description: MgmtIp is the management IP address for the node. type: string - nvmf_port: + nvmfPort: + description: NvmfPort is the NVMf service port. format: int32 type: integer - rpc_port: + rpcPort: + description: RpcPort is the node RPC service port. format: int32 type: integer status: + description: Status is the backend lifecycle state for the node. type: string uptime: + description: Uptime is the reported node uptime value. type: string uuid: + description: UUID is the backend node UUID. type: string volumes: + description: Volumes is the current logical volume count. format: int32 type: integer type: object @@ -247,11 +321,11 @@ spec: - message: nodeUUID is required when action is specified rule: '!(has(self.spec.action) && self.spec.action != "" && (!has(self.spec.nodeUUID) || self.spec.nodeUUID == ""))' - - message: clusterImage, maxLVol, and workerNodes are required when action - is not specified + - message: clusterImage, maxLogicalVolumeCount, and workerNodes are required + when action is not specified rule: (has(self.spec.action) && self.spec.action != "") || (has(self.spec.clusterImage) - && self.spec.clusterImage != "" && has(self.spec.maxLVol) && has(self.spec.workerNodes) - && size(self.spec.workerNodes) > 0) + && self.spec.clusterImage != "" && has(self.spec.maxLogicalVolumeCount) + && has(self.spec.workerNodes) && size(self.spec.workerNodes) > 0) served: true storage: true subresources: diff --git a/simplyblock_core/scripts/charts/crds/simplyblock.simplyblock.io_simplyblocktasks.yaml b/simplyblock_core/scripts/charts/crds/storage.simplyblock.io_tasks.yaml similarity index 60% rename from simplyblock_core/scripts/charts/crds/simplyblock.simplyblock.io_simplyblocktasks.yaml rename to simplyblock_core/scripts/charts/crds/storage.simplyblock.io_tasks.yaml index 2d25e21e1..be80851cf 100644 --- a/simplyblock_core/scripts/charts/crds/simplyblock.simplyblock.io_simplyblocktasks.yaml +++ b/simplyblock_core/scripts/charts/crds/storage.simplyblock.io_tasks.yaml @@ -4,20 +4,20 @@ kind: CustomResourceDefinition metadata: annotations: controller-gen.kubebuilder.io/version: v0.19.0 - name: simplyblocktasks.simplyblock.simplyblock.io + name: tasks.storage.simplyblock.io spec: - group: simplyblock.simplyblock.io + group: storage.simplyblock.io names: - kind: SimplyBlockTask - listKind: SimplyBlockTaskList - plural: simplyblocktasks - singular: simplyblocktask + kind: Task + listKind: TaskList + plural: tasks + singular: task scope: Namespaced versions: - name: v1alpha1 schema: openAPIV3Schema: - description: SimplyBlockTask is the Schema for the simplyblocktasks API + description: Task is the Schema for the tasks API properties: apiVersion: description: |- @@ -37,40 +37,58 @@ spec: metadata: type: object spec: - description: spec defines the desired state of SimplyBlockTask + description: spec defines the desired state of Task properties: clusterName: + description: ClusterName is the target storage cluster name. type: string subtasks: + description: Subtasks includes related child subtasks when supported + by the backend. type: boolean taskID: + description: TaskID filters results to a specific backend task when + set. type: string required: - clusterName type: object status: - description: status defines the observed state of SimplyBlockTask + description: status defines the observed state of Task properties: tasks: + description: Tasks is the currently reported task list for the query + scope. items: properties: canceled: + description: Canceled indicates whether the task was canceled. type: boolean parentTask: + description: ParentTask is the parent task UUID when this task + is a subtask. type: string retried: + description: Retried is the number of retry attempts made for + the task. format: int32 type: integer startedAt: + description: StartedAt is the backend-reported task start timestamp. format: date-time type: string taskResult: + description: TaskResult is the backend result payload/message. type: string taskStatus: + description: TaskStatus is the backend lifecycle status for + the task. type: string taskType: + description: TaskType is the backend task function/type name. type: string uuid: + description: UUID is the backend task UUID. type: string type: object type: array diff --git a/simplyblock_core/scripts/charts/templates/simplyblock_customresource.yaml b/simplyblock_core/scripts/charts/templates/simplyblock_customresource.yaml index e98680363..d9e614ada 100644 --- a/simplyblock_core/scripts/charts/templates/simplyblock_customresource.yaml +++ b/simplyblock_core/scripts/charts/templates/simplyblock_customresource.yaml @@ -1,6 +1,6 @@ {{- if .Values.simplyblock.cluster }} -apiVersion: simplyblock.simplyblock.io/v1alpha1 -kind: SimplyBlockStorageCluster +apiVersion: storage.simplyblock.io/v1alpha1 +kind: StorageCluster metadata: name: {{ .Values.simplyblock.cluster.clusterName }} namespace: {{ .Release.Namespace }} @@ -46,8 +46,8 @@ spec: --- {{- if .Values.simplyblock.pool }} -apiVersion: simplyblock.simplyblock.io/v1alpha1 -kind: SimplyBlockPool +apiVersion: storage.simplyblock.io/v1alpha1 +kind: Pool metadata: name: {{ .Values.simplyblock.pool.name }} namespace: {{ .Release.Namespace }} @@ -62,8 +62,8 @@ spec: --- {{- if .Values.simplyblock.lvol }} -apiVersion: simplyblock.simplyblock.io/v1alpha1 -kind: SimplyBlockLvol +apiVersion: storage.simplyblock.io/v1alpha1 +kind: Lvol metadata: name: {{ .Values.simplyblock.lvol.name }} namespace: {{ .Release.Namespace }} @@ -74,8 +74,8 @@ spec: --- {{- if .Values.simplyblock.storageNodes }} -apiVersion: simplyblock.simplyblock.io/v1alpha1 -kind: SimplyBlockStorageNode +apiVersion: storage.simplyblock.io/v1alpha1 +kind: StorageNode metadata: name: {{ .Values.simplyblock.storageNodes.name }} namespace: {{ .Release.Namespace }} @@ -136,8 +136,8 @@ spec: --- {{- if .Values.simplyblock.devices }} -apiVersion: simplyblock.simplyblock.io/v1alpha1 -kind: SimplyBlockDevice +apiVersion: storage.simplyblock.io/v1alpha1 +kind: Device metadata: name: {{ .Values.simplyblock.devices.name }} namespace: {{ .Release.Namespace }} @@ -147,8 +147,8 @@ spec: --- {{- if .Values.simplyblock.tasks }} -apiVersion: simplyblock.simplyblock.io/v1alpha1 -kind: SimplyBlockTask +apiVersion: storage.simplyblock.io/v1alpha1 +kind: Task metadata: name: {{ .Values.simplyblock.tasks.name }} namespace: {{ .Release.Namespace }} From 5975d8b6bbc798cefa0e83685bb8b38d5550b905 Mon Sep 17 00:00:00 2001 From: geoffrey1330 Date: Fri, 10 Apr 2026 10:32:48 +0100 Subject: [PATCH 66/70] use simplyblock/alpine-tools:3.21.3 image for init job container --- simplyblock_web/templates/storage_init_job.yaml.j2 | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/simplyblock_web/templates/storage_init_job.yaml.j2 b/simplyblock_web/templates/storage_init_job.yaml.j2 index ca4a1e712..14494d3f8 100644 --- a/simplyblock_web/templates/storage_init_job.yaml.j2 +++ b/simplyblock_web/templates/storage_init_job.yaml.j2 @@ -23,7 +23,7 @@ spec: path: /proc containers: - name: init-setup - image: alpine:3.21.3 + image: simplyblock/alpine-tools:3.21.3 securityContext: privileged: true volumeMounts: @@ -33,7 +33,6 @@ spec: args: - | set -e - apk add --no-cache curl iproute2 util-linux >/dev/null echo "--- Starting init setup ---" From 709ab455f5666438ddcf84320bae3424f57a85d2 Mon Sep 17 00:00:00 2001 From: geoffrey1330 Date: Fri, 10 Apr 2026 10:42:52 +0100 Subject: [PATCH 67/70] updated custom resource helm value --- .../templates/simplyblock_customresource.yaml | 16 ++++++---------- simplyblock_core/scripts/charts/values.yaml | 6 +++--- 2 files changed, 9 insertions(+), 13 deletions(-) diff --git a/simplyblock_core/scripts/charts/templates/simplyblock_customresource.yaml b/simplyblock_core/scripts/charts/templates/simplyblock_customresource.yaml index d9e614ada..0b2fc1790 100644 --- a/simplyblock_core/scripts/charts/templates/simplyblock_customresource.yaml +++ b/simplyblock_core/scripts/charts/templates/simplyblock_customresource.yaml @@ -7,8 +7,8 @@ metadata: spec: clusterName: {{ .Values.simplyblock.cluster.clusterName }} - {{- if .Values.simplyblock.cluster.mgmtIfc }} - mgmtIfc: {{ .Values.simplyblock.cluster.mgmtIfc }} + {{- if .Values.simplyblock.cluster.MgmtIfname }} + MgmtIfname: {{ .Values.simplyblock.cluster.MgmtIfname }} {{- end }} {{- if .Values.simplyblock.cluster.fabric }} @@ -86,12 +86,12 @@ spec: clusterImage: {{ .Values.simplyblock.storageNodes.clusterImage }} {{- end }} - {{- if .Values.simplyblock.storageNodes.mgmtIfc }} - mgmtIfc: {{ .Values.simplyblock.storageNodes.mgmtIfc }} + {{- if .Values.simplyblock.storageNodes.MgmtIfname }} + MgmtIfname: {{ .Values.simplyblock.storageNodes.MgmtIfname }} {{- end }} - {{- if .Values.simplyblock.storageNodes.maxLVol }} - maxLVol: {{ .Values.simplyblock.storageNodes.maxLVol }} + {{- if .Values.simplyblock.storageNodes.MaxLogicalVolumeCount }} + MaxLogicalVolumeCount: {{ .Values.simplyblock.storageNodes.MaxLogicalVolumeCount }} {{- end }} {{- if .Values.simplyblock.storageNodes.maxSize }} @@ -106,10 +106,6 @@ spec: corePercentage: {{ .Values.simplyblock.storageNodes.corePercentage }} {{- end }} - {{- if hasKey .Values.simplyblock.storageNodes "spdkDebug" }} - spdkDebug: {{ .Values.simplyblock.storageNodes.spdkDebug }} - {{- end }} - {{- if .Values.simplyblock.storageNodes.spdkImage }} spdkImage: {{ .Values.simplyblock.storageNodes.spdkImage }} {{- end }} diff --git a/simplyblock_core/scripts/charts/values.yaml b/simplyblock_core/scripts/charts/values.yaml index 76de4ce82..46bd172a7 100644 --- a/simplyblock_core/scripts/charts/values.yaml +++ b/simplyblock_core/scripts/charts/values.yaml @@ -269,7 +269,7 @@ ingress: simplyblock: cluster: clusterName: simplyblock-cluster - mgmtIfc: eth0 + MgmtIfname: eth0 fabric: tcp isSingleNode: false enableNodeAffinity: false @@ -289,8 +289,8 @@ simplyblock: storageNodes: name: simplyblock-node clusterImage: simplyblock/simplyblock:main-snapshot-replication - mgmtIfc: eth0 - maxLVol: 10 + MgmtIfname: eth0 + MaxLogicalVolumeCount: 10 maxSize: 0 partitions: 0 corePercentage: 65 From df409e9c8c2ac430756db3f546c0f6cf6dbfd1f8 Mon Sep 17 00:00:00 2001 From: geoffrey1330 Date: Fri, 10 Apr 2026 10:58:41 +0100 Subject: [PATCH 68/70] updated custom resource helm value --- .../templates/simplyblock_customresource.yaml | 27 +++++++------------ simplyblock_core/scripts/charts/values.yaml | 12 ++++----- 2 files changed, 15 insertions(+), 24 deletions(-) diff --git a/simplyblock_core/scripts/charts/templates/simplyblock_customresource.yaml b/simplyblock_core/scripts/charts/templates/simplyblock_customresource.yaml index 0b2fc1790..25460b148 100644 --- a/simplyblock_core/scripts/charts/templates/simplyblock_customresource.yaml +++ b/simplyblock_core/scripts/charts/templates/simplyblock_customresource.yaml @@ -7,8 +7,8 @@ metadata: spec: clusterName: {{ .Values.simplyblock.cluster.clusterName }} - {{- if .Values.simplyblock.cluster.MgmtIfname }} - MgmtIfname: {{ .Values.simplyblock.cluster.MgmtIfname }} + {{- if .Values.simplyblock.cluster.mgmtIfname }} + mgmtIfname: {{ .Values.simplyblock.cluster.mgmtIfname }} {{- end }} {{- if .Values.simplyblock.cluster.fabric }} @@ -27,21 +27,14 @@ spec: strictNodeAntiAffinity: {{ .Values.simplyblock.cluster.strictNodeAntiAffinity }} {{- end }} - {{- if .Values.simplyblock.cluster.capWarn }} - capWarn: {{ .Values.simplyblock.cluster.capWarn }} + {{- if .Values.simplyblock.cluster.warningThreshold }} + warningThreshold: {{ .Values.simplyblock.cluster.warningThreshold }} {{- end }} - {{- if .Values.simplyblock.cluster.capCrit }} - capCrit: {{ .Values.simplyblock.cluster.capCrit }} + {{- if .Values.simplyblock.cluster.criticalThreshold }} + criticalThreshold: {{ .Values.simplyblock.cluster.criticalThreshold }} {{- end }} - {{- if .Values.simplyblock.cluster.provCapWarn }} - provCapWarn: {{ .Values.simplyblock.cluster.provCapWarn }} - {{- end }} - - {{- if .Values.simplyblock.cluster.provCapCrit }} - provCapCrit: {{ .Values.simplyblock.cluster.provCapCrit }} - {{- end }} {{- end }} --- @@ -86,12 +79,12 @@ spec: clusterImage: {{ .Values.simplyblock.storageNodes.clusterImage }} {{- end }} - {{- if .Values.simplyblock.storageNodes.MgmtIfname }} - MgmtIfname: {{ .Values.simplyblock.storageNodes.MgmtIfname }} + {{- if .Values.simplyblock.storageNodes.mgmtIfname }} + mgmtIfname: {{ .Values.simplyblock.storageNodes.mgmtIfname }} {{- end }} - {{- if .Values.simplyblock.storageNodes.MaxLogicalVolumeCount }} - MaxLogicalVolumeCount: {{ .Values.simplyblock.storageNodes.MaxLogicalVolumeCount }} + {{- if .Values.simplyblock.storageNodes.maxLogicalVolumeCount }} + maxLogicalVolumeCount: {{ .Values.simplyblock.storageNodes.maxLogicalVolumeCount }} {{- end }} {{- if .Values.simplyblock.storageNodes.maxSize }} diff --git a/simplyblock_core/scripts/charts/values.yaml b/simplyblock_core/scripts/charts/values.yaml index 46bd172a7..2595fd212 100644 --- a/simplyblock_core/scripts/charts/values.yaml +++ b/simplyblock_core/scripts/charts/values.yaml @@ -269,15 +269,13 @@ ingress: simplyblock: cluster: clusterName: simplyblock-cluster - MgmtIfname: eth0 + mgmtIfname: eth0 fabric: tcp isSingleNode: false enableNodeAffinity: false strictNodeAntiAffinity: false - capWarn: 80 - capCrit: 90 - provCapWarn: 120 - provCapCrit: 150 + warningThreshold: 80 + criticalThreshold: 90 pool: name: simplyblock-pool @@ -289,8 +287,8 @@ simplyblock: storageNodes: name: simplyblock-node clusterImage: simplyblock/simplyblock:main-snapshot-replication - MgmtIfname: eth0 - MaxLogicalVolumeCount: 10 + mgmtIfname: eth0 + maxLogicalVolumeCount: 10 maxSize: 0 partitions: 0 corePercentage: 65 From 9aab98f4fba4112e0245896c42eb983ad957e66f Mon Sep 17 00:00:00 2001 From: geoffrey1330 Date: Fri, 10 Apr 2026 11:05:44 +0100 Subject: [PATCH 69/70] updated custom resource helm value --- .../templates/simplyblock_customresource.yaml | 16 ++++++++++++++-- simplyblock_core/scripts/charts/values.yaml | 8 ++++++-- 2 files changed, 20 insertions(+), 4 deletions(-) diff --git a/simplyblock_core/scripts/charts/templates/simplyblock_customresource.yaml b/simplyblock_core/scripts/charts/templates/simplyblock_customresource.yaml index 25460b148..a44d06ee5 100644 --- a/simplyblock_core/scripts/charts/templates/simplyblock_customresource.yaml +++ b/simplyblock_core/scripts/charts/templates/simplyblock_customresource.yaml @@ -28,11 +28,23 @@ spec: {{- end }} {{- if .Values.simplyblock.cluster.warningThreshold }} - warningThreshold: {{ .Values.simplyblock.cluster.warningThreshold }} + warningThreshold: + {{- if .Values.simplyblock.cluster.warningThreshold.capacity }} + capacity: {{ .Values.simplyblock.cluster.warningThreshold.capacity }} + {{- end }} + {{- if .Values.simplyblock.cluster.warningThreshold.provisionedCapacity }} + provisionedCapacity: {{ .Values.simplyblock.cluster.warningThreshold.provisionedCapacity }} + {{- end }} {{- end }} {{- if .Values.simplyblock.cluster.criticalThreshold }} - criticalThreshold: {{ .Values.simplyblock.cluster.criticalThreshold }} + criticalThreshold: + {{- if .Values.simplyblock.cluster.criticalThreshold.capacity }} + capacity: {{ .Values.simplyblock.cluster.criticalThreshold.capacity }} + {{- end }} + {{- if .Values.simplyblock.cluster.criticalThreshold.provisionedCapacity }} + provisionedCapacity: {{ .Values.simplyblock.cluster.criticalThreshold.provisionedCapacity }} + {{- end }} {{- end }} {{- end }} diff --git a/simplyblock_core/scripts/charts/values.yaml b/simplyblock_core/scripts/charts/values.yaml index 2595fd212..e7e4b28a0 100644 --- a/simplyblock_core/scripts/charts/values.yaml +++ b/simplyblock_core/scripts/charts/values.yaml @@ -274,8 +274,12 @@ simplyblock: isSingleNode: false enableNodeAffinity: false strictNodeAntiAffinity: false - warningThreshold: 80 - criticalThreshold: 90 + warningThreshold: + capacity: 80 + provisionedCapacity: 80 + criticalThreshold: + capacity: 90 + provisionedCapacity: 90 pool: name: simplyblock-pool From 332fda12f3d4644c8e55577682970b41c5f302fc Mon Sep 17 00:00:00 2001 From: geoffrey1330 Date: Fri, 10 Apr 2026 11:14:53 +0100 Subject: [PATCH 70/70] updated the simplyblock-manager clusterrole --- .../charts/templates/simplyblock-manager.yaml | 49 +++++++++---------- 1 file changed, 24 insertions(+), 25 deletions(-) diff --git a/simplyblock_core/scripts/charts/templates/simplyblock-manager.yaml b/simplyblock_core/scripts/charts/templates/simplyblock-manager.yaml index db52951cb..ed38517ef 100644 --- a/simplyblock_core/scripts/charts/templates/simplyblock-manager.yaml +++ b/simplyblock_core/scripts/charts/templates/simplyblock-manager.yaml @@ -156,15 +156,15 @@ rules: - update - patch - apiGroups: - - simplyblock.simplyblock.io + - storage.simplyblock.io resources: - - simplyblockpools - - simplyblocklvols - - simplyblockstorageclusters - - simplyblockstoragenodes - - simplyblockdevices - - simplyblocktasks - - simplyblocksnapshotreplications + - pools + - lvols + - storageclusters + - storagenodes + - devices + - tasks + - snapshotreplications verbs: - create - delete @@ -174,28 +174,28 @@ rules: - update - watch - apiGroups: - - simplyblock.simplyblock.io + - storage.simplyblock.io resources: - - simplyblockpools/finalizers - - simplyblocklvols/finalizers - - simplyblockstorageclusters/finalizers - - simplyblockstoragenodes/finalizers - - simplyblockdevices/finalizers - - simplyblocktasks/finalizers - - simplyblocksnapshotreplications/finalizers + - pools/finalizers + - lvols/finalizers + - storageclusters/finalizers + - storagenodes/finalizers + - devices/finalizers + - tasks/finalizers + - snapshotreplications/finalizers verbs: - update - delete - apiGroups: - - simplyblock.simplyblock.io + - storage.simplyblock.io resources: - - simplyblockpools/status - - simplyblocklvols/status - - simplyblockstorageclusters/status - - simplyblockstoragenodes/status - - simplyblockdevices/status - - simplyblocktasks/status - - simplyblocksnapshotreplications/status + - pools/status + - lvols/status + - storageclusters/status + - storagenodes/status + - devices/status + - tasks/status + - snapshotreplications/status verbs: - get - patch @@ -215,4 +215,3 @@ subjects: - kind: ServiceAccount name: simplyblock-manager namespace: {{ .Release.Namespace }} - \ No newline at end of file