stackabletech · fhennig · Jan 23, 2023 · Jan 23, 2023 · Jan 23, 2023 · Jan 23, 2023
diff --git a/docs/modules/ROOT/nav.adoc b/docs/modules/ROOT/nav.adoc
@@ -1,5 +1,9 @@
 * xref:configuration.adoc[]
-* xref:usage.adoc[]
-* xref:implementation.adoc[]
 * Concepts
 ** xref:discovery.adoc[]
+* xref:usage-guide/index.adoc[]
+** xref:usage-guide/resources.adoc[]
+** xref:usage-guide/logging-log-aggregation.adoc[]
+** xref:usage-guide/monitoring.adoc[]
+** xref:usage-guide/scaling.adoc[]
+** xref:usage-guide/configuration-environment-overrides.adoc[]
diff --git a/docs/modules/ROOT/pages/implementation.adoc b/docs/modules/ROOT/pages/implementation.adoc
diff --git a/docs/modules/ROOT/pages/index.adoc b/docs/modules/ROOT/pages/index.adoc
@@ -1,18 +1,42 @@
 = Stackable Operator for Apache HDFS
 
-This is an operator for Kubernetes that can manage https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html[Apache HDFS] clusters.
+The Stackable Operator for https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html[Apache HDFS] is used to set up HFDS in high-availability mode. It depends on the xref:zookeeper:ROOT:index.adoc[] to operate a ZooKeeper cluster to coordinate the active and standby NameNodes.
 
-WARNING: This operator only works with images from the https://repo.stackable.tech/#browse/browse:docker:v2%2Fstackable%2Fhadoop[Stackable] repository
+NOTE: This operator only works with images from the https://repo.stackable.tech/#browse/browse:docker:v2%2Fstackable%2Fhadoop[Stackable] repository
+
+== Roles
+
+Three xref:home:concepts:roles-and-role-groups.adoc[roles] of the HDFS cluster are implemented:
+
+* DataNode - responsible for storing the actual data.
+* JournalNode - responsible for keeping track of HDFS blocks and used to perform failovers in case the active NameNode fails. For details see: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html
+* NameNode - responsible for keeping track of HDFS blocks and providing access to the data.
+
+== Kubernetes objects
+
+The operator creates the following K8S objects per role group defined in the custom resource.
+
+* Service - ClusterIP used for intra-cluster communication.
+* ConfigMap - HDFS configuration files like `core-site.xml`, `hdfs-site.xml` and `log4j.properties` are defined here and mounted in the pods.
+* StatefulSet - where the replica count, volume mounts and more for each role group is defined.
+
+In addition, a `NodePort` service is created for each pod labeled with `hdfs.stackable.tech/pod-service=true` that exposes all container ports to the outside world (from the perspective of K8S).
+
+In the custom resource you can specify the number of replicas per role group (NameNode, DataNode or JournalNode). A minimal working configuration requires:
+
+* 2 NameNodes (HA)
+* 1 JournalNode
+* 1 DataNode (should match at least the `dfsReplication` factor)
 
 == Supported Versions
 
 The Stackable Operator for Apache HDFS currently supports the following versions of HDFS:
 
 include::partial$supported-versions.adoc[]
 
-== Docker
+== Docker image
 
 [source]
 ----
 docker pull docker.stackable.tech/stackable/hadoop:<version>
-----
+----
diff --git a/docs/modules/ROOT/pages/usage-guide/configuration-environment-overrides.adoc b/docs/modules/ROOT/pages/usage-guide/configuration-environment-overrides.adoc
@@ -0,0 +1,75 @@
+
+= Configuration & Environment Overrides
+
+The cluster definition also supports overriding configuration properties and environment variables, either per role or per role group, where the more specific override (role group) has precedence over the less specific one (role).
+
+IMPORTANT: Overriding certain properties can lead to faulty clusters. In general this means, do not change ports, hostnames or properties related to data dirs, high-availability or security.
+
+== Configuration Properties
+
+For a role or role group, at the same level of `config`, you can specify `configOverrides` for the `hdfs-site.xml` and `core-site.xml`. For example, if you want to set additional properties on the namenode servers, adapt the `nameNodes` section of the cluster resource like so:
+
+[source,yaml]
+----
+nameNodes:
+  roleGroups:
+    default:
+      config: [...]
+      configOverrides:
+        core-site.xml:
+          fs.trash.interval: "5"
+        hdfs-site.xml:
+          dfs.namenode.num.checkpoints.retained: "3"
+      replicas: 2
+----
+
+Just as for the `config`, it is possible to specify this at role level as well:
+
+[source,yaml]
+----
+nameNodes:
+  configOverrides:
+    core-site.xml:
+      fs.trash.interval: "5"
+    hdfs-site.xml:
+      dfs.namenode.num.checkpoints.retained: "3"
+  roleGroups:
+    default:
+      config: [...]
+      replicas: 2
+----
+
+All override property values must be strings. The properties will be formatted and escaped correctly into the XML file.
+
+For a full list of configuration options we refer to the Apache Hdfs documentation for https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml[hdfs-site.xml] and https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/core-default.xml[core-site.xml]
+
+
+== Environment Variables
+
+In a similar fashion, environment variables can be (over)written. For example per role group:
+
+[source,yaml]
+----
+nameNodes:
+  roleGroups:
+    default:
+      config: {}
+      envOverrides:
+        MY_ENV_VAR: "MY_VALUE"
+      replicas: 1
+----
+
+or per role:
+
+[source,yaml]
+----
+nameNodes:
+  envOverrides:
+    MY_ENV_VAR: "MY_VALUE"
+  roleGroups:
+    default:
+      config: {}
+      replicas: 1
+----
+
+IMPORTANT: Some environment variables will be overriden by the operator and cannot be set manually by the user. These are `HADOOP_HOME`, `HADOOP_CONF_DIR`, `POD_NAME` and `ZOOKEEPER`.
diff --git a/docs/modules/ROOT/pages/usage-guide/index.adoc b/docs/modules/ROOT/pages/usage-guide/index.adoc
@@ -0,0 +1,3 @@
+= Usage Guide
+
+This Section will help you to use and configure the Stackable Operator for Apache HDFS in various ways. You should already be familiar with how to set up a basic instance. Follow the xref:getting_started:index.adoc[] guide to learn how to set up a basic instance with all the required dependencies (for example ZooKeeper).
diff --git a/docs/modules/ROOT/pages/usage-guide/logging-log-aggregation.adoc b/docs/modules/ROOT/pages/usage-guide/logging-log-aggregation.adoc
@@ -0,0 +1,25 @@
+= Logging & log aggregation
+
+The logs can be forwarded to a Vector log aggregator by providing a discovery
+ConfigMap for the aggregator and by enabling the log agent:
+
+[source,yaml]
+----
+spec:
+  vectorAggregatorConfigMapName: vector-aggregator-discovery
+  nameNodes:
+    config:
+      logging:
+        enableVectorAgent: true
+  dataNodes:
+    config:
+      logging:
+        enableVectorAgent: true
+  journalNodes:
+    config:
+      logging:
+        enableVectorAgent: true
+----
+
+Further information on how to configure logging, can be found in
+xref:home:concepts:logging.adoc[].
diff --git a/docs/modules/ROOT/pages/usage-guide/monitoring.adoc b/docs/modules/ROOT/pages/usage-guide/monitoring.adoc
@@ -0,0 +1,9 @@
+= Monitoring
+
+The cluster can be monitored with Prometheus from inside or outside the K8S cluster.
+
+All services (with the exception of the Zookeeper daemon on the node names) run with the JMX exporter agent enabled and expose metrics on the `metrics` port. This port is available from the container level up to the NodePort services.
+
+The metrics endpoints are also used as liveliness probes by K8S.
+
+See xref:home:operators:monitoring.adoc[] for more details.
diff --git a/docs/modules/ROOT/pages/usage-guide/resources.adoc b/docs/modules/ROOT/pages/usage-guide/resources.adoc
@@ -0,0 +1,87 @@
+= Resources
+
+== Storage for data volumes
+
+You can mount volumes where data is stored by specifying https://kubernetes.io/docs/concepts/storage/persistent-volumes[PersistentVolumeClaims] for each individual role group:
+
+[source,yaml]
+----
+dataNodes:
+  roleGroups:
+    default:
+      config:
+        resources:
+          storage:
+            data:
+              capacity: 128Gi
+----
+
+In the above example, all DataNodes in the default group will store data (the location of `dfs.datanode.name.dir`) on a `128Gi` volume.
+
+By default, in case nothing is configured in the custom resource for a certain role group, each Pod will have a `5Gi` large volume mount for the data location.
+
+=== Multiple storage volumes
+
+Datanodes can have multiple disks attached to increase the storage size as well as speed.
+They can be of different type, e.g. HDDs or SSDs.
+
+You can configure multiple https://kubernetes.io/docs/concepts/storage/persistent-volumes/#persistentvolumeclaims[PersistentVolumeClaims] (PVCs) for the datanodes as follows:
+
+[source,yaml]
+----
+dataNodes:
+  roleGroups:
+    default:
+      config:
+        resources:
+          storage:
+            data: # We need to overwrite the data pvcs coming from the default value
+              count: 0
+            my-disks:
+              count: 3
+              capacity: 12Ti
+              hdfsStorageType: Disk
+            my-ssds:
+              count: 2
+              capacity: 5Ti
+              storageClass: premium-ssd
+              hdfsStorageType: SSD
+----
+
+This will create the following PVCs:
+
+1. `my-disks-hdfs-datanode-default-0` (12Ti)
+2. `my-disks-1-hdfs-datanode-default-0` (12Ti)
+3. `my-disks-2-hdfs-datanode-default-0` (12Ti)
+4. `my-ssds-hdfs-datanode-default-0` (5Ti)
+5. `my-ssds-1-hdfs-datanode-default-0` (5Ti)
+
+By configuring and using a dedicated https://kubernetes.io/docs/concepts/storage/storage-classes/[StorageClass] you can configure your HDFS to use local disks attached to Kubernetes nodes.
+
+[NOTE]
+====
+You might need to re-create the StatefulSet to apply the new PVC configuration because of https://github.com/kubernetes/kubernetes/issues/68737[this Kubernetes issue].
+You can delete the StatefulSet using `kubectl delete sts --cascade=false <statefulset>`.
+The hdfs-operator will re-create the StatefulSet automatically.
+====
+
+== Resource Requests
+
+include::home:concepts:stackable_resource_requests.adoc[]
+
+If no resource requests are configured explicitly, the HDFS operator uses the following defaults:
+
+[source,yaml]
+----
+dataNodes:
+  roleGroups:
+    default:
+      config:
+        resources:
+          cpu:
+            max: '4'
+            min: '100m'
+          storage:
+            data:
+              capacity: 2Gi
+----
diff --git a/docs/modules/ROOT/pages/usage-guide/scaling.adoc b/docs/modules/ROOT/pages/usage-guide/scaling.adoc
@@ -0,0 +1,3 @@
+= Scaling
+
+When scaling namenodes up, make sure to increase the replica count only by one and not more nodes at once.
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,3 @@
		= Usage Guide

		This Section will help you to use and configure the Stackable Operator for Apache HDFS in various ways. You should already be familiar with how to set up a basic instance. Follow the xref:getting_started:index.adoc[] guide to learn how to set up a basic instance with all the required dependencies (for example ZooKeeper).
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,3 @@
		= Scaling

		When scaling namenodes up, make sure to increase the replica count only by one and not more nodes at once.