Skip to content
Closed
8 changes: 6 additions & 2 deletions docs/modules/ROOT/nav.adoc
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
* xref:configuration.adoc[]
* xref:usage.adoc[]
* xref:implementation.adoc[]
* Concepts
** xref:discovery.adoc[]
* xref:usage-guide/index.adoc[]
** xref:usage-guide/resources.adoc[]
** xref:usage-guide/logging-log-aggregation.adoc[]
** xref:usage-guide/monitoring.adoc[]
** xref:usage-guide/scaling.adoc[]
** xref:usage-guide/configuration-environment-overrides.adoc[]
32 changes: 0 additions & 32 deletions docs/modules/ROOT/pages/implementation.adoc

This file was deleted.

32 changes: 28 additions & 4 deletions docs/modules/ROOT/pages/index.adoc
Original file line number Diff line number Diff line change
@@ -1,18 +1,42 @@
= Stackable Operator for Apache HDFS

This is an operator for Kubernetes that can manage https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html[Apache HDFS] clusters.
The Stackable Operator for https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html[Apache HDFS] is used to set up HFDS in high-availability mode. It depends on the xref:zookeeper:ROOT:index.adoc[] to operate a ZooKeeper cluster to coordinate the active and standby NameNodes.

WARNING: This operator only works with images from the https://repo.stackable.tech/#browse/browse:docker:v2%2Fstackable%2Fhadoop[Stackable] repository
NOTE: This operator only works with images from the https://repo.stackable.tech/#browse/browse:docker:v2%2Fstackable%2Fhadoop[Stackable] repository

== Roles

Three xref:home:concepts:roles-and-role-groups.adoc[roles] of the HDFS cluster are implemented:

* DataNode - responsible for storing the actual data.
* JournalNode - responsible for keeping track of HDFS blocks and used to perform failovers in case the active NameNode fails. For details see: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html
* NameNode - responsible for keeping track of HDFS blocks and providing access to the data.

== Kubernetes objects

The operator creates the following K8S objects per role group defined in the custom resource.

* Service - ClusterIP used for intra-cluster communication.
* ConfigMap - HDFS configuration files like `core-site.xml`, `hdfs-site.xml` and `log4j.properties` are defined here and mounted in the pods.
* StatefulSet - where the replica count, volume mounts and more for each role group is defined.

In addition, a `NodePort` service is created for each pod labeled with `hdfs.stackable.tech/pod-service=true` that exposes all container ports to the outside world (from the perspective of K8S).

In the custom resource you can specify the number of replicas per role group (NameNode, DataNode or JournalNode). A minimal working configuration requires:

* 2 NameNodes (HA)
* 1 JournalNode
* 1 DataNode (should match at least the `dfsReplication` factor)

== Supported Versions

The Stackable Operator for Apache HDFS currently supports the following versions of HDFS:

include::partial$supported-versions.adoc[]

== Docker
== Docker image

[source]
----
docker pull docker.stackable.tech/stackable/hadoop:<version>
----
----
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@

= Configuration & Environment Overrides

The cluster definition also supports overriding configuration properties and environment variables, either per role or per role group, where the more specific override (role group) has precedence over the less specific one (role).

IMPORTANT: Overriding certain properties can lead to faulty clusters. In general this means, do not change ports, hostnames or properties related to data dirs, high-availability or security.

== Configuration Properties

For a role or role group, at the same level of `config`, you can specify `configOverrides` for the `hdfs-site.xml` and `core-site.xml`. For example, if you want to set additional properties on the namenode servers, adapt the `nameNodes` section of the cluster resource like so:

[source,yaml]
----
nameNodes:
roleGroups:
default:
config: [...]
configOverrides:
core-site.xml:
fs.trash.interval: "5"
hdfs-site.xml:
dfs.namenode.num.checkpoints.retained: "3"
replicas: 2
----

Just as for the `config`, it is possible to specify this at role level as well:

[source,yaml]
----
nameNodes:
configOverrides:
core-site.xml:
fs.trash.interval: "5"
hdfs-site.xml:
dfs.namenode.num.checkpoints.retained: "3"
roleGroups:
default:
config: [...]
replicas: 2
----

All override property values must be strings. The properties will be formatted and escaped correctly into the XML file.

For a full list of configuration options we refer to the Apache Hdfs documentation for https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml[hdfs-site.xml] and https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/core-default.xml[core-site.xml]


== Environment Variables

In a similar fashion, environment variables can be (over)written. For example per role group:

[source,yaml]
----
nameNodes:
roleGroups:
default:
config: {}
envOverrides:
MY_ENV_VAR: "MY_VALUE"
replicas: 1
----

or per role:

[source,yaml]
----
nameNodes:
envOverrides:
MY_ENV_VAR: "MY_VALUE"
roleGroups:
default:
config: {}
replicas: 1
----

IMPORTANT: Some environment variables will be overriden by the operator and cannot be set manually by the user. These are `HADOOP_HOME`, `HADOOP_CONF_DIR`, `POD_NAME` and `ZOOKEEPER`.
3 changes: 3 additions & 0 deletions docs/modules/ROOT/pages/usage-guide/index.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
= Usage Guide

This Section will help you to use and configure the Stackable Operator for Apache HDFS in various ways. You should already be familiar with how to set up a basic instance. Follow the xref:getting_started:index.adoc[] guide to learn how to set up a basic instance with all the required dependencies (for example ZooKeeper).
25 changes: 25 additions & 0 deletions docs/modules/ROOT/pages/usage-guide/logging-log-aggregation.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
= Logging & log aggregation

The logs can be forwarded to a Vector log aggregator by providing a discovery
ConfigMap for the aggregator and by enabling the log agent:

[source,yaml]
----
spec:
vectorAggregatorConfigMapName: vector-aggregator-discovery
nameNodes:
config:
logging:
enableVectorAgent: true
dataNodes:
config:
logging:
enableVectorAgent: true
journalNodes:
config:
logging:
enableVectorAgent: true
----

Further information on how to configure logging, can be found in
xref:home:concepts:logging.adoc[].
9 changes: 9 additions & 0 deletions docs/modules/ROOT/pages/usage-guide/monitoring.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
= Monitoring

The cluster can be monitored with Prometheus from inside or outside the K8S cluster.

All services (with the exception of the Zookeeper daemon on the node names) run with the JMX exporter agent enabled and expose metrics on the `metrics` port. This port is available from the container level up to the NodePort services.

The metrics endpoints are also used as liveliness probes by K8S.

See xref:home:operators:monitoring.adoc[] for more details.
87 changes: 87 additions & 0 deletions docs/modules/ROOT/pages/usage-guide/resources.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
= Resources

== Storage for data volumes

You can mount volumes where data is stored by specifying https://kubernetes.io/docs/concepts/storage/persistent-volumes[PersistentVolumeClaims] for each individual role group:

[source,yaml]
----
dataNodes:
roleGroups:
default:
config:
resources:
storage:
data:
capacity: 128Gi
----

In the above example, all DataNodes in the default group will store data (the location of `dfs.datanode.name.dir`) on a `128Gi` volume.

By default, in case nothing is configured in the custom resource for a certain role group, each Pod will have a `5Gi` large volume mount for the data location.

=== Multiple storage volumes

Datanodes can have multiple disks attached to increase the storage size as well as speed.
They can be of different type, e.g. HDDs or SSDs.

You can configure multiple https://kubernetes.io/docs/concepts/storage/persistent-volumes/#persistentvolumeclaims[PersistentVolumeClaims] (PVCs) for the datanodes as follows:

[source,yaml]
----
dataNodes:
roleGroups:
default:
config:
resources:
storage:
data: # We need to overwrite the data pvcs coming from the default value
count: 0
my-disks:
count: 3
capacity: 12Ti
hdfsStorageType: Disk
my-ssds:
count: 2
capacity: 5Ti
storageClass: premium-ssd
hdfsStorageType: SSD
----

This will create the following PVCs:

1. `my-disks-hdfs-datanode-default-0` (12Ti)
2. `my-disks-1-hdfs-datanode-default-0` (12Ti)
3. `my-disks-2-hdfs-datanode-default-0` (12Ti)
4. `my-ssds-hdfs-datanode-default-0` (5Ti)
5. `my-ssds-1-hdfs-datanode-default-0` (5Ti)

By configuring and using a dedicated https://kubernetes.io/docs/concepts/storage/storage-classes/[StorageClass] you can configure your HDFS to use local disks attached to Kubernetes nodes.

[NOTE]
====
You might need to re-create the StatefulSet to apply the new PVC configuration because of https://github.com/kubernetes/kubernetes/issues/68737[this Kubernetes issue].
You can delete the StatefulSet using `kubectl delete sts --cascade=false <statefulset>`.
The hdfs-operator will re-create the StatefulSet automatically.
====

== Resource Requests

include::home:concepts:stackable_resource_requests.adoc[]

If no resource requests are configured explicitly, the HDFS operator uses the following defaults:

[source,yaml]
----
dataNodes:
roleGroups:
default:
config:
resources:
cpu:
max: '4'
min: '100m'
storage:
data:
capacity: 2Gi
----
3 changes: 3 additions & 0 deletions docs/modules/ROOT/pages/usage-guide/scaling.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
= Scaling

When scaling namenodes up, make sure to increase the replica count only by one and not more nodes at once.
Loading