Skip to content

Commit

Permalink
Allow enabling secure mode with Kerberos (#334)
Browse files Browse the repository at this point in the history
# Description

Closes #178
Fixes #338

TODOs

- [x] Release new Hadoop image with openssl and Kerberos clients use in docs and tests
- [x] Release and use operator-rs change
- [x] Fix hardcoded `kinit nn/simple-hdfs-namenode-default.default.svc.cluster.local@CLUSTER.LOCAL -kt /stackable/kerberos/keytab` in entrypoints
- [x] Go through all hadoop settings and see if they can be improved
- [X] Test different realms
- [x] Discuss CRD change
- [x] Discuss how to expose this in Discovery CM -> During on-site 2023/05 we have decided to ship this feature without exposing it via discovery *for now*
- [x] Implement discovery
- [x] Tests
- [x] Docs
- [x] Let  @maltesander have a look how we can better include the init container in the code structure
- [x] Test long running cluster (maybe turn down ticket lifetime for that)
  • Loading branch information
sbernauer committed Jun 14, 2023
1 parent 5d02c7f commit ef4d433
Show file tree
Hide file tree
Showing 35 changed files with 1,597 additions and 227 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Expand Up @@ -6,6 +6,7 @@ All notable changes to this project will be documented in this file.

### Added

- Add support for enabling secure mode with Kerberos ([#334]).
- Generate OLM bundle for Release 23.4.0 ([#350]).
- Missing CRD defaults for `status.conditions` field ([#354]).

Expand All @@ -16,6 +17,7 @@ All notable changes to this project will be documented in this file.
- Use testing-tools 0.2.0 ([#351])
- Run as root group ([#353]).

[#334]: https://github.com/stackabletech/hdfs-operator/pull/334
[#349]: https://github.com/stackabletech/hdfs-operator/pull/349
[#350]: https://github.com/stackabletech/hdfs-operator/pull/350
[#351]: https://github.com/stackabletech/hdfs-operator/pull/351
Expand Down
60 changes: 33 additions & 27 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 3 additions & 2 deletions Cargo.toml
Expand Up @@ -4,5 +4,6 @@ members = [
"rust/crd", "rust/operator", "rust/operator-binary"
]

#[patch."https://github.com/stackabletech/operator-rs.git"]
#stackable-operator = { git = "https://github.com/stackabletech//operator-rs.git", branch = "main" }
# [patch."https://github.com/stackabletech/operator-rs.git"]
# stackable-operator = { path = "/home/sbernauer/stackabletech/operator-rs" }
# stackable-operator = { git = "https://github.com/stackabletech//operator-rs.git", branch = "main" }
20 changes: 20 additions & 0 deletions deploy/helm/hdfs-operator/crds/crds.yaml
Expand Up @@ -26,6 +26,26 @@ spec:
properties:
clusterConfig:
properties:
authentication:
description: Configuration to set up a cluster secured using Kerberos.
nullable: true
properties:
kerberos:
description: Kerberos configuration
properties:
secretClass:
description: Name of the SecretClass providing the keytab for the HDFS services.
type: string
required:
- secretClass
type: object
tlsSecretClass:
default: tls
description: Name of the SecretClass providing the tls certificates for the WebUIs.
type: string
required:
- kerberos
type: object
autoFormatFs:
nullable: true
type: boolean
Expand Down
Binary file added docs/modules/hdfs/images/hdfs_webui_kerberos.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
78 changes: 78 additions & 0 deletions docs/modules/hdfs/pages/usage-guide/security.adoc
@@ -0,0 +1,78 @@
= Security

== Authentication
Currently the only supported authentication mechanism is Kerberos, which is disabled by default.
For Kerberos to work a Kerberos KDC is needed, which the users needs to provide.
The xref:home:secret-operator:secretclass.adoc#backend-kerberoskeytab[secret-operator documentation] states which kind of Kerberos servers are supported and how they can be configured.

IMPORTANT: Kerberos is supported staring from HDFS version 3.3.x

=== 1. Prepare Kerberos server
To configure HDFS to use Kerberos you first need to collect information about your Kerberos server, e.g. hostname and port.
Additionally you need a service-user, which the secret-operator uses to create create principals for the HDFS services.

=== 2. Create Kerberos SecretClass
Afterwards you need to enter all the needed information into a SecretClass, as described in xref:home:secret-operator:secretclass.adoc#backend-kerberoskeytab[secret-operator documentation].
The following guide assumes you have named your SecretClass `kerberos-hdfs`.

=== 3. Configure HDFS to use SecretClass
The last step is to configure your HdfsCluster to use the newly created SecretClass.

[source,yaml]
----
spec:
clusterConfig:
authentication:
tlsSecretClass: tls # Optional, defaults to "tls"
kerberos:
secretClass: kerberos-hdfs # Put your SecretClass name in here
----

The `kerberos.secretClass` is used to give HDFS the possibility to request keytabs from the secret-operator.

The `tlsSecretClass` is needed to request TLS certificates, used e.g. for the Web UIs.


=== 4. Verify that Kerberos is used
Use `stackablectl services list --all-namespaces` to get the endpoints where the HDFS namenodes are reachable.
Open the link (note that the namenode is now using https).
You should see a Web UI similar to the following:

image:hdfs_webui_kerberos.png[]

The important part is

> Security is on.

You can also shell into the namenode and try to access the file system:
`kubectl exec -it hdfs-namenode-default-0 -c namenode -- bash -c 'kdestroy && bin/hdfs dfs -ls /'`

You should get the error message `org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]`.

=== 5. Access HDFS
In case you want to access your HDFS it is recommended to start up a client Pod that connects to HDFS, rather than shelling into the namenode.
We have an https://github.com/stackabletech/hdfs-operator/blob/main/tests/templates/kuttl/kerberos/20-access-hdfs.yaml.j2[integration test] for this exact purpose, where you can see how to connect and get a valid keytab.

== Authorization
We currently don't support authorization yet.
In the future support will be added by writing an opa-authorizer to match our general xref:home:concepts:opa.adoc[] mechanisms.

In the meantime a very basic level of authorization can be reached by using `configOverrides` to set the `hadoop.user.group.static.mapping.overrides` property.
In thew following example the `dr.who=;nn=;nm=;jn=;` part is needed for HDFS internal operations and the user `testuser` is granted admin permissions.

[source,yaml]
----
spec:
nameNodes:
configOverrides: &configOverrides
core-site.xml:
hadoop.user.group.static.mapping.overrides: "dr.who=;nn=;nm=;jn=;testuser=supergroup;"
dataNodes:
configOverrides: *configOverrides
journalNodes:
configOverrides: *configOverrides
----

== Wire encryption
In case kerberos is enabled, `Privacy` mode is used for best security.
Wire encryption without kerberos as well as https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SecureMode.html#Data_confidentiality[other wire encryption modes] are *not* supported.
1 change: 1 addition & 0 deletions docs/modules/hdfs/partials/nav.adoc
Expand Up @@ -7,6 +7,7 @@
* xref:hdfs:usage-guide/index.adoc[]
** xref:hdfs:usage-guide/resources.adoc[]
** xref:hdfs:usage-guide/logging-log-aggregation.adoc[]
** xref:hdfs:usage-guide/security.adoc[]
** xref:hdfs:usage-guide/monitoring.adoc[]
** xref:hdfs:usage-guide/scaling.adoc[]
** xref:hdfs:usage-guide/configuration-environment-overrides.adoc[]
Expand Down
6 changes: 6 additions & 0 deletions rust/crd/src/constants.rs
Expand Up @@ -12,6 +12,9 @@ pub const LABEL_STS_POD_NAME: &str = "statefulset.kubernetes.io/pod-name";

pub const HDFS_SITE_XML: &str = "hdfs-site.xml";
pub const CORE_SITE_XML: &str = "core-site.xml";
pub const HADOOP_POLICY_XML: &str = "hadoop-policy.xml";
pub const SSL_SERVER_XML: &str = "ssl-server.xml";
pub const SSL_CLIENT_XML: &str = "ssl-client.xml";
pub const LOG4J_PROPERTIES: &str = "log4j.properties";

pub const SERVICE_PORT_NAME_RPC: &str = "rpc";
Expand All @@ -23,10 +26,12 @@ pub const SERVICE_PORT_NAME_METRICS: &str = "metrics";

pub const DEFAULT_NAME_NODE_METRICS_PORT: u16 = 8183;
pub const DEFAULT_NAME_NODE_HTTP_PORT: u16 = 9870;
pub const DEFAULT_NAME_NODE_HTTPS_PORT: u16 = 9871;
pub const DEFAULT_NAME_NODE_RPC_PORT: u16 = 8020;

pub const DEFAULT_DATA_NODE_METRICS_PORT: u16 = 8082;
pub const DEFAULT_DATA_NODE_HTTP_PORT: u16 = 9864;
pub const DEFAULT_DATA_NODE_HTTPS_PORT: u16 = 9865;
pub const DEFAULT_DATA_NODE_DATA_PORT: u16 = 9866;
pub const DEFAULT_DATA_NODE_IPC_PORT: u16 = 9867;

Expand All @@ -40,6 +45,7 @@ pub const DFS_NAMENODE_NAME_DIR: &str = "dfs.namenode.name.dir";
pub const DFS_NAMENODE_SHARED_EDITS_DIR: &str = "dfs.namenode.shared.edits.dir";
pub const DFS_NAMENODE_RPC_ADDRESS: &str = "dfs.namenode.rpc-address";
pub const DFS_NAMENODE_HTTP_ADDRESS: &str = "dfs.namenode.http-address";
pub const DFS_NAMENODE_HTTPS_ADDRESS: &str = "dfs.namenode.https-address";
pub const DFS_DATANODE_DATA_DIR: &str = "dfs.datanode.data.dir";
pub const DFS_JOURNALNODE_EDITS_DIR: &str = "dfs.journalnode.edits.dir";
pub const DFS_JOURNALNODE_RPC_ADDRESS: &str = "dfs.journalnode.rpc-address";
Expand Down

0 comments on commit ef4d433

Please sign in to comment.