Skip to content
This repository has been archived by the owner on Jan 31, 2024. It is now read-only.

Commit

Permalink
feat(data-catalog): Adding Hue
Browse files Browse the repository at this point in the history
  • Loading branch information
tumido committed Dec 9, 2020
1 parent bd82e90 commit adf80aa
Show file tree
Hide file tree
Showing 19 changed files with 768 additions and 0 deletions.
49 changes: 49 additions & 0 deletions hue/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# Cloudera Hue

Deploys the Cloudera Hue server allowing data exploration on Hive and S3 buckets.

Cloudera Hue is expected to be deployed along any HiveServer2 type of service. In Open Data Hub a [Spark SQL Thrift Server](../thriftserver) is used. Without Thrift Server deployment, Hue won't be able to fulfil any SQL queries. However it can still serve as an S3 browser.

### Folders

There is one main folder in the Hue component `hue` which contains the kustomize manifests.

### Installation

To install Hue add the following to the `kfctl` yaml file.

```yaml
- kustomizeConfig:
repoRef:
name: manifests
path: hue/hue
name: hue
```

### Overlays

Hue component provides a single overlay.

#### storage-class

Customizes Hue's database to use a specific `StorageClass` for PVC, see `storage_class` parameter.

### Parameters

There are 4 parameters exposed vie KFDef.

#### storage_class

Name of the storage class to be used for PVC created by Hue's database. This requires `storage-class` **overlay** to be enabled as well to work.

#### hue_secret_key

Set session store secret key for Hue web server.

#### s3_endpoint_url

HTTP endpoint exposed by your S3 object storage solution which will be made available to Hue as the default S3 filesystem location.

#### s3_credentials_secret

Along with `s3_endpoint_url`, this parameter configures the Hue's access to S3 object storage. Setting this parameter to any name of local Openshift/Kubernetes Secret resource name would allow Hue to consume S3 credentials from it. The secret of choice must contain `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` keys. Keep in mind, in order for this value to be respected by Spark cluster properly, it must use the same credentials. If not set, credentials from [`hue-sample-s3-secret`](hue/base/hue-sample-s3-secret.yaml) will be used instead.
68 changes: 68 additions & 0 deletions hue/hue/base/hive-site-xml-secret.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
---
apiVersion: v1
kind: Secret
metadata:
name: hue-hive-site-xml
type: Opaque
stringData:
hive-site.xml: |
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hive.server2.transport.mode</name>
<value>binary</value>
<description>Server transport mode. binary or http.</description>
</property>
<property>
<name>hive.server2.thrift.http.port</name>
<value>10000</value>
<description>Port number when in HTTP mode.</description>
</property>
<property>
<name>fs.s3a.aws.credentials.provider</name>
<value>com.amazonaws.auth.EnvironmentVariableCredentialsProvider</value>
<description>
Comma-separated class names of credential provider classes which implement
com.amazonaws.auth.AWSCredentialsProvider.
These are loaded and queried in sequence for a valid set of credentials.
Each listed class must implement one of the following means of
construction, which are attempted in order:
1. a public constructor accepting java.net.URI and
org.apache.hadoop.conf.Configuration,
2. a public static method named getInstance that accepts no
arguments and returns an instance of
com.amazonaws.auth.AWSCredentialsProvider, or
3. a public default constructor.
Specifying org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider allows
anonymous access to a publicly accessible S3 bucket without any credentials.
Please note that allowing anonymous access to an S3 bucket compromises
security and therefore is unsuitable for most use cases. It can be useful
for accessing public data sets without requiring AWS credentials.
If unspecified, then the default list of credential provider classes,
queried in sequence, is:
1. org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider:
Uses the values of fs.s3a.access.key and fs.s3a.secret.key.
2. com.amazonaws.auth.EnvironmentVariableCredentialsProvider: supports
configuration of AWS access key ID and secret access key in
environment variables named AWS_ACCESS_KEY_ID and
AWS_SECRET_ACCESS_KEY, as documented in the AWS SDK.
3. com.amazonaws.auth.InstanceProfileCredentialsProvider: supports use
of instance profile credentials if running in an EC2 VM.
</description>
</property>
<property>
<name>fs.s3a.endpoint</name>
<value>$(s3_endpoint_url)</value>
<description>AWS S3 endpoint to connect to. An up-to-date list is
provided in the AWS Documentation: regions and endpoints. Without this
property, the standard region (s3.amazonaws.com) is assumed.
</description>
</property>
</configuration>
210 changes: 210 additions & 0 deletions hue/hue/base/hue-ini-secret.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,210 @@
---
apiVersion: v1
kind: Secret
metadata:
name: hue-ini
type: Opaque
stringData:
hue.ini: |
# Full configuration:
# https://github.com/cloudera/hue/blob/master/desktop/conf.dist/hue.ini
[desktop]
# Hide unused apps
app_blacklist=impala,security,jobbrowser,jobsub,pig,hbase,sqoop,zookeeper,spark,oozie,search
secret_key=$(hue_secret_key)
http_host=0.0.0.0
http_port=8000
time_zone=America/Los_Angeles
django_debug_mode=false
dev=false
database_logging=false
send_dbug_messages=false
http_500_debug_mode=false
enable_prometheus=true
[[django_admins]]
[[custom]]
[[auth]]
[[ldap]]
[[[users]]]
[[[groups]]]
[[[ldap_servers]]]
[[vcs]]
[[database]]
engine=mysql
host=hue-mysql.$(namespace).svc
port=3306
user=$(database_user)
password=$(database_password)
name=$(database_name)
[[session]]
[[smtp]]
host=localhost
port=25
user=
password=
tls=no
[[knox]]
[[kerberos]]
[[oauth]]
[[oidc]]
[[metrics]]
[[tracing]]
[[task_server]]
[[gc_accounts]]
[[[default]]]
[notebook]
[[interpreters]]
[[[hive]]]
name=Hive
interface=hiveserver2
[[[impala]]]
name=Impala
interface=hiveserver2
[[[sparksql]]]
name=SparkSql
interface=hiveserver2
[[[text]]]
name=Text
interface=text
[[[markdown]]]
name=Markdown
interface=text
[dashboard]
is_enabled=true
has_sql_enabled=true
has_report_enabled=true
use_gridster=true
has_widget_filter=false
has_tree_widget=false
[[engines]]
[[[solr]]]
analytics=true
nesting=true
[[[sql]]]
analytics=true
nesting=false
[hadoop]
[[hdfs_clusters]]
[[[default]]]
[[yarn_clusters]]
[[[default]]]
[beeswax]
hive_server_host=thriftserver.$(namespace).svc
hive_server_port=10000
hive_conf_dir=/etc/hive/conf
thrift_version=7
[[ssl]]
[metastore]
enable_new_create_table=true
force_hs2_metadata=false
[impala]
[[ssl]]
[spark]
[oozie]
[filebrowser]
[pig]
[sqoop]
[proxy]
[hbase]
[search]
[libsolr]
[indexer]
[jobsub]
[jobbrowser]
[security]
[zookeeper]
[[clusters]]
[[[default]]]
[useradmin]
[[password_policy]]
[liboozie]
oozie_url=
[aws]
[[aws_accounts]]
[[[default]]]
host=$(s3_endpoint_url)
is_secure=$(s3_is_secure)
calling_format=boto.s3.connection.OrdinaryCallingFormat
access_key_id_script=/opt/hue/aws_access_key_id.sh
secret_access_key_script=/opt/hue/aws_secret_access_key.sh
[azure]
[[azure_accounts]]
[[[default]]]
[[adls_clusters]]
[[[default]]]
[[abfs_clusters]]
[[[default]]]
[libsentry]
[libzookeeper]
[librdbms]
[[databases]]
[libsaml]
[liboauth]
[kafka]
[[kafka]]
[metadata]
[[manager]]
[[optimizer]]
[[catalog]]
[[navigator]]
[[prometheus]]
11 changes: 11 additions & 0 deletions hue/hue/base/hue-mysql-pvc.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: hue-mysql
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: "1Gi"
10 changes: 10 additions & 0 deletions hue/hue/base/hue-mysql-secret.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
---
apiVersion: v1
kind: Secret
metadata:
name: hue-mysql
stringData:
database-user: datacatalog
database-password: datacatalog
database-name: datacatalog
database-root-password: root
24 changes: 24 additions & 0 deletions hue/hue/base/hue-mysql-service.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
---
kind: Service
apiVersion: v1
metadata:
name: hue-mysql
annotations:
template.openshift.io/expose-uri: |
'mysql://{.spec.clusterIP}:{.spec.ports[?(.name=="mysql")].port}'
spec:
ports:
- name: mysql
protocol: TCP
port: 3306
targetPort: 3306
- name: exporter
protocol: TCP
port: 9104
targetPort: 9104
selector:
deployment: hue-mysql
type: ClusterIP
sessionAffinity: None
status:
loadBalancer: {}
Loading

0 comments on commit adf80aa

Please sign in to comment.