Skip to content
This repository has been archived by the owner on Jan 31, 2024. It is now read-only.

feat(data-catalog): Adding Hue #227

Merged
merged 1 commit into from
Dec 15, 2020

Conversation

tumido
Copy link
Contributor

@tumido tumido commented Nov 3, 2020

Adding Hue component Data catalog.

Part of: https://github.com/opendatahub-io/odh-manifests/issues/222, https://github.com/opendatahub-io/odh-manifests/issues/105, DATAHUB-2294

Based on reference implementation in AICoE#27 for Internal DH.

It will need further cleanup and extraction of parts into overlays later on (storage class, externalize the database). This is expected to happen in consecutive PRs.

@openshift-ci-robot
Copy link
Collaborator

Hi @tumido. Thanks for your PR.

I'm waiting for a opendatahub-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tumido tumido changed the title [wip] feat(data-catalog): Adding Hue feat(data-catalog): Adding Hue Nov 3, 2020
@tumido
Copy link
Contributor Author

tumido commented Nov 4, 2020

cc @maulikjs, @rimolive, @accorvin

@tumido
Copy link
Contributor Author

tumido commented Nov 5, 2020

/retest

@tumido
Copy link
Contributor Author

tumido commented Nov 10, 2020

/retest

test failures not related to the code change

@anishasthana
Copy link
Member

/retest

@tumido
Copy link
Contributor Author

tumido commented Nov 10, 2020

/retest

hue/hue/base/hive-site-xml-secret.yaml Outdated Show resolved Hide resolved
hue/hue/base/hive-site-xml-secret.yaml Outdated Show resolved Hide resolved
hue/hue/base/hue-ini-secret.yaml Outdated Show resolved Hide resolved
hue/hue/base/hue-ini-secret.yaml Outdated Show resolved Hide resolved
hue/hue/base/params.env Outdated Show resolved Hide resolved
hue/hue/base/hue-ini-secret.yaml Outdated Show resolved Hide resolved
@tumido tumido force-pushed the datacatalog-hue branch 2 times, most recently from bfb99e2 to 3834da7 Compare November 12, 2020 10:11
hue/hue/base/hue-dc.yaml Outdated Show resolved Hide resolved
hue/hue/base/hue-dc.yaml Outdated Show resolved Hide resolved
- ReadWriteOnce
resources:
requests:
storage: "1Gi"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a very little understanding about what the DB is actually used for - do you know if 1Gi is a reasonable size?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used to store tables and metadata, we have been running 1 Gi in prod for some time now and it hasn't complained.

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same answer as at the Thrift server PR. I've used values based on the internal data hub team expertise. 🙂

Hue stores mainly it's "user" data and preferences, (it's an old school Django app), query history, sample data cache for individual tables, data autocomplete and hive metadata cache, etc... In our experience this value is enough, though I don't really know how much lower we can go.

@@ -0,0 +1,47 @@
# Cloudera Hue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this does not work without thrifserver, right? If so, can you mention it in the readme and maybe even as a comment in KFDef?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can work without thriftserver, but you'd need to connect it to something else (yarn, Hive directly, etc) via modifying the hue.ini. Or, you can use it as an S3 explorer. Which is a standalone functionality. In our scenario and set up it requires Thrift server to be available. I'll explain it in the readme. 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a paragraph in there. WDYT? 🙂

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This clarification looks good to me.

hue/hue/base/params.yaml Outdated Show resolved Hide resolved
Copy link
Contributor

@vpavlin vpavlin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looks good if the comments I added get fixed!

Thanks!

hue/README.md Show resolved Hide resolved
@LaVLaS
Copy link
Member

LaVLaS commented Dec 11, 2020

I only tested the S3 browser and it worked perfectly to view & upload files.

Copy link
Contributor

@crobby crobby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is working well for me. I'm not a hue expert, so I'm sure I haven't exercised it much, but I approve.

Copy link
Member

@LaVLaS LaVLaS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

I tested the s3 browser with no issues read/writing files to ceph-nano.
I tested Hive SQL with the thriftserver component in added in #228 using the opendatahub.io data exploration tutorial without any major issues

@openshift-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: crobby, LaVLaS, tumido

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit bbe2b59 into opendatahub-io:master Dec 15, 2020
dlabaj pushed a commit to dlabaj/odh-manifests that referenced this pull request Sep 21, 2022
Update anaconda-ce-validator cronjob API level
Jooho pushed a commit to Jooho/odh-manifests that referenced this pull request May 16, 2023
…ndatahub-io#227)

#### Motivation

It may be useful for some built-in runtime adapters to have the model server's inferencing endpoint information in addition to the existing "management" port number that's passed (in case it is different).

#### Modifications

- Set a new `RUNTIME_DATA_ENDPOINT` env var on the built-in adapter container
- Parse the `ADAPTER_PORT` env var value from the ServingRuntime `grpcEndpoint` field instead of hardcoding to 8085


Signed-off-by: Nick Hill <nickhill@us.ibm.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
10 participants