Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot create tables in Hue #131

Closed
hemajv opened this issue Mar 17, 2021 · 10 comments
Closed

Cannot create tables in Hue #131

hemajv opened this issue Mar 17, 2021 · 10 comments
Assignees
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@hemajv
Copy link
Member

hemajv commented Mar 17, 2021

Describe the bug
I have data stored in the black-flake ceph bucket and I am trying to create a table for it in Hue so that I can visualize the data using Superset.

To Reproduce
Steps to reproduce the behavior:

  1. Go to https://hue-opf-datacatalog.apps.zero.massopen.cloud/
  2. Login generic_user:operatefirst
  3. Try to create a table by executing the following query:
CREATE EXTERNAL TABLE IF NOT EXISTS ocp_ci_analysis.flakes(
timstamp TIMESTAMP,
tab STRING,
grid STRING,
test STRING,
flake BOOLEAN
)
STORED AS PARQUET
LOCATION
's3a://<access key>:<secret key>@black-flake/metrics/flake';
  1. See error:
org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:com.amazonaws.AmazonClientException: Unable to execute HTTP request: black-flake.s3.openshift-storage.svc: Name or service not known);

Expected behaviour
The table should be successfully created with the parquet file contents loaded into it.

Is Hue setup to connect to Ceph?

@hemajv
Copy link
Member Author

hemajv commented Mar 17, 2021

cc @MichaelClifford @4n4nd @Shreyanand

@hemajv
Copy link
Member Author

hemajv commented Mar 17, 2021

(as per conversation in chat)
seems like (by default) the opf-datacatalog bucket has been configured to Hue, however I stored a CSV file into this bucket and tried to create a table for it in Hue and I still end up with the same error:

org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:com.amazonaws.AmazonClientException: Unable to execute HTTP request: opf-datacatalog.s3.openshift-storage.svc: Name or service not known);

@tumido any idea what the issue might be?

also, since both the black-flake and opf-datacatalog buckets have different access/secret keys, is it possible to configure multiple buckets to be used in Hue?

@tumido
Copy link
Member

tumido commented Mar 18, 2021

Yes, this is duplicate of: #117

The problem is with s3.openshift-storage.svc not being recognised as a proper hostname by boto in Hue and Thriftserver not using 433 port. I'll experiment with the external route on this, but that one was problematic in Argo due to some SSL errors. I'll look into it.

@tumido
Copy link
Member

tumido commented Mar 18, 2021

seems like (by default) the opf-datacatalog bucket has been configured to Hue, however I stored a CSV file into this bucket and tried to create a table for it in Hue and I still end up with the same error:

default bucket doesn't change anything. This is a hostname/port problem. And OpenShift Container Storage is not helping us here.

also, since both the black-flake and opf-datacatalog buckets have different access/secret keys, is it possible to configure multiple buckets to be used in Hue?

Of course they can! They must be available on the same S3 cluster though - and the connection to S3 is the problem here. Maybe we can even make Hue/Hive connect to multiple Ceph endpoints? I don't know, we can also try that...

@tumido
Copy link
Member

tumido commented Mar 29, 2021

So.. after quite some time on this I've managed to fix a sibling issue #117 while this one is still persistent. I need to raise this one back to upstream. I can't make Thriftserver to connect to Openshift Container Storage properly.

@hemajv
Copy link
Member Author

hemajv commented Mar 29, 2021

ack, @tumido would you happen to know if there is any workaround we could look into meanwhile such as manually attaching the table to the superset/hue pod somehow?

@tumido
Copy link
Member

tumido commented Mar 29, 2021

Nope I don't know about any workaround as of now. Maybe @rimolive would be able to help...

In general, if you want to work with Hue or Superset, all the tables have to be loaded and their metadata stored in Hive. And currently Thriftserver is the only interface for Hive we've got. 😞

@sesheta
Copy link
Member

sesheta commented Oct 13, 2021

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@sesheta sesheta added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 13, 2021
@HumairAK
Copy link
Member

No one is using hue atm so I think this is no longer relevant.

/close

@sesheta
Copy link
Member

sesheta commented Oct 14, 2021

@HumairAK: Closing this issue.

In response to this:

No one is using hue atm so I think this is no longer relevant.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sesheta sesheta closed this as completed Oct 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

4 participants