Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FileHiveMetastore example/documentation? #11943

Closed
ankon opened this issue Nov 16, 2018 · 3 comments
Closed

FileHiveMetastore example/documentation? #11943

ankon opened this issue Nov 16, 2018 · 3 comments

Comments

@ankon
Copy link

ankon commented Nov 16, 2018

I'm trying to use presto to query data in ORC files produced by pinterest/secor, stored in S3. My environment doesn't have any Hadoop/Hive setup, rather we use Kafka directly. Using presto against Kafka isn't workable (too slow, and would require infinite retention of the data inside Kafka). My table descriptions are produced automatically based on application data, and I want my users to only interact with presto.

I got things to work with Hive 3.1.1, but the requirements for running a Hive metastore are "scary" (no knowledge otherwise, so this adds quite some operational complexity).
I saw in the source code that there is a FileHiveMetastore which seems like it would remove the need to configure and run an actual Hive metastore server, and it should be easy enough to convert my application knowledge into suitable schemas.

What I would need for that is basically some form of example of how to describe a table for this meta-store, or ideally some pointers to documentation.

My table definition right now looks like this:

create external table if not exists events (`type` string,`_message` string,`_meta` struct<`id`:string,...,`timestamp`:string>) partitioned by (`dt` string) stored as orc location 's3a://bucket/raw_logs/secor_backup/events/';
msck repair table events;
@ryanrupp
Copy link
Contributor

Since you're using AWS, Presto can be configured to use AWS Glue as a Hive Metastore. This is specific to using EMR but the general idea is the same, see here.

You can launch an EMR Presto with the checkbox enabled to use Glue as the Hive metastore to see how this ends up being configured on the EC2 instances it uses. If you're not using EMR it should essentially just be configuring Presto's hive.properties to use Glue e.g.:

hive.metastore = glue

and using an EC2 instance profile that can read from AWS Glue.

Also, if you use AWS Athena (managed Presto) you can issue create external table statements through it and it delegates those to the Glue metastore. So you could just do this in the AWS Athena console to get started. You can also query the data then via AWS Athena as it's Presto against S3 basically (doesn't have other connectors though e.g. Kafka). Depending on your use case that may be sufficient as alternative to running your own Presto. Running your own Presto on EC2 or via EMR would give your more control though over optimizing performance etc. where as Athena will be more convenient/potentially more cost effective depending on your query patterns.

@ankon
Copy link
Author

ankon commented Nov 20, 2018

Using Glue/Athena is actually an interesting idea, and I'll likely try that, thanks!

I think it could still be valuable though to have some form of examples for using the FileHiveMetaStore, so I'd like to keep this request active/open.

@findepi
Copy link
Contributor

findepi commented Nov 20, 2018

@ankon AFAIK, FileHiveMetaStore should be view as a helper class allowing us to run tests without external Hive Metastore process.
For example, com.facebook.presto.hive.metastore.file.FileHiveMetastore#isDatabaseOwner has some hard-coded logic that any user has owner privileges in default schema, which is OK in tests, but not OK in any real-life usage. For these reasons, we're not planning to document it, nor we encourage people to use this class directly.

@findepi findepi closed this as completed Nov 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants