Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Hive #41

Closed
rstrickland opened this issue Nov 9, 2015 · 12 comments
Closed

Support Hive #41

rstrickland opened this issue Nov 9, 2015 · 12 comments

Comments

@rstrickland
Copy link

I really like this concept, but it's critical for us to be able to create permanent tables via a Hive metastore. Would it be possible to support Hive at some point?

@velvia
Copy link
Member

velvia commented Nov 9, 2015

@rstrickland welcome! Would you mind painting a picture of your setup? Hive metastore is part of your Hadoop installation? Or DSE? I don't quite understand - table meta stored in Hive, but actual tables stored in C*? Thanks!

@rstrickland
Copy link
Author

We use a centralized Hive metastore that's shared by multiple Spark and EMR clusters that all serve different purposes. The data is stored in multiple places, including Cassandra. However, with the lack of a good open source Cassandra-Hive driver we've had to resort to creating temp tables every time we want to get at Cassandra data. It would be awesome if Filo supported legitimate Hive tables so we could bypass this step.

@velvia
Copy link
Member

velvia commented Nov 10, 2015

Got it. Would it be fine to do this through Spark — i.e. you cannot use the filo-cli but use Spark API to create tables etc. (since Spark already can connect to the Hive metastore)

On Nov 10, 2015, at 10:09 AM, Robbie Strickland notifications@github.com wrote:

We use a centralized Hive metastore that's shared by multiple Spark and EMR clusters that all serve different purposes. The data is stored in multiple places, including C. However, with the lack of a good open source C-Hive driver we've had to resort to creating temp tables every time we want to get at C* data. It would be awesome if Filo supported legitimate Hive tables so we could bypass this step.


Reply to this email directly or view it on GitHub #41 (comment).

@rstrickland
Copy link
Author

We could, but we do have BI tools that use Hive proper (i.e. not the Spark SQL thrift server). Ideally it would be great if that would work as well, but I know that's a bigger effort.

@velvia
Copy link
Member

velvia commented Nov 10, 2015

Okay, I think I understand now. You are looking for a proper FiloDB driver for HIVE that lets you query FiloDB from Hive itself. Understood now.

On Nov 10, 2015, at 1:09 PM, Robbie Strickland notifications@github.com wrote:

We could, but we do have BI tools that use Hive proper (i.e. not the Spark SQL thrift server). Ideally it would be great if that would work as well, but I know that's a bigger effort.


Reply to this email directly or view it on GitHub #41 (comment).

@velvia
Copy link
Member

velvia commented Dec 17, 2015

@rstrickland ok so to break this up into two steps:

  1. Have a Hive driver (like DSE's) that automatically lets you query tables from Spark without having to do a CREATE EXTERNAL TABLE
  2. Actually support queries directly from Hive without Spark. Hmmmm.... I think this involves yucky input formats and Hive SerDes, etc.

@velvia
Copy link
Member

velvia commented Jan 10, 2016

Ok, scoped out the work for Hive metastore support of FiloDB tables for querying in Spark. Spark has a HiveMetadataCatalog class which has a createDataSourceTable method. So one possibility is that when the FiloDB daemon/library spins up, it automatically resolves differences between FiloDB tables and the Hive catalog. Other times this sync could in theory happen is when a user requests tables or schema, but this would then require a custom Hive plugin in Spark. Need to think about how to automate the syncing.

@rstrickland it appears in Hive you either have to register a table as Hive-supported (i.e. using Hadoop INputFormats) or non-Hive supported (for Spark datasources, for example). Thus there might need to be some hack for namespacing the tables. What do you think?

@rstrickland
Copy link
Author

So are you suggesting creating temp tables on startup that would be accessible via Spark only? I think the only way to get bona fide Hive support is to create a Hive SerDe. But your solution (if I'm understanding correctly) would solve the issue of having to recreate temp tables on startup for Spark or Spark SQL jobs, but would not allow for Hive queries via a BI tool.

@velvia
Copy link
Member

velvia commented Jan 10, 2016

@rstrickland you are right, the above proposed solution would not enable true Hive-only queries, though you can still connect BI tools to Spark SQL / Thrift server via the JDBC/ODBC drivers. The SerDe/InputFormats required for true Hive-only operation would come as a second step.

Would you guys be willing to test out the Spark-only solution, before the full Hive solution comes? What would the timeframe look like?

@rstrickland
Copy link
Author

We would definitely test it whenever it's ready.

On Sunday, January 10, 2016, Evan Chan notifications@github.com wrote:

@rstrickland https://github.com/rstrickland you are right, the above
proposed solution would not enable true Hive-only queries, though you can
still connect BI tools to Spark SQL / Thrift server via the JDBC/ODBC
drivers. The SerDe/InputFormats required for true Hive-only operation would
come as a second step.

Would you guys be willing to test out the Spark-only solution, before the
full Hive solution comes? What would the timeframe look like?


Reply to this email directly or view it on GitHub
#41 (comment).

@velvia
Copy link
Member

velvia commented Feb 19, 2016

@rstrickland check out #63
and LMK if this is roughly what you guys are looking for as a first step. Would like some feedback first.

Thanks!

velvia added a commit that referenced this issue Mar 2, 2016
…ue-41

Semi-automatic Hive Metastore sync (#41)
@velvia
Copy link
Member

velvia commented Mar 3, 2016

So the initial support has been merged. Let's close this and open a new ticket for any issues or changes desired.

@velvia velvia closed this as completed Mar 3, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants