Support Hive #41

rstrickland · 2015-11-09T13:51:53Z

I really like this concept, but it's critical for us to be able to create permanent tables via a Hive metastore. Would it be possible to support Hive at some point?

velvia · 2015-11-09T17:21:39Z

@rstrickland welcome! Would you mind painting a picture of your setup? Hive metastore is part of your Hadoop installation? Or DSE? I don't quite understand - table meta stored in Hive, but actual tables stored in C*? Thanks!

rstrickland · 2015-11-10T18:09:56Z

We use a centralized Hive metastore that's shared by multiple Spark and EMR clusters that all serve different purposes. The data is stored in multiple places, including Cassandra. However, with the lack of a good open source Cassandra-Hive driver we've had to resort to creating temp tables every time we want to get at Cassandra data. It would be awesome if Filo supported legitimate Hive tables so we could bypass this step.

velvia · 2015-11-10T20:58:06Z

Got it. Would it be fine to do this through Spark — i.e. you cannot use the filo-cli but use Spark API to create tables etc. (since Spark already can connect to the Hive metastore)

On Nov 10, 2015, at 10:09 AM, Robbie Strickland notifications@github.com wrote:

We use a centralized Hive metastore that's shared by multiple Spark and EMR clusters that all serve different purposes. The data is stored in multiple places, including C. However, with the lack of a good open source C-Hive driver we've had to resort to creating temp tables every time we want to get at C* data. It would be awesome if Filo supported legitimate Hive tables so we could bypass this step.

—
Reply to this email directly or view it on GitHub #41 (comment).

rstrickland · 2015-11-10T21:09:33Z

We could, but we do have BI tools that use Hive proper (i.e. not the Spark SQL thrift server). Ideally it would be great if that would work as well, but I know that's a bigger effort.

velvia · 2015-11-10T21:15:37Z

Okay, I think I understand now. You are looking for a proper FiloDB driver for HIVE that lets you query FiloDB from Hive itself. Understood now.

On Nov 10, 2015, at 1:09 PM, Robbie Strickland notifications@github.com wrote:

We could, but we do have BI tools that use Hive proper (i.e. not the Spark SQL thrift server). Ideally it would be great if that would work as well, but I know that's a bigger effort.

—
Reply to this email directly or view it on GitHub #41 (comment).

velvia · 2015-12-17T17:54:01Z

@rstrickland ok so to break this up into two steps:

Have a Hive driver (like DSE's) that automatically lets you query tables from Spark without having to do a CREATE EXTERNAL TABLE
Actually support queries directly from Hive without Spark. Hmmmm.... I think this involves yucky input formats and Hive SerDes, etc.

velvia · 2016-01-10T06:01:40Z

Ok, scoped out the work for Hive metastore support of FiloDB tables for querying in Spark. Spark has a HiveMetadataCatalog class which has a createDataSourceTable method. So one possibility is that when the FiloDB daemon/library spins up, it automatically resolves differences between FiloDB tables and the Hive catalog. Other times this sync could in theory happen is when a user requests tables or schema, but this would then require a custom Hive plugin in Spark. Need to think about how to automate the syncing.

@rstrickland it appears in Hive you either have to register a table as Hive-supported (i.e. using Hadoop INputFormats) or non-Hive supported (for Spark datasources, for example). Thus there might need to be some hack for namespacing the tables. What do you think?

rstrickland · 2016-01-10T17:40:57Z

So are you suggesting creating temp tables on startup that would be accessible via Spark only? I think the only way to get bona fide Hive support is to create a Hive SerDe. But your solution (if I'm understanding correctly) would solve the issue of having to recreate temp tables on startup for Spark or Spark SQL jobs, but would not allow for Hive queries via a BI tool.

velvia · 2016-01-10T23:02:13Z

@rstrickland you are right, the above proposed solution would not enable true Hive-only queries, though you can still connect BI tools to Spark SQL / Thrift server via the JDBC/ODBC drivers. The SerDe/InputFormats required for true Hive-only operation would come as a second step.

Would you guys be willing to test out the Spark-only solution, before the full Hive solution comes? What would the timeframe look like?

rstrickland · 2016-01-11T00:48:24Z

We would definitely test it whenever it's ready.

On Sunday, January 10, 2016, Evan Chan notifications@github.com wrote:

@rstrickland https://github.com/rstrickland you are right, the above
proposed solution would not enable true Hive-only queries, though you can
still connect BI tools to Spark SQL / Thrift server via the JDBC/ODBC
drivers. The SerDe/InputFormats required for true Hive-only operation would
come as a second step.

Would you guys be willing to test out the Spark-only solution, before the
full Hive solution comes? What would the timeframe look like?

—
Reply to this email directly or view it on GitHub
#41 (comment).

Robbie **Strickland *|Director, Software Engineering
w:* 770-226-2093 e: robbie.strickland@weather.com

velvia · 2016-02-19T00:42:07Z

@rstrickland check out #63
and LMK if this is roughly what you guys are looking for as a first step. Would like some feedback first.

Thanks!

…ue-41 Semi-automatic Hive Metastore sync (#41)

velvia · 2016-03-03T19:20:16Z

So the initial support has been merged. Let's close this and open a new ticket for any issues or changes desired.

velvia added the New Feature label Nov 9, 2015

velvia added a commit that referenced this issue Mar 2, 2016

Merge pull request #63 from tuplejump/feature/hive-metastore-sync-iss…

ad52e08

…ue-41 Semi-automatic Hive Metastore sync (#41)

velvia closed this as completed Mar 3, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Hive #41

Support Hive #41

rstrickland commented Nov 9, 2015

velvia commented Nov 9, 2015

rstrickland commented Nov 10, 2015

velvia commented Nov 10, 2015

rstrickland commented Nov 10, 2015

velvia commented Nov 10, 2015

velvia commented Dec 17, 2015

velvia commented Jan 10, 2016

rstrickland commented Jan 10, 2016

velvia commented Jan 10, 2016

rstrickland commented Jan 11, 2016

velvia commented Feb 19, 2016

velvia commented Mar 3, 2016

Support Hive #41

Support Hive #41

Comments

rstrickland commented Nov 9, 2015

velvia commented Nov 9, 2015

rstrickland commented Nov 10, 2015

velvia commented Nov 10, 2015

rstrickland commented Nov 10, 2015

velvia commented Nov 10, 2015

velvia commented Dec 17, 2015

velvia commented Jan 10, 2016

rstrickland commented Jan 10, 2016

velvia commented Jan 10, 2016

rstrickland commented Jan 11, 2016

velvia commented Feb 19, 2016

velvia commented Mar 3, 2016