-
Notifications
You must be signed in to change notification settings - Fork 225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Hive #41
Comments
@rstrickland welcome! Would you mind painting a picture of your setup? Hive metastore is part of your Hadoop installation? Or DSE? I don't quite understand - table meta stored in Hive, but actual tables stored in C*? Thanks! |
We use a centralized Hive metastore that's shared by multiple Spark and EMR clusters that all serve different purposes. The data is stored in multiple places, including Cassandra. However, with the lack of a good open source Cassandra-Hive driver we've had to resort to creating temp tables every time we want to get at Cassandra data. It would be awesome if Filo supported legitimate Hive tables so we could bypass this step. |
Got it. Would it be fine to do this through Spark — i.e. you cannot use the filo-cli but use Spark API to create tables etc. (since Spark already can connect to the Hive metastore)
|
We could, but we do have BI tools that use Hive proper (i.e. not the Spark SQL thrift server). Ideally it would be great if that would work as well, but I know that's a bigger effort. |
Okay, I think I understand now. You are looking for a proper FiloDB driver for HIVE that lets you query FiloDB from Hive itself. Understood now.
|
@rstrickland ok so to break this up into two steps:
|
Ok, scoped out the work for Hive metastore support of FiloDB tables for querying in Spark. Spark has a @rstrickland it appears in Hive you either have to register a table as Hive-supported (i.e. using Hadoop INputFormats) or non-Hive supported (for Spark datasources, for example). Thus there might need to be some hack for namespacing the tables. What do you think? |
So are you suggesting creating temp tables on startup that would be accessible via Spark only? I think the only way to get bona fide Hive support is to create a Hive SerDe. But your solution (if I'm understanding correctly) would solve the issue of having to recreate temp tables on startup for Spark or Spark SQL jobs, but would not allow for Hive queries via a BI tool. |
@rstrickland you are right, the above proposed solution would not enable true Hive-only queries, though you can still connect BI tools to Spark SQL / Thrift server via the JDBC/ODBC drivers. The SerDe/InputFormats required for true Hive-only operation would come as a second step. Would you guys be willing to test out the Spark-only solution, before the full Hive solution comes? What would the timeframe look like? |
We would definitely test it whenever it's ready. On Sunday, January 10, 2016, Evan Chan notifications@github.com wrote:
|
@rstrickland check out #63 Thanks! |
…ue-41 Semi-automatic Hive Metastore sync (#41)
So the initial support has been merged. Let's close this and open a new ticket for any issues or changes desired. |
I really like this concept, but it's critical for us to be able to create permanent tables via a Hive metastore. Would it be possible to support Hive at some point?
The text was updated successfully, but these errors were encountered: