Communicating with the data warehouse and creating custom hive views

Introduction

Below is a brief overview of how HSLynk creates custom Hive/Impala views based on the HMIS, CES, and general human services data for each customer.

Technical Overview

The primary technology behind the data warehouse is Hadoop. We currently use Cloudera Hadoop cluster with Ldap sentry authentication. Essentially the data is stored in HBASE (HDFS) and we perform real-time analytics on the data loaded via creating external tables on Hive/Impala.

Custom Hive/Impala View for HSLynk and CES

We have the following projects which contain code specific to populating data our custom Hsynk and CES views. Two of the frequently used views like VI-SPDAT and CES Active List are here. https://github.com/servinglynk/hslynk-open-source/tree/master/sync-general

Conclusion

Although we use impala to populate the data to HBASE. We usually create the views on Hive because Impala and Hive share the same metadata.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Communicating with the data warehouse and creating custom hive views

Introduction

Technical Overview

Custom Hive/Impala View for HSLynk and CES

Conclusion

Clone this wiki locally