Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for HBase Data Source #720

Closed
mramos1004 opened this issue Jun 2, 2017 · 2 comments
Closed

Support for HBase Data Source #720

mramos1004 opened this issue Jun 2, 2017 · 2 comments

Comments

@mramos1004
Copy link

I've been looking for a way to connect to HBase through SparklyR but I can't find any examples. Is it right so far, the only thing to do is to export the hbase tables to hdfs files?

@kevinykuo
Copy link
Collaborator

https://github.com/hortonworks-spark/shc

@javierluraschi
Copy link
Collaborator

javierluraschi commented Feb 17, 2020

Should be able to use sparklyr with something like this: https://therinspark.com/data.html#cassandra

But instead of connecting to Cassandra, you would use the Spark-HBase connector from Hortonworks: https://github.com/hortonworks-spark/shc

Something like the following might just work...

sc <- spark_connect(master = "local", version = "2.3", config = list(
  sparklyr.connect.packages = "com.hortonworks:shc-core:1.1.1-2.1-s_2.11",
  sparklyr.shell.repositories = "http://repo.hortonworks.com/content/groups/public/",
  sparklyr.shell.files = "/etc/hbase/conf/hbase-site.xml"))

spark_read_source(
  sc, 
  name = "<table>",
  source = "org.apache.spark.sql.execution.datasources.hbase",
  options = list("HBaseTableCatalog.tableCatalog" = "<catalog>"),
  memory = FALSE)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants