Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Querying a cassandra DB via spark #38

Open
Enilia opened this issue May 11, 2016 · 3 comments
Open

Querying a cassandra DB via spark #38

Enilia opened this issue May 11, 2016 · 3 comments

Comments

@Enilia
Copy link

Enilia commented May 11, 2016

Hey there,

As the title says, i am trying to query an existing cassandra DB from nodejs using your library. I am using a spark cluster on a LAN

Here's what i have done so far :
using :

  • CentOS 7
  • node 4.4.4
  • apache-spark-node@0.3.3
  • spark 1.6.1
  • cassandra 2.2.5
  • spark-cassandra-connector 1.6.0-M1

From the root of my project :

ASSEMBLY_JAR=/usr/share/spark/lib/spark-assembly-1.6.1-hadoop2.6.0.jar node_modules/apache-spark-node/bin/spark-node \
--master spark://192.168.1.101:7077 --conf spark.cores.max=4 \
--jars /root/spark-cassandra-connector/spark-cassandra-connector/target/scala-2.10/spark-cassandra-connector-assembly-1.6.0-M1-36-g220aa37.jar

Once i have access to the command line i tried to do

spark-node> sqlContext.sql("Select count(*) from mykeyspace.mytable")

but of course i get a

Error: Error creating class
org.apache.spark.sql.AnalysisException: Table not found: `mykeyspace`.`mytable`;

i then tried to adapt a snippet of scala i've seen on a stack overflow post

var df = sqlContext
  .read()
  .format("org.apache.spark.sql.cassandra")
  .option("table", "mytable")
  .option("keyspace", "mykeyspace")
  .load(null, function(err, res) { console.log(err); console.log(res) }) 

but all i get is a

Error: Error running instance method
java.lang.ClassNotFoundException: Failed to find data source: org.apache.spark.sql.cassandra. Please find packages at http://spark-packages.org

The problem surely comes from the fact that i don't understand half of how everything is linked together, that's why i'm here asking for some help about this issue. All i need is a way to execute basic sql functions (with only WHERE clauses) over one cassandra table.

I recon this project seems no longer maintained, but this is as far as i can see the simpler solution i have seen so far (solutions like eclairJS have way more functionalities than i need, at the cost of an increased complexity and maybe less performance) and it would just fill my needs.

@tobilg
Copy link
Contributor

tobilg commented May 11, 2016

You should post your complete code. According to the docs you need to set up the SparkContext with the right configuration properties.

Furthermore, there is an example on how to use SparkSQL.

Basically, this is not an issue of apache-spark-node and should be closed accordingly.

@henridf
Copy link
Owner

henridf commented May 12, 2016

Hi @Enilia - as @tobilg answers this doesn't appear to be an issue but if we're missing something please post a more complete description and I'll do my best to help. (This project is still maintained btw.)

@Enilia
Copy link
Author

Enilia commented May 12, 2016

Hi and thanks for the quick reply,

I'm sorry if i thought the project was not maintained anymore, i got this impression from the low activity of the repo in the last few months :s .
Anyway, i'm glad you're still active on this project. I'll get a look at the links @tobilg gave here and post a more complete issue if there's something missing.
I'm still new in the cassandra/spark/java/scala universe, so i'm a bit lost here tbh ^^

Best regards,
Eni

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants