Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with Eigenvector Centrality?! #9

Closed
blacknred0 opened this issue Jan 23, 2017 · 10 comments
Closed

Issue with Eigenvector Centrality?! #9

blacknred0 opened this issue Jan 23, 2017 · 10 comments

Comments

@blacknred0
Copy link

hi -
I have a question or issue with Eigenvector Centrality. I have a graph that I am able to create the results by using this calculation, but when I create a subgraph (or even manually importing as a new graph) the calculation seems to take longer and doesn't seem ever to finish.

The only thing that I've noticed is that before subgraphing, Freeman's centrality is <1 and when I do the subgraph Freeman's centrality is >1.

Not sure if you anybody has any pointers.

val eic = graph.eigenvectorCentrality().vertices
val evusers = users.join(eic).map {
  case (id, (username, eic)) => (username, eic)
}

Thanks in advance!

@riomus
Copy link
Member

riomus commented Jan 23, 2017

Hi,

Could you provident some exact data example or repo with code for problem reproduction?

@riomus
Copy link
Member

riomus commented Jan 23, 2017

You could also check if increasing covergence predicate that is passed as continue predicate solves your issue. If that predicate is false then computation ends. By default covergence is checked, if averege diference between results of consecutive loops is too small computation ends. Not converging is known issue od iterative approach that is used by us. I see that it is not mentioned in documentation, it should be fixed

@blacknred0
Copy link
Author

Thanks for the quick response. While I narrow down the code and increase convergence predicate, is there a max convergence predicate number that I should not go over? I see that right now is at 1e-6, what if I do 1e-9 or 1e-12 or should I go higher than that.

@riomus
Copy link
Member

riomus commented Jan 24, 2017

It depends on your use case. Lower the convergence predicate is (remember that 1e-6 > 1e-12) than longer the computation will take but also results are more exact. Try to setting it to 1e-3, or you can try to limit the number of iterations using number of iterations that is passed to the predicate. For some specific graphs, centrality can start to oscillate between two values in consecutive iterations, in order to solve that, you can try to limit number of iterations (that is probably your case)

@blacknred0
Copy link
Author

Thanks for the pointers... so, here is the scoop. I think the problem is that I am running out of RAM and I have allocated 10G on two executors and one driver. Usually stops at 2.6K completed RDDs. Which I think should have been enough RAM for this relative small data set (see below for details).

Here is the core code that I am using.

import org.apache.spark.graphx.GraphLoader
import org.apache.spark.sql.SparkSession
import org.apache.spark.SparkContext
import org.apache.spark.graphx.Graph
// Sparkling Graph functions
import ml.sparkling.graph.operators.OperatorsDSL._

val graph = GraphLoader.edgeListFile(sc, "followers.txt").cache()
val eic = graph.eigenvectorCentrality().vertices

Here is the dataset for your test. Note that it is a relative small file with 2.4MB (214,366 rows).
Followers.txt dataset for test.

I'm new to scala, so any pointers as of what to change would be helpful!

thanks for your help in advance!

@riomus
Copy link
Member

riomus commented Jan 26, 2017

You can try to repartition your graph using bigger number of partitions than default (that the first thing that I would do). I will try to look at that issue

@blacknred0
Copy link
Author

blacknred0 commented Jan 26, 2017

Thanks for the suggestion and try to take a look at this issue... I will give it a try! :)


Here are my updates...
I am trying this, but getting the following error. At the top of your head, is there something that I should enhance on the "eigenvectorCentrality()" function? I will keep digging.

val graph = GraphLoader.edgeListFile(sc, "followers.txt", edgeStorageLevel=StorageLevel.DISK_ONLY)
//val graph = GraphLoader.edgeListFile(sc, "followers.txt", numEdgePartitions=20, edgeStorageLevel=StorageLevel.DISK_ONLY) <- I tried this as well and didn't help.
val eic = graph.eigenvectorCentrality().vertices.persist(StorageLevel.DISK_ONLY) //ERROR when I run this.  See below.
java.lang.UnsupportedOperationException: Cannot change storage level of an RDD after it was already assigned a level
  at org.apache.spark.rdd.RDD.persist(RDD.scala:169)
  at org.apache.spark.rdd.RDD.persist(RDD.scala:194)
  at org.apache.spark.graphx.impl.VertexRDDImpl.persist(VertexRDDImpl.scala:57)
  at org.apache.spark.graphx.impl.VertexRDDImpl.persist(VertexRDDImpl.scala:27)
  ... 50 elided

@blacknred0
Copy link
Author

I think I have narrowed it down. The problem is the spark driver. I actually up it to 15G and it was not enough to process the graph. I kept track of the utilization (through top) and saw when it was running out of memory.

This happens even after enhancing storing the edgeListFile to store all on disk (both edges and vertex). Which by the way, fixes the problem from above, but not the processing of the spark driver.

val graph = GraphLoader.edgeListFile(sc, "followers.txt", 
  vertexStorageLevel = StorageLevel.DISK_ONLY, 
  edgeStorageLevel = StorageLevel.DISK_ONLY)

What you point me to what I would need to change on the function to decrease the loop? Maybe that will help.

Thanks in advance.

@riomus
Copy link
Member

riomus commented Jan 31, 2017

You can pass your own implementation to continuePredicate, it must be just a function with appropriate signature

I think that you should pass something like:

import org.apache.spark.graphx.GraphLoader
import org.apache.spark.sql.SparkSession
import org.apache.spark.SparkContext
import org.apache.spark.graphx.Graph
import ml.sparkling.graph.operators.measures.vertex.eigenvector.EigenvectorCentrality
import ml.sparkling.graph.operators.OperatorsDSL._

val graph = GraphLoader.edgeListFile(sc, "followers.txt").cache()
val eic = EigenvectorCentrality.computeEigenvector(graph,VertexMeasureConfiguration(),(iteration,_,_)=>iteration<999).vertices

Just replace the 999 with appropriate value

@blacknred0
Copy link
Author

ok, thanks mate! I can confirm that after using VertexMeasureConfiguration() and passing the iteration value it works like a charm :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants