-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with Eigenvector Centrality?! #9
Comments
Hi, Could you provident some exact data example or repo with code for problem reproduction? |
You could also check if increasing covergence predicate that is passed as continue predicate solves your issue. If that predicate is false then computation ends. By default covergence is checked, if averege diference between results of consecutive loops is too small computation ends. Not converging is known issue od iterative approach that is used by us. I see that it is not mentioned in documentation, it should be fixed |
Thanks for the quick response. While I narrow down the code and increase convergence predicate, is there a max convergence predicate number that I should not go over? I see that right now is at |
It depends on your use case. Lower the convergence predicate is (remember that 1e-6 > 1e-12) than longer the computation will take but also results are more exact. Try to setting it to 1e-3, or you can try to limit the number of iterations using number of iterations that is passed to the predicate. For some specific graphs, centrality can start to oscillate between two values in consecutive iterations, in order to solve that, you can try to limit number of iterations (that is probably your case) |
Thanks for the pointers... so, here is the scoop. I think the problem is that I am running out of RAM and I have allocated 10G on two executors and one driver. Usually stops at 2.6K completed RDDs. Which I think should have been enough RAM for this relative small data set (see below for details). Here is the core code that I am using. import org.apache.spark.graphx.GraphLoader
import org.apache.spark.sql.SparkSession
import org.apache.spark.SparkContext
import org.apache.spark.graphx.Graph
// Sparkling Graph functions
import ml.sparkling.graph.operators.OperatorsDSL._
val graph = GraphLoader.edgeListFile(sc, "followers.txt").cache()
val eic = graph.eigenvectorCentrality().vertices Here is the dataset for your test. Note that it is a relative small file with 2.4MB (214,366 rows). I'm new to scala, so any pointers as of what to change would be helpful! thanks for your help in advance! |
You can try to repartition your graph using bigger number of partitions than default (that the first thing that I would do). I will try to look at that issue |
Thanks for the suggestion and try to take a look at this issue... I will give it a try! :) Here are my updates... val graph = GraphLoader.edgeListFile(sc, "followers.txt", edgeStorageLevel=StorageLevel.DISK_ONLY)
//val graph = GraphLoader.edgeListFile(sc, "followers.txt", numEdgePartitions=20, edgeStorageLevel=StorageLevel.DISK_ONLY) <- I tried this as well and didn't help.
val eic = graph.eigenvectorCentrality().vertices.persist(StorageLevel.DISK_ONLY) //ERROR when I run this. See below.
|
I think I have narrowed it down. The problem is the spark driver. I actually up it to 15G and it was not enough to process the graph. I kept track of the utilization (through top) and saw when it was running out of memory. This happens even after enhancing storing the val graph = GraphLoader.edgeListFile(sc, "followers.txt",
vertexStorageLevel = StorageLevel.DISK_ONLY,
edgeStorageLevel = StorageLevel.DISK_ONLY) What you point me to what I would need to change on the function to decrease the loop? Maybe that will help. Thanks in advance. |
You can pass your own implementation to continuePredicate, it must be just a function with appropriate signature I think that you should pass something like: import org.apache.spark.graphx.GraphLoader
import org.apache.spark.sql.SparkSession
import org.apache.spark.SparkContext
import org.apache.spark.graphx.Graph
import ml.sparkling.graph.operators.measures.vertex.eigenvector.EigenvectorCentrality
import ml.sparkling.graph.operators.OperatorsDSL._
val graph = GraphLoader.edgeListFile(sc, "followers.txt").cache()
val eic = EigenvectorCentrality.computeEigenvector(graph,VertexMeasureConfiguration(),(iteration,_,_)=>iteration<999).vertices Just replace the 999 with appropriate value |
ok, thanks mate! I can confirm that after using |
hi -
I have a question or issue with Eigenvector Centrality. I have a graph that I am able to create the results by using this calculation, but when I create a subgraph (or even manually importing as a new graph) the calculation seems to take longer and doesn't seem ever to finish.
The only thing that I've noticed is that before subgraphing, Freeman's centrality is <1 and when I do the subgraph Freeman's centrality is >1.
Not sure if you anybody has any pointers.
Thanks in advance!
The text was updated successfully, but these errors were encountered: