Issue with Eigenvector Centrality?! #9

blacknred0 · 2017-01-23T20:43:39Z

hi -
I have a question or issue with Eigenvector Centrality. I have a graph that I am able to create the results by using this calculation, but when I create a subgraph (or even manually importing as a new graph) the calculation seems to take longer and doesn't seem ever to finish.

The only thing that I've noticed is that before subgraphing, Freeman's centrality is <1 and when I do the subgraph Freeman's centrality is >1.

Not sure if you anybody has any pointers.

val eic = graph.eigenvectorCentrality().vertices
val evusers = users.join(eic).map {
  case (id, (username, eic)) => (username, eic)
}

Thanks in advance!

riomus · 2017-01-23T21:09:58Z

Hi,

Could you provident some exact data example or repo with code for problem reproduction?

riomus · 2017-01-23T21:20:03Z

You could also check if increasing covergence predicate that is passed as continue predicate solves your issue. If that predicate is false then computation ends. By default covergence is checked, if averege diference between results of consecutive loops is too small computation ends. Not converging is known issue od iterative approach that is used by us. I see that it is not mentioned in documentation, it should be fixed

blacknred0 · 2017-01-24T13:59:50Z

Thanks for the quick response. While I narrow down the code and increase convergence predicate, is there a max convergence predicate number that I should not go over? I see that right now is at 1e-6, what if I do 1e-9 or 1e-12 or should I go higher than that.

riomus · 2017-01-24T14:12:05Z

It depends on your use case. Lower the convergence predicate is (remember that 1e-6 > 1e-12) than longer the computation will take but also results are more exact. Try to setting it to 1e-3, or you can try to limit the number of iterations using number of iterations that is passed to the predicate. For some specific graphs, centrality can start to oscillate between two values in consecutive iterations, in order to solve that, you can try to limit number of iterations (that is probably your case)

blacknred0 · 2017-01-26T16:26:32Z

Thanks for the pointers... so, here is the scoop. I think the problem is that I am running out of RAM and I have allocated 10G on two executors and one driver. Usually stops at 2.6K completed RDDs. Which I think should have been enough RAM for this relative small data set (see below for details).

Here is the core code that I am using.

import org.apache.spark.graphx.GraphLoader
import org.apache.spark.sql.SparkSession
import org.apache.spark.SparkContext
import org.apache.spark.graphx.Graph
// Sparkling Graph functions
import ml.sparkling.graph.operators.OperatorsDSL._

val graph = GraphLoader.edgeListFile(sc, "followers.txt").cache()
val eic = graph.eigenvectorCentrality().vertices

Here is the dataset for your test. Note that it is a relative small file with 2.4MB (214,366 rows).
Followers.txt dataset for test.

I'm new to scala, so any pointers as of what to change would be helpful!

thanks for your help in advance!

riomus · 2017-01-26T19:01:42Z

You can try to repartition your graph using bigger number of partitions than default (that the first thing that I would do). I will try to look at that issue

blacknred0 · 2017-01-26T20:35:07Z

Thanks for the suggestion and try to take a look at this issue... I will give it a try! :)

Here are my updates...
I am trying this, but getting the following error. At the top of your head, is there something that I should enhance on the "eigenvectorCentrality()" function? I will keep digging.

val graph = GraphLoader.edgeListFile(sc, "followers.txt", edgeStorageLevel=StorageLevel.DISK_ONLY)
//val graph = GraphLoader.edgeListFile(sc, "followers.txt", numEdgePartitions=20, edgeStorageLevel=StorageLevel.DISK_ONLY) <- I tried this as well and didn't help.
val eic = graph.eigenvectorCentrality().vertices.persist(StorageLevel.DISK_ONLY) //ERROR when I run this.  See below.

java.lang.UnsupportedOperationException: Cannot change storage level of an RDD after it was already assigned a level
  at org.apache.spark.rdd.RDD.persist(RDD.scala:169)
  at org.apache.spark.rdd.RDD.persist(RDD.scala:194)
  at org.apache.spark.graphx.impl.VertexRDDImpl.persist(VertexRDDImpl.scala:57)
  at org.apache.spark.graphx.impl.VertexRDDImpl.persist(VertexRDDImpl.scala:27)
  ... 50 elided

blacknred0 · 2017-01-31T18:29:16Z

I think I have narrowed it down. The problem is the spark driver. I actually up it to 15G and it was not enough to process the graph. I kept track of the utilization (through top) and saw when it was running out of memory.

This happens even after enhancing storing the edgeListFile to store all on disk (both edges and vertex). Which by the way, fixes the problem from above, but not the processing of the spark driver.

val graph = GraphLoader.edgeListFile(sc, "followers.txt", 
  vertexStorageLevel = StorageLevel.DISK_ONLY, 
  edgeStorageLevel = StorageLevel.DISK_ONLY)

What you point me to what I would need to change on the function to decrease the loop? Maybe that will help.

Thanks in advance.

riomus · 2017-01-31T18:37:47Z

You can pass your own implementation to continuePredicate, it must be just a function with appropriate signature

I think that you should pass something like:

import org.apache.spark.graphx.GraphLoader
import org.apache.spark.sql.SparkSession
import org.apache.spark.SparkContext
import org.apache.spark.graphx.Graph
import ml.sparkling.graph.operators.measures.vertex.eigenvector.EigenvectorCentrality
import ml.sparkling.graph.operators.OperatorsDSL._

val graph = GraphLoader.edgeListFile(sc, "followers.txt").cache()
val eic = EigenvectorCentrality.computeEigenvector(graph,VertexMeasureConfiguration(),(iteration,_,_)=>iteration<999).vertices

Just replace the 999 with appropriate value

blacknred0 · 2017-02-03T19:43:52Z

ok, thanks mate! I can confirm that after using VertexMeasureConfiguration() and passing the iteration value it works like a charm :)

blacknred0 closed this as completed Feb 3, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with Eigenvector Centrality?! #9

Issue with Eigenvector Centrality?! #9

blacknred0 commented Jan 23, 2017

riomus commented Jan 23, 2017

riomus commented Jan 23, 2017 •

edited

Loading

blacknred0 commented Jan 24, 2017

riomus commented Jan 24, 2017

blacknred0 commented Jan 26, 2017

riomus commented Jan 26, 2017

blacknred0 commented Jan 26, 2017 •

edited

Loading

blacknred0 commented Jan 31, 2017

riomus commented Jan 31, 2017

blacknred0 commented Feb 3, 2017

Issue with Eigenvector Centrality?! #9

Issue with Eigenvector Centrality?! #9

Comments

blacknred0 commented Jan 23, 2017

riomus commented Jan 23, 2017

riomus commented Jan 23, 2017 • edited Loading

blacknred0 commented Jan 24, 2017

riomus commented Jan 24, 2017

blacknred0 commented Jan 26, 2017

riomus commented Jan 26, 2017

blacknred0 commented Jan 26, 2017 • edited Loading

blacknred0 commented Jan 31, 2017

riomus commented Jan 31, 2017

blacknred0 commented Feb 3, 2017

riomus commented Jan 23, 2017 •

edited

Loading

blacknred0 commented Jan 26, 2017 •

edited

Loading