GitHub

Graph Partitioning Strategy on GraphX (Spark)

We implemented some algorithms for partitioning on GraphX, and evaluated. And find the partitioning has space of improving. Seek opinion and advice.

Graph in real world follow power law. Eg. On twitter 1% of the vertices are adjacent to nearly half of the edges.
For high-degree vertex, one vertex concentrates vast resources. So the workload on few high-degree vertex should be decomposed by all machines
For low-degree vertex, The computation on one vertex is quite small. Thus should exploit the locality of the computation on low-degree vertex.

- The left Y axis is replication factor, right axis is the balance (measured using CV, coefficient of variation) of either vertices or edges of all partitions. The balance of edges can infer computation balance, and the balance of vertices can infer communication balance.
- This is an example of a balanced partitioning achieving 20% saving on communication.
- This is a simple partitioning result of BiCut.
in-2.0-1m is a generated power law graph with alpha equals 2.0

https://github.com/larryxiao/spark/blob/GraphX/graphx/src/main/scala/org/apache/spark/graphx/impl/GraphImpl.scala#L173
Because the implementation breaks the current separation with PartitionStrategy.scala, so need to think of a way to support access to graph.

Name		Name	Last commit message	Last commit date
Latest commit History 8,241 Commits
Arkansol.Analyse		Arkansol.Analyse
Arkansol.Maintanence		Arkansol.Maintanence
Arkansol.Property		Arkansol.Property
application		application
assembly		assembly
bagel		bagel
bin		bin
conf		conf
core		core
data/mllib		data/mllib
dev		dev
docker		docker
docs		docs
ec2		ec2
examples		examples
external		external
extras		extras
graphx		graphx
mllib		mllib
project		project
python		python
repl		repl
sbin		sbin
sbt		sbt
sql		sql
streaming		streaming
tools		tools
yarn		yarn
._Summary_Table.txt		._Summary_Table.txt
.gitignore		.gitignore
.rat-excludes		.rat-excludes
Arkansol.md		Arkansol.md
BreakDown.sh		BreakDown.sh
CONTRIBUTING.md		CONTRIBUTING.md
GetIterations.sh		GetIterations.sh
GoodNight.sh		GoodNight.sh
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
ScpGCLog.sh		ScpGCLog.sh
batchtest.sh		batchtest.sh
delWork.sh		delWork.sh
make-distribution.sh		make-distribution.sh
monitor.sh		monitor.sh
pom.xml		pom.xml
report.sh		report.sh
run.sh		run.sh
scalastyle-config.xml		scalastyle-config.xml
tox.ini		tox.ini