Skip to content

holgerbrandl/spark_image_labeling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Image Component Labeling Using Apache Spark

Idea: rephrase problem into graph and use graphx connected component search to label components in a distributed (sparky) manner.

Spark Graph API pointers

https://spark.apache.org/docs/latest/graphx-programming-guide.html#connected-components

ImgLib2 API pointers

Image Open & Display

Accessors & Cursors

ImagePlusImg image = ImagePlusAdapter.wrap(img);
Img<UnsignedByteType> image = ImageJFunctions.wrap(img);

try ops http://mvnrepository.com/artifact/net.imagej/imagej-ops/0.38.0

Example image after enhance contrasts

Benchmarking

Available Libs

@State Model

When multiple {@link Param}-s are needed for the benchmark run, JMH will compute the outer product of all the parameters in the run.

run with json as outformat

com.github.holgerbrandl.spark.misc.ExampleBenchmark.init  -rf json -rff results.csv

csv provided error is 99.9 CI

if execution plan is injected into benchmark method --> traverse state outer product

Run with

cd /Users/brandl/projects/spark/component_labeling

#sbt package
#appJar=/Users/brandl/projects/spark/component_labeling/target/scala-2.11/component_labeling_2.11-0.1.jar
#ll $appJar
#java -jar "target/scala-2.11/component_labeling_2.11-0.1.jar"
# -> won't work without a fat jar

## use sbt plugin from https://github.com/ktoso/sbt-jmh
# Write your benchmarks in `src/main/scala`. They will be picked up and instrumented by the plugin.

## run all
sbt 'jmh:run *' ## works if no "extends app" are present in code (see https://github.com/ktoso/sbt-jmh/pull/117#issuecomment-331255198)
#sbt jmh:run -i 3 -wi 3 -f1 -t1 .*FalseSharing.*

# run test benchmark
sbt 'jmh:run -rf json -rff ExampleBenchmark.results.json com.github.holgerbrandl.spark.misc.ExampleBenchmark'

# run local benchmarking
sbt 'jmh:run -rf json -rff threaded_results.json com.github.holgerbrandl.spark.components.ThreadedLabelBM' 



## Benchmark distributed labeling 

## fur; cd /home/brandl/misc/spark_image_labeling; sparkcluster start  --walltime 05:00 --memory-per-core 2000 100

# dedicated local cluster 
export SPARK_CLUSTER_URL="spark://scicomp-mac-12-usb.mpi-cbg.de:7077"

#ssh forwarded remote cluster
# export SPARK_CLUSTER_URL="spark://localhost:10100"

## actual cluster url when running benchmark on the cluster itself
export SPARK_CLUSTER_URL="spark://c03n01:7077"

# test spark cluster connectivity
$SPARK_HOME/bin/spark-shell --master ${SPARK_CLUSTER_URL} 

sbt package #don't forget of you'll see proxy-cast exception
sbt 'jmh:run -rf json -rff cluster_results.json com.github.holgerbrandl.spark.components.ClusterLabelBenchmark' 

# print spark logs

Next Steps

  1. fix imglib generics by converting Interval back into byte/int image2

  2. Simplify deployment

rebuild using gradle, see https://docs.gradle.org/current/userguide/scala_plugin.html and do profiling with https://github.com/melix/jmh-gradle-plugin which also supports better combined fatjar packaging and thus allow for java -jar build/libs/benchmarking-experiments-0.1.0-all.jar which seems to accept args because of https://github.com/danielmitterdorfer/benchmarking-experiments/blob/master/pom.xml#L65

Source which also uses gradle shadow --> sources: https://github.com/danielmitterdorfer/benchmarking-experiments without using jmh-gradle!!

or use mvn archetype as described on http://openjdk.java.net/projects/code-tools/jmh/

## jar building: follow advice from http://openjdk.java.net/projects/code-tools/jmh/
cd ~/Desktop/
mvn archetype:generate \
          -DinteractiveMode=false \
          -DarchetypeGroupId=org.openjdk.jmh \
          -DarchetypeArtifactId=jmh-scala-benchmark-archetype \
          -DgroupId=org.sample \
          -DartifactId=test \
          -Dversion=1.0

Other References

Array DBs

About

Image labeling experiments using apache spark

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published