Skip to content

Commit

Permalink
Strong Scaling Parallelized Tool Paper
Browse files Browse the repository at this point in the history
  • Loading branch information
Markus Kühbach committed Jan 17, 2019
1 parent b7a848c commit d20e679
Show file tree
Hide file tree
Showing 420 changed files with 79,881 additions and 2,799 deletions.
Empty file added .Rhistory
Empty file.
440 changes: 264 additions & 176 deletions CMakeLists.txt

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
# PARAPROBE
Scalable Parallelized Tools for Mining Atom Probe Tomography Data
Strong Scaling Parallelized Tools for Mining Atom Probe Tomography Data
12 changes: 12 additions & 0 deletions build/PARAPROBE.Paper14.RangingTest.rrng
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
[Ions]
Number=4
Ion1=Al
Ion2=Fe
Ion3=Si
Ion4=Sc
[Ranges]
Number=4
Range1=26.7 26.9 Vol:0.01661 Al:1 Color:33FFFF
Range1=55.9 56.0 Vol:0.01661 Fe:1 Color:33FFFF
Range1=27.8 28.0 Vol:0.01661 Si:1 Color:33FFFF
Range1=44.8 50.0 Vol:0.01661 Sc:1 Color:33FFFF
107 changes: 77 additions & 30 deletions build/Test.xml → build/PARAPROBE.Paper14.SyntheticTest.xml
Original file line number Diff line number Diff line change
@@ -1,55 +1,70 @@
<?xml version="1.0" encoding="utf-8"?>
<Parameters>
<!--for binary choice 0 specifies off, 1 on-->
<!--for binary choices, 0 specifies off, 1 on-->

<!--WHAT TYPE OF ANALYSIS TO DO-->
<InputFileformat>2</InputFileformat>
<InputFileformat>4</InputFileformat>
<!--0 nothing-->
<!--1 pos-->
<!--2 epos-->
<!--3 hdf5-->
<!--4 synthetic-->
<RAWFilenameIn>R25_33096-v02.epos</RAWFilenameIn>
<RAWFilenameIn>Synthetic.pos</RAWFilenameIn>
<!-- if Inputtype is not 4 synthetic file ending needs to match-->
<!-- the measured APT data the reader is adaptive if ends with epos will read binary epos and change endianness, otherwise rejected-->

<AnalysisMode>0</AnalysisMode>
<!--0 default-->

<ReconstructionAlgorithm>3</ReconstructionAlgorithm>
<AnalysisMode>1</AnalysisMode>
<!--0 nothing-->
<!--1 work in reconstruction space ONLY THIS AT THE MOMENT-->

<ReconstructionAlgorithm>1</ReconstructionAlgorithm>
<!--0 no reconstruction-->
<!--1 accept from synthetic dataset-->
<!--2 accept from pos ie x,y,z from pos are taken as ion coor in recon space-->
<!--3 accept from epos ie x,y,z from epos are taken as ion coor in recon space-->
<!--4 Bas et al. reconstruction requires epos file-->
<IdentifyIonType>1</IdentifyIonType>
<!--aka ranging based on rrng file specified below-->
<RRNGFilenameIn>R25_33096-v02.rrng</RRNGFilenameIn>
<RRNGFilenameIn>PARAPROBE.Paper14.Ranging.rrng</RRNGFilenameIn>

<SurfaceReconstructionType>0</SurfaceReconstructionType>
<SurfaceReconstructionType>1</SurfaceReconstructionType>
<!--0 no surface recon-->
<!--1 CGAL alphashape ONLY THIS CURRENTLY-->
<!--2 CGAL convex hull-->
<!--3 marching cube IGL-->
<!--4 read existent triangle hull-->
<SurfaceFilenameIn>PARAPROBE.SimID.0.TipSurface.vtk</SurfaceFilenameIn>
<SurfaceFilenameIn>PARAPROBE.Paper14.SimID.XXX1.vtk</SurfaceFilenameIn>
<!--raw file name in and surface need to be same-->
<AlphaShapeAlphaValueChoice>0</AlphaShapeAlphaValueChoice>
<!--0 default smallest alpha value to get a solid through data points-->
<!--1 use value which CGAL considers to be the optimal value-->

<AnalysisCrystallographicInfo>0</AnalysisCrystallographicInfo>
<!--0 no crystallography analysis-->
<!--1 perform Vicente AraulloPeters et al method to extract crystallographic signal-->

<AnalysisSpatDistrType>52</AnalysisSpatDistrType>
<AnalysisSpatDistrType>5</AnalysisSpatDistrType>
<!--0 no spatial statistics-->
<!--1 radial distribution function-->
<!--2 nearest neighbor 1NN-->
<!--3 RipleyK-->
<!--4 k nearest neighbor with k larger one-->
<!--5 multiple k nearest neighbors will allow to compute with one call for instance 2th, 5th, 10th, 50th, 100th, 3333th nearest neighbor-->
<!--list all single character numbers of tasks without space in one string to instruct all task at once 2413 will perform RDF 1NN RIPK and KNN as it will 1234-->
<!--4 multiple k nearest neighbors will allow to compute with one call for instance 2th, 5th, 10th, 50th, 100th, 3333th nearest neighbor-->
<!--5 n-point spatial correlations for k nearest neighbor-->
<!--6 distribution number of local neighbors within spherical environment rmax beta stage this is a marginal distribution of a twodimensional statistics-->
<!--namely how many local neighbors to an ion within distance bin r, this is not rdf because in rdf local neighbor counts in r bin are summed for all atoms-->
<!--while in mode 6 discrete distribution of counts are given how the local density scatters this should allow to filter out spinodally decomposed regions-->
<!--list all single character numbers of tasks without space in one string to instruct all task at once 216 will perform 1NN RDF and NP correlations-->

<AnalysisClusteringType>0</AnalysisClusteringType>
<!--0 no clustering-->
<!--1 DBScan-->
<!--2 MaximumSeparationMethod ONLY THIS CURRENTLY-->
<!--3 Isosurface-based-->

<AnalysisVolumeTessellation>0</AnalysisVolumeTessellation>
<!--0 no tessellation is constructed-->
<!--1 a Voronoi tessellation is generated but not stored-->
<!--2 a Voronoi tessellation IMPLEMENTED in tessHdl but NOT ACTIVATED is generated and all cells written to H5 file-->

<!--RECONSTRUCTION PARAMETER-->
<FlightLength>80.0</FlightLength>
Expand All @@ -72,23 +87,45 @@
<ICFMax>1.65</ICFMax>

<!--SURFACE RECONSTRUCTION PARAMETER if not otherwise mentioned in nanometer-->
<AdvIonPruneBinWidthMin>1.0</AdvIonPruneBinWidthMin>
<AdvIonPruneBinWidthIncr>1.0</AdvIonPruneBinWidthIncr>
<AdvIonPruneBinWidthMax>1.0</AdvIonPruneBinWidthMax>
<DebugComputeDistance>0</DebugComputeDistance>
<AdvIonPruneBinWidthMin>0.5</AdvIonPruneBinWidthMin>
<AdvIonPruneBinWidthIncr>0.5</AdvIonPruneBinWidthIncr>
<AdvIonPruneBinWidthMax>0.5</AdvIonPruneBinWidthMax>
<DebugComputeDistance>1</DebugComputeDistance>
<!--0 off, 1 on-->

<!--probing to at least max if max is integer multiple of radius 0 plus integer times incr otherwise probing to next integer-->

<!--CRYSTALLOGRAPHY ANALYSIS PARAMETER-->
<CrystalloRadiusMax>2.000</CrystalloRadiusMax>
<!--defines spherical region about grid points probed-->
<SamplingGridBinWidthX>2.0</SamplingGridBinWidthX>
<SamplingGridBinWidthY>2.0</SamplingGridBinWidthY>
<SamplingGridBinWidthZ>2.0</SamplingGridBinWidthZ>
<ElevationAngleMin>-90.0</ElevationAngleMin>
<ElevationAngleIncr>1.0</ElevationAngleIncr>
<ElevationAngleMax>90.0</ElevationAngleMax>
<AzimuthAngleMin>0.0</AzimuthAngleMin>
<AzimuthAngleIncr>1.0</AzimuthAngleIncr>
<AzimuthAngleMax>360.0</AzimuthAngleMax>
<!--scans according to methods detailed in the paper spherical environment of atom projects candidates on hkl planes, bins, fft of histogram-->
<WindowingAlpha>8.0</WindowingAlpha>
<CrystalloHistoM>9</CrystalloHistoM>
<!--2 to power of m used for binning with addition of padding-->
<WindowingMethod>0</WindowingMethod>
<!--0 - rectangular window, high frequency resolution but also high side lobes-->
<!--1 - Kaiser window with window parameter alpha, lower frequency resolution but much lower side lobes-->



<!--DESCRIPTIVE SPATIAL STATISTICS ANALYSIS PARAMETER-->
<DescrStatTasksCode>Al,Al;AlMn,AlMn</DescrStatTasksCode>
<DescrStatTasksCode>Al-Al;Sc-Sc;Al-Sc;Sc-Al</DescrStatTasksCode>
<SpatStatRadiusMin>0.00</SpatStatRadiusMin>
<SpatStatRadiusIncr>0.1</SpatStatRadiusIncr>
<SpatStatRadiusIncr>0.05</SpatStatRadiusIncr>
<SpatStatRadiusMax>2.50</SpatStatRadiusMax>
<!--values in nanometer ie 0.0, 0.1, 10.0 means bin on 0.0 to at most 10.0 nm with 0.1 nm step-->
<SpatStatKNNOrder>10</SpatStatKNNOrder>
<!--which kth order nearest neighbor is desired if mode is nearest neighbor KNNOrder always one else picks kth if existent-->
<SpatStatMKNNCode>1;10;50;100;1000</SpatStatMKNNCode>
<SpatStatKNNOrder>100</SpatStatKNNOrder>
<!--which kth order nearest neighbor is desired for two point spatial statistics-->
<SpatStatMKNNCode>1;10;100</SpatStatMKNNCode>
<!--semicolon separated code of which kth order nearest neighbor is desired if mode is multi KNN-->
<SpatStatAdditionalLabelRandomization>0</SpatStatAdditionalLabelRandomization>
<!--if 1 rerun all task with labels randomized-->
Expand All @@ -97,24 +134,34 @@
<!--ie when FeMnC set several NN (Fe against Fe only, against all, against Mn, against C only one needs Fe,Fe;Fe,X;Fe,Mn;Fe,C-->

<!--CLUSTERING ANALYSIS PARAMETER-->
<ClusteringTasksCode>Al,Al</ClusteringTasksCode>
<ClusteringTasksCode>Sc-Sc</ClusteringTasksCode>
<!--see comment above for DescrStatTaskCode-->
<!--minimum number of ions included for cluster to be counted as valid Nmin includes central point-->
<ClustMaxSepDmaxMin>0.50</ClustMaxSepDmaxMin>
<ClustMaxSepDmaxIncr>0.10</ClustMaxSepDmaxIncr>
<ClustMaxSepDmaxMax>0.50</ClustMaxSepDmaxMax>
<ClustMaxSepDmaxMin>0.10</ClustMaxSepDmaxMin>
<ClustMaxSepDmaxIncr>0.05</ClustMaxSepDmaxIncr>
<ClustMaxSepDmaxMax>0.70</ClustMaxSepDmaxMax>
<!--in nm for each clustering task individual clustering analyses with different dmax in the aforementioned range are done-->
<ClustMaxSepNmin>5</ClustMaxSepNmin>
<!--currently neither dilatation nor erosion is applied-->
<ClustAPosterioriSpatStat>0</ClustAPosterioriSpatStat>
<!--if 1 perform a spatial distribution analysis on the clustered ions afterwards-->

<!--TESSELLATION PARAMETER-->
<SurfaceCellsCarvingRadius>1.0</SurfaceCellsCarvingRadius>
<!--in nm, specifies which cells in the analysis are discarded because they are too close to the tip surface
<!--PLOTING AND IO OPTIONS switched on if 1 switched off for all other values-->
<IOReconstruction>0</IOReconstruction>
<IOTriangulation>0</IOTriangulation>
<IOTriangulationBVH>0</IOTriangulationBVH>
<IOKDTreePartitioning>0</IOKDTreePartitioning>
<IOHKFilteredIons>0</IOHKFilteredIons>
<IOHKRawClusterID>0</IOHKRawClusterID>
<IOIonTipSurfDistances>0</IOIonTipSurfDistances>
<IOVoronoiDescrStats>0</IOVoronoiDescrStats>
<IOVoronoiCellPositions>0</IOVoronoiCellPositions>
<IOVoronoiTopoGeom>0</IOVoronoiTopoGeom>
<IOCrystallography>0</IOCrystallography>

<!--TIP SYNTHESIS-->
<SimRelBottomRadius>0.10</SimRelBottomRadius>
Expand All @@ -124,19 +171,19 @@
<!--in multiples of desired tip conical frustum height H which is computed automatically based on tip relative spacing-->
<SimMatrixLatticeConstant>0.404</SimMatrixLatticeConstant>
<!--in nanometer, currently fcc single crystalline lattice supported only-->
<SimNumberOfAtoms>2.0e6</SimNumberOfAtoms>
<SimNumberOfAtoms>2.0e9</SimNumberOfAtoms>
<!--how many atoms should be contained in the synthesized tip not accounting for detector eff-->
<SimDetectionEfficiency>1.0</SimDetectionEfficiency>
<SimFiniteSpatResolutionX>0.000</SimFiniteSpatResolutionX>
<SimFiniteSpatResolutionY>0.000</SimFiniteSpatResolutionY>
<SimFiniteSpatResolutionZ>0.000</SimFiniteSpatResolutionZ>
<!--variance of normal distributed scatter about ideal lattice position in nanometer-->
<SimNumberOfCluster>269.0</SimNumberOfCluster>
<SimNumberOfCluster>236570.0</SimNumberOfCluster>
<SimClusterRadiusMean>2.0</SimClusterRadiusMean>
<SimClusterRadiusSigmaSqr>0.1</SimClusterRadiusSigmaSqr>
<!--BE CAREFUL currently distribution parameter not the mean and variance in nanometer-->

<!--PERFORMANCE-->
<UseNUMABinding>0</UseNUMABinding>
<UseNUMABinding>1</UseNUMABinding>
<!--if 1 uses the numa library to bind threads to specific cores preventing context switching and improving locality-->
</Parameters>
Binary file added docs/Thumbs.db
Binary file not shown.
Binary file modified docs/build/doctrees/basics.doctree
Binary file not shown.
Binary file added docs/build/doctrees/changelog.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/environment.pickle
Binary file not shown.
Binary file modified docs/build/doctrees/executing.doctree
Binary file not shown.
Binary file added docs/build/doctrees/gui.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/index.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/input.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/licence.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/refs.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/setup.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/tutorials.doctree
Binary file not shown.
Binary file added docs/build/html/_images/PARAPROBEWorkflow_02.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
55 changes: 26 additions & 29 deletions docs/build/html/_sources/basics.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,45 +3,40 @@

What is PARAPROBE?
^^^^^^^^^^^^^^^^^^
| A software for data mining Atom Probe Tomography (APT) experiment data which sets
| prime focus on scalable spatial range querying and computational geometry tasks
| making use of scalable hierarchical parallelism.
| A software for data mining Atom Probe Tomography (APT) experiment data. It sets prime focus on
| applying scalable hierarchical parallelism to spatial range querying, clustering, atom probe
| crystallography, and computational geometry tasks making use of scalable hierarchical parallelism.
What are the user benefits?
^^^^^^^^^^^^^^^^^^^^^^^^^^^
| **Unbiased descriptive spatial statistics**
| Enabled by state of the art tip surface reconstruction surplus ion to surface distancing.
| **Scalable performance**
| Owing to parallel implementation with rigorous hierarchical spatial data partitioning
| strategy to improve the utilization of fast caches and ccNUMA-aware data placement policy.
| **Open source software**
| Therefore no usage restriction, no limited licences when running in parallel
| Therefore no usage restriction, unlimited licences when running in parallel
| surplus full functional transparency and modifiability.
| **Reduced analysis bias**
| Enabled by state of the art tip surface reconstruction surplus ion to surface distancing.
| **Scalable performance, large datasets**
| Thanks to parallel implementation with rigorous hierarchical spatial data partitioning
| strategy to improve the utilization of fast caches and ccNUMA-aware data placement policy.
Which parallelization concepts are employed?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
**Process data parallelism** --- the Message passing interface (MPI_) is utilized.
It processes each individual measurement via a single process. This enables to either distribute parameter sweeping studies of the same tip on practically hundred thousands of processes or to process trivially in parallel multiple tips using the same automatized analysis protocol. At runtime, MPI invokes library calls to communicate pieces of information between physically disjoint computers. Therefore, it requires installation and linking.

**Shared memory thread data parallelism** --- the Open Multiprocessing (OpenMP_) is used.
It allows to distribute the point data of each measurement spatially into disjoint logical chunks.
These chunks are mapped to and stored in thread-local memory when processed in parallel.
For some tasks the threads update explicitly data in the memory of other threads processing spatially adjacent points.
In such cases, explicit care is taken to prevent data invalidation and reduce false sharing.
OpenMP does not require installation because it builds on preprocessor directives, which get evaluated during the compilation stage.

**Streaming instruction data parallelism aka SIMD** --- the BSIMD_ portable vector intrinsics template library is used.
Some core geometrical and numerical tasks can be accelerated further within each thread using vectorization.
Such vectorization is realized via Single Instruction Multiple Data (SIMD) which makes use of highly problem-and-CPU-specific
instructions, the so-called intrinsics. Their key idea is to apply processing operations on a packet of multiple data elements
at once instead of sequentially. PARAPROBE employs BSIMD_ in order to improve code portability by solving the problem that
intrinsics have usually different names and implementation syntax for different CPUs. Upon compilation, the abstract BSIMD intrinsics
commands are encoded into the specific CPU command realization available on the target architecture.
**Process data parallelism** via the Message Passing Interface (MPI_) API
PARAPROBE processes each individual measurement through a single process. This enables to either distribute parameter sweeping studies of the same tip on practically hundred thousands of processes or to process trivially in parallel multiple tips using the same automatized analysis protocol. At runtime, MPI invokes library calls to communicate pieces of information between physically disjoint computers if necessary.
As MPI is a library, it requires installation and linking.

**Shared memory thread data parallelism** via the Open Multi-Processing (OpenMP_) API.
PARAPROBE partitions the point data of each measurement into spatially disjoint chunks. Explicit strategies are applied to map and place the data chunks in thread-local memory to reduce false sharing and performance degradation on resources with multiple ccNUMA layers. OpenMP builds on preprocessor directives through which the corresponding pragmas are translated during compilation. As such, OpenMP needs no installation.

.. **Streaming instruction data parallelism (SIMD)** via portable vector intrinsics template libraries (e.g. bSIMD_ or Vc_) is used.
.. At the level of each thread some core geometrical and numerical tasks can be accelerated further through vectorization. The key idea is to apply vectorized operation which applies to a packet of multiple data elements of the same kind rather than to process single data elements one after another. Technically, this is implementable through highly operation-, problem-, and-CPU-specific instructions of the CPU, the so-called intrinsics.
.. Unfortunately, this renders the code non-portable. Better portability is achieved through portable vector intrinsics. These wrap the individual intrinsics into more abstract commands and compile time instructions with which the choice for the specific realization is delegated to the compiler.
.. _MPI: http://www.mcs.anl.gov/research/projects/mpi/
.. _OpenMP: https://www.openmp.org/
.. _BSIMD: https://developer.numscale.com/bsimd/documentation/v1.17.6.0/faq.html
.. _BSIMD: https://developer.numscale.com/bsimd
.. _Vc: https://github.com/VcDevel/Vc
Solid HPC background literature
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| J. L. Hennessy and D. A. Patterson
Expand All @@ -54,9 +49,11 @@ Solid HPC background literature
| http://dx.doi.org/10.1007/978-3-642-37801-0
| J. Reinders and J. Jeffers
| High Performance Parallelism Pearls Volume One: Multicore and Many-Core Programming Approaches
| High Performance Parallelism Pearls Volume One:
| Multicore and Many-Core Programming Approaches
| 1st edition, 2014, Morgan Kaufmann
| J. Jeffers and J. Reinders
| High Performance Parallelism Pearls Volume Two: Multicore and Many-Core Programming Approaches
| High Performance Parallelism Pearls Volume Two:
| Multicore and Many-Core Programming Approaches
| 1st edition, 2015, Morgan Kaufmann
27 changes: 27 additions & 0 deletions docs/build/html/_sources/changelog.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
v0.1
^^^^
* **Initial implementation**
* POS, EPOS reading, RRNG range file parsing
* Barr et al. reconstruction, supports for up to 4.2 billion ion tips
* Generation of synthetic single-crystalline tip structures
* MPI/OpenMP thread parallelized spatial range querying and indexing tasks
* Tip surface extraction through alpha shapes to entire datasets
* Surface extraction made efficient through smart ion pruning pre-processing algorithm
* Floating point precision exact distancing of ions to the alpha shape triangle hull
* This allows to reduce bias in descriptive statistics and tessellation by excluding close to the surface ions from the analyses
* Thread parallel radial distribution function (RDF), k nearest neighbor (kNN), Ripley K
* Thread parallel 2-point descriptive spatial statistics
* In-built batch processing capability for fully automatic processing of such statistics
* Allows for arbitrary single and molecular ion type combinations
* Optional ion type label randomization
* Thread parallel maximum separation clustering method with parameter space sweeping capability
* This can also be combined with the batch processing functionality
* Thread parallel implementation of V. Araullo-Peters et al. reconstruction-space-based method for quantifying crystallographic signal through discrete Fourier analysis
* Thread parallel wrapper around C. Rycrofts Voro++ library to enable hitherho impossible computations of volume tessellations to the entire tip
* Characterize the cell volume to obtain atomic scale concentration values and topology through nearest neighbor analysis and p-vectors
* Hierarchical Data Format (HDF5) / eXtensible Data Model and Format (XDMF) powered results reporting

Beta-stage functionality
^^^^^^^^^^^^^^^^^^^^^^^^
* Optional a posteriori relabeling of ions after each maximum separation clustering run to perform descriptive spatial statistics in population of remaining non-clustered ions using guard zones

0 comments on commit d20e679

Please sign in to comment.