Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds auction algorithm computations and distance matrix filter. #1

Open
wants to merge 57 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
5f90c6f
Committing initial code change for auction algorithms.
rachellevanger Nov 6, 2015
f5bbbd5
Troubleshooting ANN.h linking.
rachellevanger Nov 6, 2015
99f82b5
Shaun fixed CMakeLists.txt for proper flags/includes.
rachellevanger Nov 6, 2015
14a6082
Adding Auction and subsample namespaces.
rachellevanger Nov 7, 2015
cc1658c
Cleaning up auction algorithm namespaces for bottleneck.
rachellevanger Nov 9, 2015
bb4be70
Adding libraries and fixing compilation errors.
rachellevanger Nov 10, 2015
112437b
Added subsample namespace; quarantined old auction algorithm code; pu…
rachellevanger Nov 18, 2015
8d21e2b
Updating paths.
rachellevanger Nov 18, 2015
1563552
Troubleshooting bottleneck.cpp::main.
rachellevanger Nov 18, 2015
eac5b33
Syncing for stopping point on auction algorithms.
rachellevanger Nov 21, 2015
da14bbf
Adding ability to accept approximation_error as json parameter.
rachellevanger Nov 21, 2015
8144129
Adding ability to accept approximation_error as json parameter.
rachellevanger Nov 21, 2015
6582890
Accepting approximation parameter into distance computation code.
rachellevanger Nov 21, 2015
44171aa
Accepting approximation parameter into distance computation code.
rachellevanger Nov 21, 2015
7dd9764
Pushing approximation parameter to distance computations.
rachellevanger Nov 21, 2015
8490189
Pushed approximate json parameter through to distance code.
rachellevanger Nov 21, 2015
a963277
Removed unnecessary test data.
rachellevanger Nov 21, 2015
b4c2a6d
Updating test files to include errors.
rachellevanger Nov 21, 2015
88ec750
Cleaning up branch.
rachellevanger Nov 30, 2015
29f9094
Cleaning up branch.
rachellevanger Nov 30, 2015
8f25f4f
Cleaning up branch.
rachellevanger Nov 30, 2015
0e598c3
Cleaning up branch.
rachellevanger Nov 30, 2015
1947e66
Cleaning up branch.
rachellevanger Nov 30, 2015
52c6e38
Cleaning up branch.
rachellevanger Nov 30, 2015
ab7ddda
Removing build files.
rachellevanger Dec 2, 2015
c6c5c19
Updating .gitignore
rachellevanger Dec 2, 2015
ea63b7d
Cleaning up.
rachellevanger Dec 2, 2015
0b61c3b
Cleaning up.
rachellevanger Dec 2, 2015
2edff2e
Updating .gitignore.
rachellevanger Dec 2, 2015
8feaaaf
Merging branches.
rachellevanger Dec 2, 2015
03bac4f
Cleaning up.
rachellevanger Dec 2, 2015
2103108
Cleaning up.
rachellevanger Dec 2, 2015
cd57b91
Initial changes for wasserstein approximations
rachellevanger Dec 2, 2015
6836787
Getting wasserstein to work. Removing approximation parameter from su…
rachellevanger Dec 2, 2015
559b24d
Adding cmakelists.
rachellevanger Dec 2, 2015
20a2dc2
Troubleshooting wasserstein approximation issues.
rachellevanger Dec 2, 2015
43b8308
Integrating new versions of auction algorithms into the code base.
rachellevanger Jan 4, 2016
051060b
Updating references for new source files.
rachellevanger Jan 4, 2016
df2eb56
Trying to get new auction code to compile with subsample code.
rachellevanger Jan 16, 2016
e4ae110
Finishing up auction code integration.
rachellevanger Jan 16, 2016
5177fab
Cleaning up tests and final run-through.
rachellevanger Jan 16, 2016
a2d3650
Removing unnecessary file.
rachellevanger Jan 16, 2016
145e6d5
Putting quotes around inf in test script
rachellevanger Feb 6, 2016
7da0afa
Staging changes for filtering the distance matrix computations.
rachellevanger Feb 6, 2016
eca003c
Adding debug messages.
rachellevanger Feb 6, 2016
22db219
Debugging.
rachellevanger Feb 6, 2016
6e39438
Debugging.
rachellevanger Feb 6, 2016
5f4bfd6
Debugging.
rachellevanger Feb 6, 2016
8149726
Debugging.
rachellevanger Feb 6, 2016
9eb4d7f
Debugging.
rachellevanger Feb 6, 2016
f670613
Debugging.
rachellevanger Feb 6, 2016
7115dc2
Updating CMakeLists.txt and test script.
rachellevanger Feb 7, 2016
d315c54
Removing debug statements.
rachellevanger Feb 7, 2016
0ab2569
Debugging segfault issues.
rachellevanger Feb 9, 2016
3153ddf
Removing debug statements and adding check for sample/subsample sizes.
rachellevanger Feb 11, 2016
e34d9a1
Updating docs for distance filter and relative error for auction algo…
rachellevanger Feb 13, 2016
640c52e
Updating docs.
rachellevanger Feb 13, 2016
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 8 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,10 @@
ComputeDistanceMatrix
.DS_Store
*.o

/build
tests/data/
tests/distance*
tests/subsample*
tests/sample.json
tests/filter*
bin/ComputeSubsample
bin/ComputeDistances
5 changes: 5 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ endif("${isSystemDir}" STREQUAL "-1")
#########

set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11")
#set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -march=native -mtune=native -ffast-math")
set(CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG} -O0 -fprofile-arcs -ftest-coverage")
set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} -O3 -DNDEBUG" )

Expand All @@ -37,6 +38,9 @@ find_package(MPI REQUIRED)

include_directories (
./include
./include/persistence/approximatedistances/geom_bottleneck/bottleneck/include/ANN
./include/persistence/approximatedistances/geom_bottleneck/bottleneck/include
./include/persistence/approximatedistances/geom_matching/wasserstein/include
/usr/local/include
/opt/local/include
${Boost_INCLUDE_DIRS}
Expand Down Expand Up @@ -64,6 +68,7 @@ set(CMAKE_RUNTIME_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/bin)
# Recurse #
###########

add_subdirectory (include/persistence/approximatedistances)
add_subdirectory (source)

#########
Expand Down
2 changes: 0 additions & 2 deletions bin/.gitignore

This file was deleted.

4 changes: 2 additions & 2 deletions doc/index.raw
Original file line number Diff line number Diff line change
Expand Up @@ -62,8 +62,8 @@ The output of the subsample program will be stored in the supplied filename `/pa

The input to the distance program is the output from the subsample program. The arguments are
```bash
/path/to/subsample.json /path/to/distance.txt
error /path/to/subsample.json /path/to/distance.txt /path/to/distance_filter.txt
```
where the first is a path to the subsample (which contains a path to the original sample), and the second path is the location the distance matrix is to be stored.
where the first is a nonnegative number giving the relative error for computing the approximate distance between the persistence diagrams (supply 0.0 for exact distance computation), the second a path to the subsample (which contains a path to the original sample), the third path is the location the distance matrix is to be stored, and the fourth is an optional path to a filter for the distance matrix computations. If supplied, this distance matrix filter should be a sequence in {0,1} that is the same size as the expected output of /path/to/distance.txt. If not supplied, all pairwise distances will be computed.


79 changes: 79 additions & 0 deletions include/persistence/BottleneckApproximateDistance.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
/// BottleneckApproximateDistance.h
/// Author: Rachel Levanger
/// Date: Nov 6, 2015
/// Edit history:
/// - Created from BottleneckDistance.h to interface approximate algorithms with Subsample code (Rache Levanger, Nov 6, 2015)


#ifndef BOTTLENECKAPPROXIMATEDISTANCE_H
#define BOTTLENECKAPPROXIMATEDISTANCE_H

#include <vector>
#include "persistence/PersistenceDiagram.h"
#include "persistence/approximatedistances/geom_bottleneck/bottleneck/include/bottleneck.h"

double
BottleneckApproximateDistance( subsample::PersistenceDiagram const& diagram_1,
subsample::PersistenceDiagram const& diagram_2,
const double epsilon );


namespace BottleneckApproximateDistance_wrapper {

struct BottleneckApproximationWrapper {

/* Generators for two persistence diagrams which are going to be compared */
std::vector<subsample::Generator> Generators1;
std::vector<subsample::Generator> Generators2;

/* Read the generators into the approximate algorithm class. */
bool populateDiagramPointSets(std::vector<std::pair<double, double>>& A,
std::vector<std::pair<double, double>>& B)
{
A.clear();
B.clear();

/* Read in generators from Generators1 to A */
for ( std::vector<subsample::Generator>::const_iterator cur = Generators1.begin();
cur != Generators1.end(); ++cur ) {
A.push_back(std::make_pair(cur->birth, cur->death));
}
/* Read in generators from Generators2 to B */
for ( std::vector<subsample::Generator>::const_iterator cur = Generators2.begin();
cur != Generators2.end(); ++cur ) {
B.push_back(std::make_pair(cur->birth, cur->death));
}
return true;
}

};

} //namespace

inline double
BottleneckApproximateDistance( subsample::PersistenceDiagram const& diagram_1,
subsample::PersistenceDiagram const& diagram_2,
const double epsilon ) {

using namespace BottleneckApproximateDistance_wrapper;

BottleneckApproximationWrapper bw;

std::vector<subsample::Generator> & Generators1 = bw . Generators1;
std::vector<subsample::Generator> & Generators2 = bw . Generators2;
Generators1 . assign ( diagram_1 . begin (), diagram_1 . end () );
Generators2 . assign ( diagram_2 . begin (), diagram_2 . end () );

std::vector<std::pair<double, double>> A, B;
if (!bw.populateDiagramPointSets(A, B)) {
std::cout << "Could not convert PersistenceDiagrams to DiagramPointSets.\n";
return -1;
}

double distance;
distance = geom_bt::bottleneckDistApprox(A, B, epsilon);
return distance;

}

#endif
31 changes: 16 additions & 15 deletions include/persistence/BottleneckDistance.h
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,8 @@
#include "persistence/PersistenceDiagram.h"

double
BottleneckDistance( PersistenceDiagram const& diagram_1,
PersistenceDiagram const& diagram_2 );
BottleneckDistance( subsample::PersistenceDiagram const& diagram_1,
subsample::PersistenceDiagram const& diagram_2 );


namespace BottleneckDistance_detail {
Expand All @@ -31,8 +31,8 @@ class Edge{
struct BottleneckProblem {

/* Generators for two persistence diagrams which are going to be compared */
std::vector<Generator> Generators1;
std::vector<Generator> Generators2;
std::vector<subsample::Generator> Generators1;
std::vector<subsample::Generator> Generators2;
/* Number of generators in Generator1 and Generator2 */
unsigned int Max_Size;
/* Edges between all the nodes given by generators of the persistence diagrams */
Expand All @@ -50,7 +50,7 @@ struct BottleneckProblem {
std::vector<int> Layers;

void PrepareEdges( void ){
Generator::Distance distance;
subsample::Generator::Distance distance;
/*Set the number of generators */
Max_Size = Generators1.size() + Generators2.size();
/*Clear edges */
Expand All @@ -61,22 +61,22 @@ struct BottleneckProblem {
Edges.push_back(Edge(i, j, 0));
/* Edges between real points */
unsigned int i = 0;
for ( std::vector<Generator>::const_iterator cur1 = Generators1.begin();
for ( std::vector<subsample::Generator>::const_iterator cur1 = Generators1.begin();
cur1 != Generators1.end(); ++cur1 ) {
unsigned int j = Max_Size;
for ( std::vector<Generator>::const_iterator cur2 = Generators2.begin();
for ( std::vector<subsample::Generator>::const_iterator cur2 = Generators2.begin();
cur2 != Generators2.end(); ++cur2 ) {
Edges.push_back(Edge(i,j++, distance(*cur1, *cur2)));
}
++i;
}
/* Edges between real points and their corresponding diagonal points */
i = 0;
for ( std::vector<Generator>::const_iterator cur1 = Generators1.begin();
for ( std::vector<subsample::Generator>::const_iterator cur1 = Generators1.begin();
cur1 != Generators1.end(); ++cur1, ++i)
Edges.push_back( Edge( i, Max_Size + Generators2.size() + i, distance . diagonal ( *cur1 ) ) );
i = Max_Size;
for ( std::vector<Generator>::const_iterator cur2 = Generators2.begin();
for ( std::vector<subsample::Generator>::const_iterator cur2 = Generators2.begin();
cur2 != Generators2.end(); ++cur2, ++i)
Edges.push_back( Edge( Generators1.size() + (i - Max_Size), i, distance . diagonal ( *cur2 ) ) );
std::sort(Edges.begin(), Edges.end());
Expand Down Expand Up @@ -156,23 +156,24 @@ struct BottleneckProblem {
} //namespace

inline double
BottleneckDistance( PersistenceDiagram const& diagram_1,
PersistenceDiagram const& diagram_2 ) {
BottleneckDistance( subsample::PersistenceDiagram const& diagram_1,
subsample::PersistenceDiagram const& diagram_2 ) {
using namespace BottleneckDistance_detail;
BottleneckProblem bp;
std::vector<int> & Pair = bp.Pair;
std::vector<int> & Layers = bp.Layers;
std::vector< std::vector < int > > & Connections = bp.Connections;
std::vector<Generator> & Generators1 = bp . Generators1;
std::vector<Generator> & Generators2 = bp . Generators2;
std::vector< std::vector < int > > & Connections = bp . Connections;
std::vector<subsample::Generator> & Generators1 = bp . Generators1;
std::vector<subsample::Generator> & Generators2 = bp . Generators2;
unsigned int & Max_Size = bp . Max_Size;
std::vector<Edge> & Edges = bp . Edges;
Generators1 . assign ( diagram_1 . begin (), diagram_1 . end () );
Generators2 . assign ( diagram_2 . begin (), diagram_2 . end () );
/* If both diagrams are empty the distance is 0 */
if( Generators1.size() == 0 && Generators2.size() == 0 ) return 0;
bp.PrepareEdges ();

bp.PrepareEdges ();

/* Clear the pairing */
Pair.clear( );
Pair.assign( 2*Max_Size, -1);
Expand Down
12 changes: 10 additions & 2 deletions include/persistence/PersistenceDiagram.h
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,9 @@
#include <boost/serialization/base_object.hpp>
#include <boost/serialization/vector.hpp>

namespace subsample
{

int64_t generator_distance_count = 0;
struct Generator {
double birth;
Expand Down Expand Up @@ -80,10 +83,12 @@ class PersistenceDiagram : private std::vector<Generator> {
using Base::const_iterator;
using Base::operator[];
void load ( std::string const& filename ) {

clear ();
std::ifstream infile ( filename );
if ( not infile . good () ) {
throw std::runtime_error("PersistenceDiagram::load. File not found: " + filename );
if (not infile . good ()) {
std::cout << "PersistenceDiagram::load. File not found. \n";
throw std::runtime_error("PersistenceDiagram::load. File not found.");
}
std::string line;
while ( std::getline ( infile, line ) ) {
Expand All @@ -98,6 +103,7 @@ class PersistenceDiagram : private std::vector<Generator> {
}
infile . close ();
// Replace -1's with something sensible

double max_entry = 0;
for ( int i = 0; i < size (); ++ i ) {
max_entry = std::max ( (*this)[i].birth, max_entry );
Expand All @@ -121,4 +127,6 @@ class PersistenceDiagram : private std::vector<Generator> {
}
};

}

#endif
76 changes: 76 additions & 0 deletions include/persistence/WassersteinApproximateDistance.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
#ifndef WASSERSTEINAPPROXIMATEDISTANCE_H
#define WASSERSTEINAPPROXIMATEDISTANCE_H
#include <cmath>
#include <cstring>
#include <vector>
#include <algorithm>

#include "persistence/PersistenceDiagram.h"
#include "persistence/approximatedistances/geom_matching/wasserstein/include/wasserstein.h"

double
WassersteinApproximateDistance( subsample::PersistenceDiagram const& diagram_1,
subsample::PersistenceDiagram const& diagram_2,
const double p,
const double epsilon );


namespace WassersteinApproximateDistance_wrapper {

struct WassersteinApproximationWrapper {
/* Generators for two persistence diagrams which are going to be compared */
std::vector<subsample::Generator> Generators1;
std::vector<subsample::Generator> Generators2;

/* Read the generators into the approximate algorithm class. */
bool populateDiagramPointSets(std::vector<std::pair<double, double>>& A,
std::vector<std::pair<double, double>>& B)
{
A.clear();
B.clear();
/* Read in generators from Generators1 to A */
for ( std::vector<subsample::Generator>::const_iterator cur = Generators1.begin();
cur != Generators1.end(); ++cur ) {
A.push_back(std::make_pair(cur->birth, cur->death));
}
/* Read in generators from Generators2 to B */
for ( std::vector<subsample::Generator>::const_iterator cur = Generators2.begin();
cur != Generators2.end(); ++cur ) {
B.push_back(std::make_pair(cur->birth, cur->death));
}
return true;
}

};

} //namespace


inline double
WassersteinApproximateDistance( subsample::PersistenceDiagram const& diagram_1,
subsample::PersistenceDiagram const& diagram_2,
const double p,
const double epsilon ) {

using namespace WassersteinApproximateDistance_wrapper;

WassersteinApproximationWrapper ww;

std::vector<subsample::Generator> & Generators1 = ww . Generators1;
std::vector<subsample::Generator> & Generators2 = ww . Generators2;
Generators1 . assign ( diagram_1 . begin (), diagram_1 . end () );
Generators2 . assign ( diagram_2 . begin (), diagram_2 . end () );

std::vector<std::pair<double, double>> A, B;
if (!ww.populateDiagramPointSets(A, B)) {
std::cout << "Could not convert PersistenceDiagrams to DiagramPointSets.\n";
return -1;
}

double distance;
distance = geom_ws::wassersteinDist(A, B, p, epsilon);
return distance;

}

#endif
10 changes: 5 additions & 5 deletions include/persistence/WassersteinDistance.h
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@
#include "PersistenceDiagram.h"

double
WassersteinDistance( PersistenceDiagram const& diagram_1,
PersistenceDiagram const& diagram_2,
WassersteinDistance( subsample::PersistenceDiagram const& diagram_1,
subsample::PersistenceDiagram const& diagram_2,
double p );


Expand Down Expand Up @@ -142,11 +142,11 @@ namespace WassersteinDistance_detail {
}

inline double
WassersteinDistance( PersistenceDiagram const& diagram_1,
PersistenceDiagram const& diagram_2,
WassersteinDistance( subsample::PersistenceDiagram const& diagram_1,
subsample::PersistenceDiagram const& diagram_2,
double p ) {
using namespace WassersteinDistance_detail;
Generator::Distance distance;
subsample::Generator::Distance distance;
// Distance matrix the generators in diagram_1 and diagram_2
double* distanceMatrix;
// Distance matrix has (diagram_1.size + diagram_2.size())^2 elements
Expand Down
7 changes: 7 additions & 0 deletions include/persistence/approximatedistances/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
###########
# Recurse #
###########

add_subdirectory (geom_bottleneck)
add_subdirectory (geom_matching)

Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
###########
# Recurse #
###########

add_subdirectory (bottleneck)

Loading