Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Many improvements to RIVET #115

Merged
merged 164 commits into from
Jun 22, 2018
Merged

Many improvements to RIVET #115

merged 164 commits into from
Jun 22, 2018

Conversation

mlesnick
Copy link
Contributor

@mlesnick mlesnick commented May 10, 2018

This pull request encompasses my work to the code, as well as Roy Zhao's work and Simon Segert's work.

Here is a summary:

-Roy's work (and my edits to it) introduce the ability to handle multicritical filtrations, including the construction of the degree-Rips bifiltration.

-To improve the structure of the code, Roy introduced the BifiltrationData and FIRep classes. He got rid of the simplex tree representation of a simplicial complex, replacing it with a more straightforward list-of-simplices representations of a simplicial complex in BifiltrationData. Note that this stores only simplices in dimension j-1, j, and j+1, where j is the homology dimension of interest.

-Roy's work also introduces the firep input file type, which allows for presentations and arbitrary boundary matrices (e.g., cellular boundary matrices) as input.

-Roy's work changes the input format for the "bifiltration" input file. In particular, files of this type now need to include simplices of lower dimension, not just maximal simplices.

-My work replaces the linked list column data structures that used to underly RIVET with (optimized versions of) PHAT's lazy heaps.

-I added the ability to compute Betti numbers via computation of a minimal presentation of a 2-D persistence module. This replaces the earlier Koszul Homology algorithm for computing Betti numbers as the default, though the older algorithm is still available in the rivet_console via the --koszul flag. The performance of the two algorithms is comparable for computing Betti numbers, but the new approach makes barcode template computation much, much faster, since a minimal presentation (which is usually quite small) can now be given as input to the barcode template computations. The speed of minimal presentation / Betti number computation will continue to improve significantly in the future as we do more optimizations.

-The minimal presentation can be printed via the --minpres flag.

-I have fixed issue #76, which was the last remaining major bug.

-Simon made improvements to the interface which allow us to adjust the bounds of the viewable window. This allows us to compare two different RIVET visualizations on the same scale, and to zoom in on the interesting parts of the output.

-Simon also added the ability to handle descending coordinates gracefully in RIVET. One can now specify in an input file that a coordinate direction is descending by adding "[-]" in front of the axis label, e.g. "[-]" density. The prefix "[+]" to specify an ascending coordinate direction is optional.

-I have updated the data folder in RIVET to include more examples, including lots of small toy examples that I have accumulated over time.

Note that the RIVET documentation is only partially caught up with these improvements and extensions, and in any case ought to be consolidated into a single file, instead of multiple web pages.
We will work on this in the next weeks.

There is still more to do in many of areas where this pull request provides improvements, but I believe that we are currently at a nice local optimum, and that this is a good time to incorporate these improvements into RIVET.

Roy Zhao and others added 30 commits March 29, 2017 15:50
# Conflicts:
#	interface/input_manager.cpp
Vectors sorted by comparing lowest index, then if those are equal, the second lowest and so on.
… Moved it to private and altered the comparison method to decrease the number of row operations done in test examples.
Also, code seems not to be building bifiltrations correctly.
… a bigrade.

Speeds up persistence computations a lot, as expected, but slows down't FIRep computation too much.

Will continue to optimize.
…these more efficient.

Draft of changes is done and compiles, but is exhibiting some errors.
…tly on numerous examples. Still, more testing is needed.

The edited code is consistently faster than Matthew's code on 1-critical examples, which in turn was faster than Roy's old code.  Further optimizations are surely possible, but as far as I know, I've addressed the biggest issues.

A brief summary of improvements / changes since Roy's last commit:
-In Roy's old code, there was a hash table storing (grade,simplex) pairs that was being constructed but not used.  I have removed this.

-The old code stored the relevant part of the bifiltration in hash tables (std::unordered_map), as member data of the bifiltration_data object.  To build the FI-Reps, the code iterated through the hash table, which is inefficient.  Moreover, a hash table representation of the simplices in the high dimension is never needed.  In my changes, bifiltration_data stores the simplices as lists.  The firep class also builds the hash tables in low and mid dimension, but not in the high dimension.  The next bullet gives further motivation for this.

-For Vietoris-Rips bifiltrations, within a given grade, Matthew's code naturally orders simplices lexicographically by vertex label.  Experiments have suggested that this ordering works much better for the later computations (e.g., barcode templates) than alternatives.  By storing simplices only in a hash table, Roy's code also naturally constructed simplices of a given dimension in the lex order on vertices, but by putting these in a hash table, it then discarded that order.  One can do sorting to recover the order, but this incurs a time cost.  The new code directly builds lists of simplices in lex order (without sorting).  A stable sort is then used to sort these by grade, so that the lex order is preserved within a grade.

-To build the boundary matrices, we traverse the columns in colex order.  In the multicritical case, when building the high boundary matrix, we also look for a bigrade for a boundary simplex in colex order.  Given this, it follows that if a (simplex, grade) pair is rejected as we build the boundary matrix because the grade is not compatible, then we never have to consider that pair again.  This can save a bit of work.

-The earlier code of both Matthew and Roy accepts as text input for the "bifiltration" format just the maximal simplices in a simplicial complex.  I have done away with this convention; I now require that all simplices are given directly.  (TODO: check whether these simplices need to be given in filtration order.)  We could reintroduce the earlier convention, but this seems awkward in the current framework, and will incur a cost.

-Generally, I tried to avoid copying simplices and other data.  There probably is still be room for improvement in this regard, and where I have a doubt about efficiency, I have left a TODO comment.
@mlesnick
Copy link
Contributor Author

mlesnick commented Jun 19, 2018 via email

@mlwright84
Copy link
Contributor

I just pushed two commits that modify VisualizationWindow. Most notably, I moved two text fields (that display the homology dimension and input file name) to below the control elements. This reduces the minimum width of the window, to allow it to fit on screens as narrow as 1000 pixels or so. (Previously, its minimum width was greater than 1280 pixels, the width of a screen I was testing it on.) I also removed the status bar, since we weren't really using it.

@mlwright84
Copy link
Contributor

This pull request looks good to me. I've tested RIVET on several data sets, and it works well. I like the improvements to the interface. Also, rivet_console is much faster than before -- e.g. I did a H_0 calculation on 3600 points in 90 seconds, rather than a few hours in the old RIVET.

Copy link
Contributor

@xoltar xoltar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the whole this is a big improvement! It's a little too big to actually review in any useful way, but I think we should move forward with it since the benefits far outweigh any of the possible downsides. I would appreciate some fixes to meet naming standards as I pointed out in a few places, and please run clang-format as specified in the README.

computation.cpp Outdated
: params(params)
, progress(progress)
, verbosity(params.verbosity)
Computation::Computation(int vrbsty, Progress& progress)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use real words, including the vowels.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Bryn, broadly that seems like a good suggestion, but in this case, how do you want it implemented? The natural name for this argument is "verbosity," but that is already taken here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parameter names and initializer names resolve as you would hope in initializer lists, so you can actually do verbosity(verbosity) and it does the right thing.

computation.cpp Outdated
@@ -55,26 +56,127 @@ std::unique_ptr<ComputationResult> Computation::compute_raw(ComputationInput& in
}
}

//STAGE 3: COMPUTE MULTIGRADED BETTI NUMBERS
//STAGE 3: COMPUTE MINIMAL PRESENTTION AND MULTIGRADED BETTI NUMBERS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo in PRESENTATION

// If the --koszul flag is not given, then we compute Betti numbers by
//computing a minimal presentation
if (!koszul)
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is enough code that it might make sense to extract a separate method for it.

the MST before finding the path... By changing the output format of Krustal
we should be able to eliminate the first of these.
*/
std::vector<NodeAdjacencyList> adjList(arrangement.faces.size(), NodeAdjacencyList());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be adj_list

@@ -610,3 +638,77 @@ void ArrangementBuilder::find_subpath(Arrangement& arrangement,
}

} //end find_subpath()

void ArrangementBuilder::treeToDirectedTree(std::vector<NodeAdjacencyList>& adjList, unsigned start, std::vector<std::vector<unsigned>>& children)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be tree_to_directed_tree

void ArrangementBuilder::treeToDirectedTree(std::vector<NodeAdjacencyList>& adjList, unsigned start, std::vector<std::vector<unsigned>>& children)
{
std::vector<bool> discovered(adjList.size(),false);// c++ vector for keeping track of which nodes have been visited
std::vector<unsigned> branchWeight(adjList.size(),0); // this will contain the weight of the edges "hanging" from the node represented by its index in branchWeight
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

variables use snake_case not camelCase.

math/firep.cpp Outdated
#include <stdexcept>

//constructor; requires verbosity parameter
FIRep::FIRep(Presentation pres, int vbsty)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vowels again please.

@mlesnick
Copy link
Contributor Author

@xoltar @mlwright84 thanks for testing and for the input. I will take care of the changes Bryn requested.

@SimonSegert is now finishing a fix to issue #120 that I would like to incorporate before merging.

Also, I discovered a bug with the visualization in that arises in the current branch when "fit to window" is deselected. This occurs on my MacBook, but neither on Simon's machine nor on my Linux machine. I'm sure we could track this bug down, but I wonder if it would be better to just remove "fit to window" altogether; this feature has never really been of any use to me.

If we do this, I suppose it would suffice to move "show barcode" to halfway between where it is now and where "fit to window" is. Thoughts?

After we resolve these issues, we should be ready to merge.

@mlwright84
Copy link
Contributor

I'm sorry to hear about the bug involving "fit to window." I think the "fit to window" checkbox is worth preserving in RIVET -- my students and I have occasionally selected and deselected it in our use of RIVET. I think it's OK to merge the pull request, note this "fit to window" issue as a bug, and then fix it later.

@mlesnick
Copy link
Contributor Author

Ok--let's keep the fit to window feature then. And I agree, the fit to window bug need not hold up the merge.

@mlesnick
Copy link
Contributor Author

I did some cosmetic work on the code to address the style issues Bryn mentioned. There still exist some short variable names in the code, though I changed many of these. I suggest we take care of the others later. Most notably, some short member names that Bryn would not like are used in the structs defined in bifiltration_data.h. But I have some changes in progress to bifiltration_data.h to make the code more memory efficient. We can clean up the variable names as part of those changes.

The only two issues that remain on the table are the minor bugs in the visualization mentioned above, (apparently) related to Simon's work. But it doesn't seem to make sense to hold up the PR for these.

@xoltar
Copy link
Contributor

xoltar commented Jun 22, 2018 via email

@mlesnick mlesnick merged commit ed4afc9 into rivetTDA:master Jun 22, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants