Skip to content

Developer Notes

Mo Tiwari edited this page Jan 26, 2022 · 17 revisions

Welcome to the BanditPAM wiki!

This is a space for code contributors to keep track of notes and learnings that don't belong in Github issues.

Highly Requested Features that Mo won't have time to work on:

  • An R implementation of BanditPAM
  • An MATLAB implementation of BanditPAM

Less Requested Features that Mo won't have the time to work on:

  • An integration with PySpark

Gotchas:

  • setuptools will always, at least partly, use the compiler that Python was compiled with. This causes a problem, e.g., when trying to install clang-compiled BanditPAM on gcc-compiled Python and was resulting in errors. This CANNOT be fixed by modifying the CC environment variable. See https://github.com/pypa/setuptools/issues/1732
  • You may occasionally get a bug like (Producer: 'LLVM13.0.0' Reader: 'LLVM 12.0.0'); somehow this was the case in base after uninstalling and reinstalling some brew packages. Weirdly, it was resolved by creating a new Python 3.8 conda environment, in which BanditPAM could be installed successfully, and then somehow (?!) fixed in base
  • Building the PyPy wheels on MacOS via cibuildwheel does not work properly; see install_mac.md

Potential Cache Improvements:

  • potentially transpose cache to avoid false sharing
  • Move to multi-producer single-consumer queue for cache so that cache can be dynamically resized
  • Give each thread a local copy of cache
  • Helpful resource: Lecture 9 of series in OMP

Potential OpenMP Improvements:

  • Good practice to have default(none) inside all omp parallel workspace constructs
  • Prevent false sharing among threads for better speedups (This is dependent on local cache line size and datatype sizes)
  • Consider using loop reductions via OpenMP

Github actions

  • Right now, we compile with system python on the MacOS Github runners. It appears to work, though I'm not sure if the runners are using gcc or clang -- or if it matters, since the setup.py should detect it properly.

Potential frameworks to investigate:

C++ frameworks to investigate:

  • Eigen (pybind11 supports it out of the box, and we will likely no longer need carma or armadillo)
  • Boost
  • Folly

Clone this wiki locally