Skip to content

Developer Notes

Mo Tiwari edited this page Jan 26, 2022 · 17 revisions

Welcome to the BanditPAM wiki!

This is a space for code contributors to keep track of notes and learnings that don't belong in Github issues.

Highly Requested Features that Mo won't have time to work on:

  • An R implementation of BanditPAM
  • An MATLAB implementation of BanditPAM

Less Requested Features that Mo won't have the time to work on:

  • An integration with PySpark

Gotchas:

  • setuptools will always, at least partly, use the compiler that Python was compiled with. This causes a problem, e.g., when trying to install clang-compiled BanditPAM on gcc-compiled Python and was resulting in errors. This CANNOT be fixed by modifying the CC environment variable. See https://github.com/pypa/setuptools/issues/1732
  • You may occasionally get a bug like (Producer: 'LLVM13.0.0' Reader: 'LLVM 12.0.0'); somehow this was the case in base after uninstalling and reinstalling some brew packages. Weirdly, it was resolved by creating a new Python 3.8 conda environment, in which BanditPAM could be installed successfully, and then somehow (?!) fixed in base
  • Building the PyPy wheels on MacOS via cibuildwheel does not work properly; see install_mac.md. We get an error in the Github Actions like the one below. I separately tried adding this gist to the .yml, as well as this suggestion, but neither worked. A future possibility is to a) upgrade the Accelerate framework on the runner, b) avoid using the Accelerate framework for the PyPy builds, c) try a version of macos on the runner that's later than macos 10.15 (but this might hurt backwards compatibility), or d) try to modify the PyPy build's numpy installation once it has been instantiated
RuntimeError: Polyfit sanity test emitted a warning, most likely due to using a buggy Accelerate backend. If you compiled yourself, more information is available at https://numpy.org/doc/stable/user/building.html#accelerated-blas-lapack-libraries Otherwise report this to the vendor that provided NumPy.
    RankWarning: Polyfit may be poorly conditioned

Potential Cache Improvements:

  • potentially transpose cache to avoid false sharing
  • Move to multi-producer single-consumer queue for cache so that cache can be dynamically resized
  • Give each thread a local copy of cache
  • Helpful resource: Lecture 9 of series in OMP

Potential OpenMP Improvements:

  • Good practice to have default(none) inside all omp parallel workspace constructs
  • Prevent false sharing among threads for better speedups (This is dependent on local cache line size and datatype sizes)
  • Consider using loop reductions via OpenMP

Github actions

  • Right now, we compile with system python on the MacOS Github runners. It appears to work, though I'm not sure if the runners are using gcc or clang -- or if it matters, since the setup.py should detect it properly.

Potential frameworks to investigate:

C++ frameworks to investigate:

  • Eigen (pybind11 supports it out of the box, and we will likely no longer need carma or armadillo)
  • Boost
  • Folly

Clone this wiki locally