Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Major performance improvement by exchange vectorized Numpy code with Numba #40

Merged
merged 27 commits into from
May 14, 2018

Conversation

fsimkovic
Copy link
Contributor

@fsimkovic fsimkovic commented May 4, 2018

Added

  • numba added as dependency

Changed

  • SequenceFile.calculate_freq backend changed from numpy to numba for faster computation
  • SequenceFile.calculate_weights backend changed from numpy to numba for faster computation
  • SequenceFile.filter backend changed from numpy to numba for faster computation
  • SequenceFile.filter_gapped backend changed from numpy to numba for faster computation
  • SequenceFile.calculate_weights renamed to SequenceFile.get_weights
  • SequenceFile.compute_freq renamed to SequenceFile.get_frequency
  • ContactMap.singletons backend changed from numpy to numba for faster computation
  • Bandwidth backend changed from numpy to numba for faster computation
  • ContactMap.short_range_contacts renamed to ContactMap.short_range
  • ContactMap.medium_range_contacts renamed to ContactMap.medium_range
  • ContactMap.long_range_contacts renamed to ContactMap.long_range
  • ContactMap.calculate_scalar_score renamed to ContactMap.set_scalar_score
  • ContactMap.calculate_contact_density renamed to ContactMap.get_contact_density
  • ContactMap.calculate_jaccard_index renamed to ContactMap.get_jaccard_index

Fixed

  • Bug fix in SequenceFile.filter to remove Sequence entries reliably
  • Bug fix in ContactMapMatrixFigure when gap variable was less than 1

@fsimkovic fsimkovic requested a review from hlasimpk May 4, 2018 10:30
@coveralls
Copy link

coveralls commented May 4, 2018

Coverage Status

Coverage decreased (-0.9%) to 79.632% when pulling 7852b98 on fsimkovic:master into e511aac on rigdenlab:master.

# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something I'm curious about, in every single file you have a copy of the licence. What's the point? You already maintain a copy of the licence under conkit/LICENCE.txt and surely having a copy everywhere means that once a year you need to go through and change it everywhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just in case, once the repo is installed via PyPi, the license is not copied across and so this way it's explicit on a repo and per-file basis.

for i in range(X.shape[0]):
for j in numba.prange(i + 1, X.shape[0]):
for j in range(i + 1, X.shape[0]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is numba.prange not an advantage here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numba.prange assumes that values in each iteration can be computed independently of the previous. Since I'm skipping all throwables[j] that are already flagged for removal, vectorizing this loop might cause computations where they are not needed.

@fsimkovic fsimkovic merged commit ca31433 into rigdenlab:master May 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants