Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

One of a few experiments I did with moving libnabo to CUDA #33

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Commits on Apr 12, 2015

  1. Started work on the cuda implementation. As of right now it works, an…

    …d is about 20% faster than the CPU
    
    implementation. My code is very poorly optimized and should be taken with a grain of salt.
    
    I'll be working on it some more later next week.
    
    In a best case scenario, this function will perform in O(nlogn) time. In a worse case
    scenario, it'll be O(n^2), where a worst case scenario is two points within the
    same cluster being on exactly opposite sides of the tree.
    
    Points should be organized as so
    
    [Cluster|Cluster|Cluster|Cluster]
    Where each cluster is 32 points large, and is ordered from
    least to greatest compared around their distance from the center of the cluster.
    The furthest away point should be no more than max_rad distance from the center.
    
    Eventually this code, if my predictions hold up, should perform 7 - 9x faster than the CPU implementation.
    Obviously thats quite a long ways away, will be countless days of work, and will never be optimized perfectly,
    but you get the idea.
    
    This code is highly unstable and has a very large amount of bugs (I counted 20+). DO NOT use this
    code in any application yet. It will almost certainly crash either your GPU driver or the application.
    
    This code was written to look, initially, as close to the openCL code as possible. Said being, the amount
    of branching that currently occurs is huge, since I directly copied the transversal patterns
    of the OpenCL version. I will be reducing the amount of branching soon.
    
    -Louis
    LouisCastricato committed Apr 12, 2015
    Configuration menu
    Copy the full SHA
    6b0168c View commit details
    Browse the repository at this point in the history
  2. Improved readability.

    Optimized branching.
    
    Reduced memory overhead.
    
    First tests with dynamic parrallelism failed. I'll try again tomorrow.
    
    Began to diverage from libnabo's default transversal patterns. I'm using my own, as they
    seem better for CUDA. This may have a negative impact later on. I don't know yet.
    
    Improved comments.
    LouisCastricato committed Apr 12, 2015
    Configuration menu
    Copy the full SHA
    7f27d72 View commit details
    Browse the repository at this point in the history
  3. Syntax fix

    Fixed some syntax errors that were preventing it from compiling.
    LouisCastricato committed Apr 12, 2015
    Configuration menu
    Copy the full SHA
    ad270ac View commit details
    Browse the repository at this point in the history
  4. Update README.md

    Added a best practice guide and installation guide for CUDA
    LouisCastricato committed Apr 12, 2015
    Configuration menu
    Copy the full SHA
    448b008 View commit details
    Browse the repository at this point in the history