SummerOfCodeIdeas

Ryan Curtin edited this page Feb 14, 2018 · 86 revisions

Ideas for Google Summer of Code 2018

These are a list of ideas compiled by mlpack developers; they range from simpler code maintenance tasks to difficult machine learning algorithm implementation, which means that there are suitable ideas for a wide range of student abilities and interests. The "necessary knowledge" sections can often be replaced with "willing to learn" for the easier projects, and for some of the more difficult problems, a full understanding of the description statement and some coding knowledge is sufficient.

In addition, these are not all of the possible projects which could be undertaken; new, interesting ideas are also suitable.

A list of projects from 2013, 2014, 2015, 2016 and 2017 that aren't really applicable to this year, can be found here.

For more information on any of these projects, visit #mlpack in freenode (IRC) or email mlpack@lists.mlpack.org (see also http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack). It is probably worth consulting the mailing list archives first; many of these projects have been discussed extensively in previous years. Also, it is probably not a great idea to contact mentors directly (this is why their emails are not given here); a public discussion on the mailing list is preferred, if you need help or more information.

If you are looking for how to get started with mlpack, see the GSoC page, the get involved page, and the tutorials and try to compile and run simple mlpack programs. Then, you could look at the list of issues on Github and maybe you can find an easy and interesting bug to solve.

For an application template, see the Application Guide.

Particle swarm optimization

Particle swarm optimization (PSO) is a population-based stochastic optimization technique developed by Eberhart and Kennedy in 1995, inspired by social behavior of bird flocking or fish schooling;

PSO Illustration

Walter Baxter - A murmuration of starlings at Gretna.

In PSO, the potential solutions, called particles, fly through the problem space by following the current optimum particles to find a solution. Several variants of PSO have been proposed up to date e.g.

  • global best (gbest) PSO: uses a star neighborhood topology, where each particle has the entire swarm as its neighborhood (all particles are attracted to one global best position).

  • local best (lbest) PSO: uses a ring topology, where each particle’s neighborhood consists of itself and its immediate two neighbors (each particle is attracted to a different neighborhood position).

In recent years, Particle Swarm Optimization (PSO) methods have gained popularity in solving single objective but solving constrained optimization problems using PSO has been attempted in past but arguably stays as one of the challenging issues. For unconstrained problems we have something like this:

class ObjectiveFunction
{
  // Evaluates the actual optimization problem.
  double Evaluate(...)
}

ObjectiveFunction f;
Optimizer optimizer(....);
optimizer.Optimize(f, f.GetInitialPoint());

This idea takes a closer look at constrained optimization problems that are encountered in numerous applications (using PSO).

So this project is divided into two parts: First implement one or two unconstrained methods and afterwards takes a look at one or two constrained methods. Note mlpack has already an optimization infrastructure for handling unconstrained problems, However, there is no infrastructure to deal with constrained problems. That is a detail that will need to be worked out in the proposal. For constrained PSO, we wish to pass a number of constraints, I can imagine an API something like this:

class ConstrainedFunction
{
  // Evaluates the actual optimization problem.
  double Evaluate(...) {}

  // Returns all equality constraints of the given problem.
  // Maybe it makes sense to express the constraineds in a matrix form or
  // perhaps we can use C++11 lambda functions, maybe a combination of both.
  void EqualityConstraint(...) {}
}

Note that this is just an idea---there are many details left to be figured out.

difficulty: 7/10

deliverable: working PSO optimizer for unconstrained and contrained problems.

necessary knowledge: basic data science concepts, good familiarity with C++ and template metaprogramming (since that will probably be necessary), familiarity with mlpack optimizer API (see src/mlpack/core/optimizers*/)

relevant tickets: none are open at this time

potential mentor(s): Marcus Edel

recommendations for preparing an application: This project can't really be designed on-the-fly, so a good proposal will have already gone through the existing codebase and identified what parts of the API will need to change (if any).

Robotic Arm

You are likely pretty good at picking things up. That’s really great. Part of the reason that you’re pretty good at picking things up is that when you were young, you spent a lot of time trying to pick things up and learning from your experiences. For the industrial task, say, sorting objects we don't want to wait through the equivalent of an entire robotic childhood, we want robots to operate in unstructured environments while learning many tasks on the fly, either by themselves or with the assistance of a human teacher.

robotic arm

This project takes up on the idea and explores different approaches towards teaching manipulation tasks to robots by applying (already implemented) learning methods. To show the learning process, we can provide access to a standard robot arm that can be used to demonstrate your work.

The idea behind this project would be, to use mlpack for various robotic manipulation tasks. The completed project would include an implementation of a simulator based on the existing robot arm, plus and that's the real focus of the project; implementing a pipeline (with tests and documentation) that uses different mlpack methods to solve the chosen task over the course of the summer. For this project, it would be up to you to observe the current mlpack API and methods, devise a design for the pipeline, and then implement everything. One idea would be to implement the simulator using processing, use OpenCV for the image processing and at the end mlpack's reinforcement framework for the model optimization. Note that this is just an idea---there are many details left to be figured out.

difficulty: 8/10

deliverable: working simulator and pipeline

necessary knowledge: basic data science concepts, good familiarity with C++, Python and template metaprogramming (since that will probably be necessary), familiarity with mlpack methods API (see src/mlpack/methods/*/)

relevant tickets: none are open at this time

potential mentor(s): Marcus Edel

recommendations for preparing an application: To be able to work on this you should be familiar with the source code of mlpack. We suggest that everyone who likes to apply for this idea, try to compile and explore the source code, especially the neural network and reinforcement code.

Reinforcement learning

Back in 2015 DeepMind published their landmark Nature article on training deep neural networks to learn to play Atari games using raw pixels and a score function as inputs. Since then there has been a surge of interest in the capabilities of reinforcement learning using deep neural network policy networks; and honestly who didn't always wanted to play around with RL on their own domain specific automation tasks. This project revitalizes this desire and combines it with recent developments in reinforcement learning to train neural networks to play some of the unforgettable games. In more detail, this project involves implementing different reinforcement methods over the course of the summer. A good project will select one or two algorithms and implement them (with tests and documentation. Note while in principle, the implementation of an agent that can play one of the Atari games is quite simple, this project concentrates more on the recent ideas, e.g:

  • Double DQN: The Double DQN algorithm was released by Google’s DeepMind group back in 2015, and adds stability to the learning by using a Q evaluation which is known, to overestimate action values under certain conditions, to use in the td-error formula that is less prone to "jiggering".

  • Proximal Policy Optimization Algorithms: The PPO method was released by OpenAI in 2017, and it made a splash by it's state of the art performance, while being much easier to implement.

  • Persistent Advantage Learning DQN: In Google’s DeepMind group presented a novel RL exploration bonus based on an adaptation of count-based exploration for high-dimensional spaces. One of the main benefits is that the agent is able to recognize and adjust its behaviors efficiently to salient events.

The algorithms must be implemented according to mlpack's neural network interface and the existing reinforcement learning structure so that they work interchangeable. In addition, this project could possibly contain a research component -- benchmarking runtimes of different algorithms with other existing implementations.

breakout sample image sequence breakout sample image sequence breakout sample image sequence

Note: We already have written code to communicate with the OpenAI Gym toolkit. You can find the code here: https://github.com/zoq/gym_tcp_api

difficulty: 7/10

deliverable: Implemented algorithms and proof (via tests) that the algortihms work with the mlpack's code base.

necessary knowledge: a working knowledge of neural networks and reinforcement learning, willingness to dive into some literature on the topic, basic C++

recommendations for preparing an application: To be able to work on this you should be familiar with the source code of mlpack, especially: src/mlpack/methods/reinforcement_learning/. We suggest that everyone who likes to apply for this idea, compile and explore the source code including the tests: src/mlpack/tests/. If you have more time, try to review the documents linked below, and in your application provide comments/questions/ideas/tradeoffs/considerations based on your brainstorming.

relevant tickets: none are open at this time

references: Deep learning reading list, Deep learning bibliography, Playing Atari with deep reinforcement learning

potential mentor(s): Marcus Edel

Variational Autoencoders

Variational Autoencoders(VAE) are widely used in unsupervised learning of complicated distributions. The more classical generative models depend upon sampling techniques such as MCMC. These sampling techniques are unable to scale to high dimensional spaces, for example distribution over set of images. Due to this reason, VAEs get rid of sampling by introducing gradient based optimization.

This project involves implementing the basic framework of VAE as described in this paper. For validity, implementation should reproduce the results shown in the same paper. The paper shows experiments on MNIST dataset and uses fully connected networks for modelling. The completed project should also test the de-noising ability of VAEs shown in this paper. This task involves adding noise to the data and use the same VAE network to reconstruct the original image.

vae mnist

The above image shows the samples generated by VAEs after training on MNIST dataset. The images correspond to different latent space used while training VAE, 2D 5, 2D 10, 2D 15, respectively.

You are also encouraged to add more functionality to the VAE implementation. Some suggestions might be -

  • Conditional Modelling: Use the same VAE network to reconstruct the whole MNIST digit by providing half of the digit. Refer to this paper

  • Regularization: Add functionality for regularizing the network. Refer to paper

  • Beta VAE: Beta VAEs are shown to generate better reconstruction loss. Beta VAE only differ in their loss function from pure VAEs. Refer to this paper

You can also suggest your own ideas.

difficulty: 8/10

deliverable: Basic VAE framework with rigorous tests, able to reproduce results shown in the paper

necessary knowledge: a good knowledge of C++ and Armadillo, fair understanding of generative models

recommendations for preparing an application: You should get familiar with the ANN module of mlpack. Spend some time reading this paper. This paper is a tutorial on VAE and explains the idea in depth. By understanding the framework beforehand we can concentrate more on implementation issues and testing.

potential mentor(s): Sumedh Ghaisas

Algorithm Optimization

mlpack is renowned for having high-quality, fast implementations of machine learning algorithms. To this end, since 2013, we have maintained a benchmarking system that is able to help us show mlpack's efficient implementations. However, it is always possible to make these implementations a bit faster. Below is some example output from a recent run of the benchmarking system on some logistic regression problems.

logistic regression benchmark example

Your goal in this project would be to choose a machine learning algorithm that you are interested in and knowledgeable about, then set up a series of datasets and configurations that allows you to provide a nice benchmarking view like the one above, and iteratively improve the mlpack implementation, hopefully until it is the fastest. This could involve high-level optimizations like simply avoiding unnecessary calculations or changing default parameters when needed, or lower-level optimizations like checking that SIMD autovectorization is happening and avoiding small memory allocations and copies, and even algorithm-level optimizations like clever approximations. Almost certainly any successful project here will involve a lot of time with a profiler.

The finished product should be mergable code, but more importantly the benchmarking output, which can allow users to quickly see and understand that mlpack's implementation is good.

difficulty: 3/10-10/10, depending on the algorithm chosen and the level of interest

deliverable: improved implementation and benchmarking configuration providing a relevant and clear comparison of mlpack to other implementations

necessary knowledge: a good knowledge of C++ and Armadillo, knowledge of how to use a profiler, in-depth knowledge of the algorithm to be optimized, strong familiarity with the existing implementation of the algorithm to be optimized

recommendations for preparing an application: You should definitely be familiar with the source code of the algorithm you are interested in working with, and it would not be a bad idea to do some preliminary benchmarking and possibly submit a simple pull request with an improvement (if there is low-hanging fruit that could be easily optimized). When we evaluate these applications it will be important to see a clear plan for progress, so that we can avoid the project "getting stuck".

potential mentor(s): Ryan Curtin

String Processing Utilities

description: mlpack and many other machine learning libraries are built upon a core of LAPACK- and BLAS-like algorithms that expect numeric data (specifically, single or double precision numeric data). But much of today's interesting data is in other formats, such as dates or strings. Some machine learning libraries handle this by writing generic algorithms that can also support types like strings; but this can cause huge runtime penalties. Instead it is better to provide some utilities that can convert string datatypes (or other datatypes) to the numeric datatypes required by the algorithms implemented in mlpack.

For the sake of simplicity, let's assume that we have some string "hello world hello everyone". Typically a user may want to encode this in many different ways. Below are two simple possibilities.

  • dictionary encoding: here we simply assign a word (or a character) to a numeric index and treat the dataset as categorical. So our example string "hello world hello everyone" would map to [0 1 0 2] where 0 = "hello", 1 = "world", and 2 = "everyone" if we were encoding words.

  • one-hot dictionary encoding: here, we encode each word as a k-dimensional vector where only one dimension has a nonzero value, and k is the number of words in the dictionary. So our example string "hello world hello everyone" would map to

    [[1 0 1 0]
     [0 1 0 0]
     [0 0 0 1]]

where the first row is 1 if the word is "hello" and 0 otherwise, the second row is 1 if the word is "world" and 0 otherwise, and the third row is 1 if the world is "everyone" and 0 otherwise.

Other useful encodings might be TF-IDF or related techniques.

We want users to be able to convert seamlessly between these representations, in different languages. Therefore, we might imagine some functionality like the following:

class DictionaryEncoding
{
 public:
  // ...
              
  /**
   * Use dictionary encoding to fill the 'output' matrix with numeric data.
   *
   * ...
   */
  void Encode(const std::vector<std::string>& strings,
              std::map<size_t, std::string>& mappings,
              arma::Mat<eT>& output);
};

This has some similarity to the NormalizeLabels() function found in src/mlpack/core/data/normalize_labels.hpp, but takes the idea somewhat further in that it makes the encoding and decoding process classes of their own.

On top of this, this could be integrated with the preprocessing utilities to provide a command-line program and Python binding that can convert a dataset of labels to a numeric dataset that can then be used with mlpack algorithms (or other algorithms). Below is a demo of what we might expect a user to do when they interact with mlpack's tools.

$ echo "Let's do nearest neighbor search on a dataset of sentences."
$ head -4 sentences.csv
A tetrad is a set of four notes in music theory.
Notable people with the surname include:
Gunspinning is a western art such as trick roping.
The play is a tragicomedy about a small rural town in Ireland.
$ mlpack_preprocess_encode -i sentences.csv -o sentences-numeric.csv -a "tf-idf" -e encoding.bin --strip-nonalnum --lowercase -v
[INFO ] Loading 'sentences.csv' as string data... X columns.
[INFO ] 1533 different words in the dataset.
[INFO ] Output one-hot encoded matrix is of size 1533 x X.
[INFO ] Saving output to 'sentences-numeric.csv'...
[INFO ] Saving output model to 'encoding.bin'...
$ mlpack_knn -r sentences-numeric.csv -k 1 -n neighbors.csv

Or they might want to use Python:

>>> print("Let's do nearest neighbor search on a dataset of sentences.")
Let's do nearest neighbor search on a dataset of sentences.
>>> print(sentences[0:4])
['A tetrad is a set of four notes in music theory.', 'Notable people with the surname include:', 'Gunspinning is a western art such as trick roping.', 'The play is a tragicomedy about a small rural town in Ireland']
>>> from mlpack import preprocess_encode, knn
>>> encoding = preprocess_encode(input=sentences, algorithm="tf-idf", strip-nonalnum=True, lowercase=True, verbose=True)
[INFO ] Input string data 'sentences' as string data contains 1000 columns.
[INFO ] 1533 different words in the dataset.
[INFO ] Output one-hot encoded matrix is of size 1533 x X.
[INFO ] Saving output to 'sentences-numeric.csv'...
[INFO ] Saving output model to 'encoding.bin'...
>>> knn_results = knn(reference=encoding['output'], k=1)

recommendations for preparing an application: The key to a successful project here is first defining the API that will be provided to the user. So in your application, it should be completely clear how the user will input their strings, how they will be converted to numeric values, and then how the resulting numeric values can be converted back to strings. It would be a good idea to make sure that your design is flexible enough to allow other conversion strategies, as well as handling data with other formats (graph data, for instance).

difficulty: 5/10

deliverable: Implemented design and algorithms, fully tested and ready for end users.

necessary knowledge: a working knowledge of data science and typical data processing steps, some knowledge of the mlpack automatic bindings system, C++ knowledge, and familiarity with Armadillo and string processing in C++.

potential mentor(s): Ryan Curtin

Automatic bindings to new languages

As of last year, mlpack now has an automatic bindings system documented here. So far, bindings exist for Python and command-line programs, but it would be really useful to add bindings for other languages.

A couple ideas for languages are given above---MATLAB, Octave, R, Java, Scala, and Julia. But these are certainly not the only language for which bindings could (and should) be provided.

For this project, once you select your target language, you will have to design a handful of components:

  • The build system workflow and CMake infrastructure so that the bindings are properly built. i.e., for the command-line bindings, this is a single-step compilation, but for Python, this involves generating a Cython .pyx file, generating a setuptools setup.py file, then compiling the Cython bindings.

  • Handling and conversion of the different types from the input language. It's extremely important that matrices aren't copied during this step, so it is important to think this through beforehand as that may guide the design of the bindings in other ways.

  • Generation of the code for the target language from the PARAM_*() macros and PROGRAM_INFO() macros.

Any design process should certainly start by looking at the existing bindings, and thinking about how to make a proof-of-concept to the new language. It's important to use the existing automatic bindings system so that the bindings we do provide in the files like logistic_regression_main.cpp and others can work for all languages. So, unfortunately, hand-written bindings for each language will not suffice here, since they are difficult to maintain and keep in sync.

difficulty: 6/10

deliverable: working and tested binding generator to the target language, plus some simple examples of how they can be used for user documentation

necessary knowledge: a strong knowledge of the details of the target language is necessary, as well as familiarity with the existing automatic bindings ssytem

recommendations for preparing an algorithm: Start with the basics; think about how you would hand-write an efficient binding to the target language, and perhaps prepare a simple proof-of-concept for a binding with the same functionality as of the simple programs like src/mlpack/methods/pca_main.cpp. Then, think about how you would automatically generate that hand-written binding from the sources in pca_main.cpp, and this will guide the design that you will use for the automatic binding generator to your target language. In your application, be sure to include a reasonable level of detail about the implementation; specifically, how the overall generator will work, how unnecessary copies will be avoided, how users will interact with the bindings, and so forth.

references: automatic bindings documentation

potential mentor(s): Ryan Curtin

Profiling for further optimization

description: mlpack could run even faster if it used profiling information during compilation. This entails adding extra build steps to the build process: first, to run a subset of mlpack programs with profiling information, and second, to rebuild mlpack using that profiling information. The difficulty here is choosing datasets which reflect general characteristics of the datasets which users will run mlpack methods with. If an unusual dataset is chosen, then mlpack will be compiled to run very quickly on that type of dataset -- but it will not run as well on other datasets. Another issue here is that the profiling process should not take extremely long (that is, longer than an hour or so), so we cannot choose a huge variety of large datasets.

deliverable: a 'make profile' build option which performs the profiling as described above

difficulty: 6/10

necessary knowledge: some machine learning familiarity, some CMake scripting

relevant tickets: #48

potential mentor(s): Ryan Curtin

Improvement of tree traversers

description: Many of mlpack's machine learning methods are dual-tree algorithms. These dual-tree algorithms are abstracted in such a way that they all use the same tree traversal methods. If these traversers could be improved, this would result in runtime gains for all tree-based mlpack methods. This requires a lot of abstract thought in very weird ways and often debugging and profiling tree traversers is fraught with sadness.

deliverable: demonstrably improved runtime for tree-based methods

difficulty: 9/10

necessary knowledge: very in-depth C++ knowledge and understanding of tree structures; familiarity with machine learning methods is helpful

relevant tickets: #235

potential mentor(s): Ryan Curtin

LMNN with LRSDP implementation

description: mlpack has a working LRSDP (low-rank semidefinite program) implementation, which gives faster solutions to SDPs. This could give good speedups for existing SDP-based machine learning methods, such as LMNN (large margin nearest neighbor). This project should only entail the expression of LMNN as a low-rank SDP and then the implementation should be straightforward because the mlpack LRSDP API is straightforward. The difficulty is still quite high, though, because debugging LRSDPs is complex.

deliverable: working, tested LMNN implementation with extensive documentation

difficulty: 9/10

necessary knowledge: in-depth C++ knowledge, understanding of semidefinite programs; familiarity with LMNN is helpful

relevant tickets: none open at this time

potential mentor(s): Ryan Curtin

Fixes to MVU and low-rank semidefinite programs

description: This project is not for the faint of heart. For some time now, MVU (maximum variance unfolding), a dimensionality reduction technique, has not been converging even on simple datasets. mlpack's implementation of MVU uses LRSDP (low-rank semidefinite programs); there is not another existing implementation of MVU+LRSDP. Many tears will be shed trying to debug this. A good approach will probably be to compare mlpack MVU results on exceedingly simple problems with other MVU implementation results. The final outcome of this project may not even be a successful converging algorithm but instead more information on what is going wrong.

Here is a good list of papers to read or at least be familiar with:

deliverable: working MVU implementation, or, further details and information on the problem

difficulty: 10/10

necessary knowledge: understanding of convexity and duality, knowledge of semidefinite programs, incredible determination and perseverance

potential mentor(s): Ryan Curtin

Essential Deep Learning Modules

description: In the past years Deep Learning has markedly attracted a lot of attention in the Machine Learning community for its ability to learn features that allow high performance in a variety of tasks. For example, DeepMind has shown that neural networks can learn to play Atari games just by observing large amounts of images, without being trained explicitly on how to play games. This project involves implementing essential building blocks of deep learning algorithms based on the existing neural network codebase. A good project will select a some of the architectures and implement them (with tests and documentation) over the course of the summer. This could include Restricted Boltzmann Machines (RBM) and training algorithms, Deep Belief Networks (DBN), Radial Basis Function Networks (RBFN) and Bidirectional Recurrent networks (BRN). Generative Adversarial Networks (GAN). The architecture should be designed to build a foundation to integrating many more models including support for other state-of-the-art deep learning techniques. Note this project aims to revisit some of the traditional models from a more modern perspective e.g.:

deliverable: Implemented deep learning modules and proof (via tests) that the code works.

difficulty: 5/10

necessary knowledge: a working knowledge of what neural networks are, willingness to dive into some literature on the topic, basic C++

recommendations for preparing an application: Being familiar with the mlpack codebase, especially with existing neural network code is the first step if you wish to take up this task. We suggest that you build mlpack on your system and explore the functionalities. Take a look at the different layers and basic network structures. When you prepare your application, provide some comments/ideas/tradeoffs/considerations about your decision process, when choosing the models you want to implement over the summer.

relevant tickets: #1206, #1207

references: Deep learning reading list, Deep learning bibliography

potential mentor(s): Marcus Edel, Mikhail Lozhnikov

Alternatives to neighborhood-based collaborative filtering

description: The past two years in Summer of Code have seen the addition of a highly flexible collaborative filtering framework to mlpack. This framework offers numerous different types of matrix factorizations, and is likely the most flexible package for this (with respect to matrix factorizations). However, the implementation uses k-nearest-neighbors to select its recommendations, whereas there are more options. This project entails the investigation of alternatives (for example, weighted nearest neighbors and regression techniques) and the implementation of these alternatives in a flexible manner.

deliverable: Implemented alternatives to k-NN for recommendation selection

difficulty: 7/10

necessary knowledge: experience with C++ and templates, knowledge of recommendation systems, willingness to read the literature

relevant tickets: #406

references: this paper describes an alternative to k-NN recommendation selection: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.129.4662&rep=rep1&type=pdf

potential mentor(s): Ryan Curtin, Sumedh Ghaisas

Profiling for Parallelization

Efficiency and scalability is very important for the users of mlpack.

Some of our algorithms, such as the DTree (methods/det) and LSH (methods/lsh) already employ parallelism through OpenMP, to speed things up without complicating the codebase too much. Unfortunatelly, there's still a lot of algorithms implemented in mlpack that currently only utilize a single core.

For this project, you will explore the mlpack codebase, and identify algorithms that you can parallelize with OpenMP so we can take advantage of the multiple cores most machines have.

There are multiple ways mlpack could employ parallelism. Some of these are:

  • Replacing the embarassingly parallel sections, for example code that performs the same computation for multiple data, with parallel code.
  • Substituting the algorithm itself with a parallel version, that works for 1 or more threads (since some users might still want to use the singlethreaded code).
  • Identifying and parallelizing bottlenecks in the core libraries of mlpack - this might offer significant performance boost for many algorithms, however it may be difficult to test, and might end up being underwhelming in most situations, since mlpack relies on OpenBLAS (under Armadillo) for all our matrix computations. (OpenBLAS is already multi-threaded)

Furthermore, another contribution could be identifying thread-unsafe sections of code in the core libraries, and modifying it to make it thread-safe. This will have significant impact for future developers, since it will be easier for them to write parallel code. (once the sequential version has been thoroughly debugged)

difficulty: 5/10 - 9/10, depending on the chosen approach(es).

deliverable: Implemented parallel versions of the selected algorithms in OpenMP. Correctness tests. Benchmarking of the parallel algorithm versions for comparison with other libraries and past performance. You can take a look at the Better benchmarking project description and https://github.com/zoq/benchmarks for more information on the benchmarking tool mlpack uses.

necessary knowledge: Methods for profiling source code. C++ working knowledge. Parallelization techniques (preferably OpenMP).

relevant tickets: 173

potential mentor(s): Yannis Mentekidis

recommendations for preparing an application: The project description is vague. A good proposal will solidify some of the ideas, and propose both what parts of the code base need changing and what needs to be added to the API - as mentioned, users might not want to employ this feature.

You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.