Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic python bindings #880

Merged
merged 136 commits into from Aug 11, 2017
Merged

Automatic python bindings #880

merged 136 commits into from Aug 11, 2017

Conversation

rcurtin
Copy link
Member

@rcurtin rcurtin commented Feb 21, 2017

I've spent a long time working on this and there is still a lot to do before merge, but they should at least be in a usable state and can be played with now.

There are some major changes in this PR:

  • Instead of mlpack::util::Option being used with CLI::Add(), now Option is a template typedef specified in <mlpack/core/mlpack_main.hpp> and controlled by the MLPACK_BINDING_TYPE macro.

  • CLI has been revamped (and should probably be renamed) and now does not actually do the parsing of the command line. It simply is a singleton that holds options and their current values, and it also handles the timers too.

  • Each option that CLI holds has a type, but since we can't deduce the type from a boost::any, now we use this CLI::functionMap structure along with the C++ typeid to call the appropriate functions for a given option. Before calling CLI::Add(), an option is responsible for adding all of the function mappings that will be used by that program/binding type.

  • src/mlpack/bindings/cli/ is a reimplementation of the command-line program, but with the refactored CLI code. Here, CLIOption defines the functions that we need for command-line programs.

  • src/mlpack/bindings/python/ contains automatically-generated Python bindings controlled by the add_python_binding() CMake macro.

If you have cython installed, you can check this branch out and build it, and then you can actually use the python bindings:

$ pwd
/home/ryan/src/mlpack-rc/build/
$ cd src/mlpack/python/bindings/build/lib.linux-x86_64-2.7/
$ export LD_LIBRARY_PATH=/home/ryan/src/mlpack-rc/build/
$ export PYTHONPATH=.
$ python
Python 2.7.13 (default, Dec 18 2016, 20:19:42) 
[GCC 6.2.1 20161215] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from mlpack import pca
>>> import numpy as np
>>> x = np.random.rand(5, 5)
>>> x
array([[ 0.1659253 ,  0.44932094,  0.89157297,  0.94657701,  0.84910244],
       [ 0.99556297,  0.15514302,  0.52927208,  0.52319716,  0.73886948],
       [ 0.80643328,  0.04821054,  0.97173154,  0.65464406,  0.48690407],
       [ 0.29296537,  0.7519181 ,  0.33787851,  0.72429684,  0.13147376],
       [ 0.01446388,  0.04444607,  0.99994722,  0.33920724,  0.46390401]])
>>> y = pca.pca(input=x, new_dimensionality=2)
>>> y
{'output': array([[ 0.23981584, -0.20722268],
       [-0.53626128,  0.30452866],
       [-0.4147487 , -0.05555313],
       [ 0.55255013,  0.4990668 ],
       [ 0.15864401, -0.54081965]])}
>>> 

There are still many things to do before merging this in though, although maybe some of them can be separated into separate tickets to be resolved after merge:

  • Update CMake configuration so that Python isn't required for the build.
  • Handle Python binding generation via a CMake variable.
  • Install Python bindings to the appropriate place on make install.
  • Test the components of the Python bindings automatically, instead of by hand.
  • Test the components of the CLI bindings automatically, instead of by hand.
  • Figure out a way to write tests for the command-line programs, which might require some kind of DSL, I'm not sure yet.
  • Handle DatasetInfo/matrix objects in the Python bindings---right now they are ignored. For this I need to see what the "typical" way is of describing what your dimensions are.
  • Update PROGRAM_INFO strings to be correct for both CLI and Python bindings.
  • Check that these bindings can work with pandas dataframe objects and other commonly-used Python data science tools (I'm pretty sure they don't, so I'll need to fix this).

… and an other load.

If we're not using the command-line interface we don't need to load matrices.
Otherwise, matrices and models aren't automatically loaded.
This will allow different bindings access to the typename of the parameter
itself.
This is a big set of changes.  Highlights:

 * ParamData is far simpler.
 * CLI now holds a function pointer map for custom binding type functionality.
 * GetUnmappedParam is now GetPrintableParam.

Still a few steps away from working compilation for all of the library.
In some cases there are regressions because hacks are no longer possible, like
with k-means and mean_shift.
@rcurtin
Copy link
Member Author

rcurtin commented Aug 8, 2017

Ok, I think this is ready to merge; I've been sitting on it for quite a while but any outstanding issues have been fixed. I'll let this sit for another 3 days in case anyone has any comments.

@zoq
Copy link
Member

zoq commented Aug 8, 2017

Sounds good, can't wait to work with the bindings.

@rcurtin rcurtin merged commit 05b4333 into mlpack:master Aug 11, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants