-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
any progress of python wrapper #6
Comments
There was a student scoping out the Python bridge last fall. He became busy with other commitments, though, so the project has moved back onto the TODO list. We could definitely use help with the Python wrapper. The current plan is to implement the bridge using Cython, compiling it together with the Core code into a single shared library. This is how the R bridge is implemented - C++/Rcpp in that case - and we would like to follow the same model for all supported front ends, if possible. The bridge is a collection of fairly boring subroutines whose only task is to massage front-end data structures to and from their Core counterparts. We keep a lean front-end interface, as well, whose main task is argument checking. Let me know if that sounds interesting, and thank you for the help. |
That's exactly what I am thinking: a lean front-end and a language-unrelated back-end. I believe the python bridge could be done via cffi (C++ interface to C interface needed though), and I think it is also nice to follow the interface style of I will have a deeper investigate on the project these days, and hopefully I would be able to get my hands dirty ASAP (because I am more familiar with Python than C++). :-) |
CFFI is a possibility. The interface to the Core code employs STL vectors, so probably leans more toward C++ than it does toward C. We had at one point planned to use CFFI but Cython seemed to make more sense at the time. We can certainly revisit this. My only real concern is that the interface compiles with the Core into a single shared object, and does not make demands of the Core implementation. If you can do this while honoring the scikit-learn interface, then so much the better. |
So now I am actually trying to do it (slowly though). How about calling it I have walked through some of your source code (core and RCpp), and now I want to implement python regression interface firstly. Of course the most important functions are
Is that sufficiently enough? But obviously your R interface does something more than that... |
On 05/03/2016 10:15 PM, fyears wrote:
Let's take this offline, in future, but here are some quick responses.
Thanks for your help. Please feel free to follow off-line.
|
Did it again! Thunderbird displays the quoted material in the sent mail, but the Github version does not display it. Repeating the last mail, with quotes intact:
All sounds doable within Python bridge, Why not just use Cython, which
Core builds several vectors, some consisting of packed structures and some containing basic types, that is, types supported by the front end. This will vary, of course, with the front end. See class ForestNode, for example. In particular, when there is no equivalent front-end type, Rcpp casts via its 'Xptr' mechanism, exposing it to R as an externally-defined class. We need something analagous from Python and the bridge.
Rborist (i.e., the R implementation) stores the trained model in memory
By 'X_new', I assume you mean separately-tested data?
Sounds analagous to what you want to do for training. Again, though, the "ArboristModel" would need to be allocated and populated from the Python front end as a collection of packed and basic vectors.
Same argument as above: Python supplies the raw vectors filled in by the Core, not the other way around.
The R interface does some argument checking and computes default values. There is a piece that breaks apart the R "data.frame"; this piece will eventually be implemented by the bridge code instead. In Python there will be the analagous problem of processing the Panda data-frames. Thanks for your help. Please feel free to follow off-line. |
|
Yes, please start a thread on info@suiji.org, and I will cc to an MPL-2 is fine for the bridge, as is GPL. I am not as familiar with the On 05/04/2016 02:47 PM, fyears wrote:
|
Please, please keep discussion relevant to this thread online and not FWIW, Cython make more sense to me. C- On Thu, May 5, 2016 at 9:24 AM, suiji notifications@github.com wrote:
|
O.k., will do. Thanks. On 05/05/2016 10:29 AM, Christopher Brown wrote:
|
Hi, The current (20160519) progress: Currently I could feed numpy arrays I have some questions:
|
It sounds like you are making good progress. Concerning the 'feNum' array, storage should be row-major, as in native Regards, On 2016-05-19 20:40, fyears wrote:
|
Hi, Just to let you know that it's possible to run the Python version now! 😄 If you are interested, please look at my repo and:
Then open from pyborist import PyboristClassifier, PyboristRegressor So there you go. Some issues:
Best wishes. |
Fyears, Thanks very much for your contribution. We will review the changes but, Let me see whether including <algorithm.h> causes any trouble in the Do you want your real name included in our acknowledgments, or do you Regards, On 2016-05-21 21:38, fyears wrote:
|
I will attempt to run it with some smaller examples. If you have been able to run the flight-delay data sets successfully, though, then we are likely to be in good shape.
I changed to the core files to include , then recast the calls as std::min() and std::max() to be clear about which ones are being invoked. It remains to be seen how this builds with msvc. I should have an msvc development system available some time in the next month, but don't have anything handy right now. I do expect to push another round of changes to Github in the next week to ten days, however, and these changes will be included.
Those changes have also been included in the Core, and are subject to the same delayed verification on msvc mentioned above. You had suggested moving the Core to an MIT license. Believe it or not, this is out of my hands. The code was open-sourced following an agreement between a former business associate and me. The written agreement was that it be release under MPL-2 license. My unilaterally altering the terms of the license could be viewed as bad faith. |
I have been traveling all week, and have not had an opportunity to review your changes yet. You should expect some feedback over the long weekend. One quick question: You mentioned that validation was not ready yet. Did you mean that the trained forest cannot be tested on the out-of-bag samples? |
The Oh one more thing, I treated all the inputs as numeric (usual way in |
Yes, this makes sense: reasonable stopping point.
We have planned all along to accommodate the Pandas DataFrame feature. This was mentioned in the PyData talk. It is also pointed out in an earlier message in this thread. It is perfectly reasonable to emulate Scikit-Learn as an initial implementation, but we would ultimately like to provide full functionality irrespective of the language front end. |
Your initial release looks pretty good. I just have a couple of questions:
ii) The call-back pre-defines a random-variate generator: Regards, |
Makes sense. We can make more plans as the work progresses.
It is no doubt unusual from the perspective of a single language. We are maintaining a multiple-language project, however, in which each spin is built by compiling a common core together with selected elements from a front-end bridge. Since this pattern will likely be repeated in every front end we support, I believe that the FrontEnd/Shared model for the respective bridges is highly intuitive. In particular, this is the sort of approach I have encountered in nearly every compiler project I worked on over a span of 20+ years: various companies, various languages. I am not wedded to the style, but have found it simple and useful.
Yes, this was my understanding of the Python ecosystem, as well: i.e., Numpy and Pandas are not core. For more R-like functionality, though, we will probably need to support them. BTW: Your pull request is accepting. Thank you. |
Merging to commence a new thread. |
Hi,
I just discover your project and I am interested in it. I notice that you provide the R wrapper but no Python wrapper now.
Is there any Python wrapper available now? Or is it unwritten yet? Actually I am interested in participate in this project and writing the Python wrapper, if you are still focusing on R wrapper now. :-)
The text was updated successfully, but these errors were encountered: