Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Stan Math Support to Shogun #4216

Merged
merged 13 commits into from May 6, 2018

Conversation

FaroukY
Copy link
Contributor

@FaroukY FaroukY commented Mar 25, 2018

I've added the ability to now use the Stan Math library in Shogun. I've also written a unit test to make sure it compiles and works properly (The unit test is a perceptron learning algorithm with gradient descent)

The CmakeLists.txt was changed to add the include directories of stan headers, and the last file is the unit test.

@@ -0,0 +1,18 @@
include(ExternalProject)
ExternalProject_Add(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should fix the commit sha, instead of following the master HEAD. basically set
GIT_TAG and set it to the last stable release of stan

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I set up the GIT_TAG as required

include(external/StanMath)
SHOGUN_INCLUDE_DIRS(SCOPE PUBLIC SYSTEM
$<BUILD_INTERFACE:${STAN_INCLUDE_DIR_STAN_MATH}>
$<INSTALL_INTERFACE:include/shogun/lib/external/stan1>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the stan1 stan2 stan3 stands for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the names to be more descriptive.

@vigsterkr
Copy link
Member

great! there are some minor improvements needed

@FaroukY
Copy link
Contributor Author

FaroukY commented Mar 29, 2018

@vigsterkr Done with the required improvements.

@vigsterkr
Copy link
Member

@FaroukY lemme check why do the CIs fail

@vigsterkr
Copy link
Member

@FaroukY does this work for you locally? maybe because boost is in your include paths? as you can see from here this does not compile:
https://travis-ci.org/shogun-toolbox/shogun/jobs/359680795#L1970

stan misses boost... lemme try to pull these commits and see locally what's happening and get back to you

@vigsterkr
Copy link
Member

indeed the problem is that - i've just tested this with a machine that has boost installed and it worked fine... i'll test and try to come up with a fix in a clean docker container

Copy link
Member

@vigsterkr vigsterkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's definitely a problem with INSTALL files missing.

include(external/StanMath)
SHOGUN_INCLUDE_DIRS(SCOPE PUBLIC SYSTEM
$<BUILD_INTERFACE:${STAN_INCLUDE_DIR_STAN_MATH}>
$<INSTALL_INTERFACE:include/shogun/lib/external/Stan>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

who and what will install those the folder here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would actually start a new dir: include/shogun/third_party and start putting stuff there that are external... we have too many things in lib

)
SHOGUN_INCLUDE_DIRS(SCOPE PUBLIC SYSTEM
$<BUILD_INTERFACE:${STAN_INCLUDE_DIR_BOOST}>
$<INSTALL_INTERFACE:include/shogun/lib/external/Stan_Boost>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

who and what will install those the folder here?

)
SHOGUN_INCLUDE_DIRS(SCOPE PUBLIC SYSTEM
$<BUILD_INTERFACE:${STAN_INCLUDE_DIR_CVODES}>
$<INSTALL_INTERFACE:include/shogun/lib/external/Stan_Cvodes>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

who and what will install those the folder here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please rebase this over latest develop and change this to include/shogun/third_party/Stan_Cvodes

add_dependencies(libshogun StanMath)

set(STAN_INCLUDE_DIR_STAN_MATH ${CMAKE_BINARY_DIR}/StanMath/src/StanMath)
set(STAN_INCLUDE_DIR_BOOST ${CMAKE_BINARY_DIR}/StanMath/src/StanMath/lib/boost_1.66.0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be boost_1.64.0

@FaroukY
Copy link
Contributor Author

FaroukY commented Apr 1, 2018

@vigsterkr I think the problem is indeed that I had boost in my path. Can you share the Dockerfile that you used to test this? I want to make sure that I get it working on a general machine before the I push my commits.

Since we can find weights from [1 1 1]^T to [1 1]^T in a perceptron,
this error should be very close to zero after 100 epochs.
*/
TEST(StanPerceptronTest, sample_perceptron)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while I like this test, I think we should have something super simple. Like the gradient of the gaussian kernel wrt to bandwidth, left/right argument. Something that we know in closed form.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But please de-prioritise the test until all the cmake stuff is correct

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test is very fine, lets concentrate on the actual inclusion of the lib

}

//Error should be very close to 0.0
EXPECT_NEAR(last_error, 0.0, 1e-6);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might fail randomly if the data is not separable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but in this case, there is only one data point ( [1 1 1]^T) so the data is linearly separable (in an infinite number of ways) so it shouldn't fail no?

std::random_device rd{};
std::mt19937 gen{rd()};
normal_distribution<> d{0,1};
Matrix<var, 2, 3> W1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use N and D

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what you mean here. What's N and D? Do you mean the variable names?

std::mt19937 gen{rd()};
normal_distribution<> d{0,1};
Matrix<var, 2, 3> W1;
for(int i=0; i<2; ++i)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will change it.

{
error += (outputs(i,0)-1)*(outputs(i,0)-1); //square error
}
error.grad(); //Calculate the gradient
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

superfluous comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will get rid of it.

@karlnapf
Copy link
Member

karlnapf commented Apr 2, 2018

Quite cool to have stan in. Let's get going with this patch!
There are some open requests by viktor @FaroukY

@vigsterkr
Copy link
Member

@FaroukY i've use the same docker image that is being used for travis CI jobs: shogun/shogun-dev

@vigsterkr
Copy link
Member

@FaroukY basically the problems are the following:

  1. you are using a system where you already have libboost installed (probably under /usr/include...)
  2. you have not tested the cmake script with the given GIT_TAG (the boost versions are different)
  3. the way you handle STAN_INCLUDE_DIR*... those are not going to automagically install themselves under include/shogun/..... you need to take care of that.

@@ -0,0 +1,73 @@
#include <gtest/gtest.h>
#include <stan/math.hpp>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not so sure if this is the best way... namely that you are including stan like this.
meaning what if for some reason stan/math.hpp is in the include path already... probably the -isystem inclusion will take care of this, but maybe it'd be better that we explicitly specify in the include path that it's shogun's third-party include path... on the other hand probably this will actually break everything else within stan's headers....

it's more important actually that stan usage (or a model that is actually using stan) is being tested from a libshogun example. namely that the cmake target flags are set correctly when shogun is being used from a third-party lib, or just a simple CLI. note that it requires that either you:

  • either include stan only in cpp files of shogun, or
  • you expose the include flags in ShogunTarget.cmake; currently it's PUBLIC SYSTEM which should (? does it really? should be tested) reflect the correct exposure of include flags.... but i'm not so sure if we want to expose stan to the public, i would presume it's safe to think that we only use stan in the implementation of the models.

@vigsterkr
Copy link
Member

@FaroukY any update?

@FaroukY
Copy link
Contributor Author

FaroukY commented Apr 5, 2018

@vigsterkr In progress. I just set up my environment with Docker (was new to it) and started changing the cmake and fixing the problems. I am a bit tied down these coming few days but I am working a bit on it everyday. Also was handling the other prs.

@FaroukY
Copy link
Contributor Author

FaroukY commented Apr 5, 2018

@vigsterkr It appears that the only thing that was wrong here is that I miswrote the boost directory from 1.66 to 1.64. Once I changed 1.66 to 1.64, the build compiled and finished. For now, I will:

  1. Change the directory of Stan from include/shogun/lib/external/Stan to include/shogun/third_party/Stan

  2. Retest and push again on this branch.

  3. Address the changes in the test mentioned above.

Also, it looks like there are two jobs in Travis which timed out (one during the testing phase) so we can assume the build worked for it.

@vigsterkr
Copy link
Member

@FaroukY there's still a major part missing: installation of the Stan headers in the right folder. the tests are working because you rely on the fact that everything is available in build time. but once you ship this, the stan related directories are not going to be available as you dont copy them with make install

@FaroukY
Copy link
Contributor Author

FaroukY commented Apr 6, 2018

@vigsterkr I am new to cmake so I'm a bit confused here. When I make and make install on my Docker machine, this seems to build and install correctly. What do you mean by "but once you ship this, the stan related directories are not going to be available" and how can I test this.

Also, any guidance on what the next steps might be would be super helpful since I've been reading on Cmake for quite a few days now and I'm not sure where to go from here.

@vigsterkr
Copy link
Member

@FaroukY ok so when you do make install do you have files under /usr/local/include/shogun/lib/external/Stan ? because i doubt so... as there's nothing instructing cmake to install those files anywhere.

@FaroukY
Copy link
Contributor Author

FaroukY commented Apr 16, 2018

@vigsterkr Yes, just tried it. I have the file directories /usr/local/include/shogun/lib/external/Stan, /usr/local/include/shogun/lib/external/Stan_Boost, and /usr/local/include/shogun/lib/external/Stan_Cvodes so I think cmake automatically puts them there.

@FaroukY
Copy link
Contributor Author

FaroukY commented Apr 16, 2018

@vigsterkr Actually it was my mistake. I forgot to commit the changes to this branch facepalm so now it has the Install commands. Should be okay now. Installs the 3 libraries into /usr/local/include/shogun/lib/external/*

Copy link
Member

@vigsterkr vigsterkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there has been some changes in develop... see my comments


SHOGUN_INCLUDE_DIRS(SCOPE PUBLIC SYSTEM
$<BUILD_INTERFACE:${STAN_INCLUDE_DIR_STAN_MATH}>
$<INSTALL_INTERFACE:include/shogun/lib/external/Stan>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please rebase this over latest develop and change this to include/shogun/third_party/Stan

)
SHOGUN_INCLUDE_DIRS(SCOPE PUBLIC SYSTEM
$<BUILD_INTERFACE:${STAN_INCLUDE_DIR_BOOST}>
$<INSTALL_INTERFACE:include/shogun/lib/external/Stan_Boost>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please rebase this over latest develop and change this to include/shogun/third_party/Stan_Boost

)
SHOGUN_INCLUDE_DIRS(SCOPE PUBLIC SYSTEM
$<BUILD_INTERFACE:${STAN_INCLUDE_DIR_CVODES}>
$<INSTALL_INTERFACE:include/shogun/lib/external/Stan_Cvodes>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please rebase this over latest develop and change this to include/shogun/third_party/Stan_Cvodes

set(STAN_INCLUDE_DIR_CVODES ${CMAKE_BINARY_DIR}/StanMath/src/StanMath/lib/cvodes_2.9.0/include)


INSTALL( DIRECTORY ${STAN_INCLUDE_DIR_STAN_MATH} DESTINATION include/shogun/lib/external/Stan )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of this you should actually copy these files into ${THIRD_PARTY_INCLUDE_DIR}/Stan

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove these installs.... if you do a cmake copy thing then you will not have problems with the logs in travis.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check how COMMAND ${CMAKE_COMMAND} -E copy_if_different is being used in shogun

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and the best place to do this copy would be ExternalProject_Add's INSTALL_COMMAND

@FaroukY
Copy link
Contributor Author

FaroukY commented May 4, 2018

@vigsterkr Did the required changes and its not rebased with develop. However, I had a look at travis and its failed because the log messages of installing Stan is exceeding 4Mb (Since there are many files that are being installed). But the build itself is building fine.

@vigsterkr
Copy link
Member

@FaroukY yeah first we need a rebase over develop and then fix those things i've mentioned i'll think about how to fix travis limitations.

@FaroukY
Copy link
Contributor Author

FaroukY commented May 4, 2018

@vigsterkr Did the required changes and the problem is not there anymore. However, I'm failing one of the tests: 1 - unit-TrainedModelSerializationTest (SEGFAULT) even though my PR has no relation to that test. Any ideas here?

@vigsterkr
Copy link
Member

@FaroukY the segfault is unrelated, there's currently a bug in develop branch, so we can ignore that

@vigsterkr
Copy link
Member

@FaroukY please check/fix the errors that clang-format reports, see: https://travis-ci.org/shogun-toolbox/shogun/jobs/375012262#L722

@FaroukY
Copy link
Contributor Author

FaroukY commented May 5, 2018

@vigsterkr Done, clang-format passes now.

In this example, we want to learn the weights W such that the square
Error loss from the output of the perceptron to [1 1]^T is minimized.
Since we can find weights from [1 1 1]^T to [1 1]^T in a perceptron,
this error should be very close to zero after 100 epochs.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat job documenting!

std::mt19937 gen{rd()};
normal_distribution<> d{0, 1};
Matrix<var, 2, 3> W1;
for (int i = 0; i < 2; ++i)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to update, just informational. A small improvement would be to query the matrix dimensions instead of hardcoding them in the loops.

@vigsterkr
Copy link
Member

@FaroukY perfect! merging!

@vigsterkr vigsterkr merged commit 49e35df into shogun-toolbox:develop May 6, 2018
ktiefe pushed a commit to ktiefe/shogun that referenced this pull request Jul 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants