Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vectorized RNG #4437

Closed

Conversation

saatvikshah
Copy link
Contributor

@saatvikshah saatvikshah commented Dec 15, 2018

Addresses #4430

Only adds Uniform Random capability for now:

  • SGMatrix
  • SGVector
  • Benchmark

Edit: Will add Normal random in a different PR.

template <typename U>
using enable_simd_float = std::enable_if_t<is_any<U, float64_t>::value>;
template <typename U = T>
auto random() -> enable_simd_float<U>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

separating declaration and definition would need explicit specialization of each template method signature - which would end up being really long. Since the methods are relatively short it seemed ok to keep both in the header file.

The only disadvantage is that an additional include of Math.h has to be added here

Copy link
Member

@gf712 gf712 Dec 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So is this the same as writing:

template <typename T>
typename std::enable_if_t<is_any<T, float64_t>::value, void>>
random()
{
...
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep - Actually 3 versions are possible which mean the same:

  1. https://coliru.stacked-crooked.com/a/bc08035924d3582e
  2. https://coliru.stacked-crooked.com/a/7cc61a4adcdd5ded
  3. https://coliru.stacked-crooked.com/a/45dba741ddf2d4a0

Mine is a slightly more readable version of (2) :D

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK! It is more readable indeed! Why do you need the template <typename U = T> though? Doesn't template <typename T> work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SFINAE only works on templated functions. A method inside a template class doesnt count. The function itself has to also be templated. As mentioned in CPPReference This rule applies during overload resolution of function templates

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw I wrote up a short blog post on this topic - more as a point of reference but here it is in case anyone is interested :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah true! Thanks, I'll have a look!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah cool!

@@ -1193,6 +1192,7 @@ SGVector<T> SGMatrix<T>::get_diagonal_vector() const
return diag;
}

// Explicit Specialization of Templates
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unneeded - will revert

Copy link
Member

@gf712 gf712 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! @karlnapf It might be worth putting the alias definitions in another file and then reuse them no? I am assuming these are the same for the random number generation of the other containers?

{
CRandom r{};
r.fill_array_co(
static_cast<float64_t*>(matrix), num_rows * num_cols);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you could write static_cast<U*>(matrix), num_rows * num_cols) instead, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep I could. Will do so!

Yea I wanted to get the Thumbs up on this version. Adding it for SGVector shouldnt take too long on top of this. Dont think there's any other container right?

Also the type trait is_any <- I think this should go in some other file where it can be reused. Maybe common.h?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, you will need @karlnapf to get back to you on that one though. Btw I think you will need to rename is_any, because there is an Any class in shogun, which might cause some confusion. Is is_any_of too verbose?
Btw if you always use the value of is_any, instead of its type, you could write a is_any_v as a short cut to is_any<...>::value. I think that is more conforming with the direction in which c++ is going. (see for example std::is_void_v)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for the rename...
Actually, we might even go explicit for now as as there are not THAT many combinations...I like it on the other hand, so could also keep it as a generic tool within the lib (under a different name)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// Checks if any one of the types matches
// Ref: https://stackoverflow.com/a/17032517/3656081
template <typename T, typename... Rest>
struct is_any : std::false_type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether this is the best place to put this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth having a file with all the utility type_traits, no? Something like sg_type_traits.h

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep!

src/shogun/lib/SGMatrix.h Outdated Show resolved Hide resolved
**/
// vectorized available
template <typename U>
using enable_simd_float = std::enable_if_t<is_any<U, float64_t>::value>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cute!
I am wondering whether we should maybe move all the template magic out of SG* since that is supposed to only be a wrapper class. Maybe Random itself.... or linalg?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should stay here - Ive noticed that macros have been used at come places to replicate this functionality which imho is a bad practice. eg. here. Keeping such examples would help push towards using SFINAE/tag dispatch for overload resolution.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally right!
If you want to have a stab at doing some of this in a better way, feel free to submit a draft PR for discussing it! Fixing the equals would be a start

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I could look up and try to submit a few more once this is done.

CRandom r{};
// Casting floats upwards or downwards is safe
// Ref: https://stackoverflow.com/a/36840390/3656081
for (index_t i = 0; i < num_rows * num_cols; i++)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::for_each ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::for_each wont work with a stateful lambda which would be needed here [&r]{...r.random...}?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use std::transform?

std::transform(matrix, matrix+(num_rows * num_cols), matrix, [&r](auto a){return r.random_half_open();});

Not sure if this causes a performance penalty though.. I think the point of using STL over loops is that we are telling the compiler that we are never going to do something weird in the loop and break out of it. You could benchmark this and check the performance..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tried this out on my machine, and both methods seem to take the same amount of time, I am not sure there is a way to speed up either methods.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And btw, I wouldn't recommend this, but you could do this with std::for_each:

std::for_each(matrix, matrix+N, 
	[&r, &matrix](auto a){
		static int i = 0; 
		matrix[i] = r.random_half_open(); 
		++i; 
	});

Copy link
Member

@gf712 gf712 Dec 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or if you're not a fan of static members in lambdas you could do this:

auto lambda = [&r, &matrix, i=0](auto a) mutable {
		matrix[i] = r.random_half_open(); 
		++i; 
	};
std::for_each(matrix, matrix+N, lambda);

In any case, I think std::transform is the best STL algorithm for this case.

Btw, I don't know if you know this, but you can capture class members with a lambda like this &matrix=matrix

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any thoughts on this @saatvikshah1994 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, std::transform is indeed equally fast(or even faster at times especially at lower optimization levels). This is done with std::transform - the for_each version looked a bit hacky :/

}
}

// disabled for the remaining - will throw no matching fn call compile
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

TEST(SGMatrixTest, vectorized_rng)
{
auto rng_check_float = [](auto elem) {
SGMatrix<decltype(elem)> sg_mat(10, 5);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would there be a away to check it against the individual calling of random with some seed fixing magic?
If that's a pain, dont bother, this test is more to check that no memory errors happen I guess

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be a bit of a chore to implement - especially because I'd have to based on the type use the random_half_open() vs. fill_array_oc version.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah don't worry then

Copy link
Member

@karlnapf karlnapf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice! :)

I put some points that should be discussed in order to make this a bit more maintainable. I love bringing modern C++ to shogun!

Just curious did you benchmark any of this? We have a benchmark folder where programs for such things can be added using a google framework

@saatvikshah
Copy link
Contributor Author

@karlnapf I've added the task to benchmark - definitely intend to and will look into it after adding the SGVector version.

src/shogun/lib/SGMatrix.h Outdated Show resolved Hide resolved
template <typename U = T>
auto random() -> enable_simd_float<U>
{
CRandom r{};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one thing: we usually allocate sgobjects using new for several reasons. It is a style thing but let's adopt that here as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Umm, sure, I was trying following the CPP Core guideline on this.
Also I guess instead of new I would need to std::make_unique since we want to avoid SG_UNREF?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also curious - why arent SG* objects implemented like STL style containers which have an inbuilt allocator(eg. like std::vector), thus avoiding the use of new. I'm not exactly sure about the complexity involved with that approach. Just curious as I imagine it would make API users especially for oft used containers like SGMatrix quite happy I presume.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice questions, @saatvikshah1994 :-)

I'd say the actual reasons are mostly due to history. Shogun has been around for long time, longer than what we consider today modern C++. Also, I personally think the focus and interests have also moved a bit from "hardcore" machine learning programming to software engineering.

It is of course always good to follow the latest trends and guidelines of C++. I've found however that as a project grows and gets bigger such redesigns and/or refactorings become too time consuming without having a clear value in terms of bringing new features. Technical debt.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About your comment on SGMatrix's API. Note that the main SG* guys, SGMatrix and SGVector are referenced data, so there's no need for handmade memory management with those ;-)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About the initialization of CRandom that Heiko mentioned. It has to do with objects residing in the stack vs the heap. The rule-of-thumb I started using at some moment was that any SGObject has to be heap-allocated.

I forgot the exact technical reason. Stepping through a minimal program with gdb could help understanding. But anyway, very vaguely from what I can remember: it had to do with the reference counting mechanism taking place as scopes end.

Anyhow, I think should be able to follow the CPP core guidelines for this initialization: just wrap your SGObject inside a shogun::Some (aka std::shared_ptr).

src/shogun/lib/SGVector.h Outdated Show resolved Hide resolved

template <typename T, typename First, typename... Rest>
struct is_any_of<T, First, Rest...>
: std::integral_constant<bool, std::is_same<T, First>::value ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw I think I defined is_sg_base somewhere...we could move that into this file

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should definitely be grouped a bit, did you have a look at the is_sg_base?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, ive moved it over

Copy link
Member

@karlnapf karlnapf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost mergable from my side.
I would love to get @vigsterkr or @lisitsyn comments here before we do this

And maybe we can start cleaning up macro definitions that were done pre c++11 as a side/next step.

@karlnapf
Copy link
Member

@iglesias maybe you also have comments

**/
// vectorized available
template <typename U>
using enable_simd_float = std::enable_if_t<is_any_of_v<U, float64_t>>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just out of curiosity, std::is_same_v would make it here as well instead of is_any_of_v, yes? No need to update anything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep it would, I just kept it for consistency. Also is_same_v isnt part of STL till C++17. So I would have to add that too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a sg_is_same_v for now and have used that. Guess it might be useful at other places too and can be replaced by the STL equivalent when you'll decide to bump up the C++ version.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, thx for that, maybe (if not yet done) put a comment that indicated that this can be removed/replaced

@iglesias
Copy link
Collaborator

iglesias commented Jan 4, 2019

Nice work @saatvikshah1994, thank you.

I was wondering, would you like to experiment with concepts to implement what you have done here with traits and sfinae in sgvector and sgmatrix? If you're interested we can discuss further what would be the possible advantages and gains. This could be done in a separate branch and discussed in another pull request.

@karlnapf
Copy link
Member

karlnapf commented Jan 4, 2019

Will merge this tomorrow if no more input from @lisitsyn @vigsterkr thx!

@vigsterkr
Copy link
Member

vigsterkr commented Jan 4, 2019 via email

@saatvikshah
Copy link
Contributor Author

Nice work @saatvikshah1994, thank you.

I was wondering, would you like to experiment with concepts to implement what you have done here with traits and sfinae in sgvector and sgmatrix? If you're interested we can discuss further what would be the possible advantages and gains. This could be done in a separate branch and discussed in another pull request.

Sure, I'm game! Do you want me to create an issue and we can discuss there? Or on IRC first?

@saatvikshah
Copy link
Contributor Author

@iglesias the column explanations can be found here: google/benchmark#397 (comment)
And yes, theres a fix needed in the bmark - I should be multiplying by sizeof(float64_t) in the SetBytesProcessed call everywhere.

@karlnapf
Copy link
Member

@karlnapf Im currently blocked on the benchmark by #4452 .

noted! Looks like some sanitizer updates necessary....nice ! :)

@karlnapf
Copy link
Member

Really like the results table! Well well done! So useful and a great example of how we can make things faster with some clever thinking about low hanging fruits.

@saatvikshah
Copy link
Contributor Author

@karlnapf Im currently blocked on the benchmark by #4452 .

noted! Looks like some sanitizer updates necessary....nice ! :)

Actually, I was blocked on Google benchmark not building on Mac/Ubuntu. This has been resolved and hence the analysis table above. The sanitizer problem is a bit tangential and Ive worked around it by using a VM/ @gf712 's docker image. I think this PR is ready to go unless there are some pending comments?

@karlnapf
Copy link
Member

@karlnapf Im currently blocked on the benchmark by #4452 .

noted! Looks like some sanitizer updates necessary....nice ! :)

Actually, I was blocked on Google benchmark not building on Mac/Ubuntu. This has been resolved and hence the analysis table above. The sanitizer problem is a bit tangential and Ive worked around it by using a VM/ @gf712 's docker image. I think this PR is ready to go unless there are some pending comments?

Yes well there are @vigsterkr comments about the seed. Was there any discussion regarding that? I am not updated to the latest I guess

@iglesias
Copy link
Collaborator

iglesias commented Jan 14, 2019 via email

@iglesias
Copy link
Collaborator

iglesias commented Jan 14, 2019 via email

@saatvikshah
Copy link
Contributor Author

@iglesias the column explanations can be found here: google/benchmark#397 (comment)
And yes, theres a fix needed in the bmark - I should be multiplying by sizeof(float64_t) in the SetBytesProcessed call everywhere.

@iglesias had linked to an answer here by the Google benchmark devs about the time columns. In summary:

  1. Time: Avg Wall time per iteration(This is generally a combination of time taken by the CPU + IO)
  2. CPU: This is indicative of time taken by the CPU alone.

Looking at (1) and (2) we can determine where we are IO bound and where CPU bound.

And yes, what you've written about the last column is correct.

@karlnapf
Copy link
Member

Hi!
While the concepts PR is great to get going on this new pattern, I still would like to get this one here merged soon. There is the issue with the seed. Any comments on this anyone?

@iglesias
Copy link
Collaborator

About the seed, I think that was already solved in the latest commit in the PR: 6a97846. See also comments above from @saatvikshah1994 saying it was done.

@karlnapf
Copy link
Member

karlnapf commented Jan 23, 2019

Ok let's merge this then :) I will wait one more day and then do it if nobody else complains @lisitsyn @vigsterkr

I think cool next steps would be:

  • actually replace looped rng usage in shogun with this new API
  • or write an entrance task for GSoC students to do this

EDIT: actually reliased the randn is not yet included so I guess there will be little usage in loops as it is now ... We can do this once the normal random is in

@vigsterkr
Copy link
Member

@karlnapf i wouldnt put more effort into this for the time being as we wanna drop some of the PRNG stuff as they are flawed by design atm. see feature/random-refactor

@karlnapf
Copy link
Member

I guess the effort is already done? So this is still better than what we have atm. We could stop pushing this forward then of course. But I think can still go in as an example of the overload dispatching?

@vigsterkr
Copy link
Member

a) where is this code being used? meaning the SGVector/Matrix.random()
b) how do you set the seed of this?

@karlnapf
Copy link
Member

Alright, agreed actually

@karlnapf
Copy link
Member

@saatvikshah1994 maybe have a look at the feature branch to get a feeling for what @vigsterkr is talking about

@saatvikshah
Copy link
Contributor Author

Sure, I'll take a look

@iglesias
Copy link
Collaborator

@vigsterkr @karlnapf please guys have a look at the commit 6a97846. I think that one is addressing the seed concern.

@saatvikshah
Copy link
Contributor Author

@vigsterkr @karlnapf please guys have a look at the commit 6a97846. I think that one is addressing the seed concern.

@vigsterkr @karlnapf The commit mentioned by @iglesias does seem to be correctly setting the seed. I was able to reproduce correct seed behavior by this code snippet:

auto rng_check_float = [](auto elem) {
		SGVector<decltype(elem)> sg_vec(50);
		sg_rand->set_seed(200);
		sg_vec.random();
		for (int i = 0; i < sg_vec.size(); i++)
		{
			std::cout << sg_vec[i] << " ";
			EXPECT_LE(0.0, sg_vec[i]);
			EXPECT_GT(1.0, sg_vec[i]);
		}
		std::cout << std::endl;
	};
	rng_check_float(float32_t{0.0});
// Seed 0 always produces: 0.437919 0.339959 0.733059 0.113451 0.624103 0.895501 0.638084 0.84145 0.608526 0.546932 0.620833 0.611356 ...
// Seed 200 always produces: 0.917502 0.500624 0.971984 0.736531 0.613905 0.559259 0.781844 0.910769 0.432283 0.646343 0.689275

@saatvikshah saatvikshah mentioned this pull request Feb 4, 2019
@karlnapf
Copy link
Member

karlnapf commented Feb 5, 2019

@saatvikshah1994 thanks for the ping. The point beside the sees is that the whole design of the random class in shogun is flawed. This is why Viktor mentioned to not put any more effort into it. That said, we could of course still use your updates until we do a change, but the second point is that the uniform random number generator is not really used within shogun. This doesn't mean the PR is not useful. Quite the opposite is true, it is a really nice demonstration on how to gain speedups with vectorization as well as using better design patterns, so it is very useful for future reference. As said, for now, check the feature branch on how we can improve the random number generation in shogun in general. We should get the same ideas that you developed you in this PR into the feature branch and then put the efforts into pushing this forward rather than tuning deprecated designs.

@stale
Copy link

stale bot commented Feb 26, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Feb 26, 2020
@gf712
Copy link
Member

gf712 commented Feb 26, 2020

@vigsterkr is this equivalent to what @theartful did?

@stale stale bot removed the stale label Feb 26, 2020
@stale
Copy link

stale bot commented Aug 24, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Aug 24, 2020
@karlnapf
Copy link
Member

Closing due as it is outdated and also we had some discussions whether we want to use this old design in the past

@karlnapf karlnapf closed this Aug 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants