Vectorized RNG #4437

saatvikshah · 2018-12-15T21:03:23Z

Addresses #4430

Only adds Uniform Random capability for now:

SGMatrix
SGVector
Benchmark

Edit: Will add Normal random in a different PR.

saatvikshah · 2018-12-15T21:08:04Z

src/shogun/lib/SGMatrix.h

+		template <typename U>
+		using enable_simd_float = std::enable_if_t<is_any<U, float64_t>::value>;
+		template <typename U = T>
+		auto random() -> enable_simd_float<U>


separating declaration and definition would need explicit specialization of each template method signature - which would end up being really long. Since the methods are relatively short it seemed ok to keep both in the header file.

The only disadvantage is that an additional include of Math.h has to be added here

So is this the same as writing:

template <typename T> typename std::enable_if_t<is_any<T, float64_t>::value, void>> random() { ... }

Yep - Actually 3 versions are possible which mean the same:

https://coliru.stacked-crooked.com/a/bc08035924d3582e

https://coliru.stacked-crooked.com/a/7cc61a4adcdd5ded

https://coliru.stacked-crooked.com/a/45dba741ddf2d4a0

Mine is a slightly more readable version of (2) :D

OK! It is more readable indeed! Why do you need the template <typename U = T> though? Doesn't template <typename T> work?

SFINAE only works on templated functions. A method inside a template class doesnt count. The function itself has to also be templated. As mentioned in CPPReference This rule applies during overload resolution of function templates

Btw I wrote up a short blog post on this topic - more as a point of reference but here it is in case anyone is interested :)

Ah true! Thanks, I'll have a look!

saatvikshah · 2018-12-15T21:08:54Z

src/shogun/lib/SGMatrix.cpp

@@ -1193,6 +1192,7 @@ SGVector<T> SGMatrix<T>::get_diagonal_vector() const
 	return diag;
 }

+// Explicit Specialization of Templates


unneeded - will revert

gf712

Looks good! @karlnapf It might be worth putting the alias definitions in another file and then reuse them no? I am assuming these are the same for the random number generation of the other containers?

gf712 · 2018-12-17T19:21:53Z

src/shogun/lib/SGMatrix.h

+		{
+			CRandom r{};
+			r.fill_array_co(
+			    static_cast<float64_t*>(matrix), num_rows * num_cols);


Here you could write static_cast<U*>(matrix), num_rows * num_cols) instead, no?

Yep I could. Will do so!

Yea I wanted to get the Thumbs up on this version. Adding it for SGVector shouldnt take too long on top of this. Dont think there's any other container right?

Also the type trait is_any <- I think this should go in some other file where it can be reused. Maybe common.h?

I think so, you will need @karlnapf to get back to you on that one though. Btw I think you will need to rename is_any, because there is an Any class in shogun, which might cause some confusion. Is is_any_of too verbose?
Btw if you always use the value of is_any, instead of its type, you could write a is_any_v as a short cut to is_any<...>::value. I think that is more conforming with the direction in which c++ is going. (see for example std::is_void_v)

+1 for the rename...
Actually, we might even go explicit for now as as there are not THAT many combinations...I like it on the other hand, so could also keep it as a generic tool within the lib (under a different name)

karlnapf · 2018-12-18T11:45:56Z

src/shogun/lib/SGMatrix.h

+// Checks if any one of the types matches
+// Ref: https://stackoverflow.com/a/17032517/3656081
+template <typename T, typename... Rest>
+struct is_any : std::false_type


I wonder whether this is the best place to put this?

It might be worth having a file with all the utility type_traits, no? Something like sg_type_traits.h

src/shogun/lib/SGMatrix.h

karlnapf · 2018-12-18T11:47:56Z

src/shogun/lib/SGMatrix.h

+		**/
+		// vectorized available
+		template <typename U>
+		using enable_simd_float = std::enable_if_t<is_any<U, float64_t>::value>;


Cute!
I am wondering whether we should maybe move all the template magic out of SG* since that is supposed to only be a wrapper class. Maybe Random itself.... or linalg?

I think it should stay here - Ive noticed that macros have been used at come places to replicate this functionality which imho is a bad practice. eg. here. Keeping such examples would help push towards using SFINAE/tag dispatch for overload resolution.

Totally right!
If you want to have a stab at doing some of this in a better way, feel free to submit a draft PR for discussing it! Fixing the equals would be a start

Sure, I could look up and try to submit a few more once this is done.

karlnapf · 2018-12-18T11:49:52Z

src/shogun/lib/SGMatrix.h

+			CRandom r{};
+			// Casting floats upwards or downwards is safe
+			// Ref: https://stackoverflow.com/a/36840390/3656081
+			for (index_t i = 0; i < num_rows * num_cols; i++)


std::for_each ?

std::for_each wont work with a stateful lambda which would be needed here [&r]{...r.random...}?

Why not use std::transform?

std::transform(matrix, matrix+(num_rows * num_cols), matrix, [&r](auto a){return r.random_half_open();});

Not sure if this causes a performance penalty though.. I think the point of using STL over loops is that we are telling the compiler that we are never going to do something weird in the loop and break out of it. You could benchmark this and check the performance..

I just tried this out on my machine, and both methods seem to take the same amount of time, I am not sure there is a way to speed up either methods.

And btw, I wouldn't recommend this, but you could do this with std::for_each:

std::for_each(matrix, matrix+N, [&r, &matrix](auto a){ static int i = 0; matrix[i] = r.random_half_open(); ++i; });

Or if you're not a fan of static members in lambdas you could do this:

auto lambda = [&r, &matrix, i=0](auto a) mutable { matrix[i] = r.random_half_open(); ++i; }; std::for_each(matrix, matrix+N, lambda);

In any case, I think std::transform is the best STL algorithm for this case.

Btw, I don't know if you know this, but you can capture class members with a lambda like this &matrix=matrix

any thoughts on this @saatvikshah1994 ?

Yep, std::transform is indeed equally fast(or even faster at times especially at lower optimization levels). This is done with std::transform - the for_each version looked a bit hacky :/

karlnapf · 2018-12-18T11:50:09Z

src/shogun/lib/SGMatrix.h

+			}
+		}
+
+		// disabled for the remaining - will throw no matching fn call compile


karlnapf · 2018-12-18T11:51:30Z

tests/unit/lib/SGMatrix_unittest.cc

+TEST(SGMatrixTest, vectorized_rng)
+{
+	auto rng_check_float = [](auto elem) {
+		SGMatrix<decltype(elem)> sg_mat(10, 5);


would there be a away to check it against the individual calling of random with some seed fixing magic?
If that's a pain, dont bother, this test is more to check that no memory errors happen I guess

might be a bit of a chore to implement - especially because I'd have to based on the type use the random_half_open() vs. fill_array_oc version.

yeah don't worry then

karlnapf

Really nice! :)

I put some points that should be discussed in order to make this a bit more maintainable. I love bringing modern C++ to shogun!

Just curious did you benchmark any of this? We have a benchmark folder where programs for such things can be added using a google framework

saatvikshah · 2018-12-19T05:38:39Z

@karlnapf I've added the task to benchmark - definitely intend to and will look into it after adding the SGVector version.

src/shogun/lib/SGMatrix.h

karlnapf · 2018-12-20T16:08:29Z

src/shogun/lib/SGMatrix.h

+		template <typename U = T>
+		auto random() -> enable_simd_float<U>
+		{
+			CRandom r{};


one thing: we usually allocate sgobjects using new for several reasons. It is a style thing but let's adopt that here as well

Umm, sure, I was trying following the CPP Core guideline on this.
Also I guess instead of new I would need to std::make_unique since we want to avoid SG_UNREF?

I'm also curious - why arent SG* objects implemented like STL style containers which have an inbuilt allocator(eg. like std::vector), thus avoiding the use of new. I'm not exactly sure about the complexity involved with that approach. Just curious as I imagine it would make API users especially for oft used containers like SGMatrix quite happy I presume.

Nice questions, @saatvikshah1994 :-)

I'd say the actual reasons are mostly due to history. Shogun has been around for long time, longer than what we consider today modern C++. Also, I personally think the focus and interests have also moved a bit from "hardcore" machine learning programming to software engineering.

It is of course always good to follow the latest trends and guidelines of C++. I've found however that as a project grows and gets bigger such redesigns and/or refactorings become too time consuming without having a clear value in terms of bringing new features. Technical debt.

About your comment on SGMatrix's API. Note that the main SG* guys, SGMatrix and SGVector are referenced data, so there's no need for handmade memory management with those ;-)

About the initialization of CRandom that Heiko mentioned. It has to do with objects residing in the stack vs the heap. The rule-of-thumb I started using at some moment was that any SGObject has to be heap-allocated.

I forgot the exact technical reason. Stepping through a minimal program with gdb could help understanding. But anyway, very vaguely from what I can remember: it had to do with the reference counting mechanism taking place as scopes end.

Anyhow, I think should be able to follow the CPP core guidelines for this initialization: just wrap your SGObject inside a shogun::Some (aka std::shared_ptr).

src/shogun/lib/SGVector.h

src/shogun/lib/sg_type_traits.h

karlnapf · 2018-12-20T16:10:46Z

src/shogun/lib/sg_type_traits.h

+
+template <typename T, typename First, typename... Rest>
+struct is_any_of<T, First, Rest...>
+    : std::integral_constant<bool, std::is_same<T, First>::value ||


btw I think I defined is_sg_base somewhere...we could move that into this file

should definitely be grouped a bit, did you have a look at the is_sg_base?

yep, ive moved it over

karlnapf

Almost mergable from my side.
I would love to get @vigsterkr or @lisitsyn comments here before we do this

And maybe we can start cleaning up macro definitions that were done pre c++11 as a side/next step.

karlnapf · 2018-12-23T09:50:15Z

@iglesias maybe you also have comments

iglesias · 2019-01-04T13:07:06Z

src/shogun/lib/SGMatrix.h

+		**/
+		// vectorized available
+		template <typename U>
+		using enable_simd_float = std::enable_if_t<is_any_of_v<U, float64_t>>;


Just out of curiosity, std::is_same_v would make it here as well instead of is_any_of_v, yes? No need to update anything.

Yep it would, I just kept it for consistency. Also is_same_v isnt part of STL till C++17. So I would have to add that too.

I added a sg_is_same_v for now and have used that. Guess it might be useful at other places too and can be replaced by the STL equivalent when you'll decide to bump up the C++ version.

yes, thx for that, maybe (if not yet done) put a comment that indicated that this can be removed/replaced

iglesias · 2019-01-04T13:17:18Z

Nice work @saatvikshah1994, thank you.

I was wondering, would you like to experiment with concepts to implement what you have done here with traits and sfinae in sgvector and sgmatrix? If you're interested we can discuss further what would be the possible advantages and gains. This could be done in a separate branch and discussed in another pull request.

karlnapf · 2019-01-04T16:22:18Z

Will merge this tomorrow if no more input from @lisitsyn @vigsterkr thx!

vigsterkr · 2019-01-04T16:23:20Z

Hold your horses plz needs some changes....

…

On 4 Jan 2019, at 17:22, Heiko Strathmann ***@***.***> wrote: Will merge this tomorrow if no more input from @lisitsyn @vigsterkr thx! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

…ed_rng

saatvikshah · 2019-01-05T07:58:42Z

Nice work @saatvikshah1994, thank you.

I was wondering, would you like to experiment with concepts to implement what you have done here with traits and sfinae in sgvector and sgmatrix? If you're interested we can discuss further what would be the possible advantages and gains. This could be done in a separate branch and discussed in another pull request.

Sure, I'm game! Do you want me to create an issue and we can discuss there? Or on IRC first?

saatvikshah · 2019-01-12T10:36:29Z

@iglesias the column explanations can be found here: google/benchmark#397 (comment)
And yes, theres a fix needed in the bmark - I should be multiplying by sizeof(float64_t) in the SetBytesProcessed call everywhere.

karlnapf · 2019-01-12T18:04:52Z

@karlnapf Im currently blocked on the benchmark by #4452 .

noted! Looks like some sanitizer updates necessary....nice ! :)

karlnapf · 2019-01-12T18:06:50Z

Really like the results table! Well well done! So useful and a great example of how we can make things faster with some clever thinking about low hanging fruits.

saatvikshah · 2019-01-13T00:28:12Z

@karlnapf Im currently blocked on the benchmark by #4452 .

noted! Looks like some sanitizer updates necessary....nice ! :)

Actually, I was blocked on Google benchmark not building on Mac/Ubuntu. This has been resolved and hence the analysis table above. The sanitizer problem is a bit tangential and Ive worked around it by using a VM/ @gf712 's docker image. I think this PR is ready to go unless there are some pending comments?

karlnapf · 2019-01-13T08:55:39Z

@karlnapf Im currently blocked on the benchmark by #4452 .

noted! Looks like some sanitizer updates necessary....nice ! :)

Actually, I was blocked on Google benchmark not building on Mac/Ubuntu. This has been resolved and hence the analysis table above. The sanitizer problem is a bit tangential and Ive worked around it by using a VM/ @gf712 's docker image. I think this PR is ready to go unless there are some pending comments?

Yes well there are @vigsterkr comments about the seed. Was there any discussion regarding that? I am not updated to the latest I guess

iglesias · 2019-01-14T12:21:38Z

I didn’t see an explanation there about the two results. Maybe I missed something. Anyhow, it is not important.

…

On Sat, 12 Jan 2019 at 11:36, Saatvik Shah ***@***.***> wrote: @iglesias <https://github.com/iglesias> the column explanations can be found here: google/benchmark#397 (comment) <google/benchmark#397 (comment)> And yes, theres a fix needed in the bmark - I should be multiplying by sizeof(float64_t) in the SetBytesProcessed call everywhere. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#4437 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABGrdkfFt8dUOlvk4Jqut4cnWZaKbf1jks5vCbqzgaJpZM4ZU5h7> .

iglesias · 2019-01-14T12:21:59Z

The two time* results. On Mon, 14 Jan 2019 at 13:21, Fernando J. Iglesias García < fernando.iglesiasg@gmail.com> wrote:

…

I didn’t see an explanation there about the two results. Maybe I missed something. Anyhow, it is not important. On Sat, 12 Jan 2019 at 11:36, Saatvik Shah ***@***.***> wrote: > @iglesias <https://github.com/iglesias> the column explanations can be > found here: google/benchmark#397 (comment) > <google/benchmark#397 (comment)> > And yes, theres a fix needed in the bmark - I should be multiplying by > sizeof(float64_t) in the SetBytesProcessed call everywhere. > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#4437 (comment)>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/ABGrdkfFt8dUOlvk4Jqut4cnWZaKbf1jks5vCbqzgaJpZM4ZU5h7> > . >

saatvikshah · 2019-01-15T00:21:02Z

@iglesias the column explanations can be found here: google/benchmark#397 (comment)
And yes, theres a fix needed in the bmark - I should be multiplying by sizeof(float64_t) in the SetBytesProcessed call everywhere.

@iglesias had linked to an answer here by the Google benchmark devs about the time columns. In summary:

Time: Avg Wall time per iteration(This is generally a combination of time taken by the CPU + IO)
CPU: This is indicative of time taken by the CPU alone.

Looking at (1) and (2) we can determine where we are IO bound and where CPU bound.

And yes, what you've written about the last column is correct.

karlnapf · 2019-01-21T14:20:36Z

Hi!
While the concepts PR is great to get going on this new pattern, I still would like to get this one here merged soon. There is the issue with the seed. Any comments on this anyone?

iglesias · 2019-01-21T16:21:31Z

About the seed, I think that was already solved in the latest commit in the PR: 6a97846. See also comments above from @saatvikshah1994 saying it was done.

karlnapf · 2019-01-23T14:41:20Z

Ok let's merge this then :) I will wait one more day and then do it if nobody else complains @lisitsyn @vigsterkr

I think cool next steps would be:

actually replace looped rng usage in shogun with this new API
or write an entrance task for GSoC students to do this

EDIT: actually reliased the randn is not yet included so I guess there will be little usage in loops as it is now ... We can do this once the normal random is in

vigsterkr · 2019-01-23T14:42:36Z

@karlnapf i wouldnt put more effort into this for the time being as we wanna drop some of the PRNG stuff as they are flawed by design atm. see feature/random-refactor

karlnapf · 2019-01-23T14:44:00Z

I guess the effort is already done? So this is still better than what we have atm. We could stop pushing this forward then of course. But I think can still go in as an example of the overload dispatching?

vigsterkr · 2019-01-23T14:45:34Z

a) where is this code being used? meaning the SGVector/Matrix.random()
b) how do you set the seed of this?

karlnapf · 2019-01-23T14:47:59Z

Alright, agreed actually

karlnapf · 2019-01-23T14:48:33Z

@saatvikshah1994 maybe have a look at the feature branch to get a feeling for what @vigsterkr is talking about

saatvikshah · 2019-01-24T15:56:28Z

Sure, I'll take a look

iglesias · 2019-01-24T15:58:06Z

@vigsterkr @karlnapf please guys have a look at the commit 6a97846. I think that one is addressing the seed concern.

saatvikshah · 2019-02-03T14:22:07Z

@vigsterkr @karlnapf please guys have a look at the commit 6a97846. I think that one is addressing the seed concern.

@vigsterkr @karlnapf The commit mentioned by @iglesias does seem to be correctly setting the seed. I was able to reproduce correct seed behavior by this code snippet:

auto rng_check_float = [](auto elem) {
		SGVector<decltype(elem)> sg_vec(50);
		sg_rand->set_seed(200);
		sg_vec.random();
		for (int i = 0; i < sg_vec.size(); i++)
		{
			std::cout << sg_vec[i] << " ";
			EXPECT_LE(0.0, sg_vec[i]);
			EXPECT_GT(1.0, sg_vec[i]);
		}
		std::cout << std::endl;
	};
	rng_check_float(float32_t{0.0});
// Seed 0 always produces: 0.437919 0.339959 0.733059 0.113451 0.624103 0.895501 0.638084 0.84145 0.608526 0.546932 0.620833 0.611356 ...
// Seed 200 always produces: 0.917502 0.500624 0.971984 0.736531 0.613905 0.559259 0.781844 0.910769 0.432283 0.646343 0.689275

karlnapf · 2019-02-05T17:23:03Z

@saatvikshah1994 thanks for the ping. The point beside the sees is that the whole design of the random class in shogun is flawed. This is why Viktor mentioned to not put any more effort into it. That said, we could of course still use your updates until we do a change, but the second point is that the uniform random number generator is not really used within shogun. This doesn't mean the PR is not useful. Quite the opposite is true, it is a really nice demonstration on how to gain speedups with vectorization as well as using better design patterns, so it is very useful for future reference. As said, for now, check the feature branch on how we can improve the random number generation in shogun in general. We should get the same ideas that you developed you in this PR into the feature branch and then put the efforts into pushing this forward rather than tuning deprecated designs.

stale · 2020-02-26T15:52:25Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

gf712 · 2020-02-26T16:39:17Z

@vigsterkr is this equivalent to what @theartful did?

stale · 2020-08-24T17:19:01Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

karlnapf · 2020-08-28T14:16:23Z

Closing due as it is outdated and also we had some discussions whether we want to use this old design in the past

Vectorized RNG in SGMatrix

424e61e

saatvikshah commented Dec 15, 2018

View reviewed changes

gf712 reviewed Dec 17, 2018

View reviewed changes

karlnapf reviewed Dec 18, 2018

View reviewed changes

src/shogun/lib/SGMatrix.h Show resolved Hide resolved

karlnapf reviewed Dec 18, 2018

View reviewed changes

src/shogun/lib/SGMatrix.h Outdated Show resolved Hide resolved

karlnapf reviewed Dec 18, 2018

View reviewed changes

saatvikshah added 2 commits December 19, 2018 01:41

SGVector Vectorized RNG

0c59ece

Apply Formatting changes

8c7fe7f

karlnapf requested a review from vigsterkr December 20, 2018 16:06

karlnapf reviewed Dec 20, 2018

View reviewed changes

src/shogun/lib/SGMatrix.h Outdated Show resolved Hide resolved

karlnapf reviewed Dec 20, 2018

View reviewed changes

src/shogun/lib/SGVector.h Outdated Show resolved Hide resolved

karlnapf reviewed Dec 20, 2018

View reviewed changes

src/shogun/lib/sg_type_traits.h Outdated Show resolved Hide resolved

karlnapf reviewed Dec 20, 2018

View reviewed changes

karlnapf mentioned this pull request Dec 23, 2018

clean up explicit template specializations using SFINAE #4449

Open

iglesias reviewed Jan 4, 2019

View reviewed changes

Merge remote-tracking branch 'upstream/develop' into feature/vectoriz…

e0bd50e

…ed_rng

sg_rand usage for SGVector

6a97846

saatvikshah mentioned this pull request Feb 4, 2019

Concepts POC #4477

Closed

stale bot added the stale label Feb 26, 2020

stale bot removed the stale label Feb 26, 2020

stale bot added the stale label Aug 24, 2020

karlnapf closed this Aug 28, 2020

Vectorized RNG #4437

Vectorized RNG #4437

Conversation

saatvikshah commented Dec 15, 2018 • edited

Choose a reason for hiding this comment

gf712 Dec 17, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gf712 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gf712 Dec 19, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

karlnapf left a comment

Choose a reason for hiding this comment

saatvikshah commented Dec 19, 2018

Choose a reason for hiding this comment

saatvikshah Jan 5, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

karlnapf left a comment

Choose a reason for hiding this comment

karlnapf commented Dec 23, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

iglesias commented Jan 4, 2019

karlnapf commented Jan 4, 2019

vigsterkr commented Jan 4, 2019 via email

saatvikshah commented Jan 5, 2019

saatvikshah commented Jan 12, 2019

karlnapf commented Jan 12, 2019

karlnapf commented Jan 12, 2019

saatvikshah commented Jan 13, 2019

karlnapf commented Jan 13, 2019

iglesias commented Jan 14, 2019 via email

iglesias commented Jan 14, 2019 via email

saatvikshah commented Jan 15, 2019

karlnapf commented Jan 21, 2019

iglesias commented Jan 21, 2019

karlnapf commented Jan 23, 2019 • edited

vigsterkr commented Jan 23, 2019

karlnapf commented Jan 23, 2019

saatvikshah commented Dec 15, 2018 •

edited

gf712 Dec 17, 2018 •

edited

gf712 Dec 19, 2018 •

edited

saatvikshah Jan 5, 2019 •

edited

karlnapf commented Jan 23, 2019 •

edited