New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vectorized RNG #4437
Vectorized RNG #4437
Conversation
template <typename U> | ||
using enable_simd_float = std::enable_if_t<is_any<U, float64_t>::value>; | ||
template <typename U = T> | ||
auto random() -> enable_simd_float<U> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
separating declaration and definition would need explicit specialization of each template method signature - which would end up being really long. Since the methods are relatively short it seemed ok to keep both in the header file.
The only disadvantage is that an additional include of Math.h has to be added here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So is this the same as writing:
template <typename T>
typename std::enable_if_t<is_any<T, float64_t>::value, void>>
random()
{
...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep - Actually 3 versions are possible which mean the same:
- https://coliru.stacked-crooked.com/a/bc08035924d3582e
- https://coliru.stacked-crooked.com/a/7cc61a4adcdd5ded
- https://coliru.stacked-crooked.com/a/45dba741ddf2d4a0
Mine is a slightly more readable version of (2) :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK! It is more readable indeed! Why do you need the template <typename U = T>
though? Doesn't template <typename T>
work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SFINAE only works on templated functions. A method inside a template class doesnt count. The function itself has to also be templated. As mentioned in CPPReference This rule applies during overload resolution of function templates
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw I wrote up a short blog post on this topic - more as a point of reference but here it is in case anyone is interested :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah true! Thanks, I'll have a look!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah cool!
@@ -1193,6 +1192,7 @@ SGVector<T> SGMatrix<T>::get_diagonal_vector() const | |||
return diag; | |||
} | |||
|
|||
// Explicit Specialization of Templates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unneeded - will revert
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! @karlnapf It might be worth putting the alias definitions in another file and then reuse them no? I am assuming these are the same for the random number generation of the other containers?
src/shogun/lib/SGMatrix.h
Outdated
{ | ||
CRandom r{}; | ||
r.fill_array_co( | ||
static_cast<float64_t*>(matrix), num_rows * num_cols); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you could write static_cast<U*>(matrix), num_rows * num_cols)
instead, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep I could. Will do so!
Yea I wanted to get the Thumbs up on this version. Adding it for SGVector
shouldnt take too long on top of this. Dont think there's any other container right?
Also the type trait is_any
<- I think this should go in some other file where it can be reused. Maybe common.h
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so, you will need @karlnapf to get back to you on that one though. Btw I think you will need to rename is_any
, because there is an Any class in shogun, which might cause some confusion. Is is_any_of
too verbose?
Btw if you always use the value of is_any
, instead of its type, you could write a is_any_v
as a short cut to is_any<...>::value
. I think that is more conforming with the direction in which c++ is going. (see for example std::is_void_v)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for the rename...
Actually, we might even go explicit for now as as there are not THAT many combinations...I like it on the other hand, so could also keep it as a generic tool within the lib (under a different name)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
src/shogun/lib/SGMatrix.h
Outdated
// Checks if any one of the types matches | ||
// Ref: https://stackoverflow.com/a/17032517/3656081 | ||
template <typename T, typename... Rest> | ||
struct is_any : std::false_type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder whether this is the best place to put this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth having a file with all the utility type_traits, no? Something like sg_type_traits.h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep!
src/shogun/lib/SGMatrix.h
Outdated
**/ | ||
// vectorized available | ||
template <typename U> | ||
using enable_simd_float = std::enable_if_t<is_any<U, float64_t>::value>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cute!
I am wondering whether we should maybe move all the template magic out of SG* since that is supposed to only be a wrapper class. Maybe Random itself.... or linalg?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should stay here - Ive noticed that macros have been used at come places to replicate this functionality which imho is a bad practice. eg. here. Keeping such examples would help push towards using SFINAE/tag dispatch for overload resolution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally right!
If you want to have a stab at doing some of this in a better way, feel free to submit a draft PR for discussing it! Fixing the equals would be a start
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I could look up and try to submit a few more once this is done.
src/shogun/lib/SGMatrix.h
Outdated
CRandom r{}; | ||
// Casting floats upwards or downwards is safe | ||
// Ref: https://stackoverflow.com/a/36840390/3656081 | ||
for (index_t i = 0; i < num_rows * num_cols; i++) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::for_each ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::for_each
wont work with a stateful lambda which would be needed here [&r]{...r.random...}
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use std::transform
?
std::transform(matrix, matrix+(num_rows * num_cols), matrix, [&r](auto a){return r.random_half_open();});
Not sure if this causes a performance penalty though.. I think the point of using STL over loops is that we are telling the compiler that we are never going to do something weird in the loop and break out of it. You could benchmark this and check the performance..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just tried this out on my machine, and both methods seem to take the same amount of time, I am not sure there is a way to speed up either methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And btw, I wouldn't recommend this, but you could do this with std::for_each
:
std::for_each(matrix, matrix+N,
[&r, &matrix](auto a){
static int i = 0;
matrix[i] = r.random_half_open();
++i;
});
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or if you're not a fan of static members in lambdas you could do this:
auto lambda = [&r, &matrix, i=0](auto a) mutable {
matrix[i] = r.random_half_open();
++i;
};
std::for_each(matrix, matrix+N, lambda);
In any case, I think std::transform
is the best STL algorithm for this case.
Btw, I don't know if you know this, but you can capture class members with a lambda like this &matrix=matrix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any thoughts on this @saatvikshah1994 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, std::transform is indeed equally fast(or even faster at times especially at lower optimization levels). This is done with std::transform
- the for_each
version looked a bit hacky :/
src/shogun/lib/SGMatrix.h
Outdated
} | ||
} | ||
|
||
// disabled for the remaining - will throw no matching fn call compile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
TEST(SGMatrixTest, vectorized_rng) | ||
{ | ||
auto rng_check_float = [](auto elem) { | ||
SGMatrix<decltype(elem)> sg_mat(10, 5); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would there be a away to check it against the individual calling of random with some seed fixing magic?
If that's a pain, dont bother, this test is more to check that no memory errors happen I guess
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be a bit of a chore to implement - especially because I'd have to based on the type use the random_half_open()
vs. fill_array_oc
version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah don't worry then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice! :)
I put some points that should be discussed in order to make this a bit more maintainable. I love bringing modern C++ to shogun!
Just curious did you benchmark any of this? We have a benchmark folder where programs for such things can be added using a google framework
@karlnapf I've added the task to benchmark - definitely intend to and will look into it after adding the |
src/shogun/lib/SGMatrix.h
Outdated
template <typename U = T> | ||
auto random() -> enable_simd_float<U> | ||
{ | ||
CRandom r{}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one thing: we usually allocate sgobjects using new
for several reasons. It is a style thing but let's adopt that here as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Umm, sure, I was trying following the CPP Core guideline on this.
Also I guess instead of new
I would need to std::make_unique
since we want to avoid SG_UNREF
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also curious - why arent SG* objects implemented like STL style containers which have an inbuilt allocator(eg. like std::vector
), thus avoiding the use of new
. I'm not exactly sure about the complexity involved with that approach. Just curious as I imagine it would make API users especially for oft used containers like SGMatrix
quite happy I presume.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice questions, @saatvikshah1994 :-)
I'd say the actual reasons are mostly due to history. Shogun has been around for long time, longer than what we consider today modern C++. Also, I personally think the focus and interests have also moved a bit from "hardcore" machine learning programming to software engineering.
It is of course always good to follow the latest trends and guidelines of C++. I've found however that as a project grows and gets bigger such redesigns and/or refactorings become too time consuming without having a clear value in terms of bringing new features. Technical debt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About your comment on SGMatrix
's API. Note that the main SG* guys, SGMatrix
and SGVector
are referenced data, so there's no need for handmade memory management with those ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About the initialization of CRandom
that Heiko mentioned. It has to do with objects residing in the stack vs the heap. The rule-of-thumb I started using at some moment was that any SGObject
has to be heap-allocated.
I forgot the exact technical reason. Stepping through a minimal program with gdb could help understanding. But anyway, very vaguely from what I can remember: it had to do with the reference counting mechanism taking place as scopes end.
Anyhow, I think should be able to follow the CPP core guidelines for this initialization: just wrap your SGObject
inside a shogun::Some
(aka std::shared_ptr
).
src/shogun/lib/sg_type_traits.h
Outdated
|
||
template <typename T, typename First, typename... Rest> | ||
struct is_any_of<T, First, Rest...> | ||
: std::integral_constant<bool, std::is_same<T, First>::value || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw I think I defined is_sg_base
somewhere...we could move that into this file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should definitely be grouped a bit, did you have a look at the is_sg_base
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, ive moved it over
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost mergable from my side.
I would love to get @vigsterkr or @lisitsyn comments here before we do this
And maybe we can start cleaning up macro definitions that were done pre c++11 as a side/next step.
@iglesias maybe you also have comments |
src/shogun/lib/SGMatrix.h
Outdated
**/ | ||
// vectorized available | ||
template <typename U> | ||
using enable_simd_float = std::enable_if_t<is_any_of_v<U, float64_t>>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just out of curiosity, std::is_same_v
would make it here as well instead of is_any_of_v
, yes? No need to update anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep it would, I just kept it for consistency. Also is_same_v
isnt part of STL till C++17. So I would have to add that too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a sg_is_same_v
for now and have used that. Guess it might be useful at other places too and can be replaced by the STL equivalent when you'll decide to bump up the C++ version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, thx for that, maybe (if not yet done) put a comment that indicated that this can be removed/replaced
Nice work @saatvikshah1994, thank you. I was wondering, would you like to experiment with concepts to implement what you have done here with traits and sfinae in sgvector and sgmatrix? If you're interested we can discuss further what would be the possible advantages and gains. This could be done in a separate branch and discussed in another pull request. |
Will merge this tomorrow if no more input from @lisitsyn @vigsterkr thx! |
Hold your horses plz needs some changes....
… On 4 Jan 2019, at 17:22, Heiko Strathmann ***@***.***> wrote:
Will merge this tomorrow if no more input from @lisitsyn @vigsterkr thx!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Sure, I'm game! Do you want me to create an issue and we can discuss there? Or on IRC first? |
@iglesias the column explanations can be found here: google/benchmark#397 (comment) |
Really like the results table! Well well done! So useful and a great example of how we can make things faster with some clever thinking about low hanging fruits. |
Actually, I was blocked on Google benchmark not building on Mac/Ubuntu. This has been resolved and hence the analysis table above. The sanitizer problem is a bit tangential and Ive worked around it by using a VM/ @gf712 's docker image. I think this PR is ready to go unless there are some pending comments? |
Yes well there are @vigsterkr comments about the seed. Was there any discussion regarding that? I am not updated to the latest I guess |
I didn’t see an explanation there about the two results. Maybe I missed
something.
Anyhow, it is not important.
…On Sat, 12 Jan 2019 at 11:36, Saatvik Shah ***@***.***> wrote:
@iglesias <https://github.com/iglesias> the column explanations can be
found here: google/benchmark#397 (comment)
<google/benchmark#397 (comment)>
And yes, theres a fix needed in the bmark - I should be multiplying by
sizeof(float64_t) in the SetBytesProcessed call everywhere.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4437 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABGrdkfFt8dUOlvk4Jqut4cnWZaKbf1jks5vCbqzgaJpZM4ZU5h7>
.
|
The two time* results.
On Mon, 14 Jan 2019 at 13:21, Fernando J. Iglesias García <
fernando.iglesiasg@gmail.com> wrote:
… I didn’t see an explanation there about the two results. Maybe I missed
something.
Anyhow, it is not important.
On Sat, 12 Jan 2019 at 11:36, Saatvik Shah ***@***.***>
wrote:
> @iglesias <https://github.com/iglesias> the column explanations can be
> found here: google/benchmark#397 (comment)
> <google/benchmark#397 (comment)>
> And yes, theres a fix needed in the bmark - I should be multiplying by
> sizeof(float64_t) in the SetBytesProcessed call everywhere.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#4437 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/ABGrdkfFt8dUOlvk4Jqut4cnWZaKbf1jks5vCbqzgaJpZM4ZU5h7>
> .
>
|
@iglesias had linked to an answer here by the Google benchmark devs about the time columns. In summary:
Looking at (1) and (2) we can determine where we are IO bound and where CPU bound. And yes, what you've written about the last column is correct. |
Hi! |
About the seed, I think that was already solved in the latest commit in the PR: 6a97846. See also comments above from @saatvikshah1994 saying it was done. |
Ok let's merge this then :) I will wait one more day and then do it if nobody else complains @lisitsyn @vigsterkr I think cool next steps would be:
EDIT: actually reliased the randn is not yet included so I guess there will be little usage in loops as it is now ... We can do this once the normal random is in |
@karlnapf i wouldnt put more effort into this for the time being as we wanna drop some of the PRNG stuff as they are flawed by design atm. see |
I guess the effort is already done? So this is still better than what we have atm. We could stop pushing this forward then of course. But I think can still go in as an example of the overload dispatching? |
a) where is this code being used? meaning the SGVector/Matrix.random() |
Alright, agreed actually |
@saatvikshah1994 maybe have a look at the feature branch to get a feeling for what @vigsterkr is talking about |
Sure, I'll take a look |
@vigsterkr @karlnapf please guys have a look at the commit 6a97846. I think that one is addressing the seed concern. |
@vigsterkr @karlnapf The commit mentioned by @iglesias does seem to be correctly setting the seed. I was able to reproduce correct seed behavior by this code snippet: auto rng_check_float = [](auto elem) {
SGVector<decltype(elem)> sg_vec(50);
sg_rand->set_seed(200);
sg_vec.random();
for (int i = 0; i < sg_vec.size(); i++)
{
std::cout << sg_vec[i] << " ";
EXPECT_LE(0.0, sg_vec[i]);
EXPECT_GT(1.0, sg_vec[i]);
}
std::cout << std::endl;
};
rng_check_float(float32_t{0.0});
// Seed 0 always produces: 0.437919 0.339959 0.733059 0.113451 0.624103 0.895501 0.638084 0.84145 0.608526 0.546932 0.620833 0.611356 ...
// Seed 200 always produces: 0.917502 0.500624 0.971984 0.736531 0.613905 0.559259 0.781844 0.910769 0.432283 0.646343 0.689275 |
@saatvikshah1994 thanks for the ping. The point beside the sees is that the whole design of the random class in shogun is flawed. This is why Viktor mentioned to not put any more effort into it. That said, we could of course still use your updates until we do a change, but the second point is that the uniform random number generator is not really used within shogun. This doesn't mean the PR is not useful. Quite the opposite is true, it is a really nice demonstration on how to gain speedups with vectorization as well as using better design patterns, so it is very useful for future reference. As said, for now, check the feature branch on how we can improve the random number generation in shogun in general. We should get the same ideas that you developed you in this PR into the feature branch and then put the efforts into pushing this forward rather than tuning deprecated designs. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
@vigsterkr is this equivalent to what @theartful did? |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Closing due as it is outdated and also we had some discussions whether we want to use this old design in the past |
Addresses #4430
Only adds Uniform Random capability for now:
Edit: Will add Normal random in a different PR.