[MRG+2] switch to multinomial composition for mixture sampling #7702
What does this implement/fix? Explain your changes.
This changes the way mixture models construct the composition of new samples. Specifically, any subclass deriving from
Instead of rounding the composition vector from
In addition, this adds tests to ensure that
This may affect scaling for mixture models with a very large number of dimensions, since the multinational composition draw may be slow. But, this composition draw only occurs once during sampling.
Any other comments?
None. Thanks for the great package!
This may be a problem, not sure.
This reminds me the issue we had in StratifiedShuffleSplit #6472 where drawing samples for each group did not add up to the total number of samples. Although I didn't really understand the details I believe @amueller used some kind of approximation to avoid randomly sampling.
It's not a problem. The function
On my computer, for a weights of shape = (1000000, ) :
%timeit np.random.multinomial(100000000, weights).astype(int) 10 loops, best of 3: 135 ms per loop
Looks like you can only allow squash and merge if you want to:
The settings are available from: https://github.com/scikit-learn/scikit-learn/settings
Should we do that?