Skip to content
This repository has been archived by the owner on Dec 22, 2018. It is now read-only.

stat/distmv: nil checks for sigma race against setSigma in Normal and Student's T #163

Closed
btracey opened this issue Feb 22, 2017 · 9 comments

Comments

@btracey
Copy link
Member

btracey commented Feb 22, 2017

We protect setSigma with a once so it only gets set once, but this still races against checks that it is non-nil, for example in normal.ConditionNormalSingle

@kortschak
Copy link
Member

We don't say that it is safe for concurrent use do we? Since we don't, it's actually probably overkill to use the sync.Once; we could just rewrite setSigma as

func (n *Normal) setSigma() {
	if n.sigma != nil {
		return
	}
	n.sigma = mat64.NewSymDense(n.Dim(), nil)
	n.sigma.FromCholesky(&n.chol)
}

and

func (s *StudentsT) setSigma() {
	if s.sigma != nil {
		return
	}
	s.sigma = mat64.NewSymDense(s.Dim(), nil)
	s.sigma.FromCholesky(&s.chol)
}

@btracey
Copy link
Member Author

btracey commented Feb 22, 2017

We don't say that, but it should all be save for concurrent usage. One of Go's strengths is concurrency, and we should work with it as much as possible. I came across this bug while trying to use these methods concurrently, for instance.

@kortschak
Copy link
Member

The general practice is to not make thing safe for concurrent use unless specifically designed for that. Here if a Normal or StudentsT is intended to be used concurrently the client can either wrap it in a struct with a mutex, or pass it through a channel.

@btracey
Copy link
Member Author

btracey commented Feb 22, 2017 via email

@kortschak
Copy link
Member

The standard practice in std is that things are not safe for concurrent use unless marked as such. The rationale for this is that mostly things are either not used concurrently, or are used concurrently under a system of user control of synchronisation. Adding synchronisation to types in the first case would mean that you may end up with unnecessary synchronisation and in the second case doubled layers of synchronisation, both resulting in an unnecessary performance hit. In the standard library types that are very likely to be used concurrently, or are difficult to protect in a useful way have sychronisation built in.

@kortschak
Copy link
Member

Random number generators, computing probabilities, these are things that one might want to do many of using the same probability distribution, and so they are rightly thread safe.

This is not true. The rand.Func functions are thread safe, but that is done by wrapping the global rand.Rand variable in a locked wrapper, var globalRand = New(&lockedSource{src: NewSource(1).(Source64)}). This is done for the very reasons I describe above; the rand.Rand types are not protected since that would in most cases cause an unnecessary performance hit, hence the existence of rand.lockedSource.

@btracey
Copy link
Member Author

btracey commented Feb 22, 2017

I have never run into an issue with a race condition from calling a standard library function concurrent when I thought there shouldn't be any concurrency issues.

The rand.Rand types aren't concurrent, but if you know how random number generators work, you know that they have to have some state that gets modified when you call one of the methods. If the user supplies their own random number generator to Normal, then of course Rand is not going to be thread safe unless the random number generator is also thread safe.

The performance hit of not making Marginal thread safe is likely much higher than making it thread safe. Rather than generating one *Normal, I now would have to copy the mean and cholesky decomposition many times. That's much higher than checking a read lock that can only be locked once.

Right now, in all of Gonum, it's easy to predict when code is thread safe, and when it is not. This is a great property, and not one we should break.

@btracey
Copy link
Member Author

btracey commented Feb 22, 2017

I also opened golang/go#19245 which would be an efficient fix to this issue.

@kortschak
Copy link
Member

kortschak commented Feb 22, 2017

The performance hit of not making Marginal thread safe is likely much higher than making it thread safe. Rather than generating one *Normal, I now would have to copy the mean and cholesky decomposition many times. That's much higher than checking a read lock that can only be locked once.

Not really,

type lockedNormal struct {
    mu sync.Mutex
    n  *distmv.Normal
}

// relevant methods for client, e.g....
func (n *lockedNormal) Rand(x []float64) []float64 {
    n.mu.Lock()
    defer n.mu.Unlock()
    return n.n.Rand(x)
}

The setup of the Normal needs only be done once and does not need protection because of that.

Right now, in all of Gonum, it's easy to predict when code is thread safe, and when it is not. This is a great property, and not one we should break.

I think this is not a safe assumption to make. There should be no need to predict, the documentation should say when things are thread safe.

Having said all this, it is clear that in Normal and StudentsT there should be an accessor for the sigma field that is used instead of direct access. With that the exact semantics of synchronisation (of whatever type) of that field are easily reasoned about.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants