Pynbody
is built on top of numpy
, which means that learning how to optimize numpy
array manipulations is the most important route to writing efficient code; see http://scipy-lectures.github.com/advanced/optimizing/index.html for an introduction.
However there are a few issues which are specific to pynbody
. First, a large number of pynbody
's most common operations are parallelized: make sure you have set up these routines to behave in a way that matches your needs by reading the page on threads
.
Other than that, there are some more subtle issues. The most important of these that we have come across is the overheads incurred by using SubSnap
s which are explained below.
To cut a long story short, if your routine does a lot of array access on an object which might be a SubSnap
of a certain flavour (explained further below), you will find that wrapping your code as follows speeds it up.
def my_expensive_operation(sim_or_subsim) :
with sim_or_subsim.immediate_mode :
mass_array = sim_or_subsim['mass']
other_array = sim_or_subsim['other']
# ... get other arrays required...
#
# perform multiple operations on arrays
#
# At end, copy back results if the arrays have
# changed
sim_or_subsim['other'] = other_array
The remainder of this document unpacks what this does and why it should be necessary.
When you construct a SubSnap
, the framework records which particles of the underlying SimSnap
are included and which are not. Thereafter, if you access an array from the SubSnap
, it is constructed in one of two ways.
- If the set of particles indexed by the
SubSnap
is expressable a slice, the arrays constructed are stillnumpy
arrays. This will be the case if you explicitly slice the simulation (e.g.f[2:100:3]
), or if you ask for a specific particle family (e.g.f.dm
).- If the set of particles is not expressable in this way, the arrays constructed are emulating
numpy
arrays and this can become expensive (see below). This will be the cae if you ask for a list of particles (e.g.f[[2,10,15,22]]
) use anumpy
-like indexing trick (e.g.f[f['x']>10]
orf[numpy.where(f['x']>10)]
) or use a :pyfilter <pynbody.filt>
.
In the first case, the optimization is unchanged from the raw SimSnap
case. In the second case, the situation is different. To understand how and why, we need to look at the difference between an indexed and a sliced numpy
array.
This section explains why the reason for the slightly awkward design of IndexedSubArray
. The pynbody
framework requires all sub-arrays to continue pointing to the original data but a simple experiment with numpy shows that it does not enable this behaviour in all cases that we want to cover.
Here's what happens when you use a slice of an existing numpy
array.
In [2]: import numpy as np
In [3]: a = np.zeros(10)
In [4]: b = a[3:5]
In [5]: b[1] = 100
In [6]: a Out[6]: array([ 0., 0., 0., 0., 100., 0., 0., 0., 0., 0.])
The a
array has been updated as required, because the b
and a
objects actually point back to the same part of the computer memory.
On the other hand, when you index a numpy
array, the behaviour is different.
In [7]: c = a[[4,5,6]]
In [8]: c[1] = 200
In [9]: a Out[9]: array([ 0., 0., 0., 0., 100., 0., 0., 0., 0., 0.])
Here changing c
has not updated a
. That's because the construction of c
actually copied the relevant data instead of just pointing back at it. This is necessitated by the underlying design of numpy
arrays requiring the data to lie in a predictable pattern in the memory.
The IndexedSubArray
class fixes this problem:
Note that a
has been updated correctly. This is achieved by the IndexedSimArray
emulating, rather than wrapping, a numpy
array; internally the syntax d[1]=200
is then translated into a[[4,5,6][1]]=200
.
The cost of this is that each time you call a function that requires a numpy
array as an input, the IndexedSimArray
has to generate a proxy for this purpose. This can become slow. Have a look at the following timings:
Adding to the subarray is slower than adding to the entire array! This is because of the overheads of continually constructing proxy numpy
arrays to pass to the __add__
method.
We should emphasize that the example above is quite contrived, since it forces re-construction of the numpy
proxy 10000 times. In user code, the number of numpy
proxies that have to be constructed will be vastly smaller, so the fractional overheads are normally quite small.
Nonetheless, it does sometimes become a problem for performance-critical code. For that reason, it's possible to avoid constructing IndexedSimArray
s altogether and force only numpy
arrays to be returned. This means you must take responsibility for understanding which operations copy, as opposed to referencing, data.
This is known as immediate mode
and is activated using python's with
mechanism. Let's create a test snapshot and a subview into that snapshot to try it out.
Under normal conditions, the type of arrays returned from sub_f
is IndexedSimArray
. Updating one of these arrays will transparently update the main snapshot.
Conversely, in immediate mode
, the type of arrays returned from sub_f
is SimArray
(so just a wrapper round a real numpy
array). But updating that returned numpy
array has no effect on the parent snapshot.
So it becomes your responsibility to copy the results back in this case, if required. A template for performance critical code which might be operating on a SubSnap
follows.
Note
with f_sub.immediate_mode
is equivalent to with f.immediate_mode
where f_sub
is any SubSnap
of f
.
So in summary, the template code at the start of this document advocates:
- storing a copy of the data for the subset of particles
- working on the copy
- (if necessary) updating the main snapshot data explicitly before returning
Note
This information is provided for interest. We have never come across a realistic use case where the following is necessary.
In pynbody
, arrays are implemented by the class ~pynbody.array.SimArray
. This is a wrapper around a numpy
array. There is a small extra cost associated with every operation to allow units to be matched and updated. For long arrays such as those found in typical simulations, this is usually a tiny fraction of the actual computation time. We have never found it to be a problem, but if you want to disable the unit tracking you can always do so using numpy
's view mechanism to get a raw numpy
array. Suppose you have a SimSnap
f
; then pos = f['pos'].view(numpy.ndarray)
(for example) will return the position array without any of the SimArray
wrapping. The new pos
variable can be manipulated without any unit handling code being called.