Known Python Issues

megies edited this page Oct 13, 2012 · 5 revisions

Performance Issues

np.arange numerically unstable

For floating point numbers and given increment numpy's np.arange is dangerous / numerical unstable (further reading, or #395). Use np.linspace instead.

Allocating numpy arrays

Allocation of numpy arrays is much faster using the numpy.empty instead of the numpy.zeros function. The by far worst method is to use the numpy.array function on an list object. For lists the data type must be stored for each element, in contrast to numpy arrays where the data type is stored only once for the whole array. The following ipython lines show some performance tests. The differences in speed are in the order of 100.

In [1]: import numpy as np
In [2]: N = int(1e7)
In [3]: %timeit np.empty(N)
100000 loops, best of 3: 12.2 µs per loop
In [4]: %timeit np.zeros(N)
10 loops, best of 3: 70.4 ms per loop

In [5]: %timeit x = np.empty(N); x[:] = np.NaN
10 loops, best of 3: 86.1 ms per loop
In [6]: %timeit x = np.array([np.NaN]*N)
10 loops, best of 3: 1.57 s per loop

NumPy vs. math

numpy is designed for vector operations. Thus for each expression it needs to check whether the argument is a vector or not. This consumes time and is one of the reasons that the library math is faster for non-vector arguments.

In [1]: import math as M
In [2]: import numpy as np
In [3]: %timeit M.floor(3.5)
1000000 loops, best of 3: 396 ns per loop
In [4]: %timeit np.floor(3.5)
100000 loops, best of 3: 3.72 us per loop

Note: math cannot handle vectors

CTypes, Tips and Tricks

  • NULL pointers in ctypes are generally of None type. However sometimes the C functions need directly given addresses. In such cases a NULL pointer to an e.g. integer must be passed with the construct ctypes.POINTER(ctypes.c_int)()
  • For cross platform compilation of C code the #ifdef statement can be really useful. To see all available variables for #ifdef type in a shell
echo "" | gcc -E -dM -c -

64 Bit Platforms

In order to compile the C extensions on 64bit platforms add the compiler option -m64 -fPIC. The data types on 64bit platforms have a different type. Avoid using the type long in Python, better use int32, int64 or for variable size int. long changes a lot between 32bit and 64bit platforms. [http://www.unix.org/whitepapers/64bit.html Read more ...]

Formats floats in a fixed exponential format

Different operation systems are delivering different output for the exponential format of floats. Here we ensure to deliver in a for SEED valid format independent of the OS. For speed issues we simple cut any number ending with E+0XX or E-0XX down to E+XX or E-XX. This fails for numbers XX>99, but should not occur, because the SEED standard does not allow this values either.

Python 2.5.2 (r252:60911, Feb 21 2008, 13:11:45) 
[MSC v.1310 32 bit (Intel)] on win32
>>> '%E' % 2.5
'2.500000E+000'
Python 2.5.2 (r252:60911, Apr  2 2008, 18:38:52)
[GCC 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)] on linux2
>>> '%E' % 2.5
'2.500000E+00'

Default Parameter Values in Python

Python’s handling of default parameter values is one of a few things that tends to trip up most new Python programmers (but usually only once).

What causes the confusion is the behaviour you get when you use a “mutable” object as a default value; that is, a value that can be modified in place, like a list or a dictionary.

An example:

>>> def function(data=[]):
...     data.append(1)
...     return data
...
>>> function()
[1]
>>> function()
[1, 1]
>>> function()
[1, 1, 1]

Read more ...

Calculating Micro Seconds

Micro seconds cannot be calculated by the modulo operator, as the modulo operator returns only positive results.

An example:

>>> -0.5 % 1
0.5
>>> -0.2 % 1
0.80000000000000004

A preferred way is to use the modf function:

>>> from math import modf
>>> sec = 1.5
>>> msec, dsec = modf(sec)
>>> msec *= 1000
>>> print dsec, msec
1 500.0

Writing and Reading Sampling Intervals

The following example shows that the effect of casting (of the sampling interval 0.01) can be circumvented by dividing with a number which got the same casting effect.

>>> import numpy as np
>>> np.float32(0.01)
0.0099999998
>>> 1.0 / np.float32(0.01)
100.00000223517424
>>> np.float32(1.0) / np.float32(0.01)
100.0

floor vs int

Just to avoid problems with negative numbers:

>>> import math as M
>>> M.floor(3.5)
3.0
>>> int(3.5)
3
>>> M.floor(-3.5)
-4.0
>>> int(-3.5)
-3

Comma & Dot; Locale Settings

A known problem are locale settings so that the Python shell uses comma instead of dot as decimal separator. In this case the ctypes library could cause problems (Read more ...). As soon as this problem occurs with !ObsPy please let us know.

Writing Data from Numpy Arrays after Indexing Operations (#192, #193)

Using convenient indexing operations on Numpy ndarrays can lead to problems when writing data via external functions with ctypes (C/Fortran). Consider the following example...

import numpy as np
from obspy.core import read, Trace

x = np.arange(10)
y = x[:5]
z = x[::2]
tr1 = Trace(data=y)
tr2 = Trace(data=z)
print tr1.data
print tr2.data

...which shows that in Python the data of Traces 1 and 2 is:

[0 1 2 3 4]
[0 2 4 6 8]

But after writing and reading the data again...

tr1.write("/tmp/tr1.tmp", "MSEED")
tr2.write("/tmp/tr2.tmp", "MSEED")
tr1 = read("/tmp/tr1.tmp")[0]
tr2 = read("/tmp/tr2.tmp")[0]
print tr1.data
print tr2.data

...it is obvious that there was a problem with using the reference to the original data array. During the write operation not the correct data got written to file:

[0 1 2 3 4]
[0 1 2 3 4]

This can be avoided by ensuring that the data is C-contiguous. Use either np.require(data, dtype=data.dtype, requirements='C_CONTIGUOUS') or np.ascontiguousarray(data) for this purpose:

z_safe = np.ascontiguousarray(z)
# or: z_safe = np.require(z, dtype=z.dtype, requirements='C_CONTIGUOUS')
tr1 = Trace(data=y)
tr2 = Trace(data=z_safe)
tr1.write("/tmp/tr1.tmp", "MSEED")
tr2.write("/tmp/tr2.tmp", "MSEED")
tr1 = read("/tmp/tr1.tmp")[0]
tr2 = read("/tmp/tr2.tmp")[0]
print tr1.data
print tr2.data

which gives the expected result:

[0 1 2 3 4]
[0 2 4 6 8]

Whether the data is safe for operations with ctypes or not can be checked looking at the ndarray flags:

print y.flags
print z.flags
print z_safe.flags
True
False
True

To summarize: When data arrays are created via e.g. indexing operations on other arrays it should be checked if the correct data get passed on during ctypes calls. Also refer to bugs #192 and #193.

Floating Point Arithmetic Issues