Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
Known Python Issues
np.arange numerically unstable
For floating point numbers and given increment numpy's
np.arange is dangerous / numerical unstable (further reading, or #395). Use
Allocating numpy arrays
Allocation of numpy arrays is much faster using the
numpy.empty instead of the
numpy.zeros function. The by far worst method is to use the
numpy.array function on an list object. For lists the data type must be stored for each element, in contrast to numpy arrays where the data type is stored only once for the whole array. The following
ipython lines show some performance tests. The differences in speed are in the order of 100.
In : import numpy as np In : N = int(1e7) In : %timeit np.empty(N) 100000 loops, best of 3: 12.2 µs per loop In : %timeit np.zeros(N) 10 loops, best of 3: 70.4 ms per loop In : %timeit x = np.empty(N); x[:] = np.NaN 10 loops, best of 3: 86.1 ms per loop In : %timeit x = np.array([np.NaN]*N) 10 loops, best of 3: 1.57 s per loop
NumPy vs. math
numpy is designed for vector operations. Thus for each expression it needs to check whether the argument is a vector or not. This consumes time and is one of the reasons that the library math is faster for non-vector arguments.
In : import math as M In : import numpy as np In : %timeit M.floor(3.5) 1000000 loops, best of 3: 396 ns per loop In : %timeit np.floor(3.5) 100000 loops, best of 3: 3.72 us per loop
Note: math cannot handle vectors
CTypes, Tips and Tricks
- NULL pointers in ctypes are generally of
Nonetype. However sometimes the C functions need directly given addresses. In such cases a NULL pointer to an e.g. integer must be passed with the construct
- For cross platform compilation of C code the
#ifdefstatement can be really useful. To see all available variables for
#ifdeftype in a shell
echo "" | gcc -E -dM -c -
64 Bit Platforms
In order to compile the C extensions on 64bit platforms add the compiler option
-m64 -fPIC. The data types on 64bit platforms have a different type. Avoid using the type
long in Python, better use
int64 or for variable size
long changes a lot between 32bit and 64bit platforms.
[http://www.unix.org/whitepapers/64bit.html Read more ...]
Formats floats in a fixed exponential format
Different operation systems are delivering different output for the exponential format of floats. Here we ensure to deliver in a for SEED valid format independent of the OS. For speed issues we simple cut any number ending with E+0XX or E-0XX down to E+XX or E-XX. This fails for numbers XX>99, but should not occur, because the SEED standard does not allow this values either.
Python 2.5.2 (r252:60911, Feb 21 2008, 13:11:45) [MSC v.1310 32 bit (Intel)] on win32 >>> '%E' % 2.5 '2.500000E+000'
Python 2.5.2 (r252:60911, Apr 2 2008, 18:38:52) [GCC 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)] on linux2 >>> '%E' % 2.5 '2.500000E+00'
Default Parameter Values in Python
Python’s handling of default parameter values is one of a few things that tends to trip up most new Python programmers (but usually only once).
What causes the confusion is the behaviour you get when you use a “mutable” object as a default value; that is, a value that can be modified in place, like a list or a dictionary.
>>> def function(data=): ... data.append(1) ... return data ... >>> function()  >>> function() [1, 1] >>> function() [1, 1, 1]
Calculating Micro Seconds
Micro seconds cannot be calculated by the modulo operator, as the modulo operator returns only positive results.
>>> -0.5 % 1 0.5 >>> -0.2 % 1 0.80000000000000004
A preferred way is to use the modf function:
>>> from math import modf >>> sec = 1.5 >>> msec, dsec = modf(sec) >>> msec *= 1000 >>> print dsec, msec 1 500.0
Writing and Reading Sampling Intervals
The following example shows that the effect of casting (of the sampling interval 0.01) can be circumvented by dividing with a number which got the same casting effect.
>>> import numpy as np >>> np.float32(0.01) 0.0099999998 >>> 1.0 / np.float32(0.01) 100.00000223517424 >>> np.float32(1.0) / np.float32(0.01) 100.0
floor vs int
Just to avoid problems with negative numbers:
>>> import math as M >>> M.floor(3.5) 3.0 >>> int(3.5) 3 >>> M.floor(-3.5) -4.0 >>> int(-3.5) -3
Comma & Dot; Locale Settings
A known problem are locale settings so that the Python shell uses comma instead of dot as decimal separator. In this case the ctypes library could cause problems (Read more ...). As soon as this problem occurs with !ObsPy please let us know.
Writing Data from Numpy Arrays after Indexing Operations (#192, #193)
Using convenient indexing operations on Numpy
ndarrays can lead to problems when writing data via external functions with ctypes (C/Fortran). Consider the following example...
import numpy as np from obspy.core import read, Trace x = np.arange(10) y = x[:5] z = x[::2] tr1 = Trace(data=y) tr2 = Trace(data=z) print tr1.data print tr2.data
...which shows that in Python the data of Traces 1 and 2 is:
[0 1 2 3 4] [0 2 4 6 8]
But after writing and reading the data again...
tr1.write("/tmp/tr1.tmp", "MSEED") tr2.write("/tmp/tr2.tmp", "MSEED") tr1 = read("/tmp/tr1.tmp") tr2 = read("/tmp/tr2.tmp") print tr1.data print tr2.data
...it is obvious that there was a problem with using the reference to the original data array. During the write operation not the correct data got written to file:
[0 1 2 3 4] [0 1 2 3 4]
This can be avoided by ensuring that the data is C-contiguous. Use either
np.require(data, dtype=data.dtype, requirements='C_CONTIGUOUS') or
np.ascontiguousarray(data) for this purpose:
z_safe = np.ascontiguousarray(z) # or: z_safe = np.require(z, dtype=z.dtype, requirements='C_CONTIGUOUS') tr1 = Trace(data=y) tr2 = Trace(data=z_safe) tr1.write("/tmp/tr1.tmp", "MSEED") tr2.write("/tmp/tr2.tmp", "MSEED") tr1 = read("/tmp/tr1.tmp") tr2 = read("/tmp/tr2.tmp") print tr1.data print tr2.data
which gives the expected result:
[0 1 2 3 4] [0 2 4 6 8]
Whether the data is safe for operations with ctypes or not can be checked looking at the ndarray flags:
print y.flags print z.flags print z_safe.flags
True False True
To summarize: When data arrays are created via e.g. indexing operations on other arrays it should be checked if the correct data get passed on during
ctypes calls. Also refer to bugs #192 and #193.
Floating Point Arithmetic Issues
- Python documentation page on floating point arithmetic
- Numpy information on machine limits for floating point types: