# Table of Contents
 <p><div class="lev1"><a href="#08-Invalid-Values-and-Masked-Arrays"><span class="toc-item-num">1&nbsp;&nbsp;</span>08 Invalid Values and Masked Arrays</a></div><div class="lev2"><a href="#Inf-and-NaN"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Inf and NaN</a></div><div class="lev4"><a href="#Testing-for-NaNs"><span class="toc-item-num">1.1.0.1&nbsp;&nbsp;</span>Testing for NaNs</a></div><div class="lev2"><a href="#Exercise"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Exercise</a></div><div class="lev1"><a href="#Working-with-invalid-data:-masked-arrays"><span class="toc-item-num">2&nbsp;&nbsp;</span>Working with invalid data: masked arrays</a></div><div class="lev3"><a href="#Creating-MaskedArrays"><span class="toc-item-num">2.0.1&nbsp;&nbsp;</span>Creating MaskedArrays</a></div><div class="lev3"><a href="#masked-arrays-can-be-masked-further"><span class="toc-item-num">2.0.2&nbsp;&nbsp;</span>masked arrays can be masked further</a></div><div class="lev3"><a href="#masked-arrays-support-(?)-ndarray-ufuncs"><span class="toc-item-num">2.0.3&nbsp;&nbsp;</span>masked arrays support (?) ndarray ufuncs</a></div><div class="lev3"><a href="#Indexing-and-Slicing"><span class="toc-item-num">2.0.4&nbsp;&nbsp;</span>Indexing and Slicing</a></div><div class="lev3"><a href="#Refill-Masked-Arrays"><span class="toc-item-num">2.0.5&nbsp;&nbsp;</span>Refill Masked Arrays</a></div><div class="lev3"><a href="#Accessing-Data-and-Mask"><span class="toc-item-num">2.0.6&nbsp;&nbsp;</span>Accessing Data and Mask</a></div><div class="lev4"><a href="#Accessing-valid-data"><span class="toc-item-num">2.0.6.1&nbsp;&nbsp;</span>Accessing valid data</a></div><div class="lev3"><a href="#Hard-and-Soft-Masks"><span class="toc-item-num">2.0.7&nbsp;&nbsp;</span>Hard and Soft Masks</a></div><div class="lev1"><a href="#Structured-data-types"><span class="toc-item-num">3&nbsp;&nbsp;</span>Structured data types</a></div><div class="lev1"><a href="#Exercises"><span class="toc-item-num">4&nbsp;&nbsp;</span>Exercises</a></div>

# 08 Invalid Values and Masked Arrays

In [5]:
import numpy as np

## Inf and NaN

In [None]:
a = np.arange(3)
a

In [None]:
b = 1./a
b

**inf**inity

In [None]:
np.inf == b[0]

In [None]:
type(np.inf)

In [None]:
np.inf * 1.0

In [None]:
np.inf - 1000000000

In [None]:
1./np.inf

In [None]:
b.max()

In [None]:
b.min()

In [None]:
b.mean()

In [None]:
- np.inf

-------------------------------------------------------

In [None]:
a = np.arange(-1, 3, 0.001)
b = np.sqrt(a)
b

nan := Not a Number (NaN), undefined value

In [None]:
b * 100  # NaN can never become anyhing other

#### Testing for NaNs

In [None]:
np.isnan(b)  # there are also isinf(), isneginf(), isposinf(), and more


## Exercise
<div class="alert alert-success">
<li>Test `isinf` and `isfinite` </li>
</div>

# Working with invalid data: masked arrays

If there are invalid values, they can be masked => masked arrays

In [None]:
import numpy.ma as ma  # ma = masked array

In [None]:
a = np.arange(5.)
a[3] = np.nan
a

In [None]:
a.mean()

In [None]:
b = ma.masked_array(a, mask=[0, 0, 0, 1, 0])
b

In [None]:
b.mean()

`masked_array` is a subclass of *ndarray*, so it inherits all methods.

### Creating MaskedArrays

In [None]:
a = ma.masked_array(np.arange(5))  # no mask
a

In [None]:
a = ma.masked_array(np.arange(5), mask=[0, 0, 0, 1, 0])  # created with a mask
a

In [None]:
a = np.sin(np.arange(0, 100, 10))
a

In [None]:
b = ma.masked_greater(a, 0.5)
b

In [None]:
a = np.arange(5.)
a[3] = np.nan
a

In [None]:
b = ma.masked_invalid(a)
b

### masked arrays can be masked further

In [None]:
c = ma.masked_equal(b, 1.0)
c

Single values can be masked as well

In [None]:
c[4] = ma.masked
c

analog to *masked_greater(a, v)*:

 * *masked_less(a, v)*
 * *masked_less_equal(a, v)*
 * *masked_equal(a, v)*
 * …
 * *masked_where(condition, a)*
 * *masked_inside(a, v1, v2)*
 * …

### masked arrays support (?) ndarray ufuncs

In [None]:
b

In [None]:
b.max()

In [None]:
b / b.mean()

### Indexing and Slicing

In [None]:
b

In [None]:
b[0]

In [None]:
b[3]

In [None]:
b[3] is ma.masked  # Testing for masked works as testing for np.nan

### Refill Masked Arrays

In [None]:
b

In [None]:
c = b.filled()
c

In [None]:
c = b.filled(b.mean())
c


In [None]:
b.fill_value = 3
b  # fill value is 3.0 now

In [None]:
b

### Accessing Data and Mask

In [None]:
c = b.data
c

In [None]:
type(c)

In [None]:
b.mask  # True is masked

#### Accessing valid data

In [None]:
b[~b.mask]  # carefull, this is again a masked array

In [None]:
b[~b.mask].data

In [None]:
b[0] = ma.masked

In [None]:
b

In [None]:
ma.masked_equal(a, 2.0)

In [None]:
b

### Hard and Soft Masks

In [None]:
b

In [None]:
b[0] = 3.0  # demask b[0]
b

if a mask is *hard*, it cannot be demasked

In [None]:
b = ma.masked_invalid(a)
b

In [None]:
b.hardmask

In [None]:
b.harden_mask()
b.hardmask

In [None]:
b[3] = 3.0  # Careful, no exception!
b

In [None]:
b.data

In [1]:
b.soften_mask()
b.hardmask

NameError: name 'b' is not defined

# Structured data types
* arrays with different data types
* strings have fixed lengths

In [17]:
samples = np.zeros((6,), dtype=[('sensor_code', 'S4'), ('position', float), ('value', float)])
samples


array([(b'',  0.,  0.), (b'',  0.,  0.), (b'',  0.,  0.), (b'',  0.,  0.),
       (b'',  0.,  0.), (b'',  0.,  0.)], 
      dtype=[('sensor_code', 'S4'), ('position', '<f8'), ('value', '<f8')])

In [19]:
samples.ndim

1

In [20]:
samples.shape

(6,)

In [21]:
samples.dtype.names

('sensor_code', 'position', 'value')

In [22]:
samples[:] = [('ALFA',   1, 0.37), ('BETA', 1, 0.11), ('TAU', 1,   0.13),
              ('ALFA', 1.5, 0.37), ('ALFA', 3, 0.11), ('TAU', 1.2, 0.13)]
samples

array([(b'ALFA',  1. ,  0.37), (b'BETA',  1. ,  0.11),
       (b'TAU',  1. ,  0.13), (b'ALFA',  1.5,  0.37),
       (b'ALFA',  3. ,  0.11), (b'TAU',  1.2,  0.13)], 
      dtype=[('sensor_code', 'S4'), ('position', '<f8'), ('value', '<f8')])


# Exercises
<div class="alert alert-success">
<li>load *files/invalid_array.npy*, mask all NaN values and calculate the mean</li>
<li>additionally mask all values `< - 50` and re-calculate the mean</li>
<li>fill all masked values the array's mean</li>
<li>how sensible is it to calculate the mean only along one axis?</li>
</div>

In [None]:
data = np.load('files/invalid_array.npy')
data = ma.masked_invalid(data)
data.mean()

In [None]:
data = ma.masked_less(data, -50)
data.mean()

In [None]:
mask = data.mask
data.data[mask] = data.mean()

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline
plt.imshow(data)