# Working with embedded quality control variables

This is an example of how to use existing or create new quality control varibles. All the tests are located in act/qc/qctests.py file but called under the qcfilter method.

In [1]:
import numpy as np

from act.io.armfiles import read_netcdf
from act.qc.qcfilter import parse_bit
from act.tests import EXAMPLE_IRT25m20s

Read a data file that does not have any embedded quality control variables. This data comes from the example dataset within ACT. Can also read data that has existing quality control variables and add, manipulate or use those variables the same.

In [2]:
var_name = 'inst_up_long_dome_resist'  # The name of the data variable we wish to work with
ds_object = read_netcdf(EXAMPLE_IRT25m20s, keep_variables=[var_name, 'lat', 'lon'])
ds_object

  ds_object = read_netcdf(EXAMPLE_IRT25m20s, keep_variables=[var_name, 'lat', 'lon'])


Unnamed: 0,Array,Chunk
Bytes,16.88 kiB,16.88 kiB
Shape,"(4320,)","(4320,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 16.88 kiB 16.88 kiB Shape (4320,) (4320,) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",4320  1,

Unnamed: 0,Array,Chunk
Bytes,16.88 kiB,16.88 kiB
Shape,"(4320,)","(4320,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


Since there is no embedded quality control varible one will be
created for us. We can start with adding where the data are set to missing value. First we will change the first value to NaN to simulate where a missing value exist in the data file.

In [3]:
data = ds_object[var_name].values
data[0] = np.nan
ds_object[var_name].values = data

Add a test for where the data are set to missing value. Since a quality control variable does not exist in the file one will be created as part of adding this test.

In [17]:
result = ds_object.qcfilter.add_missing_value_test(var_name)
ds_object

The returned value from the _add_missing_value_test()_ method is a dictionary that contains relevent quality control information, including the name of the corresponsing quality control variable.

In [19]:
qc_var_name = result['qc_variable_name']
result

{'test_number': 6,
 'test_meaning': 'Value is set to missing_value.',
 'test_assessment': 'Bad',
 'qc_variable_name': 'qc_inst_up_long_dome_resist',
 'variable_name': 'inst_up_long_dome_resist'}

We can add a second test where data is less than a specified value.

In [6]:
result = ds_object.qcfilter.add_less_test(var_name, 7.8)

In [7]:
ds_object[qc_var_name].attrs

{'long_name': 'Quality check results on field: Instantaneous Upwelling Pyrgeometer Dome Thermistor Resistance, Pyrgeometer',
 'units': '1',
 'flag_masks': [1, 2],
 'flag_meanings': ['Value is set to missing_value.',
  'Data value less than fail_min.'],
 'flag_assessments': ['Bad', 'Bad'],
 'standard_name': 'quality_flag',
 'fail_min': array(7.8, dtype=float32)}

Next we add a test to indicate where a value is greater than or equal to a specified number. We also set the assessement to a user defined word. The default assessment is "Bad".

In [8]:
result = ds_object.qcfilter.add_greater_equal_test(var_name, 12, test_assessment='Suspect')

We can now get the data as a numpy masked array with a mask set where the third test we added (greater than or equal to) using the result dictionary to get the test number created for us.

In [20]:
data = ds_object.qcfilter.get_masked_data(var_name, rm_tests=result['test_number'])
print('data:', data)
print('Data type =', type(data))
print('data.data:', data.data)
print('data.mask:', data.mask)

data: [-- 7.877699851989746 7.896500110626221 ... 7.670499801635742
 7.689199924468994 7.689199924468994]
Data type = <class 'numpy.ma.core.MaskedArray'>
data.data: [   nan 7.8777 7.8965 ... 7.6705 7.6892 7.6892]
data.mask: [ True False False ... False False False]


Or we can get the masked array for all tests that use the assessment set to "Bad".

In [21]:
data = ds_object.qcfilter.get_masked_data(var_name, rm_assessments=['Bad'])
data

masked_array(data=[--, 7.877699851989746, 7.896500110626221, ..., --, --,
                   --],
             mask=[ True, False, False, ...,  True,  True,  True],
       fill_value=1e+20,
            dtype=float32)

If we prefer to mask all data for both Bad or Suspect we can list as many assessments as needed.

In [22]:
data = ds_object.qcfilter.get_masked_data(var_name, rm_assessments=['Suspect', 'Bad'])
data

masked_array(data=[--, 7.877699851989746, 7.896500110626221, ..., --, --,
                   --],
             mask=[ True, False, False, ...,  True,  True,  True],
       fill_value=1e+20,
            dtype=float32)

We can request the data returned to be a normal Numpy array with NaN values used to fill in the bad values.

In [23]:
data = ds_object.qcfilter.get_masked_data(var_name, rm_assessments=['Suspect', 'Bad'], return_nan_array=True)
data

array([   nan, 7.8777, 7.8965, ...,    nan,    nan,    nan], dtype=float32)

We can create our own test by creating an array of indexes of where we want the test to be set and call the method to create our own test. We can allow the method to pick the test number (next available) or set the test number we wan to use. This example uses test number 5 to demonstrate how not all tests need to be used in order.

In [24]:
data = ds_object.qcfilter.get_masked_data(var_name)
diff = np.diff(data)
max_difference = 0.04
data = np.ma.masked_greater(diff, max_difference)
index = data.mask.nonzero()
result = ds_object.qcfilter.add_test(
    var_name,
    index=index,
    test_meaning=f'Difference is greater than {max_difference}',
    test_assessment='Suspect',
    test_number=5,
)

The test numbers are not the _flag_masks_ numbers. The flag masks numbers are bit-paked numbers used to store what bit is set. To see the test numbers we can unpack the bits.

In [14]:
print('\nmask : test')
print('-' * 11)
qc_varialbe = ds_object[qc_var_name]
for mask in qc_varialbe.attrs['flag_masks']:
    print(mask, ' : ', parse_bit(mask))


mask : test
-----------
1  :  [1]
2  :  [2]
4  :  [3]
16  :  [5]


We can also just use the _get_masked_data()_ method to get data the same as using ".values" method on the xarray dataset. If we don't request any tests or assessments to mask the returned masked array will not have any mask set. The returned value is a numpy masked array where the raw numpy array is accessable with .data property.

In [15]:
data = ds_object.qcfilter.get_masked_data(var_name)
print('Normal numpy array data values:', data.data)
print('Mask associated with values:', data.mask)

Normal numpy array data values: [   nan 7.8777 7.8965 ... 7.6705 7.6892 7.6892]
Mask associated with values: [False False False ... False False False]


We can use the _get_masked_data()_ method to return a masked array where the test is set in the quality control varialbe, and use the masked array method to see if any of the values have the test set.

In [16]:
data = ds_object.qcfilter.get_masked_data(var_name, rm_tests=3)
print('At least one less than test set =', data.mask.any())
data = ds_object.qcfilter.get_masked_data(var_name, rm_tests=4)
print('At least one difference test set =', data.mask.any())

At least one less than test set = True
At least one difference test set = False
