### Investigating relative humidity data for dry aerosol absorption coefficients in EBAS database

Retrival of dry absorption coefficients from EBAS data requires information about RH in EBAS data files. This notebook investigates, how many of all EBAS data files with aerosol absorption data contain information about the relative humidity during the measurement.

In [1]:
import pyaerocom as pya
import os

DATA_DIR = pya.const.EBASMC_DATA_DIR + '/'

Init data paths for lustre


0.026566028594970703 s


In [2]:
print('Last EBAS database update: {}'.format(open(DATA_DIR + '../Revision.txt').readline()))

Last EBAS database update: 20190115



#### Get all files that contain scattering absorption data

In [3]:
req = pya.io.EbasSQLRequest(variables=['aerosol_absorption_coefficient'])
print(req)


Pyaerocom EbasSQLRequest
------------------------
variables: ['aerosol_absorption_coefficient']
start_date: None
stop_date: None
station_names: None
matrices: None
altitude_range: None
lon_range: None
lat_range: None
instrument_types: None
statistics: None
datalevel: None
Filename request string:
select distinct filename from variable join station on station.station_code=variable.station_code where comp_name in ('aerosol_absorption_coefficient');


In [4]:
db = pya.io.EbasFileIndex()
files = db.get_file_names(request=req)
len(files)

868

#### Read all files and check if they contain relative humidity column

##### Defines search strings for RH column (*RH_NAME*) and metadata key with RH info (*META_NAME*)

In [5]:
RH_NAME = 'relative_humidity'
META_NAME = 'humidity/temperature_control'

##### Init empty objects that are filled below

In [6]:
contain_rh_data = [] # will be filled with all files that have relative_humidity column with actual data
no_rh_col = [] # will be filled with all files that have NO relative_humidity column 
contain_rh_col_allnans = [] # will be filled with all files that have relative_humidity column but all values are NaN

contain_rh_meta_info = {} # dictionary that contains all files that have META_NAME key in their metadata dictionary

##### Read all files and check RH availability

In [7]:
for i, file in enumerate(files):
    if i%100==0:
        print(i)
    first = pya.io.EbasNasaAmesFile(DATA_DIR + file)
    if RH_NAME in first.col_names:
        idx = first.col_names.index(RH_NAME)
        data = first.data[:, idx]
        if np.sum(np.isnan(data)) == len(data):
            contain_rh_col_allnans.append(file)
        else:
            contain_rh_data.append(file)
    else:
        no_rh_col.append(file)
    
    if META_NAME in first.meta.keys():
        contain_rh_meta_info[file] = first.meta[META_NAME]
        

0
100
200
300
400
500
600
700
800


#### Print search results

In [8]:
print('Total # of files: {}'.format(len(files)))
print('No RH column: {}'.format(len(no_rh_col)))
print('RH column but all NaN: {}'.format(len(contain_rh_col_allnans)))
print('RH column with data: {}'.format(len(contain_rh_data)))
print('Files with RH info in meta data: {}'.format(len(contain_rh_meta_info)))

Total # of files: 868
No RH column: 825
RH column but all NaN: 26
RH column with data: 17
Files with RH info in meta data: 629


#### Investigate the additional meta information

In [9]:
info = {}
for info_str in contain_rh_meta_info.values():
    if not info_str in info:
        info[info_str] = 1
    else: 
        info[info_str] += 1
        
for k, v in info.items():
    print('Info: {} ({} times)'.format(k, v))

Info: None (183 times)
Info: Heating to 40% RH, limit 40 deg. C (280 times)
Info: Other (see metadata) (51 times)
Info: Diffusion dryer (41 times)
Info: Temperature conditioning at 25 deg. C (25 times)
Info: Nafion dryer (32 times)
Info: Temperature conditioning at 30 deg. C (8 times)
Info: Dilution drying (2 times)
Info: Nafion dryer with temperature conditioning at 30 deg. C (1 times)
Info: Temperature controlled (6 times)


#### Print all stations where actual RH column data was found

In [10]:
for file in contain_rh_data:
    data = pya.io.EbasNasaAmesFile(DATA_DIR + file)
    print(data.station_name)

Mt Cimone
Montseny
Hyytiälä
DEM_Athens
DEM_Athens
DEM_Athens
Helmos Mountain
Preila
Preila
Preila
Preila
Preila
Preila
Preila
Preila
Preila
Preila


### Summary and Discussion

 - Currently, only 6 stations for absorption coefficient data contain RH columns with acutal values in them
 - 300, out of 888 files contain metadata that provides information about dry measurement conditions (global) where RH 40%
 
### Conclusion

Since absorption does not show strong dependency on RH, it can be considered of minor importance here to change anything.