### Investigating relative humidity data for dry scattering coefficients in EBAS database

Retrival of dry scattering coefficients from EBAS data requires information about RH in EBAS data files. The first version of the EBAS level 3 product that we created (today: 5/12/2018) used only datapoints from files that include a data column ``relative_humidity`` and considered only those points where RH values were not NaN and below 50%. However, based on the resulting timeseries (in comparison with EBAS-RAW), it appears that too many datapoints are disregarded (e.g. Gosan station). Also, for the absorption coefficients [we found that it might be required to consider humidity information in the NASA Ames meta headers](http://localhost:8889/notebooks/issues/ebas_absc550dryaer_RHdata.ipynb). 

Here, we show that the latter may lead to insufficient results, using the example of the Gosan station.

In [1]:
import pyaerocom as pya

DATA_DIR = pya.const.EBASMC_DATA_DIR + '/'

Init data paths for lustre


0.011529207229614258 s


#### Get all files that contain scattering absorption data

In [2]:
req = pya.io.EbasSQLRequest(variables=['aerosol_light_scattering_coefficient'],
                            station_names='Gosan')
print(req)


Pyaerocom EbasSQLRequest
------------------------
variables: ['aerosol_light_scattering_coefficient']
start_date: None
stop_date: None
station_names: Gosan
matrices: None
altitude_range: None
lon_range: None
lat_range: None
instrument_types: None
statistics: None
datalevel: None
Filename request string:
select distinct filename from variable join station on station.station_code=variable.station_code where station_name in ('Gosan') and comp_name in ('aerosol_light_scattering_coefficient');


In [3]:
db = pya.io.EbasFileIndex()
files = db.get_file_names(req)
len(files)

10

In [4]:
import numpy as np
for i, file in enumerate(files):
    data = pya.io.EbasNasaAmesFile(DATA_DIR + file)
    rh_col = data.col_names.index('relative_humidity')
    rh = data.data[:, rh_col]
    nanvals = np.isnan(rh).sum()
    highrh = (rh > 40).sum()
    print('File {} ({} measurements):\t{}\tRH-column: NaNs: {}, RH>40: {}'.format(i, 
                                                 data.shape[0], 
                                                 data.meta['humidity/temperature_control'],
                                                 nanvals,highrh))
    

File 0 (8784 measurements):	Heating to 40% RH, limit 40 deg. C	RH-column: NaNs: 8784, RH>40: 0
File 1 (8760 measurements):	Heating to 40% RH, limit 40 deg. C	RH-column: NaNs: 8760, RH>40: 0
File 2 (8760 measurements):	Heating to 40% RH, limit 40 deg. C	RH-column: NaNs: 8760, RH>40: 0
File 3 (5688 measurements):	Heating to 40% RH, limit 40 deg. C	RH-column: NaNs: 5688, RH>40: 0
File 4 (9 measurements):	Heating to 40% RH, limit 40 deg. C	RH-column: NaNs: 0, RH>40: 0
File 5 (1 measurements):	Heating to 40% RH, limit 40 deg. C	RH-column: NaNs: 0, RH>40: 0
File 6 (8784 measurements):	Heating to 40% RH, limit 40 deg. C	RH-column: NaNs: 828, RH>40: 1466
File 7 (8760 measurements):	Heating to 40% RH, limit 40 deg. C	RH-column: NaNs: 886, RH>40: 1645
File 8 (8760 measurements):	Heating to 40% RH, limit 40 deg. C	RH-column: NaNs: 983, RH>40: 207
File 9 (8760 measurements):	Heating to 40% RH, limit 40 deg. C	RH-column: NaNs: 4514, RH>40: 25


### Summary and Discussion

We showed that the meta information in the files ``humidity/temperature_control`` is not reliable.