Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EBAS data reading: concoc reading problems #592

Closed
jgriesfeller opened this issue Mar 4, 2022 · 3 comments
Closed

EBAS data reading: concoc reading problems #592

jgriesfeller opened this issue Mar 4, 2022 · 3 comments
Assignees

Comments

@jgriesfeller
Copy link
Member

Birkenes 2 data for year 2009 is wrong (far too low):
Screenshot from 2022-03-04 11-57-11
Analysis shows that the current SQL queries need to be extended:

(base) jang@ppi-clogin-a1:.../EBASMultiColumn/data$ pwd
/lustre/storeA/project/aerocom/aerocom1/AEROCOM_OBSDATA/EBASMultiColumn/data
(base) jang@ppi-clogin-a1:.../EBASMultiColumn/data$ sqlite3 ebas_file_index.sqlite3
SQLite version 3.36.0 2021-06-18 18:36:39
Enter ".help" for usage hints.
sqlite> select distinct filename from variable where station_code = 'NO0001R' and comp_name in ('organic_carbon') and matrix in ('pm25') and last_start >= '2009-01-01' and first_end < '2010-01-01' and statistics = 'arithmetic mean';
NO0001R.20090107070000.20171010195500.low_vol_sampler.organic_carbon..52w.1w.NO01L_kfg_no01.NO01L_Thermo_Optical-Sunset_Lab.lev2.nas
NO0001R.20090107070000.20171010195500.low_vol_sampler..pm25.52w.1w.NO01L_kfg_no01.NO01L_Thermo_Optical-Sunset_Lab.lev2.nas
sqlite> 

Right now the data is taken from the first file which does not include the needed arithmetic mean. Because of

IGNORE_COLS_CONTAIN = ["fraction", "artifact"]

all fields containing the string 'fraction' are ignored leaving only the following data field:

(base) jang@ppi-clogin-a1:.../data/data$ cat /lustre/storeA/project/aerocom/aerocom1/AEROCOM_OBSDATA/EBASMultiColumn/HTAP_OBS_1980-now_Rev20220101_NasaAmesMulticolumn/data/NO0001R.20090107070000.20171010195500.low_vol_sampler.organic_carbon..52w.1w.NO01L_kfg_no01.NO01L_Thermo_Optical-Sunset_Lab.lev2.nas | grep pm25 | grep -iv fraction | grep organic_carbon
organic_carbon, ug C/m3, Statistics=uncertainty, Matrix=pm25
numflag organic_carbon, no unit, Statistics=uncertainty, Matrix=pm25

Because it's the only data field, it's assumed to be correct.

to be continued...

@jgriesfeller jgriesfeller self-assigned this Mar 4, 2022
@jgriesfeller
Copy link
Member Author

jgriesfeller commented Mar 4, 2022

The solution is to extend the SQL statement:

sqlite> select distinct filename from variable join station on station.station_code=variable.station_code where station.station_code = 'NO0001R' and comp_name in ('organic_carbon') and matrix in ('pm25') and last_start >= '2009-01-01' and first_end < '2010-01-01' and statistics = 'arithmetic mean' and  not exists (select * from characteristic where var_id=variable.var_id and ct_type='Fraction');
NO0001R.20090107070000.20171010195500.low_vol_sampler..pm25.52w.1w.NO01L_kfg_no01.NO01L_Thermo_Optical-Sunset_Lab.lev2.nas

The extension and not exists (select * from characteristic where var_id=variable.var_id and ct_type='Fraction') excludes files that do not provide exactly what we need (usually the arithmetic mean)

Unfortunately queries that are actually used are broader than in this example, so I need to dig where to use the extension of the SQL command pyaerocom code.

@jgriesfeller
Copy link
Member Author

The smoking gun for the extension might be in the last line of this:

if self.datalevel is not None:
req += " and " if add_cond else " where "
req += f"datalevel={self.datalevel}"
add_cond += 1
return req + ";"

@jgriesfeller
Copy link
Member Author

corrected via #595

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant