### Tutorial showing how to read EBAS NASA Ames files

**Note**: this notebook is currently under development

Please see [here](https://ebas-submit.nilu.no/Submit-Data/Getting-started) for information related to the EBAS NASA Ames file format.

**Further links**:
 - [Pyaerocom website](http://aerocom.met.no/pyaerocom/)
 - [Pyaerocom installation instructions](http://aerocom.met.no/pyaerocom/readme.html#installation)
 - [Getting started](http://aerocom.met.no/pyaerocom/notebooks.html#getting-started)

In [1]:
import os 
from pyaerocom.io import EbasNasaAmesFile

ebasdir = "/lustre/storeA/project/aerocom/aerocom1/AEROCOM_OBSDATA/EBASMultiColumn/data/data/"
filename = "DE0043G.20080101000000.20160708144500.nephelometer..aerosol.1y.1h.DE09L_tsi_neph_3563.DE09L_nephelometer.lev2.nas"

mc = EbasNasaAmesFile(file=ebasdir+filename,
                      only_head=False,          #set True if you only want to import header
                      replace_invalid_nan=True, #replace invalid values with NaNs
                      convert_timestamps=True,  #compute datetime64 timestamps from numerical values
                      decode_flags=True,        #decode all flags (e.g. 0.111222333 -> 111 222 333)
                      verbose=False)
print(mc)

Pyaerocom EbasNasaAmesFile
--------------------------
num_head_lines: 60
num_head_fmt: 1001
data_originator: Flentje, Harald
sponsor_organisation: DE09L, Deutscher Wetterdienst, DWD, Met. Obs., Hohenspeissenberg, , 82283, Hohenspeissenberg, Germany
submitter: Flentje, Harald
project_association: EUSAAR GAW-WDCA
vol_num: 1
vol_totnum: 1
ref_date: 2008 01 01 2016 07 08
revision_date: nan
freq: 0.041667
descr_time_unit: days from file reference point
num_cols_dependent: 11
mul_factors: [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
vals_invalid: [999.999999, 999.999, 999.999, 999.999, 9999.999, 9999.999, 9999.999, 9999.0, 999.9, 9999.9, 9.999999999]
descr_first_col: end_time of measurement, days from the file reference point

Column variable definitions
---------------------------
name: starttime
unit: days
is_var: False
is_flag: False
flag_id: 

name: endtime
unit: days
is_var: False
is_flag: False
flag_id: 

name: aerosol_light_backscattering_coefficient
unit: 1/Mm
is_var: True

#### Column information

In [2]:
mc.print_col_info()

Column 0
name: starttime
unit: days
is_var: False
is_flag: False
flag_id: 

Column 1
name: endtime
unit: days
is_var: False
is_flag: False
flag_id: 

Column 2
name: aerosol_light_backscattering_coefficient
unit: 1/Mm
is_var: True
is_flag: False
flag_id: numflag
wavelength: 450.0 nm

Column 3
name: aerosol_light_backscattering_coefficient
unit: 1/Mm
is_var: True
is_flag: False
flag_id: numflag
wavelength: 550.0 nm

Column 4
name: aerosol_light_backscattering_coefficient
unit: 1/Mm
is_var: True
is_flag: False
flag_id: numflag
wavelength: 700.0 nm

Column 5
name: aerosol_light_scattering_coefficient
unit: 1/Mm
is_var: True
is_flag: False
flag_id: numflag
wavelength: 450.0 nm

Column 6
name: aerosol_light_scattering_coefficient
unit: 1/Mm
is_var: True
is_flag: False
flag_id: numflag
wavelength: 550.0 nm

Column 7
name: aerosol_light_scattering_coefficient
unit: 1/Mm
is_var: True
is_flag: False
flag_id: numflag
wavelength: 700.0 nm

Column 8
name: pressure
unit: hPa
is_var: True
is_flag: Fa

You can see that all variable columns were assigned the same flag column, since there is only one. This would be different if there were multiple flag columns (e.g. one for each variable).

#### Access flag information

You can access the flags for each column using the ``flags`` attribute of the file.

In [3]:
mc.flags

OrderedDict([('numflag',
              <pyaerocom.io.ebas_nasa_ames.EbasFlagCol at 0x7f0602d94ac8>)])

In [4]:
flagcol = mc.flags["numflag"]

The raw flags can be accessed via:

In [5]:
flagcol.raw_data

array([0.394999, 0.394999, 0.394999, ..., 0.247   , 0.247   , 0.      ])

And the processed flags are in stored in a (Nx3) numpy array where N is the total number of timestamps.

In [6]:
flagcol.flags

array([[394, 999,   0],
       [394, 999,   0],
       [394, 999,   0],
       ...,
       [247,   0,   0],
       [247,   0,   0],
       [  0,   0,   0]])

For instance, access the flags of the 5 timestamp:

In [7]:
flagcol.flags[4]

array([394, 999,   0])

#### Convert object to pandas Dataframe

The conversion does currently exclude all flag columns

In [8]:
df = mc.to_dataframe()
df

Unnamed: 0,aerosol_light_backscattering_coefficient,aerosol_light_backscattering_coefficient.1,aerosol_light_backscattering_coefficient.2,aerosol_light_scattering_coefficient,aerosol_light_scattering_coefficient.1,aerosol_light_scattering_coefficient.2,pressure,relative_humidity,temperature
2008-01-01 00:30:00.000,,,,,,,,,
2008-01-01 01:29:59.500,,,,,,,,,
2008-01-01 02:29:59.500,,,,,,,,,
2008-01-01 03:30:00.000,,,,,,,,,
2008-01-01 04:29:59.500,,,,,,,,,
2008-01-01 05:29:59.500,,,,,,,,,
2008-01-01 06:30:00.000,,,,,,,,,
2008-01-01 07:29:59.500,,,,,,,,,
2008-01-01 08:29:59.500,,,,,,,,,
2008-01-01 09:30:00.000,,,,,,,,,


#### Performance

Read only header

In [9]:
%%timeit
EbasNasaAmesFile(file=ebasdir+filename,
                 only_head=True, verbose=False)

3.85 ms ± 98.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


Read raw:

In [10]:
%%timeit
EbasNasaAmesFile(file=ebasdir+filename,
                      only_head=False,          #set True if you only want to import header
                      replace_invalid_nan=False, #replace invalid values with NaNs
                      convert_timestamps=False,  #compute datetime64 timestamps from numerical values
                      decode_flags=False,        #decode all flags (e.g. 0.111222333 -> 111 222 333)
                      verbose=False)

65.3 ms ± 11.3 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


Perform all operations:

In [11]:
%%timeit
EbasNasaAmesFile(file=ebasdir+filename,
                      only_head=False,          #set True if you only want to import header
                      replace_invalid_nan=True, #replace invalid values with NaNs
                      convert_timestamps=True,  #compute datetime64 timestamps from numerical values
                      decode_flags=True,        #decode all flags (e.g. 0.111222333 -> 111 222 333)
                      verbose=False)

68.9 ms ± 9.25 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
