# Observations

In [None]:
import warnings

import pandas as pd

warnings.filterwarnings("ignore", category=FutureWarning, message="The warn_bad_lines argument has been deprecated")
warnings.filterwarnings("ignore", category=FutureWarning, message="The error_bad_lines argument has been deprecated")
warnings.filterwarnings("ignore", category=FutureWarning, message="Argument `closed` is deprecated in favor of `inclusive`")

## AERONET

See {mod}`~monetio.aeronet` for more information about AERONET.

In [None]:
from monetio import aeronet

Aeronet is global data so we are going to look at a single day to speed this along.
First we need to create a datetime array.

In [None]:
dates = pd.date_range(start='2017-09-25', freq='H', periods=24)
dates

Now let's assume that we want to read the Aerosol Optical Depth Level 1.5 data, which is
cloud-screened and quality controlled.
To request AERONET data we use the {func}`~monetio.aeronet.add_data` function.

In [None]:
df = aeronet.add_data(dates=dates, product='AOD15')
df.head()

Sometimes you only want data over a specific region. To do this let's define a
latitude-longitude box
```python
[latmin, lonmin, latmax, lonmax]
```
over northern Africa.

In [None]:
df = aeronet.add_data(dates=dates, product='AOD15', latlonbox=[2., -21, 38, 37])
df[['latitude', 'longitude']].describe()

To download inversion products you must supply the `inv_type` kwarg. It accepts either
`'ALM15'`, `'ALM20'`, `'HYB15'`, or `'HYB20'`, as described [here](https://aeronet.gsfc.nasa.gov/print_web_data_help_v3_inv_new.html). Let's get the size distribution
from data over northern Africa.

In [None]:
df = aeronet.add_data(dates=dates, product='SIZ', latlonbox=[2., -21, 38, 37], inv_type='ALM15')
df.head()

You can also:
* request daily-average data instead
* request a specific site ID from [the list](https://aeronet.gsfc.nasa.gov/aeronet_locations_v3.txt)

## AirNow

{func}`monetio.airnow.add_data` downloads data from their Amazon S3 bucket and aggregates it, returning a {class}`~pandas.DataFrame`. For example, lets say that we want to look at data from 2018-05-01 to 2018-05-03.

In [None]:
from monetio import airnow

In [None]:
dates = pd.date_range(start='2018-05-01', end='2018-05-03', freq='H', closed='left')

In [None]:
%%time

df = airnow.add_data(dates)
df.head()

Use the `n_procs` keyword argument to control the maximum number of workers used by Dask. By default, it is set to 1. Note the faster wall time for the below.

In [None]:
%%time

df = airnow.add_data(dates, n_procs=4)

To keep local copies of the files downloaded from AirNow, supply `download=True`.
```python
df = airnow.add_data(dates, download=True)
```

By default, the returned frame is in "wide" format, with columns for each separate variable (OZONE, PM2.5, etc.). It is possible to return the original AirNow "long" format (where each row is a single record with a single variable and value) by supplying `wide_fmt=False`.

In [None]:
df = airnow.add_data(dates, wide_fmt=False, n_procs=4)
df.head()

The `daily` option will download AirNow's daily-average data. Our example `dates` array has two unique days, which will be detected automatically. If you are only interested in the daily statistics, this is a much faster method for obtaining them.

In [None]:
%%time

df = airnow.add_data(dates, daily=True, n_procs=2)
df.head()

In [None]:
df.time.unique()

## AQS

We will begin by loading hourly ozone concentrations from 2018.

In [None]:
from monetio import aqs

In [None]:
dates = pd.date_range(start='2018-01-01', end='2018-12-31', freq='H', closed='left')
dates

Retrieving such a file can take a few minutes and the resulting DataFrame could use ~ 1 GB of memory.
But this is how we would do it:
```ipython
%%time

df = aqs.add_data(dates, param="OZONE")
df.head()
```

Obtaining daily data is considerably quicker.

In [None]:
%%time

df = aqs.add_data(dates, param="OZONE", daily=True)
df.head()

We can get multiple variables by setting `param` to a list of strings. Since this requires loading multiple files, we benefit by settings `n_procs`.

In [None]:
%%time

df = aqs.add_data(dates, param=["SO2", "PM2.5"], daily=True, n_procs=2)
df.head()

Like with AirNow, the returned dataframes are by default converted to a wide format, but we can request skipping this step by supplying `wide_fmt=False`. Here we also demonstrate `meta=True`, which adds additional site metadata columns.

In [None]:
%%time

df = aqs.add_data(dates, param="OZONE", daily=True, wide_fmt=False, meta=True)
print(len(df), "columns")
df.head()

In [None]:
df.networks.unique()

Use the `network` keyword argument to subset the data by EPA measurement network before returning. This requires `meta=True`.

In [None]:
%%time

df = aqs.add_data(dates, param="OZONE", daily=True, meta=True, network="NCORE")
print(len(df), "columns")
df.head()

In [None]:
df.networks.unique().tolist()

Supply `download=False` to download the AQS file to disk. Then, `local=True` can be used to load from the local directory instead of downloading the files from the AQS server.

Let's load speciated PM data from the Chemical Speciation Network (CSN; <https://www.epa.gov/amtic/chemical-speciation-network-csn>).

In [None]:
%%time

df = aqs.add_data(dates, param="SPEC", daily=True, meta=True, network="CSN")

## NADP

In [None]:
from monetio import nadp

In [None]:
dates = pd.date_range(start='2018-01-01', end='2018-12-31', freq='H', closed='left')
dates

```python
df = nadp.add_data(dates, network="NTN")
```
TODO: not working