# Observations

In [None]:
import warnings

import pandas as pd

warnings.filterwarnings("ignore", category=FutureWarning, message="The warn_bad_lines argument has been deprecated")
warnings.filterwarnings("ignore", category=FutureWarning, message="The error_bad_lines argument has been deprecated")
warnings.filterwarnings("ignore", category=FutureWarning, message="Argument `closed` is deprecated in favor of `inclusive`")

## AERONET

See {mod}`~monetio.aeronet` for more information about AERONET.

In [None]:
from monetio import aeronet

Aeronet is global data so we are going to look at a single day to speed this along.
First we need to create a datetime array.

In [None]:
dates = pd.date_range(start='2017-09-25', freq='H', periods=24)
dates

Now let's assume that we want to read the Aerosol Optical Depth Level 1.5 data, which is
cloud-screened and quality controlled.
To request AERONET data we use the {func}`~monetio.aeronet.add_data` function.

In [None]:
df = aeronet.add_data(dates=dates, product='AOD15')
df.head()

Sometimes you only want data over a specific region. To do this let's define a
latitude-longitude box
```python
[latmin, lonmin, latmax, lonmax]
```
over northern Africa.

In [None]:
df = aeronet.add_data(dates=dates, product='AOD15', latlonbox=[2., -21, 38, 37])
df[['latitude', 'longitude']].describe()

To download inversion products you must supply the `inv_type` kwarg. It accepts either
`'ALM15'`, `'ALM20'`, `'HYB15'`, or `'HYB20'`, as described [here](https://aeronet.gsfc.nasa.gov/print_web_data_help_v3_inv_new.html). Let's get the size distribution
from data over northern Africa.

In [None]:
df = aeronet.add_data(dates=dates, product='SIZ', latlonbox=[2., -21, 38, 37], inv_type='ALM15')
df.head()

You can also:
* request daily-average data instead
* request a specific site ID from [the list](https://aeronet.gsfc.nasa.gov/aeronet_locations_v3.txt)

## AirNow

{func}`monetio.airnow.add_data` downloads data from their Amazon S3 bucket and aggregates it, returning a {class}`~pandas.DataFrame`. For example, lets say that we want to look at data from 2018-05-01 to 2018-05-03.

In [None]:
from monetio import airnow

In [None]:
dates = pd.date_range(start='2018-05-01', end='2018-05-03', freq='H', closed='left')

In [None]:
%%time

df = airnow.add_data(dates)
df.head()

Use the `n_procs` keyword argument to control the maximum number of workers used by Dask. By default, it is set to 1. Note the faster wall time for the below.

In [None]:
%%time

df = airnow.add_data(dates, n_procs=4)

To keep local copies of the files downloaded from AirNow, supply `download=True`.
```python
df = airnow.add_data(dates, download=True)
```

By default, the returned frame is in "wide" format, with columns for each separate variable (OZONE, PM2.5, etc.). It is possible to return the original AirNow "long" format (where each row is a single record with a single variable and value) by supplying `wide_fmt=False`.

In [None]:
df = airnow.add_data(dates, wide_fmt=False, n_procs=4)
df.head()

The `daily` option will download AirNow's daily-average data. Our example `dates` array has two unique days, which will be detected automatically. If you are only interested in the daily statistics, this is a much faster method for obtaining them.

In [None]:
%%time

df = airnow.add_data(dates, daily=True, n_procs=2)
df.head()

In [None]:
df.time.unique()