# 5 minute tutorial

The easiest way to use `pydap` is to use it to access remote data hosted on [OPeNDAP](https://www.opendap.org/) servers. You can use `pydap`'s `open_url` directly, or better use `pydap` as an engine for `xarray`. Both ways are equivalent, but xarray allows for [OPeNDAP](https://www.opendap.org/) users to exploit many of [Pangeo](https://pangeo.io/)'s modern capabilities for scalable computing.


## OPeNDAP - the vision
The original vision of [OPeNDAP](https://www.opendap.org/) ([Cornillion, et al 1993](https://zenodo.org/records/10610992)) was to make the equivalency:

$ \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; \boxed{\text{URL} \approx \text{Remote Dataset} }$


Furthermore, 

$ \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\; \boxed{\text{URL + Constraints} \approx \text{Subset of Remote Dataset}} $


Here, we demonstrate this. For this short tutorial we will access a remote dataset hosted on [OPeNDAP's Hyrax server](https://www.opendap.org/software/hyrax-data-server/). For more information about [OPeNDAP](https://www.opendap.org/) and Hyrax you can go to the official [OPeNDAP documentation](https://opendap.github.io/documentation/UserGuideComprehensive.html).

The remote dataset that will be used in this tutorial can be inspected via the browser [HERE](http://test.opendap.org:8080/opendap/tutorials/20220531090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc.dmr.html)


In [None]:
from pydap.client import open_url
import xarray as xr
import numpy as np

We define a URL pointing to a remote dataset.

In [None]:
url = "http://test.opendap.org:8080/opendap/tutorials/20220531090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc"

### pydap approach
We access the remote dataset via pydap

In [None]:
pydap_ds = open_url(url, protocol='dap4')

Note the extra argument `protocol='dap4'`. One could also pass `protocol='dap2'`. We can inspect the contents of the dataset as follows:

In [None]:
pydap_ds.tree()

In [None]:
pydap_ds.attributes

```{node}

No data has been downloaded yet. `PyDAP` reads the metadata of the URL to create the `Dataset`. 
```
We can further inspect some of the variables attributes.

In [None]:
pydap_ds['sst_anomaly'].shape

In [None]:
print('This array occupies: ', pydap_ds['sst_anomaly'].nbytes/1e9, '[GBs] in memory')

Each variable contains `CF-compliant metadata` that can be recognized by various APIs, such as `scale_factor`, `offsets` and `_FillValue`. These parameters are necessary to mask over land areas, and scale values. Some APIs like xarray can recognize these, while for others a user must manually transform the data.



In [None]:
pydap_ds['sst_anomaly'].attributes

You can read more about `NetCDF Climate and Forcasts (CF) Metadata Conventions` [HERE](https://cfconventions.org/cf-conventions/cf-conventions.html).


### **Downloading the Array**


You can trigger a download on-the-fly as needed. **However** in almost all cases `only a subset of an entire dataset is needed`. You can download only the piece you want, by slicing the array as follows:

In [None]:
%%time
array = pydap_ds['sst_anomaly'][0, 0:10, 0:10]

In [None]:
np.shape(array)

With the above command, all the data-array has been downloaded into memory and assigned to the variable `array`. However, the variable `array` is not a numpy array, but rather a `BaseType` of `pydap`'s model:

In [None]:
type(array)

To extract the numpy array from `pydap`'s `BaseType` do:

In [None]:
data = array.data

In [None]:
type(data)

### Using server-side processing

Because data is hosted on Hyrax, you can exploit server-side processing that occurs local to the data to perform subsetting. [OPeNDAP](https://www.opendap.org/) servers support subsetting by adding `Constraint Expressions` to the `URL`.

In this scenario were we want a subset of the variable `sst_anomaly`, we can request it directly to [OPeNDAP](https://www.opendap.org/)'s Hyrax server using the following syntax:



```python
<OPeNDAP_URL> + "?dap4.ce=\sst_anomaly[0][0:1:9][0:1:9]"
```


In [None]:
CE = "?dap4.ce=/sst_anomaly[0][0:1:9][0:1:9]"

In [None]:
pydap_ds = open_url(url+CE, protocol='dap4')

In [None]:
pydap_ds.tree()

In [None]:
pydap_ds['sst_anomaly'].shape

### xarray approach

`pydap`'s `open_url` can be used internally within `xarray`, by defining as an extra parameter when creating an `xarray` Dataset. The extra parameter is:

```python
engine='pydap'
```

Moreoever, we can combine the `server-side` processing that occurs **local to the data on the OPeNDAP server**, with `xarray`.

### DAP2 vs DAP4

There are some differences between the `DAP2` and `DAP4` [OPeNDAP](https://www.opendap.org/) model that go beyond this 5 minute intro. We will simply restrict to say that `DAP4` is newer and that will be the focus of this short tutorial. `pydap` accepts a `protocol` argument which specifies `"dap2"` vs `"dap4"`. `xarray` however does not. 

We can specify `DAP4` as the protocol that the [OPeNDAP](https://www.opendap.org/) server will implement when requesting access and server-side processing to the data. We do this by passing a URL that begins with `dap4`. For example in this case the following URL can be passed within `xarray` and `pydap` will recognize the `DAP4` protocol specification:



In [None]:
'dap4'+url[4:]

In [None]:
dataset = xr.open_dataset('dap4'+url[4:], engine='pydap')
dataset

Lastly, we can also pass a URL with a `Constraint Expression` directly onto `xarray` directly as follows:

In [None]:
CE = "?dap4.ce=/time;/lat;/lon;/sst_anomaly"

In [None]:
dataset_ce = xr.open_dataset(url+CE, engine='pydap')
dataset_ce