# FAQs





### Why does `pydap` take so long to download, and how can I improve it?
$\textbf{Answer}:$ There are broadly two stages at which `pydap` downloads content: `a)` during the dataset creation, and `b)` fetching numerical / array data. These look like this:

**a) Metadata / dataset creation**
```python
pyds = open_url(<opendap_url>, session=my_session, protocol='dap2 | dap4')
```
This stage is usually fast, but the speed depends on various factors:
* **Authentication**. There may be many redirects. When possible try to use token authentication which reduces the amount of redirects.
  
* **Hierarchical metadata**. Some datasets, in particular associated with level 2 data, can contain nested `Groups` (`Groups` are part of the `DAP4` protocol), and parsing the complex metadata during the dataset creation can be time-consuming. To reduce the timing, you can use the [Data Request Form](https://www.opendap.org/support/online-help-files/) to construct a `Constraint Expression` that reduces the amount of `Groups` and variables you wish to include in your dataset. This is, an `<opendap_url>` with a `CE` allows you to discard variables before creating the dataset  (To inspect the Data Request Form associated with a dataset on a DAP4 server, append a `.dmr` to the `<opendap_url>`). The documentation on [Constraint expressions](ConstraintExpressions) has an example demonstrating the use of `CE`s to reduce the size of the dataset before the dataset creation.

* **Cache the Session**. Starting with `pydap` version `3.5.4`, `pydap` can use `requests-cache` to cache sessions. Caching the session means that `pydap` store the `dmr` (i.e. the metadata) after the first download, for later use. `requests-cache` can also recover credentials from the `~/.netrc` file, and handle token authentication.
```python
from pydap.net import create_session

my_session = create_session(use_cache=True) # False is the default

pyds = open_url(<opendap_url>, session=my_session, protocol='dap2 | dap4')
```
Caching the session also implies caching the `dap` / `dods` responses (the numerical array values), which also speeds up the analysis. The documentation section on [Pydap as a Client](PydapAsClient) has short example demonstrating the use of caching the `dmr` during the dataset creation.


**b) Fetching numerical data**
`pydap` downloads array data in the form of `.dap` (DAP4) or `.dods` (DAP2) when slicing the array. This is, when:
```python
pyds["VarName"][:] # this will download the entirety of the array, a different indexing will only download the subset
```
or when accessing via `xarray` (with `engine="pydap"`)
```python
ds['varName'].isel(dim1=dim1_slice, dim2=dim2_slice) # e.g.
```
The speed of download can depend on many factors: chunking of remote dataset, size of download, internet speed, the remote server, etc. We recommend:

* **Subset the Variable**. This limits the size of download (specially when remote datasets are a virtual aggregated of many many remote files). Some organizations impose a 2Gb limit on the download. The Example on using pydap with [PACE](notebooks/PACE) data has a nice example of downloading all the coords arrays (`lat` and `lon`) to identify the subset of 2D array of interest. 

* **Cache the Session** . Same as with the dataset creation, a cached session can also store `.dap`/`.dods` responses. This will also limit the times a (repeated) download is requested to the server. 

* **Diagnosing**. It is possible that the remote dataset has many small chunks, resulting in very slow performance. This, along with internet connection, are performance problems outside of the scope of `pydap`. A useful diagnose if the issue is withg `pydap` or with the remote server, is to use curl to download the response.

```python
curl -L -n "<opendap_url_with_constraint_expression>" 
```
where `-L` implies following redirects, and `-n` instructs `curl` to recover authentication from the `~/.netrc` file. This last one is only necessary when authentication is required. For example, to download a `.dap` (DAP4) response from a dap4 server (with no authentication required):

```python
curl -L -o output.dap "http://test.opendap.org/opendap/data/nc/coads_climatology.nc.dap?dap4.ce=/TIME"
```
The following command downloads only the variable `TIME` from [this test](http://test.opendap.org/opendap/data/nc/coads_climatology.nc.dmr) dataset. The download should be very fast. When slicing an array `pydap` does something very similar: downloads a `.dap` response for a single variable, in this case `TIME`. Pydap should not take too much longer that `curl` to download the `.dap` response.
