# ERDDAP FUNCTIONALITY

## Table of contents
### [Functions](#functions)
* [erddap_py](#erddappy)
* [erddap_pull](#erddap_pull)
* [create_erddap_url](#createerddapurl)



# TO DO
* tabledap --> get so you dont need to input all variables individually
  * look into incorporating ERDDAPY here
* incorporating other servers
  * do we want 1 really flexible function that pulls from ERDDAP/THREDDS/etc
  * or do we want separate functions depending on the server type? 

In [1]:
import xarray as xr
import requests
import pandas as pd
import json
import erddapy
from erddapy import ERDDAP

# <font color='black'>Functions</font> <a class="anchor" id="erddappy"></a>

### <font color='blue'>```erddap_py```</font> <a class="anchor" id="createerddapurl"></a>
**TO DO:**
* add functionality for dataset that dont have time explicitly named (ie ecomon--> datetimeUTC)

**Purpose:** 
* use ERDDAPY package to search, subset, and stream from ERDDAP
* supports griddap and tabledap

**Arguments:** 
* <u>protocol</u> (string): either ```'tabledap'``` if using point data, or ```'griddap'``` if using gridded data
* <u>datasetID</u> (string): dataset ID from ERDDAP
* <u>latmin</u> (int/float) *optional*: minimum latitude for bounding box. If not explicitly defined, default is to subset to the NWA)
* <u>latmax</u> (int/float) *optional*: maxmimum latitude for bounding box. If not explicitly defined, default is to subset to the NWA)
* <u>lonmin</u> (int/float) *optional*: minimum longitude for bounding box. If not explicitly defined, default is to subset to the NWA)
* <u>lonmax</u> (int/float) *optional*: maximum longitude for bounding box. If not explicitly defined, default is to subset to the NWA)
* <u>date_end</u> (string): end date for data slice. Should be formatted as ```'yyyy-mm-dd'```
* <u>date_start</u> (string): start date for data slice. Should be formatted as ```'yyyy-mm-dd'```
* <u>base_url</u> (string) *optional*: beginning part of erddap url. Defaults to ```'https://comet.nefsc.noaa.gov/'```

**Sample Usage:** <br>
### Tabledap
```
erddap_py('tabledap','cfrf_temps','2020-01-01','2020-12-31')
```
### Griddap
```
erddap_py('griddap','noaacwBLENDEDsstDLDaily','2020-07-10','2020-07-12',baseurl='https://coastwatch.noaa.gov/erddap')
```

**History:** <br>
>* 2/27/25 function initialized

In [6]:
def erddap_py(protocol,datasetID, date_start,date_end,latmin=34,latmax=46,lonmin=-77,lonmax=-63,baseurl="https://comet.nefsc.noaa.gov/erddap"):
    e = ERDDAP(
    server=baseurl,
    protocol=protocol,)

    constraints = {
    "time>=": date_start+"T00:00:00Z",
    "time<=": date_end+"T00:00:00Z",
    "latitude>=": latmin,
    "latitude<=": latmax,
    "longitude>=": lonmin,
    "longitude<=": lonmax,}
    e.dataset_id = datasetID
    if protocol=='griddap':
        e.griddap_initialize()
        e.constraints.update(constraints)
        data = e.to_xarray()
        #e.variables=e.variables[:2]
    elif protocol=='tabledap':
        e.constraints = constraints
        data = e.to_pandas()
    return data
        

### <font color='blue'>```erddap_pull```</font> <a class="anchor" id="erddappull"></a>
**Purpose:** 
* lazy-load data from ERDDAP url 

**Arguments:** 
* <u>file_type</u> (string): file type. either ```'nc'``` or ```'csv'```
* <u>url</u> (string): erddap url. Either directly pulled from erddap's website or created using ```creatae_erddap_url```

**Sample Usage:** <br>
```
data = erddap_pull('nc',url)
```

**History:** <br>
>* 2/26/25 function initialized

In [None]:
import requests
def erddap_pull(file_type, url):
    if file_type=='nc':
        url = requests.get(url, verify=False).content
        data=xr.open_dataset(url)
    elif file_type =='csv':
        data=pd.read_csv(url)
    return data

### <font color='blue'>```create_erddap_url```</font> <a class="anchor" id="createerddapurl"></a>
## THIS IS OUTDATED PLZ IGNORE
**Purpose:** 
* Create an ERDDAP url for subsetting a dataset
* NOTE: must have the variable names that you want to grab from the dataset

**Arguments:** 
* <u>dataID</u> (string): dataset ID from ERDDAP
* <u>file_type</u> (string): either ```'nc'``` or ```'csv'```
* <u>daptype</u> (string): either ```'tabledap'``` if using point data, or ```'griddap'``` if using gridded data
* <u>var</u> (list of strings): list of variables from dataset
* <u>latmin</u> (int/float) *optional*: minimum latitude for bounding box. If not explicitly defined, default is to subset to the NWA)
* <u>latmax</u> (int/float) *optional*: maxmimum latitude for bounding box. If not explicitly defined, default is to subset to the NWA)
* <u>lonmin</u> (int/float) *optional*: minimum longitude for bounding box. If not explicitly defined, default is to subset to the NWA)
* <u>lonmax</u> (int/float) *optional*: maximum longitude for bounding box. If not explicitly defined, default is to subset to the NWA)
* <u>date_end</u> (string): end date for data slice. Should be formatted as ```'yyyy-mm-dd'```
* <u>date_start</u> (string): start date for data slice. Should be formatted as ```'yyyy-mm-dd'```
* <u>base_url</u> (string) *optional*: beginning part of erddap url. Defaults to ```'https://comet.nefsc.noaa.gov/'```

**Sample Usage:** <br>
### Tabledap
```
url =create_erddap_url(dataID='ocdbs_v_erddap1', file_type='csv', daptype='tabledap', var=['sea_surface_temperature'], date_end='2024-06-01', date_start='2024-01-31')
```
### Griddap
```
url =create_erddap_url(dataID='noaa_coastwatch_acspo_v2_reanalysis', file_type='nc', daptype='griddap', var=['sea_surface_temperature','sst_dtime'], date_end='2024-01-01', date_start='2024-01-31')
erddap_pull('nc',url)
```

**History:** <br>
>* 2/26/25 function initialized

In [1]:
def create_erddap_url(dataID, file_type, daptype, var, latmin=35,latmax=46, lonmin=-76, lonmax=-63, date_end=None, date_start=None, base_url='https://comet.nefsc.noaa.gov/'):
    base = base_url + 'erddap/' + daptype + '/'+ dataID + '.' + file_type +'?'
    if daptype=='griddap':
        varz= []
        for v in var: 
            varz.append(v + '%5B(' + date_start+'T00:00:00Z):1:('+date_end+'T00:00:00Z)%5D%5B(' + str(latmin)+'):1:('+str(latmax)+')%5D%5B('+str(lonmin)+'):1:('+str(lonmax)+')%5D')
        url =base+ ",".join(varz)
    elif daptype=='tabledap':
        varz=[]
        for v in var:
            varz.append(v)
            if v.__contains__('lat'):
                latvar=v
            elif v.__contains__('lon'):
                lonvar=v
            elif v.__contains__('time') or v.__contains__('TIME'):
                timevar=v
        #var.remove(timevar)
        #for v in var:
        #    varz.append(v)#+'%2C')
        url=base+ "%2C".join(varz)+'&'+timevar+'%3E='+date_start+'&'+timevar+'%3C='+date_end+'&'+latvar+'%3C='+str(latmin)+'&'+latvar+'%3C='+str(latmax)+'&'+lonvar+'%3C='+str(lonmin)+'&'+lonvar+'%3C='+str(lonmax)        
    return url