<a name="connecting"></a>
# Connecting to ERDDAP 
We use the [`erddapy`](https://pypi.org/project/erddapy/) library to make a connection to the ERDDAP server at [`https://data.permafrostnet.ca/erddap`](https://data.permafrostnet.ca/erddap). 

For a more thorough tutorial about the erddapy library, see the [erddapy documentation](https://ioos.github.io/erddapy/master/01-longer_intro-output.html)

In [64]:
from erddapy import ERDDAP
import pandas as pd
pd.set_option('display.max_colwidth', 0)

erddap = ERDDAP(
  server='https://data.permafrostnet.ca/erddap',
  protocol='tabledap',
)


<a name="access_temperature"></a>
# Accessing ground temperature data
First, we search for all datasets that match a set of search criteria. The `standard_name` parameter is used to specify what kind of data we are looking for. The [CF standard name table](https://cfconventions.org/Data/cf-standard-names/74/build/cf-standard-name-table.html) provides a list of valid `standard_names`. For ground temperature, we are most interested in `soil_temperature` and `solid_earth_subsurface_temperature`. In this example,  we will search for `soil_temperature`.



In [65]:
search_parameters = {
                    "standard_name": "soil_temperature",
                    "cdm_data_type": "timeseriesprofile",
                    "min_lon": -122.0,
                    "max_lon": -120.0,
                    "min_lat": 60.0,
                    "max_lat": 70.0
}

search_url = erddap.get_search_url(response="csv", **search_parameters)
search_result = pd.read_csv(search_url)

search_result[['Dataset ID', 'Title']]



Unnamed: 0,Dataset ID,Title
0,pfnetGrndTmpAll,Aggregated Ground Temperature Datasets
1,2019-007-FEN_DP,"Ground temperature at FEN_DP, Scotty Creek Research Station, Northwest Territories"
2,2019-007-FEN_SHLW,"Ground temperature at FEN_SHLW, Scotty Creek Research Station, Northwest Territories"
3,2019-007-PLT1_DP,"Ground temperature at PLT1_DP, Scotty Creek Research Station, Northwest Territories"
4,2019-007-PLT1_SHLW,"Ground temperature at PLT1_SHLW, Scotty Creek Research Station, Northwest Territories"
5,2019-007-PLT2_SHLW,"Ground temperature at PLT2_SHLW, Scotty Creek Research Station, Northwest Territories"
6,2019-007-PLT3_SHLW,"Ground temperature at PLT3_SHLW, Scotty Creek Research Station, Northwest Territories"
7,2019-007-PLT4_SHLW,"Ground temperature at PLT4_SHLW, Scotty Creek Research Station, Northwest Territories"
8,2019-007-PLT5_SHLW,"Ground temperature at PLT5_SHLW, Scotty Creek Research Station, Northwest Territories"
9,2019-007-SL_DP,"Ground temperature at SL_DP, Scotty Creek Research Station, Northwest Territories"


Now that we have a list of datasets matching our search criteria, we want to get the data from a site. The variable names in a dataset aren't standardized, but we can use the `standard_name` and `axis` attributes (which are standardized!) to find what we need. 

In [66]:
coordinates = erddap.get_var_by_attr(
            dataset_id="2019-007-PLT1_SHLW",
            axis=lambda v: v in ["X", "Y", "Z", "T"]
        )

temperature = erddap.get_var_by_attr(
            dataset_id="2019-007-PLT1_SHLW",
            standard_name="soil_temperature"
        )

print(f"The variables of interest are named: {coordinates + temperature}")


The variables of interest are named: ['latitude', 'longitude', 'time', 'depth', 'soil_temperature']


In [67]:


search_parameters = {'variables' : coordinates + temperature,
                    'response' : 'csv',
                    'dataset_id' : "2019-007-PLT1_SHLW"
                    }

data_url = erddap.get_download_url(**search_parameters)  
data = pd.read_csv(data_url, skiprows=(1,)).dropna()

data

Unnamed: 0,latitude,longitude,time,depth,soil_temperature
126,61.3084,-121.3079,2012-08-01T03:00:00Z,0.05,16.392
127,61.3084,-121.3079,2012-08-01T05:00:00Z,0.05,18.747
128,61.3084,-121.3079,2012-08-01T07:00:00Z,0.05,19.294
129,61.3084,-121.3079,2012-08-01T09:00:00Z,0.05,18.961
130,61.3084,-121.3079,2012-08-01T11:00:00Z,0.05,18.461
...,...,...,...,...,...
78989,61.3084,-121.3079,2017-01-29T19:00:00Z,1.00,-0.004
78990,61.3084,-121.3079,2017-01-29T21:00:00Z,1.00,-0.004
78991,61.3084,-121.3079,2017-01-29T23:00:00Z,1.00,-0.004
78992,61.3084,-121.3079,2017-01-30T01:00:00Z,1.00,-0.004


<a name="access_profile"></a>
# Accessing Geotechnical/borehole profile data
 

There aren't yet `standard_name`s defined for a number of geotechnical variables (this is in the works), so for now, you can search for datasets that are of the type `profile`:

In [68]:
search_parameters = {
                    "cdm_data_type": "profile"
}

search_url = erddap.get_search_url(response="csv", **search_parameters)
search_result = pd.read_csv(search_url)

search_result[['Dataset ID', 'Title']]

Unnamed: 0,Dataset ID,Title
0,NTGS-2021-xx-6176e9b97f35c33b,Cryostratigraphic data for borehole 170-1-10
1,NTGS-2021-xx-4496b36c8d0632ce,Cryostratigraphic data for borehole 170-1-12
2,NTGS-2021-xx-320e06c998f5e7f9,Cryostratigraphic data for borehole 170-1-17
3,NTGS-2021-xx-09bffbaba33f1360,Cryostratigraphic data for borehole 170-1-18
4,NTGS-2021-xx-0184a6177a0f9aaf,Cryostratigraphic data for borehole 170-1-19
...,...,...
570,NTGS-2021-xx-f182689849cc17f1,Cryostratigraphic data for borehole W14103137-S6-BH13
571,NTGS-2021-xx-f0ee1aab4be085ec,Cryostratigraphic data for borehole W14103137-S6-BH14
572,NTGS-2021-xx-b8f016462faa0219,Cryostratigraphic data for borehole W14103137-S6-BH15
573,NTGS-2021-xx-194c985bea623339,Cryostratigraphic data for borehole W14103137-S6-BH16


There aren't yet `standard_name`'s defined for most geotechnical variables (this is in the works), so for the time being, a number of the query parameters for geotechnical data must be hard-coded

In [86]:
coordinates = erddap.get_var_by_attr(
            dataset_id="ntgs-AC",
            axis=lambda v: v in ["X", "Y", "T"]
        )

interval = ['top_of_interval', 'bottom_of_interval']
data_variables = ['borehole', 'cryostructures', 'visible_ice', 'ASTM_2488']

f"variables of interest are {coordinates + interval + data_variables}"

"variables of interest are ['latitude', 'longitude', 'time', 'top_of_interval', 'bottom_of_interval', 'borehole', 'cryostructures', 'visible_ice', 'ASTM_2488']"

In [87]:
search_parameters = {'variables' : coordinates + interval + data_variables,
                    'response' : 'csv',
                    'dataset_id' : "ntgs-AC"
                    }

data_url = erddap.get_download_url(**search_parameters)  
data = pd.read_csv(data_url, skiprows=(1,)).dropna()
data

Unnamed: 0,latitude,longitude,time,top_of_interval,bottom_of_interval,borehole,cryostructures,visible_ice,ASTM_2488
2,69.16162,-133.08682,2012-03-21T00:00:00Z,1.4,2.40,170-1-10,Nf,No visible ice,SW-SM
3,69.16162,-133.08682,2012-03-21T00:00:00Z,2.4,8.40,170-1-10,Nf,No visible ice,GW-GM
4,69.16105,-133.08880,2012-03-21T00:00:00Z,0.0,2.40,170-1-12,Nf,No visible ice,GP-GM
5,69.16105,-133.08880,2012-03-21T00:00:00Z,2.4,5.50,170-1-12,Nf,No visible ice,SM
6,69.16105,-133.08880,2012-03-21T00:00:00Z,5.5,6.70,170-1-12,Nf,No visible ice,ICE
...,...,...,...,...,...,...,...,...,...
2656,69.10868,-133.08197,2013-04-20T00:00:00Z,0.2,0.75,W14103137-CRA12N,Vr,Medium to high,PEAT
2666,69.10802,-133.08230,2013-04-20T00:00:00Z,0.3,0.95,W14103137-CRA12S,Vx/Vr,Medium to high,PEAT
2676,69.16827,-133.03706,2013-04-20T00:00:00Z,0.1,1.40,W14103137-CRA3N,Vx/Vr,Medium to high,PEAT
2678,69.16827,-133.03706,2013-04-20T00:00:00Z,2.3,2.40,W14103137-CRA3N,Vx/Vr,Medium to high,PEAT


<a name="citation"></a>
# Citing a dataset

If you intend to use data for a project or publication, is is important that you cite it properly. Dataset attributes can be used to find the publication(s) and authors(s) responsible for creating the dataset.  


In [69]:
info_url = erddap.get_info_url(dataset_id="ntgs-AC", response='csv')

info = pd.read_csv(info_url)
info.loc[info["Attribute Name"].isin(["references", "publisher_name", "creator_name", "contributor_name", "contributor_role"]) ,
         ["Attribute Name", "Value"]]

Unnamed: 0,Attribute Name,Value
2,contributor_name,"Ariane Castagner,Steve Kokelj,Stephan Gruber,Kiggiak-EBA Consulting Ltd.,Kavik-Stantec Inc."
3,contributor_role,"coAuthor,coAuthor,coAuthor,contributor,contributor"
7,creator_name,Ariane Castagner
34,publisher_name,Northwest Territories Geological Survey
37,references,"Castagner, A., Kokelj, S.V., Gruber, S., 2021. Permafrost Geotechnical Data Report: Cryostratigraphic Synthesis of Inuvik to Tuktoyaktuk Highway Corridor Geotechnical Boreholes (2012-2017). NWT Open Report 2021-XXX, Northwest Territories Geological Survey, Yellowknife, NT, 17 p."
