# Tutorial

Intake-thredds provides an interface that combines [`siphon`](https://github.com/Unidata/siphon) and `intake` to retrieve data from THREDDS data servers. This tutorial provides an introduction to the API and features of intake-thredds. Let's begin by importing `intake`. 

In [1]:
import intake

## Loading a catalog

You can load data from a THREDDS catalog by providing the URL to a valid THREDDS catalog: 

In [2]:
cat_url = 'https://psl.noaa.gov/thredds/catalog/Datasets/noaa.ersst/catalog.xml'

In [3]:
catalog = intake.open_thredds_cat(cat_url)
print(catalog)
print(type(catalog))

<Intake catalog: No name found>
<class 'intake_thredds.cat.ThreddsCatalog'>


## Using the catalog

Once you've loaded a catalog, you can display its contents by iterating over its entries:

In [4]:
list(catalog)

['err.mnmean.v3.nc',
 'sst.mnmean.v3.nc',
 'sst.mnmean.v4.nc',
 'sst.mon.1971-2000.ltm.v4.nc',
 'sst.mon.19712000.ltm.v3.nc',
 'sst.mon.1981-2010.ltm.v3.nc',
 'sst.mon.1981-2010.ltm.v4.nc']

Once you've identified a dataset of interest, you can access it as follows:

In [5]:
source = catalog['err.mnmean.v3.nc']
print(source)

sources:
  err.mnmean.v3.nc:
    args:
      chunks: {}
      urlpath: https://psl.noaa.gov/thredds/dodsC/Datasets/noaa.ersst/err.mnmean.v3.nc
    description: THREDDS data
    driver: intake_xarray.opendap.OpenDapSource
    metadata:
      catalog_dir: null



In [6]:
print(type(source))

<class 'intake_xarray.opendap.OpenDapSource'>


## Loading a dataset

To load a dataset of interest, you can use the `to_dask()` method which is available on a **source** object:

In [7]:
%%time
ds = source().to_dask()
ds

CPU times: user 719 ms, sys: 185 ms, total: 904 ms
Wall time: 9.44 s


Unnamed: 0,Array,Chunk
Bytes,31.90 kB,31.90 kB
Shape,"(1994, 2)","(1994, 2)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 31.90 kB 31.90 kB Shape (1994, 2) (1994, 2) Count 2 Tasks 1 Chunks Type float64 numpy.ndarray",2  1994,

Unnamed: 0,Array,Chunk
Bytes,31.90 kB,31.90 kB
Shape,"(1994, 2)","(1994, 2)"
Count,2 Tasks,1 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,127.78 MB,127.78 MB
Shape,"(1994, 89, 180)","(1994, 89, 180)"
Count,2 Tasks,1 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 127.78 MB 127.78 MB Shape (1994, 89, 180) (1994, 89, 180) Count 2 Tasks 1 Chunks Type float32 numpy.ndarray",180  89  1994,

Unnamed: 0,Array,Chunk
Bytes,127.78 MB,127.78 MB
Shape,"(1994, 89, 180)","(1994, 89, 180)"
Count,2 Tasks,1 Chunks
Type,float32,numpy.ndarray


The `to_dask()` reads only metadata needed to construct an ``xarray.Dataset``. The actual data are streamed over the network when computation routines are invoked on the dataset. 
By default, intake-thredds uses ``chunks={}`` to load the dataset with dask using a single chunk for all arrays. You can use a different chunking scheme by prividing a custom value of chunks before calling `.to_dask()`:

In [8]:
%%time
# Use a custom chunking scheme
ds = source(chunks={'time': 100, 'lon': 90}).to_dask()
ds

CPU times: user 218 ms, sys: 18.3 ms, total: 237 ms
Wall time: 8.3 s


Unnamed: 0,Array,Chunk
Bytes,31.90 kB,1.60 kB
Shape,"(1994, 2)","(100, 2)"
Count,21 Tasks,20 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 31.90 kB 1.60 kB Shape (1994, 2) (100, 2) Count 21 Tasks 20 Chunks Type float64 numpy.ndarray",2  1994,

Unnamed: 0,Array,Chunk
Bytes,31.90 kB,1.60 kB
Shape,"(1994, 2)","(100, 2)"
Count,21 Tasks,20 Chunks
Type,float64,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,127.78 MB,3.20 MB
Shape,"(1994, 89, 180)","(100, 89, 90)"
Count,41 Tasks,40 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 127.78 MB 3.20 MB Shape (1994, 89, 180) (100, 89, 90) Count 41 Tasks 40 Chunks Type float32 numpy.ndarray",180  89  1994,

Unnamed: 0,Array,Chunk
Bytes,127.78 MB,3.20 MB
Shape,"(1994, 89, 180)","(100, 89, 90)"
Count,41 Tasks,40 Chunks
Type,float32,numpy.ndarray
