<a href="https://colab.research.google.com/github/matt-gn/amrdc-jupyterlab/blob/main/introduction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# AMRDC Repository API Introduction

In this tutorial we will learn how to access, interact, and aggregate AMRDC data and metadata via the public API (application programming interface). Along the way, we will learn a bit about the architecture of APIs, HTTP requests, and wrangling data streams in Python.

This tutorial assumes you have Python 3.xx installed, along with either the `pip` tool or an active `conda` environment.

Alternately, you can execute the code directly in this notebook by opening it in Google Colab or downloading the `.ipynb` file and opening it in your favorite interactive notebook environment.

For a more technical treatment of this topic, refer to the [Official CKAN API Guide](https://docs.ckan.org/en/2.10/api/index.html).

## Accessing the repository using the `requests` library

We'll start by importing the `requests` library, which we can use to make simple HTTP GET and POST requests.

In [None]:
import requests

We're ready to make our first API call.

In [None]:
## Request a list of datasets from the repository
## requests.get() returns a `Response` object, which can be cast to a dict using .json()
response = requests.get('https://amrdcdata.ssec.wisc.edu/api/action/package_list').json()

## We then access the results using the 'result' key.
amrdc_datasets = response['result']

## Let's count how many datasets are in the repo.
len(amrdc_datasets)

4507

In [None]:
## Okay, that worked. Now let's use this list to look for Byrd AWS datasets.
byrd_aws_datasets = [dataset for dataset in amrdc_datasets if "byrd" in dataset]
len(byrd_aws_datasets)

91

In [None]:
## Great. Now say we want only Byrd AWS datasets from 1999....
byrd_aws_1999_datasets = [dataset for dataset in amrdc_datasets['result'] if all(word in dataset for word in ["byrd", "1999"])]
byrd_aws_1999_datasets

['byrd-automatic-weather-station-1999-reader-format-three-hour-observational-data',
 'byrd-automatic-weather-station-1999-unmodified-ten-minute-observational-data']

You can use these results as static URLs by appending them to `https://amrdcdata.ssec.wisc.edu/dataset/`. Visit [this link](https://amrdcdata.ssec.wisc.edu/dataset/byrd-automatic-weather-station-1999-unmodified-ten-minute-observational-data) in your browser to access the data.

This method works, but it's inefficient and unintuitive. Surely there has to be a better way....

## Accessing the repository using the `ckanapi` library

The `ckanapi` library is a simple, powerful wrapper for programmatically communicating with the CKAN API.

Install it with `pip install ckanapi` or `conda install -c conda-forge ckanapi`.

In [None]:
%pip install ckanapi
import ckanapi

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting ckanapi
  Downloading ckanapi-4.7.tar.gz (33 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting docopt
  Downloading docopt-0.6.2.tar.gz (25 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: ckanapi, docopt
  Building wheel for ckanapi (setup.py) ... [?25l[?25hdone
  Created wheel for ckanapi: filename=ckanapi-4.7-py3-none-any.whl size=43312 sha256=fb9372e76637198fec9b771c8b7f8c66a1add4046849ceef00cd4d352add4594
  Stored in directory: /root/.cache/pip/wheels/0d/b2/c7/219cd5a752c2ff4fb9809216307d26f6421f6711e0f4e010ff
  Building wheel for docopt (setup.py) ... [?25l[?25hdone
  Created wheel for docopt: filename=docopt-0.6.2-py2.py3-none-any.whl size=13723 sha256=85e4cfbd5c340713928ce2089ab86a645def2b7439dc0c07328e94ed142329a4
  Stored in directory: /root/.cache/pip/wheels/56/ea/58/ead137b087d9e326852a851351d1deb

Let's import the library and execute another search for the Byrd 1999 data.

In [None]:
from ckanapi import RemoteCKAN
amrdc_repository = RemoteCKAN('https://amrdcdata.ssec.wisc.edu/')

In [None]:
byrd_aws_1999_datasets2 = amrdc_repository.action.package_search(q="Byrd 1999")['results']

That's it! The library returns the response as a list of dictionaries, this time containing all of the metadata. Let's make sure we got the same response as last time.

In [None]:
print([dataset['title'] for dataset in byrd_aws_1999_datasets2])

['Byrd Automatic Weather Station, 1999 unmodified ten-minute observational data.', 'Byrd Automatic Weather Station, 1999 Reader format three-hour observational data.']
