<a href="https://colab.research.google.com/github/matt-gn/amrdc-jupyterlab/blob/main/introduction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# AMRDC Repository API Introduction

In this tutorial we will learn how to access, interact, and aggregate AMRDC data and metadata via the public API (application programming interface). Along the way, we will learn a bit about the architecture of APIs, HTTP requests, and wrangling JSON and other data streams in Python.

This tutorial assumes you have Python 3.xx installed, along with either the `pip` tool or an active `conda` environment.

Users can apply these examples to their favorite programming language using any utility that can generate and receive an HTTP request. In fact, we will even learn how to execute API requests in your web browser and get a formatted JSON file as a response.

Alternately, you can execute the code directly in this notebook by opening it in Google Colab or downloading the `.ipynb` file and opening it in your favorite interactive notebook environment.

For a more technical treatment of this topic, refer to the [Official CKAN API Guide](https://docs.ckan.org/en/2.10/api/index.html).

## Accessing the repository using the `requests` library

We'll start by importing the `requests` library, which we can use to make simple HTTP GET and POST requests.

In [None]:
import requests

We're ready to make our first API call.

In [None]:
## Request a list of datasets from the repository
## requests.get() returns a `Response` object, which can be cast to a dict using .json()
response = requests.get('https://amrdcdata.ssec.wisc.edu/api/action/package_list').json()

## We then access the results using the 'result' key.
amrdc_datasets = response['result']

## Let's see how many datasets are in the repo.
len(amrdc_datasets)

4507

Okay, that worked. Now let's use this list to look for Byrd AWS datasets.


In [None]:
byrd_aws_datasets = [dataset for dataset in amrdc_datasets if "byrd" in dataset]
len(byrd_aws_datasets)

91

Great. Say we want only Byrd AWS datasets from 1999....

In [None]:
byrd_aws_1999_datasets = [dataset for dataset in amrdc_datasets if "byrd" in dataset and "1999" in dataset]
byrd_aws_1999_datasets

['byrd-automatic-weather-station-1999-reader-format-three-hour-observational-data',
 'byrd-automatic-weather-station-1999-unmodified-ten-minute-observational-data']

You can use these results as static URLs by appending them to `https://amrdcdata.ssec.wisc.edu/dataset/`. Visit [this link](https://amrdcdata.ssec.wisc.edu/dataset/byrd-automatic-weather-station-1999-unmodified-ten-minute-observational-data) in your browser to access the data.

This method works, but it's inefficient and verbose. Surely there has to be a better way....

## Accessing the repository using the `ckanapi` library

The `ckanapi` library is a simple, powerful Python wrapper for programmatically communicating with the CKAN API.

Install it with `pip install ckanapi` or `conda install -c conda-forge ckanapi`.

In [None]:
%pip install ckanapi
import ckanapi

Let's import the library and execute another search for the Byrd 1999 data.

In [None]:
from ckanapi import RemoteCKAN
amrdc_repository = RemoteCKAN('https://amrdcdata.ssec.wisc.edu/')

In [None]:
byrd_aws_1999_datasets2 = amrdc_repository.action.package_search(q="Byrd 1999")['results']

That's it! The library returns the response as a list of dictionaries, this time containing all of the metadata. Let's make sure we got the same response as last time.

In [None]:
[dataset['title'] for dataset in byrd_aws_1999_datasets2]

['Byrd Automatic Weather Station, 1999 unmodified ten-minute observational data.',
 'Byrd Automatic Weather Station, 1999 Reader format three-hour observational data.']

Looks good! Let's see how we can leverage the API for other use cases.