## Ingesting Data

Ingest is the process of transforming ARD tiles into smaller chips of data. To trigger ingest, you HTTP PUT a `source` to the Aardvark REST API that contains the URL of the data and a checksum to verify its integrity.

In [117]:
import requests
import io
import string

## Building a Collection of Sources

ARD tiles are organized by individual area using an HXXVXX nomenclature. The directory should contain text file named `manifest.txt` with one or more lines that have an ID, a URL, and a checksum for an archive. The entries in a manifest are transformed into a collection of dictionaries using the `get_manifest` function.

In [118]:
def source(line):
    id, uri, checksum = line.split("\t")
    return {'id': id, 'uri': uri, 'checksum': checksum.strip() }

def get_manifest(url):
    res = requests.get(url)
    buffer = io.StringIO(res.text)
    return [source(line) for line in buffer]

def put_source(base_url, source):
    url = base_url.format(**source)
    return requests.put(url, source)

## Ingesting Sources

The `get_manifest` and `put_source` function are used together to trigger ingest. You will need to change the URL to the manifest and URL to the Aardvark REST API.

In [120]:
manifest_url = "https://edclpdsftp.cr.usgs.gov/downloads/collections/tiles-l2-20170403/manifest.txt"
sources = get_manifest(manifest)
source_url = "http://localhost:5678/source/{id}"
res = put_source(source_url, sources[0])

## Checking Progress

The progress of ingest for a single scene can be obtained by performing an HTTP GET request using the URL of the derived source. The response contains a list of `progress` entries sorted from oldest to newest that describe what has happened. You can use these to count the number of missing/pending/started/finished/failed sources.

In [121]:
def get_source(base_url, source):
    url = base_url.format(**source)
    return requests.get(url, source)

In [125]:
res = get_source(base_url, sources[22])