## Overview

This notebook is used to:

* Request PyCCD execution
* Retrieve results directly from Cassandra
* Check on progress (or failures)
* Produce JSON files

### Chip Points in a Tile

In order to request or retrieve change results, you will need a range of points. The following function generates a row-major list of points that can be used to request, retrieve, and produce JSON files.

In [1]:
def make_points(start_x, start_y):
    xr = range(start_x, start_x+(30*5000),  (30*100))
    yr = range(start_y, start_y-(30*5000), -(30*100))
    return [(x,y) for y in yr for x in xr]

### Requesting Execution

To execute an algorithm, you make an HTTP request for each point. **You will need to change this url.**

In [2]:
# CHANGE THIS URL...
URL = "http://lcmap-test.cr.usgs.gov/results/{algorithm}/{x}/{y}?refresh={refresh}"

# ...or else this won't work!
def request_execution(algorithm, x, y, refresh=False):
    url = URL.format(x=x, y=y, refresh=refresh)
    return requests.get(url).json()

These are some different upper-left coordinates for ARD tiles.

In [12]:
# x,y = -1815585, 3014805 # H05V02
x,y = -1815585, 2864805 # H05V03
# x,y = -1965585, 3014805 # H04V02
# x,y = -1965585, 2864805 # H04V03

points = make_points(x,y)

Now you can request execution using that set of points.

In [14]:
algorithm = 'lcmap-pyccd:1.4.0rc1'
# results = [request_execution(algorithm, x,y) for (x,y) in points]

### Retrieving Data

You can retrieve results from the Clownfish REST API or you can access Cassandra directly. Direct access to Cassandra avoid the overhead of HTTP. In general, this is not a good idea, but we know what we're doing... ;-) ...so it's ok.

In [6]:
from cassandra.cluster import Cluster, ExecutionProfile
from cassandra.auth import PlainTextAuthProvider
from cassandra.query import ordered_dict_factory

This function creates a session, used to execute CQL.

In [7]:
def setup_session(username, password, hosts, keyspace):
    auth_provider = PlainTextAuthProvider(username=username, password=password)
    cluster = Cluster(hosts, auth_provider=auth_provider)
    cluster.add_execution_profile("quorum_profile", ExecutionProfile(consistency_level='quorum'))
    session = cluster.connect(keyspace)
    session.row_factory = ordered_dict_factory
    return session

You will need credentials. If you commit changes to the notebook, please remove these values.

In [8]:
# Think.
username=''
password=''
hosts=[]
keyspace=''

Create a session.

In [9]:
session = setup_session(username, password, hosts, keyspace)

Here are some more utility functions for _counting_ results.

In [16]:
## This is used to count results for a chip. Useful for determining where algorithm execution failed to complete.

status_result = session.prepare("SELECT count(result_ok) as count FROM results WHERE chip_x=? AND chip_y=? AND algorithm=?")

def count_results(x, y, algorithm):
    result = session.execute(status_result, (x, y, algorithm))
    return (x,y,algorithm,result[0]['count'])

### Counting Results

You may find it useful to count the number of results produced for a specific chip. You can do this in parallel...

In [None]:
from multiprocessing.dummy import Pool as ThreadPool 
pool = ThreadPool(4)
these = [(x,y,algorithm) for (x,y) in points]
counts = pool.starmap(count_results, these)

You can count the number of complete, incomplete, and missing results like this:

In [505]:
complete   = [(x,y) for (x,y,a,c) in counts if (c == 10000)]
incomplete = [(x,y) for (x,y,a,c) in counts if (c < 10000 and c > 0)]
missing    = [(x,y) for (x,y,a,c) in counts if (c == 0)]
len(complete) + len(incomplete) + len(missing)

2500

You can request re-execution of missing or incomplete chips like this:

In [507]:
# Commented out to prevent a lot of unintentional work...

# If you're just evaluating cells if you come to them, this could 
# end up queueing a substantial amount of work.

# [request_execution(algorithm, x, y, True) for (x,y) in missing]

### Saving Results as JSON

JSON results are used to build information products.

In [494]:
## These two functions are used to produce JSON files.

import json
from datetime import datetime

def json_serial(obj):
    """JSON serializer for objects not serializable by default json code"""

    if isinstance(obj, datetime):
        serial = obj.isoformat()
        return serial
    raise TypeError ("Type not serializable")

entire_result = session.prepare("SELECT * FROM results WHERE chip_x=? AND chip_y=? AND algorithm=?")

def save_chip(path_template, x, y, algorithm):
    path = path_template.format(x=x,y=y)
    result = session.execute(entire_result, (x, y, al))
    with open(path.format(x=x,y=y),'w') as outfile:
        json.dump(list(result), outfile, default=json_serial)
    return path

Once you are ready to save data, you can use this code to retrieve an entire chip of data and save it to a JSON file.

In [508]:
save_pool  = ThreadPool(4)
path_h5_v3 = '/data2/jmorton/pyccd-results/H05V03/{x}_{y}.json'
save_these = [(path_h5_v3, x, y, al) for (x,y) in complete]
# results    = save_pool.starmap(save_chip, save_these)