In [None]:
# We don't technically need this but it avoids a warning when importing pysis
import os
os.environ['ISISROOT'] = '/usgs/cpkgs/anaconda3_linux/envs/isis3.9.0'

# AutoCNet Intro
AutoCNet is a suite of functions that parallelize network generation and analyze the health of networks. You have already seen how AutoCNet can be used to analyze the health of networks, so now we will explore the parallelized network generation. 

The process of creating a network in AutoCNet is very similar to the general workflow of ISIS, it consists of the following steps:
- [Load and apply configuration file for AutoCNet's associated services](#configuration)
- [Ingest images to process and calculate corresponding overlaps](#ingest)
- [Distribute points in overlaps](#distribute)
- [Subpixel register relative network](#registration)
- [Jigsaw - done in ISIS]

The largest deviations from ISIS is in HOW AutoCNet goes through these steps. AutoCNet is structured to take advantage of elementwise cluster processing (these elements can be images, points, measures, etc.) and postgresql for data storage and quick relational querying. 

### Grab the Image Data
We are going to process the area of the moon surrounding a Lunar Swirl named Reiner Gamma using Kaguya Terrian Camera (TC) images. Reiner Gamma is centered at (7.40, -58.80); an area for 4.9&deg; - 9.9&deg; N Planetocentric Latitude and 61.3&deg; - 56.3&deg; W Longitude was selected. The data is located in '', please use the cell below to copy the data into a directory of your choosing.

In [None]:
output_directory = ________ # put output directory path as string here

# create the output directory with a subdirectory 'updated' where the images will be copied to
!mkdir -p $output_directory/updated/ 

In [None]:
# copy over the data to the 'updated' subdirectory
!cp -p /scratch/ladoramkershner/kaguya/workshop/original/*cub $output_directory/updated/

We need to create a list of the cubes, to feed into AutoCNet. It is important that the cube list handed to AutoCNet contain **absolute** paths, as they will serve as an accessor for loading information from the cubes later.

In [None]:
!ls $output_directory/updated/*cub > $output_directory/cubes_updated.lis
!head $output_directory/cubes_updated.lis

<a id='configuration'></a>
# Configuration
AutoCNet leverages services for cluster processing and data presistance that require configuration parameters to setup properly. For the cluster processing, AutoCNet uses a module called [redis](https://redis.io/) to create queues for the cluster jobs waiting to be dispatched to the cluster, a cluster for the computations, and conda environments activated during the cluster jobs to give the cluster access to the appropriate packages. Then for the database presistence AutoCNet needs to configure the database and the spatial information the database will use when generating the geometries.

In summary, AutoCNet needs configuration parameters for the following servives:
- redis
- cluster
- env (short for conda environment)
- database
- spatial

### Parse the Configuration File
The configuration parameters are typically held in a configuration yaml file. A configuration file has been compiled for use internal to the USGS ASC facilities leveraging a shared cluster and database. Use AutoCNet's function 'parse_config' to read in the yaml file and output a dictionary variable.

In [None]:
from autocnet.config_parser import parse_config

config_path = ________ # put path to config file as string here
config = parse_config(config_path)

The config is a nested dictionary, meaning it has a larger dictionary structure defining sections for the services above and then each service section is a dictionary defining the particular configuration parameters.

In [None]:
import numpy as np 

print('configuration dictionary keys: ')
print(np.vstack(list(config.keys())), '\n')

print('cluster configuration dictionary keys: ')
print(np.vstack(list(config['cluster'].keys())))

Although the configuration file is set up for internal use, please alter the following fields to point to user specific areas or unique strings:
- cluster
    - cluster_log_dir
    - tmp_scratch_dir

In [None]:
config['cluster']['cluster_log_dir'] # show original value

In [None]:
config['cluster']['cluster_log_dir'] = ________ # edit cluster log directory

config['cluster']['cluster_log_dir'] # confirm updated value

- database
    - name

In [None]:
config['database']['name'] # show original value

In [None]:
config['database']['name'] = ________ # edit database name

config['database']['name'] # confirm updated value

- redis
    - basename
    - completed_queue
    - processing_queue
    - working_queue

In [None]:
print(config['redis']['basename'])
print(config['redis']['completed_queue'])
print(config['redis']['processing_queue'])
print(config['redis']['working_queue']) # show original values

In [None]:
# edit queue names


In [None]:
print(config['redis']['basename'])
print(config['redis']['completed_queue'])
print(config['redis']['processing_queue'])
print(config['redis']['working_queue']) # confirm updated values

<a id='ingest'></a>
# Ingest Image Data into AutoCNET
Networks within AutoCNet are represented with an object created from a NetworkCandidateGraph (NCG) class. The NCG class is structured like an [undirected graph](https://en.wikipedia.org/wiki/Graph_%28discrete_mathematics%29), containing nodes and edges. 

A node in our graph is synonymous with an image. The node (image) stores path information, which serves as an accessor to the on-disk data set (this is why the absolute path was important in our cube list), and correspondences information that references the image. 

An edge in our graph represent the overlap relationship between two nodes (images); it contains information such as the source and destination image ids (and associated node ids), the overlap dimensions, and points/measures shared between the two nodes (images).

### Create the NetworkCandidateGraph
The NetworkCandidateGraph (NCG) class can be instantiated to an object without any arguments. However, this NCG object requires configuration before it can be used for any meaningful work, so we have to run 'config_from_dict'.

In [None]:
from autocnet.graph.network import NetworkCandidateGraph

ncg = NetworkCandidateGraph()
ncg.config_from_dict(config)

In [None]:
ncg.clear_db()

### Ingest Image Data and Calculate Overlaps
At this point our ncg variable is empty, so if we try to plot the contents we will get an empty plot. 

In [None]:
ncg.plot()

We need to load the images into the ncg using 'add_from_filelist', which loads the images from the passed in list and then calculates the overlaps.

In [None]:
filelist = f'{output_directory}/cubes_updated.lis' # this should contain absolute paths

# How long will this function take?
ncg.add_from_filelist(filelist) 

Now when we plot the ncg, we see the undirected graph, where the circles are the nodes/images and the lines are the edges/overlaps. The Kaguya TC data has a very regular overlap pattern in this area, seen by the large number of edges shared between nodes.

In [None]:
ncg.plot()

We have access to the image data through the ncg, but the ncg does not presist after the notebook is shut down. To presist the network, AutoCNet leverages a database for the storage of the networks images, points, and measures. The ncg has access to this database through the ncg's 'session_scope'. Through the session_scope you can interact and execute queries on your database in pure SQL.

In [None]:
with ncg.session_scope() as session:
    img_count = session.execute("SELECT COUNT(*) FROM images").fetchall()
    print('Number of images in database: ', img_count)

This method of using session.execute() can be inconvenient if working with the actual data contained within the tables. For example, to access certain information you need to know the index where that information exists.

In [None]:
with ncg.session_scope() as session:
    img = session.execute("SELECT * FROM images LIMIT 1").fetchall()
    print('image index: ', img[0][0])
    print('product id: ', img[0][1])
    print('image path: ', img[0][2])
    print('image serial number: ', img[0][3])
    print('image ignore flag: ', img[0][4])
#     print('image geom: ', img[0][5]) # only uncomment after looking at other output
    print('image camera type: ', img[0][7])

However, if the structure of the database changes (order of the columns or a column is added/removed) or your cannot remember the order of the columns, working with the database data in this way would be very inconvenient. So AutoCNet built models for each table of the database tables to help interface with them.

In [None]:
from autocnet.io.db.model import Images, Measures, Overlay, Points
import matplotlib.pyplot as plt

with ncg.session_scope() as session:
    img = session.query(Images).first()
    print('image index: ', img.id)
    print('product id: ', img.name)
    print('image path: ', img.path)
    print('image serial number: ', img.serial)
    print('image ignore flag: ', img.ignore)
#     print('image geometry: ', img.geom) # only uncomment after looking at other output
    print('image camera type: ', img.cam_type)

Accessing the information off of the img object is more intuitive as it is field based instead of index based. Additionally, if you uncommented the geom prints (in the two previous cells) you saw that the database geometry is stored as a binary string while the Images.geom field is a shapely Multipolygon, which has more directly accessible latitude, longitude information.

In [None]:
n = 25
with ncg.session_scope() as session:
    imgs = session.query(Images).limit(n)
    
    fig, axs = plt.subplots(1, 1, figsize=(5,10))
    axs.set_title(f'Footprints of First {n} Images in Database')
    for img in imgs:
        x,y = img.geom.envelope.boundary.xy
        axs.plot(x,y)

<a id="distribute"></a>
# Place Points in Overlap
The next step in the network generation process is to lay down points in the image overlaps. We are going to use the 'place_points_in_overlap' function to lay the points down. This function first evenly distributes points spatially into a given overlap, then it back-projects the points into the 'top' image. Once in image space, the function searches the area surrounding the measures to find interesting features to shift the measures to (this increases the chance of subpixel registration passing). The shifted measures are projected back to the ground are these updated longitudes and latitudes are used to propagate the points into all images associated with the overlap. So, this function requires:
- An overlap (to evenly distribute points into)
- Distribution kwargs (to decide how points are distributed into the overlap)
- Camera type (so it knows what to expect as inputs/output for the camera model)
- Size of the area around the measure (to search for the interesting feature)

For now we will use the default size and distribution arguments, but we need to change our camera type from the default 'csm' to 'isis'. 

Since this function operates independently on each overlap, it is ideal for paralleization with the cluster. Before dispatching the function to the cluster we need to make the log directory from our configuration file. If a SLURM job is submitted with a log directory argument that does not exist the job will fail.

In [None]:
import os

log_dir = config['cluster']['cluster_log_dir']
print('creating directory: ', log_dir)

if not os.path.exists(log_dir):
    os.mkdir(log_dir)

We also need to consider the arguments for applying the function to the cluster. The application of the function to the cluster is done with the ncg.apply function. Using the apply arguments we can determine how long we want to allow the job to run (walltime), how many jobs we want running at once (arraychunk), where we want our log outputs to be directed to (log_dir), etc. These inpute mirror some of the input options you would see with an SBATCH submission.

In [None]:
ncg.apply?

So all together the submission for the place_point_in_overlap jobs is...

In [None]:
from autocnet.spatial.overlap import place_points_in_overlap

njobs = ncg.apply('spatial.overlap.place_points_in_overlap', 
                  on='overlaps', # start of function kwargs
                  cam_type='isis',
                  size=71,
                  walltime='00:20:00', # start of apply kwargs
                  log_dir=log_dir,
                  arraychunk=100)
print(njobs)

Notice that we are not passing in a single overlap to the apply call, instead we pass "on = 'overlaps'". The 'on' argument indicates which element (image, overlap, point, measure) to apply the function; ncg.apply will now apply the function to all of the overlaps in the database. You can check on the progress of your jobs using the slurm 'squeue' command with the -u (user) flag

In [None]:
uid = ________ # put jobid int here
!squeue -u $uid | wc -l 
!squeue -u $uid | head

As jobs are put on the cluster, their corresponding log files are created. You can check how many jobs have been/ are being processed on the cluster by looking in the log directory.

In [None]:
!ls $log_dir | wc -l

As more logs are placed in the log directory, you will have to specify which array job's logs you are checking on. The naming convention of the log files generated by autocnet are 'path.to.function.function_name-jobid.arrayid_taskid.out'

In [None]:
!ls $log_dir | head

So, you can look at a specific array jobs by doing...

In [None]:
jobid = ________ # put jobid int here
! ls $log_dir/*$jobid* | wc -l

Sometimes jobs fail to submit to the cluster, it is prudent to check the ncg queue before moving on to other cluster jobs.

In [None]:
redis_orphans = ncg.queue_length
print("jobs left on the queue: ", redis_orphans)

When reapplying a function to the cluster, you do not need to resubmit the function arguments, because those were already serialized into the queue message. However, the cluster submission arguments can be reformatted and the 'reapply' argument should be set to 'True'.

In [None]:
# njobs = ncg.apply('spatial.overlap.place_points_in_overlap', 
#                         chunksize=redis_orphans,
#                         arraychunk=None,
#                         walltime='00:20:00',
#                         log_dir=log_dir,
#                         reapply=True)
# print(njobs)

One advantage of using of a database for data storage is that it allows for storage of and therefore quick access of geometries and how those geometries relate with other elements geometries.

In [None]:
from autocnet.io.db.model import Overlay, Points, Measures
from geoalchemy2 import functions
from geoalchemy2.shape import to_shape


with ncg.session_scope() as session:
    results = (
        session.query(
        Overlay.id, 
        Overlay.geom.label('ogeom'), 
        Points.geom.label('pgeom')
        )
        .join(Points, functions.ST_Contains(Overlay.geom, Points.geom)=='True')
        .all()
    )
    print('number of points: ', len(results))
    
    fig, axs = plt.subplots(1, 1, figsize=(10,10))
    axs.grid()
    
    oid = []
    for res in results:
        if res.id not in oid:
            oid.append(res.id)
            ogeom = to_shape(res.ogeom)
            ox, oy = ogeom.envelope.boundary.xy
            axs.plot(ox, oy, c='k')      
        pgeom = to_shape(res.pgeom)
        px, py = pgeom.xy
        axs.scatter(px, py, c='grey')
        

Notice that the points are not in straight lines, this is because of the shifting place_points_in_overlaps does to find interesting measure locations. The distribution of points in the overlaps looks dense in the EW, so lets try rerunning place_points_in_overlap, altering the distribution kwargs.

Before rerunning place_point_in_overlap, the points and measures tables need to be cleared using ncg's 'clear_db' method.

In [None]:
from autocnet.io.db.model import Measures
with ncg.session_scope() as session:
    npoints = session.query(Points).count()
    print('number of points: ', npoints)
    
    nmeas = session.query(Measures).count()
    print('number of measures: ', nmeas)

In [None]:
ncg.clear_db(tables=['points', 'measures'])

In [None]:
from autocnet.io.db.model import Measures
with ncg.session_scope() as session:
    npoints = session.query(Points).count()
    print('number of points: ', npoints)
    
    nmeas = session.query(Measures).count()
    print('number of measures: ', nmeas)

The distribution argument for place_points_in_overlap requires two function inputs. Since overlaps are variable shapes and sizes, integers are not suffecient to determine proper gridding of all overlaps. Instead the distribution of points along the N to S edge of the overlap and the E to W edge of the overlap are determined based on the edge's length and a grid is built from tgese edge distributions. This way a shorter edge will recieve less points and a longer side will recieve more points.

The default distribution functions are: <br />
nspts_func=lambda x: ceil(round(x,1)\*10) <br />
ewpts_func=lambda x: ceil(round(x,1)\*5) <br />

** NOTICE THE NS ACTUALLY GETS USED ON THE LONGER SIDE OF THE OVERLAP, NOT NECESSARILY THE NS SIDE**

In [None]:
def ns(x):
    from math import ceil # this import has to be in the function defintion for cluster processing
    return ceil(round(x,1)*7)

def ew(x):
    from math import ceil # this import has to be in the function defintion for cluster processing
    return ceil(round(x,1)*5)

distribute_points_kwargs = {'nspts_func':ns, 'ewpts_func':ew, 'method':'classic'}

In [None]:
njobs = ncg.apply('spatial.overlap.place_points_in_overlap', 
                  on='overlaps', # start of function kwargs
                  distribute_points_kwargs=distribute_points_kwargs, # NEW LINE
                  cam_type='isis',
                  size=71,
                  walltime='00:20:00', # start of apply kwargs
                  log_dir=log_dir,
                  arraychunk=100)
print(njobs)

Check the progress of your jobs

In [None]:
!squeue -u $uid | wc -l
!squeue -u $uid | head

Count number of jobs started by looking for generated logs

In [None]:
jobid = ________ # put jobid int here
! ls $log_dir/*$jobid* | wc -l

Check to see if the ncg redis queue is clear

In [None]:
redis_orphans = ncg.queue_length
print("jobs left on the queue: ", redis_orphans)

Reapply cluster job if there are still jobs left on the queue

In [None]:
# njobs = ncg.apply('spatial.overlap.place_points_in_overlap', 
#                         chunksize=redis_orphans,
#                         arraychunk=None,
#                         walltime='00:20:00',
#                         log_dir=log_dir,
#                         reapply=True)
# print(njobs)

Visualize the new distribution

In [None]:
from autocnet.io.db.model import Overlay, Points, Measures
from geoalchemy2 import functions
from geoalchemy2.shape import to_shape


with ncg.session_scope() as session:
    results = (
        session.query(
        Overlay.id, 
        Overlay.geom.label('ogeom'), 
        Points.geom.label('pgeom')
        )
        .join(Points, functions.ST_Contains(Overlay.geom, Points.geom)=='True')
        .all()
    )
    print('number of points: ', len(results))
    
    fig, axs = plt.subplots(1, 1, figsize=(10,10))
    axs.grid()
    
    oid = []
    for res in results:
        if res.id not in oid:
            oid.append(res.id)
            ogeom = to_shape(res.ogeom)
            ox, oy = ogeom.envelope.boundary.xy
            axs.plot(ox, oy, c='k')      
        pgeom = to_shape(res.pgeom)
        px, py = pgeom.xy
        axs.scatter(px, py, c='grey')
        

<a id="registration"></a>
# Subpixel Registration
After laying down points, the next step is to subpixel register the measures, to do this we are going to use the 'subpixel_register_point' function. As the name suggests, 'subpixel_register_point' registers the measures on a single point, which makes it another great candidate for parllelization. 

This function chooses a reference measure, affinely transforms the other images to the reference image, and clips an image chip out of the reference image and a template chip out of the transformed images. The template chips are marched across the image chip and the maximum correlation (method defined by autocnet.matcher.naive_template.pattern_match 'metric' kwarg) value and location is saved. 

The solution is then evaluated to see if the maximum correlation solution is acceptable. The evaluation is done using the 'cost_func' and 'threshold' arguments. The cost_func is dependent two independent variables, the first is the distance that a point has shifted from the original, sensor identified intersection, and the second is the correlation coefficient coming out of the template matcher. The __order__ that these variables are passed in __matters__. If the cost_func solution is greater than the threshold value, the registration is successful and the point is updated. If not, the registration is unsuccessful, the point is not updated and is set to ignore.

So, 'subpixel_register_point' requires the following arguments:
- pointid
- subpixel_template_kwargs
- cost_func 
- threshold


In [None]:
from autocnet.matcher.subpixel import subpixel_register_point

subpixel_register_point?

## First Run
We are not going to consider the distance the measures were moved in this workflow and just look at the maximum correlation value returned by the matcher. 

In [None]:
subpixel_template_kwargs = {'image_size':(121,121), 'template_size':(61,61)} 

njobs = ncg.apply('matcher.subpixel.subpixel_register_point', 
                  on='points', # start of function kwargs
                  subpixel_template_kwargs=subpixel_template_kwargs,
                  version='simple',
                  cost_func=lambda x,y:x*0+y,
                  threshold=0.6, 
                  walltime="00:30:00", # start of apply kwargs
                  log_dir=log_dir,
                  arraychunk=200,
                  chunksize=5000) # maximum chunksize = 20,000

print(njobs)

Check the progress of your jobs

In [None]:
! squeue -u $uid | wc -l
! squeue -u $uid | head

Count number of jobs started by looking for generated logs

In [None]:
jobid = ________ # put jobid int here
! ls $log_dir/*$jobid* | wc -l

Check to see if the ncg redis queue is clear

In [None]:
redis_orphans = ncg.queue_length
print("jobs left on the queue: ", redis_orphans)

Reapply cluster job if there are still jobs left on the queue

In [None]:
# job_array = ncg.apply('matcher.subpixel.subpixel_register_point', 
#                       reapply=True,
#                       chunksize=redis_orphans, 
#                       arraychunk=None,
#                       walltime="00:30:00",
#                       log_dir=subpix1_log_dir)
# print(job_array)

### Visualize Point Registration

Pick a point id to visualize

In [None]:
pid = 265

Run visualization

In [None]:
from autocnet.io.db.model import Images
from plio.io.io_gdal import GeoDataset
from autocnet.transformation import roi
from autocnet.utils.utils import bytescale

with ncg.session_scope() as session:
    source = session.query(Measures, Images).join(Images, Measures.imageid==Images.id).filter(Measures.pointid==pid, Measures.template_metric==1).all()
    s_img = GeoDataset(source[0][1].path)
    sx = source[0][0].sample
    sy = source[0][0].line
    
    destination = session.query(Measures, Images).join(Images, Measures.imageid==Images.id).filter(Measures.pointid==pid, Measures.template_metric!=1).limit(1).all()
    d_img = GeoDataset(destination[0][1].path)
    dx = destination[0][0].sample
    dy = destination[0][0].line
    
    
image_size = (121,121)
template_size = (61,61)
s_roi = roi.Roi(s_img, sx, sy, size_x=image_size[0], size_y=image_size[1])
s_image = bytescale(s_roi.clip())

d_roi = roi.Roi(d_img, dx, dy, size_x=image_size[0], size_y=image_size[1])
d_template = bytescale(d_roi.clip())

fig, axs = plt.subplots(1, 2, figsize=(20,10));
axs[0].imshow(s_image, cmap='Greys');
axs[0].set_title('Reference');
axs[1].imshow(d_template, cmap='Greys');
axs[1].set_title('Template');

## Second run
We are going to rerun the subpixel registration with larger chips to attempt to register the measures that failed first run. 'subpixel_register_point' is set up so subsequent runs can use filters which only runs the function on points with a certain property value (e.g.: points where ignore=true). It can also be rerun on all points, if this is done AutoCNet checks for a previous subpixel registration result, if the new result is better the point is updated, if the previous result is better the point is left alone.

In [None]:
subpixel_template_kwargs = {'image_size':(221,221), 'template_size':(81,81)} 
# filters = {'ignore': 'true'}

njobs = ncg.apply('matcher.subpixel.subpixel_register_point', 
                  on='points', # start of function kwargs
#                   filters=filters,
                  subpixel_template_kwargs=subpixel_template_kwargs,
                  version='simple',
                  cost_func=lambda x,y:x*0+y,
                  threshold=0.6, 
                  walltime="00:30:00", # start of apply kwargs
                  log_dir=log_dir,
                  arraychunk=100,
                  chunksize=5000) # maximum chunksize = 20,000


print(njobs)

Check the progress of your jobs

In [None]:
! squeue -u $uid | wc -l
! squeue -u $uid | head

Count number of jobs started by looking for generated logs

In [None]:
jobid = ________ # put jobid int here
! ls $log_dir/*$jobid* | wc -l

Check to see if the ncg redis queue is clear

In [None]:
redis_orphans = ncg.queue_length
print("jobs left on the queue: ", redis_orphans)

Reapply cluster job if there are still jobs left on the queue

In [None]:
# njobs = ncg.apply('matcher.subpixel.subpixel_register_point', 
#                   reapply = True,
#                   walltime="00:30:00",
#                   log_dir='/scratch/ladoramkershner/mars_quads/oxia_palus/subpix2_logs/',
#                   arraychunk=50,
#                   chunksize=20000) # maximum chunksize = 20,000

# print(njobs)

### subpix2: Write out Network
At this point you write out the network to begin work bundling the network!

In [None]:
cnet = 'reiner_gamma_morning_ns7_ew5_t121x61_t221x81.net'
ncg.to_isis(os.path.join(output_directory,cnet))

# Appendix
[Loading Data from a Populated Database](#from_database)

<a id="from_database"></a>
### Loading Data from a Populated Database
If the database is already created, you can access the information using 'from_database'. You do not need to run this cell now, this is included in case the notebook fails or your connection is lost and you need to reload the NCG.

In [None]:
from autocnet.graph.network import NetworkCandidateGraph
from autocnet.config_parser import parse_config

config_path = ________ # put path to config file as string here
config = parse_config(config_path)

ncg = NetworkCandidateGraph()
ncg.config_from_dict(config)


ncg.from_database()

In [None]:
ncg.plot()

When reloading the NCG it is useful to check the redis queue associated with the NCG and clean out any jobs that were left over during the notebook failure.

In [None]:
print('Cluster queue length:')
print('before -> ', ncg.queue_length)
print('Cleaning up cluster queue:')
ncg.queue_flushdb()
print('after  -> ', ncg.queue_length)