# Useful Modules for Synapse Pipelines
In running the service and making sure everything works with AWS and is integrated with the NeuroData stack, we were required to write a lot of boiler-plate code to interface with these different tools. We will cover these modules in this notebook. Look at the example pipeline (nomads_unsupervised) to see the different modules we provide.

The following describes the various modules:
1. nd_boss.py - For pushing results to BOSS
2. NeuroDataResource.py - For pulling data from BOSS
3. driver.py - Integrating the pipeline with AWS
4. pymeda_driver.py - Integrating pipeline with PyMeda
5. nomads.py - the actual algorithm (we won't go over this here)

## nd_boss

The main function in this module is boss_push which pushes a numpy array to the BOSS. To run this function, you need to provide:
1. token - BOSS API Token (needs permissions to push to col and exp)
2. col - the collection you are pushing to
3. exp - the experiment you are pushing to
4. z_range, y_range, x_range - location of cutout that you are pushing to (better we same size as actual cutout)
5. data_dict - dictionary where each key is a channel name and value is numpy array to be pushed
6. results_key - idenitfier that contains metadata on the results you are pushing

Note that the channel name that is ultimately pushed to BOSS is results_key + key of dictionary. That way metadata and different results from the pipeline (i.e. Gaba detections numpy array, Glut detections numpy array) can all be pushed.

In [None]:
def boss_push(token,
              col,
              exp,
              z_range,
              y_range,
              x_range,
              data_dict,
              results_key):
    dtype = "uint8"
    config_dict = {"token": token, "host": "api.boss.neurodata.io" , "protocol": "https"}
    remote = create_boss_remote(config_dict)
    links_dict = {}

    for key, data in data_dict.items():
        data = data.astype(np.uint8)
        np.putmask(data, data>0, 255)
        channel = results_key + "_" + key
        print(data.shape)
        z, y, x = data.shape

        channel_resource = ChannelResource(channel, col, exp, 'image', '', 0, dtype, 0)
        print("Pushing to BOSS...")

        for z in range(z_range[0],z_range[1]):
            print(z)
            try:
                old_channel = remote.get_project(channel_resource)
                remote.create_cutout(old_channel, 0, (x_range[0],x_range[1]), (y_range[0],y_range[1]), (z,z+1), data[z-z_range[0]].reshape(-1,data[z-z_range[0]].shape[0],data[z-z_range[0]].shape[1]))
            except:
                channel_resource = ChannelResource(channel, col, exp, 'image', '', 0, dtype, 0)#, sources = ["em_clahe"])
                new_channel = remote.create_project(channel_resource)
                remote.create_cutout(new_channel, 0, (x_range[0],x_range[1]), (y_range[0],y_range[1]), (z,z+1), data[z-z_range[0]].reshape(-1,data[z-z_range[0]].shape[0],data[z-z_range[0]].shape[1]))


        links_dict["All Predictions"] = ("http://ndwt.neurodata.io/channel_detail/{}/{}/{}/").format(col, exp, channel)
        print("Pushed {} to Boss".format(channel))
    return links_dict

## NeuroDataResource
This is just a lightweight intern wrapper. We won't go over the code here since there shouldln't be a need to change it for the most part.

## driver
This is the main file that is used when AWS Batch launches the job! The majority of the file are just a series of functions that nomads_unsupervised uses to run properly. We will not go over these here since they are pipline specific. Below are more useful ones:

In [None]:
# pull data from BOSS
def get_data(host, token, col, exp, z_range, y_range, x_range):
    print("Downloading {} from {} with ranges: z: {} y: {} x: {}".format(exp,
                                                                         col,
                                                                         str(z_range),
                                                                         str(y_range),
                                                                         str(x_range)))
    resource = NeuroDataResource(host, token, col, exp)
    data_dict = {}
    for chan in resource.channels:
        data_dict[chan] = resource.get_cutout(chan, z_range, y_range, x_range)
    return data_dict, resource.voxel_size

This get_data function just wraps NeuroDataResource to grab a data dictionary where each key is channel and each value is numpy array.

In [None]:
def upload_results(path, results_key):
    client = boto3.client('s3')
    s3 = boto3.resource('s3')
    s3_bucket_exists_waiter = client.get_waiter('bucket_exists')
    bucket = client.create_bucket(Bucket="nomads-unsupervised-results")
    s3_bucket_exists_waiter.wait(Bucket="nomads-unsupervised-results")

    bucket = s3.Bucket("nomads-unsupervised-results")
    bucket.Acl().put(ACL='public-read')
    files = glob.glob(path+"*")
    for file in files:
        key = results_key + "/" + file.split("/")[-1]
        client.upload_file(file, "nomads-unsupervised-results", key)
        response = client.put_object_acl(ACL='public-read', Bucket="nomads-unsupervised-results", \
        Key=key)
    return

This function pushes results to S3. Note how it does this: it expects all the pipeline outputs (prediction files, PyMeda html) to be pushed to a path. This function then just grabs all the files in path, and pushes them to the proper S3 folder.

The S3 folder is labeled with results_key which is used in both the service and in naming channels pushed to BOSS through nd_boss (the results_key parameter in boss_push). This key is:

In [None]:
results_key = "_".join(["nomads-unsupervised", col, exp, "z", str(z_range[0]), str(z_range[1]), "y", \
    str(y_range[0]), str(y_range[1]), "x", str(x_range[0]), str(x_range[1])])

There is one really important thing to note about driver.py. For now, we are running only one command in AWS Batch to make our lives easier. This means only one function can be called! We accomplish this by calling "python3 driver.py" in our actual job. This triggers the main function which runs the following as shown below:

In [None]:
if __name__ == "__main__":
    parser = argparse.ArgumentParser(description='NOMADS and PyMeda driver.')
    parser.add_argument('--host', required = True, type=str, help='BOSS Api host, do not include "https"')
    parser.add_argument('--token', required = True, type=str, help='BOSS API Token Key')
    parser.add_argument('--col', required = True, type=str, help='collection name')
    parser.add_argument('--exp', required = True, type=str, help='experiment name')
    parser.add_argument('--z-range', required = True, type=str, help='zstart,zstop   NO SPACES. zstart, zstop will be casted to ints')
    parser.add_argument('--y-range', required = True, type=str, help='ystart,ystop   NO SPACES. ystart, ystop will be casted to ints')
    parser.add_argument('--x-range', required = True, type=str, help='xstart,xstop   NO SPACES. xstart, xstop will be casted to ints')
    args = parser.parse_args()

    z_range = list(map(int, args.z_range.split(",")))
    y_range = list(map(int, args.y_range.split(",")))
    x_range = list(map(int, args.x_range.split(",")))

    driver(args.host, args.token, args.col, args.exp, z_range, y_range, x_range)
    

def driver(host, token, col, exp, z_range, y_range, x_range, path = "./results/"):

    print("Starting Nomads Unsupervised...")
    info = locals()
    data_dict, voxel_size = get_data(host, token, col, exp, z_range, y_range, x_range)

    results = run_nomads(data_dict)
    results = results.astype(np.uint8)
    np.putmask(results, results, 255)

    results_key = "_".join(["nomads-unsupervised", col, exp, "z", str(z_range[0]), str(z_range[1]), "y", \
    str(y_range[0]), str(y_range[1]), "x", str(x_range[0]), str(x_range[1])])

    pickle.dump(results, open(path + "nomads-unsupervised-predictions" + ".pkl", "wb"))
    print("Saved pickled results (np array) {} in {}".format("nomads-unsupervised-predictions.pkl", path))

    print("Generating PyMeda Plots...")

    norm_data = load_and_preproc(data_dict)
    try:
        pymeda_driver.pymeda_pipeline(results, norm_data, title = "PyMeda Plots on All Predicted Synapses", path = path)
    except:
        print("Not generating plots for all synapses, no predictions classified as Gaba")
    print("Uploading results...")
    #results = pickle.load(open("./results/nomads-unsupervised-predictions.pkl", "rb"))

    boss_links = boss_push(token, "collman_nomads", "nomads_predictions", z_range, y_range, x_range, {results_key: results}, results_key)
    with open('results/NDVIS_links.csv', 'w') as csv_file:
        writer = csv.writer(csv_file)
        for key, value in boss_links.items():
            writer.writerow([key, value])
    upload_results(path, results_key)

    return info, results, boss_links

**Important**: Note that there seems to be a lot going on. But all driver() function does is incorporate all the different steps that have to happen in the pipeline (pull data from BOSS, run algorithm, run PyMeda, push to BOSS, push to S3). When adding new pipelines, you should follow a similar format and just call "driver.py" during job submission.