## Tutorial: Using NSDF for End-to-End Analysis of Scientific Data

In this tutorial, you will learn how to:

* Build a composable workflow integrating your application module with NSDF services for data visualization and analysis.
* Load and store intermediate data from/to public storage options like Dataverse and from/to private storage such as Seal Storage.
* Use the NSDF dashboard for data visualization, including zooming into data for detailed analysis, cropping subregions, and saving them locally in a Python-compatible format.

### Tutorial Overview

The workflow in Figure 1 uses a chain of tools and services to showcase the NSDF capabilities for studying a terrain parameter dataset generated with [GEOtiled] (URL) (a module of the [SOMOSPIE](https://github.com/TauferLab/SOMOSPIE) engine [cite]). The workflow comprises four steps.

**Step 1. Data Generation:** This initial step acquires data from the United States Geological Survey (USGS) and processes it with GEOtiled (as per Option A) or uploads it from either a public or private storage (as per Option B).

**Step 2. Conversion to IDX Format:** In this step, Tiff files containing terrain data from the first step are converted into IDX format using OpenVisus. This conversion process results in terrain parameter files that maintain the original accuracy but are more compact in size. These new, optimized files can be stored on either public or private storage solutions.

**Step 3. Visualization and Analysis:** The third stage uses the IDX files within OpenVisus for creating visualizations and conducting analysis, with the results displayed on an advanced 

**Step 4. Use the Dashboard:** The last step allows users to work on large-scale data, extract and save subregions of the original dataset, and study and extract features. 

Throughout these steps, the workflow incorporates the use of public storage services (Dataverse, CyVerse) and private storage providers (Sera Store, expand). These services are connected to the conversion, visualization, and analysis steps, highlighting their crucial role in this tutorial's data processing and transformation steps.

</br>
</br>

<figure align="center">
    <img src="files/docs/Openvisus-somospie.png" width="800">
    <figcaption>Figure 1. Tutorial steps.</figcaption>
</figure>

### Preparing your Environment 

The following cell prepares the environment for processing and visualizing geospatial data by importing various crucial libraries for workflow execution. Please note that running this cell might take some time. Upon completion, a message will be displayed to notify you that the cell execution has finished.

In [None]:
import geotiled as gt
from pathlib import Path
import glob
import os
import shutil
import multiprocessing
import OpenVisus as ov
import numpy as np
import requests
import json
from matplotlib import pyplot as plt
from tqdm import tqdm

# To silence a deprecation warning.
gt.gdal.UseExceptions()
# You have have successfully prepared your environment.
print("You have successfully prepared your environment.")

### Step 1. Data Generation

Step 1 provides two options for users to obtain data: Option A allow the users to  downloading raw data from a public archive, such as elevation data from the USGS archive, and then processing it using the user's application, in this instance, the GEOtiled module from the [SOMOSPIE] (URL) earth science application. Option B allows users to access data directly from existing public or private storage, in this case from [Dataverse] (URL) without the need for initial downloading and processing.


####  Option A: Generating Data Using the SOMOSPIE Application Module

We use the [SOMOSPIE](https://github.com/TauferLab/SOMOSPIE) application, specifically its GEOtiled module, to generate a high-resolution dataset of terrain parameters. This process involves using Digital Elevation Models (DEMs) from the USGS 3D Elevation Program as input. The GEOtiled module processes this data to produce three key terrain parameters: slope, aspect, and hillshading, all at a 10km resolution specifically for the state of Tennessee. The output consists of four TIFF format image files, each representing the calculated elevation, slope, aspect, and hillshading.

The following cell .... WRITE WHAT IT DOES

In [None]:
download_list = "./download_urls.txt"  # Where the list of download links will be stored
root_output_folder = "./files/tif_files/"  # root folder where geotiled will store data
n_tiles = 4  # Number of tiles that are generated for parameter computation
dem_tiles_dir_name = "tiles"  # Folder where downloaded DEM tiles will be saved
param_tiles_dir_name = (
    "elevation_tiles"  # Folder where computation tiles will be saved.
)
gcs_name = "gcs.tif"  # Name for the mosaicked DEM
pcs_name = "pcs.tif"  # Name for the projected DEM
shapefile = ["./files/shape_files/STATEFP_47.shp"]  # Shapefile for Visualization
region_bounding_box = {
    "xmin": -90.4,
    "ymin": 34.8,
    "xmax": -81.55,
    "ymax": 36.8,
}  # For `fetch_dem`. X=Longitude Y=Latitude. Determine bounding coordinates by looking at a map.

# You have have successfully
print("You have successfully WRITE WHAT IT IS DONE.")

The following cell .... WRITE WHAT IT DOES

In [None]:
# Fetching Data
tiles_folder = os.path.join(root_output_folder, dem_tiles_dir_name)
Path(root_output_folder).mkdir(parents=True, exist_ok=True)
Path(tiles_folder).mkdir(parents=True, exist_ok=True)

# Setting up for parameter computation
gcs = os.path.join(root_output_folder, gcs_name)
pcs = os.path.join(root_output_folder, pcs_name)
elevation_tiles = os.path.join(root_output_folder, param_tiles_dir_name)
Path(elevation_tiles).mkdir(parents=True, exist_ok=True)

# Computing Parameters
aspect_tiles = os.path.join(root_output_folder, "aspect_tiles")
hillshading_tiles = os.path.join(root_output_folder, "hillshading_tiles")
slope_tiles = os.path.join(root_output_folder, "slope_tiles")
Path(aspect_tiles).mkdir(parents=True, exist_ok=True)
Path(hillshading_tiles).mkdir(parents=True, exist_ok=True)
Path(slope_tiles).mkdir(parents=True, exist_ok=True)

# You have have successfully
print("You have successfully WRITE WHAT IT IS DONE.")

The following cell .... WRITE WHAT IT DOES

In [None]:
gt.fetch_dem(
    bbox=region_bounding_box,
    txtPath=download_list,
    dataset="National Elevation Dataset (NED) 1 arc-second Current",
)
gt.download_files(download_list, tiles_folder)

# You have have successfully
print("You have successfully WRITE WHAT IT IS DONE.")

The following cell .... WRITE WHAT IT DOES

In [None]:
raster_list = glob.glob(tiles_folder + "/*")

gt.build_mosaic(raster_list, gcs)

shutil.rmtree(tiles_folder)
os.remove("./merged.vrt")

# You have have successfully
print("You have successfully WRITE WHAT IT IS DONE.")

The following cell .... WRITE WHAT IT DOES

In [None]:
gt.reproject(gcs, pcs, "EPSG:9822")

os.remove(gcs)

# You have have successfully
print("You have successfully WRITE WHAT IT IS DONE.")

The following cell .... WRITE WHAT IT DOES

In [None]:
gt.crop_into_tiles(pcs, elevation_tiles, n_tiles)

glob_of_tiles = glob.glob(elevation_tiles + "/*.tif")

# You have have successfully
print("You have successfully WRITE WHAT IT IS DONE.")

The following cell .... WRITE WHAT IT DOES

In [None]:
pool = multiprocessing.Pool(processes=n_tiles)
pool.map(gt.compute_geotiled, sorted(glob.glob(elevation_tiles + "/*.tif")))

# You have have successfully
print("You have successfully WRITE WHAT IT IS DONE.")

The following cell .... WRITE WHAT IT DOES

In [None]:
gt.build_mosaic_filtered(
    sorted(glob.glob(aspect_tiles + "/*.tif")),
    os.path.join(root_output_folder, "aspect.tif"),
)
gt.build_mosaic_filtered(
    sorted(glob.glob(hillshading_tiles + "/*.tif")),
    os.path.join(root_output_folder, "hillshading.tif"),
)
gt.build_mosaic_filtered(
    sorted(glob.glob(slope_tiles + "/*.tif")),
    os.path.join(root_output_folder, "slope.tif"),
)


shutil.rmtree(aspect_tiles)
shutil.rmtree(hillshading_tiles)
shutil.rmtree(slope_tiles)
shutil.rmtree(elevation_tiles)

# You have have successfully
print("You have successfully WRITE WHAT IT IS DONE.")

The following cell .... WRITE WHAT IT DOES

In [None]:
hill = os.path.join(root_output_folder, "hillshading.tif")
aspect = os.path.join(root_output_folder, "aspect.tif")
slope = os.path.join(root_output_folder, "slope.tif")


pcs_array = gt.generate_img(
    pcs,
    downsample=5,
    reproject_gcs=True,
    shp_files=shapefile,
    title="Elevation Data for TN @ 1 Arc-Second/30m Resolution",
    zunit="Meter",
    xyunit="Degree",
    ztype="Elevation",
    crop_shp=True,
)
hill_array = gt.generate_img(
    hill,
    downsample=5,
    reproject_gcs=True,
    shp_files=shapefile,
    title="Hillshading Data for TN @ 1 Arc-Second/30m Resolution",
    zunit="Level",
    xyunit="Degree",
    ztype="Hillshading",
    crop_shp=True,
)
aspect_array = gt.generate_img(
    aspect,
    downsample=5,
    reproject_gcs=True,
    shp_files=shapefile,
    title="Aspect Data for Rhode TN @ 1 Arc-Second/30m Resolution",
    zunit="Degree",
    xyunit="Degree",
    ztype="Aspect",
    crop_shp=True,
)
slope_array = gt.generate_img(
    slope,
    downsample=5,
    reproject_gcs=True,
    shp_files=shapefile,
    title="Slope Data for TN @ 1 Arc-Second/30m Resolution",
    zunit="Degree",
    xyunit="Degree",
    ztype="Slope",
    crop_shp=True,
)

# You have have successfully
print("You have successfully WRITE WHAT IT IS DONE.")

#### Option B. Accessing Data from Dataverse Public Commons

High-resolution data for elevation, slope, aspect, and hillshading at a 10m resolution for Tennessee might already exist in public storage repositories. If available, you can directly download this data to proceed to Step 2 without the need for initial data generation.

The following cell establishes a connection with a public storage such as Dataverse.

In [None]:
if not os.path.exists("files/tif_files"):
    os.mkdir("files/tif_files")

with open("./files/json/dataverse.json", "r") as file:
    urls = json.load(file)


def get_data_from_dataverse(file_url, name_file):
    resp = requests.get(file_url)
    with open(name_file, "wb") as f:
        f.write(resp.content)


for data in tqdm(urls):
    get_data_from_dataverse(data.get("url"), data.get("name_file"))
    
    
# You have have successfully
print("You have successfully WRITE WHAT IT IS DONE.")

The following cell downloads previously-generated TIFF files from Dataverse.

In [None]:
shapefile = ["./files/shape_files/STATEFP_47.shp"]  # Shapefile for Visualization
hill = os.path.join("./files/tif_files/", "TN_30M_hillshading.tif")
aspect = os.path.join("./files/tif_files/", "TN_30M_aspect.tif")
pcs = os.path.join("./files/tif_files/", "TN_30M_elevation.tif")
slope = os.path.join("./files/tif_files/", "TN_30M_slope.tif")
pcs_array = gt.generate_img(
    pcs,
    downsample=5,
    reproject_gcs=True,
    shp_files=shapefile,
    title="Elevation Data for TN @ 1 Arc-Second/30m Resolution",
    zunit="Meter",
    xyunit="Degree",
    ztype="Elevation",
    crop_shp=True,
)
hill_array = gt.generate_img(
    hill,
    downsample=5,
    reproject_gcs=True,
    shp_files=shapefile,
    title="Hillshading Data for TN @ 1 Arc-Second/30m Resolution",
    zunit="Level",
    xyunit="Degree",
    ztype="Hillshading",
    crop_shp=True,
)
aspect_array = gt.generate_img(
    aspect,
    downsample=5,
    reproject_gcs=True,
    shp_files=shapefile,
    title="Aspect Data for Rhode TN @ 1 Arc-Second/30m Resolution",
    zunit="Degree",
    xyunit="Degree",
    ztype="Aspect",
    crop_shp=True,
)
slope_array = gt.generate_img(
    slope,
    downsample=5,
    reproject_gcs=True,
    shp_files=shapefile,
    title="Slope Data for TN @ 1 Arc-Second/30m Resolution",
    zunit="Degree",
    xyunit="Degree",
    ztype="Slope",
    crop_shp=True,
)

# You have have successfully
print("You have successfully WRITE WHAT IT IS DONE.")

**Important** The four images created by the GEOtiled module are high-resolution, large, and static. They do not allow users to manipulate the embedded information, including zooming into subregions, cropping them, and saving the cropped areas either locally or on other platforms for further analysis. Additionally, these images are provided in the TIFF format, which results in large file sizes.

###  Step 2: Converting to IDX Format

After generating the data, it needs to be converted into the IDX format to be compatible with OpenVisus. This process involves creating a single IDX file that includes all the terrain parameters as distinct fields.

**Important:** Completing one of the data generation options in Step 1 is a prerequisite for proceeding with this step.

The following cell .....

In [None]:
# Generate lat/lon min/max from tiff
from osgeo import gdal, osr
hill = os.path.join("./files/tif_files/", "TN_30M_hillshading.tif")

dataset = gdal.Open(hill)
band = dataset.GetRasterBand(1) 

geotransform = dataset.GetGeoTransform()
spatial_ref = osr.SpatialReference(wkt=dataset.GetProjection())

target_spatial_ref = osr.SpatialReference()
target_spatial_ref.ImportFromEPSG(4326)

coord_transform = osr.CoordinateTransformation(spatial_ref, target_spatial_ref)
ulx, xres, _, uly, _, yres = geotransform
lrx = ulx + (dataset.RasterXSize * xres)
lry = uly + (dataset.RasterYSize * yres)
top_left_geo = coord_transform.TransformPoint(ulx, uly)
bottom_right_geo = coord_transform.TransformPoint(lrx, lry)

lon_1, lat_1, _ = top_left_geo 
lon_2, lat_2, _ = bottom_right_geo
lat_min=min(lat_1,lat_2)
lat_max=max(lat_1,lat_2)
lon_min=min(lon_1,lon_2)
lon_max=max(lon_1,lon_2)

print(f"Longitude range: {lat_min} to {lat_max}")
print(f"Latitude range: {lon_min} to {lon_max}")

# You have have successfully
print("You have successfully WRITE WHAT IT IS DONE.")

The follwing cell ...  For all fields of IDX, we write the corresponding data. `Fields` should be in this format: [ov.Field(FIELD_NAME, DTYPE)]. After writing, we compress the IDX file with `zip` compression.

In [None]:
filename = "idx_data/Tennessee_terrain_parameters.idx"
all_fields = [
    ov.Field("elevation", "float32"),
    ov.Field("hillshading", "uint8"),
    ov.Field("aspect", "float32"),
    ov.Field("slope", "float32"),
]
input_data = [
    np.flipud(pcs_array).copy(),
    np.flipud(hill_array).copy(),
    np.flipud(aspect_array).copy(),
    np.flipud(slope_array).copy(),
]
height, width = input_data[0].shape
db = ov.CreateIdx(
    url=filename,
    dims=[width, height],
    fields=all_fields,
    arco="4mb",
    physic_box=ov.BoxNd.fromString(f"{lat_min} {lat_max} {lon_min} {lon_max}"),
    time=[0, 0, "%00000d/"],
)
i = 0
for fld in db.getFields():
    db.write(input_data[i], field=fld)
    i += 1
db.compressDataset(["zip"])

# You have have successfully
print("You have successfully compress the IDX file with the four terrain parmaters.")

### Step 3: Creating a Dashboard for Visualization and Analysis

After the data has been generated, transformed into the correct format, and compressed, we proceed to create the NSDF dashboard. This dashboard enables the visualization and analysis of terrain parameters, either from a local file (Option A) or from remote storage, such as the private Seal Storage (Option B).

#### Option A: Loading the Dataset from Local Storage

This option involves importing the dataset into the dashboard directly from local storage contained within the Docker image.

The following cell loads the dataset from a local file

In [None]:
filename = "idx_data/Tennessee_terrain_parameters.idx"
db = ov.LoadDataset(filename)

# You have have successfully
print("You have successfully compress the IDX file with the four terrain parmaters.")

The followitng cell ...

In [None]:
read_elevation = db.read(field="elevation")
read_hillshading = db.read(field="hillshading")
read_aspect = db.read(field="aspect")
read_slope = db.read(field="slope")

# You have have successfully
print("You have successfully compress the IDX file with the four terrain parmaters.")

The following cell ....

In [None]:
fig, axs = plt.subplots(4, 1, figsize=(10, 8))
axs[0].imshow(read_elevation, origin="lower", vmin=30, vmax=1999, cmap="BuPu_r")
axs[0].set_title("Elevation")
axs[1].imshow(read_hillshading, origin="lower", vmin=0, vmax=255, cmap="Oranges")
axs[1].set_title("Hillshading")

axs[2].imshow(read_aspect, vmin=0, origin="lower", vmax=360, cmap="Reds")
axs[2].set_title("Aspect")

axs[3].imshow(read_slope, vmin=0, origin="lower", vmax=65.9, cmap="Oranges")
axs[3].set_title("Slope")
plt.subplots_adjust(wspace=0.4, hspace=0.6)
plt.tight_layout()
plt.show()

# You have have successfully
print("You have successfully compress the IDX file with the four terrain parmaters.")

#### Option B: Loading the Dataset from Seal Storage

For this option, the dataset is upladed into the dashboard from a private archive that has been previously generated and stored in Seal Storage.

The following cell ....

In [None]:
filename = "Tennessee_terrain_parameters.idx"
HOME_DIR = "s3://utah/nsdf/somospie/"  # DONOT change this line
data_dir = "terrain_tennessee/"
upload_dir = HOME_DIR + data_dir
s3_path = upload_dir.split("://")[1]
s3_path += filename
remote_dir = (
    "https://maritime.sealstorage.io/api/v0/s3/"
    + s3_path
    + "?access_key=any&secret_key=any&endpoint_url=https://maritime.sealstorage.io/api/v0/s3&cached=arco"
)

db = ov.LoadDataset(remote_dir)

# You have have successfully
print("You have successfully compress the IDX file with the four terrain parmaters.")

The following cell reads the four parmates from the single IDX file.

In [None]:
read_elevation = db.read(field="elevation")
read_hillshading = db.read(field="hillshading")
read_aspect = db.read(field="aspect")
read_slope = db.read(field="slope")

# You have have successfully
print("You have successfully compress the IDX file with the four terrain parmaters.")

The following cell 

In [None]:
fig, axs = plt.subplots(4, 1, figsize=(10, 8))
axs[0].imshow(read_elevation, origin="lower", vmin=30, vmax=1999, cmap="BuPu_r")
axs[0].set_title("Elevation")
axs[1].imshow(read_hillshading, origin="lower", vmin=0, vmax=255, cmap="Oranges")
axs[1].set_title("Hillshading")

axs[2].imshow(read_aspect, vmin=0, origin="lower", vmax=360, cmap="Reds")
axs[2].set_title("Aspect")

axs[3].imshow(read_slope, vmin=0, origin="lower", vmax=65.9, cmap="Oranges")
axs[3].set_title("Slope")
plt.subplots_adjust(wspace=0.4, hspace=0.6)
plt.tight_layout()
plt.show()

# You have have successfully
print("You have successfully compress the IDX file with the four terrain parmaters.")

### Step 4: Analyzing Tennessee's Geoterrain Using the NSDF Dashboard 

WHAT ARE WE DOING HERE? What is the diffence between Step 3 and step 4?

This cell configures the Port and Address settings.

In [None]:
PORT="8989" # Dont change this since this is the forwarded port.
ADDRESS="0.0.0.0" #Local to the server; 0.0.0.0 or 127.0.0.1

# You have have successfully
print("You have successfully compress the IDX file with the four terrain parmaters.")

#### Option A. Load dataset from a local storage

This cell uplaid data from a local storage to input in the dashboard.

In [None]:
URL = "idx_data/Tennessee_terrain_parameters.idx"

# You have have successfully
print("You have successfully compress the IDX file with the four terrain parmaters.")

#### Option B. Load dataset from Seal Storage

This cell uplaid data from a Seal Storage to input in the dashboard. 

In [None]:
URL="https://maritime.sealstorage.io/api/v0/s3/utah/nsdf/somospie/terrain_tennessee/Tennessee_terrain_parameters.idx?access_key=any&secret_key=any&endpoint_url=https://maritime.sealstorage.io/api/v0/s3&cached=arco"

# You have have successfully
print("You have successfully compress the IDX file with the four terrain parmaters.")

This cell executes the dashboard, enabling us to zoom into specific subregions of the stae of Tennessee and examine the terrain parameter patterns.

In [None]:
!python -m panel serve openvisuspy/src/openvisuspy/dashboards --dev --allow-websocket-origin='*' --address="{ADDRESS}" --port "{PORT}" --args "{URL}"

### Visit 0.0.0.0:8989 to explore the dashboard

ADD HERE THE CONUS EXAMPLE

#### Acknowledgment
The authors of this tutorial would like to express their gratitude to:

Dataverse
Seal Storage
Vargas Group
NSF

#### Learn More About SOMOSPIE

#### Contact Us
