# Obtain NC Phase3 2015 DEM Test Data

Data (DEM / LiDAR) for this exercise is available at the NOAA Digital Coast bulk download site:  

- DEM:  [https://chs.coast.noaa.gov/htdata/raster2/elevation/NorthCarolina_DEM_2015P3_6205/](https://chs.coast.noaa.gov/htdata/raster2/elevation/NorthCarolina_DEM_2015P3_6205/)
  
  - DEM Index:   [tileindex_NorthCarolina_DEM_2015P3.zip](https://chs.coast.noaa.gov/htdata/raster2/elevation/NorthCarolina_DEM_2015P3_6205/tileindex_NorthCarolina_DEM_2015P3.zip)
  - DEM URL List:  [urllist6205.txt](https://chs.coast.noaa.gov/htdata/raster2/elevation/NorthCarolina_DEM_2015P3_6205/urllist6205.txt)  
  
- LiDAR:  [https://noaa-nos-coastal-lidar-pds.s3.amazonaws.com/laz/geoid18/6209/index.html](https://noaa-nos-coastal-lidar-pds.s3.amazonaws.com/laz/geoid18/6209/index.html)

  - LiDAR Index:   [nc2015_phaseIII_index.gpkg](https://noaa-nos-coastal-lidar-pds.s3.amazonaws.com/laz/geoid18/6209/nc2015_phaseIII_index.gpkg)
  - LiDAR URL List: [urllist6209.txt](https://noaa-nos-coastal-lidar-pds.s3.amazonaws.com/laz/geoid18/6209/urllist6209.txt)



## Download the DEM and LiDAR URL lists.

Below, wget is used to download the URL lists.  For compatibility with Windows OS, wget commands are commented out below and substituted with a function to emulate wget. 

In [None]:
import os
import csv
import glob
import time
import argparse
import multiprocessing as mp
import requests #this is used to replace !wget
import pandas as pd 
from concurrent.futures import ProcessPoolExecutor
from itertools import repeat
from download_files import *

Check present working directory with bash syntax.

In [None]:
!pwd

Check current working directory with python syntax.

In [None]:
os.getcwd()

Check present working directory on Windows.

In [None]:
!echo %cd%

In [None]:
def get_url_list(url, output_csv_name):
    # Create the Reference directory if it doesn't exist
    os.makedirs('./Reference', exist_ok=True)
    
    # Send a GET request to the URL
    response = requests.get(url)
    
    # Check if the request was successful
    if response.status_code == 200:
        # Write the content to a file
        with open(output_csv_name, 'wb') as file:
            file.write(response.content)
        print("File downloaded successfully.")
    else:
        print(f"Failed to download the file. Status code: {response.status_code}")


In [None]:
# !wget -P ./Reference -np -nd -r -nH -L "https://chs.coast.noaa.gov/htdata/raster2/elevation/NorthCarolina_DEM_2015P3_6205/urllist6205.txt"
get_url_list('https://chs.coast.noaa.gov/htdata/raster2/elevation/NorthCarolina_DEM_2015P3_6205/urllist6205.txt', './Reference/urllist6205.txt')

In [None]:
# !wget -P ./Reference -np -nd -r -nH -L "https://noaa-nos-coastal-lidar-pds.s3.amazonaws.com/laz/geoid18/6209/urllist6209.txt"
get_url_list('https://noaa-nos-coastal-lidar-pds.s3.amazonaws.com/laz/geoid18/6209/urllist6209.txt','./Reference/urllist6209.txt')

For the Coldridge SE TEM exercises, **30** original product resolution (OPR) digital elevation model (DEM) tiles will be utilized.  The LiDAR tiles count is also 30.  A file listing text file for the 30 DEM and LiDAR point cloud (LPC) data are in the "Reference" folder and printed with the following cells.

In [None]:
!cat -n ./Reference/NC_P3_2015_TEM_Coleridge_SE_opr.csv | column

In [None]:
!cat -n ./Reference/NC_P3_2015_TEM_Coleridge_SE_lpc.csv | column

The following two cells extract the relevant urls for both the DEM and LiDAR from the respective NC P3 2015 URL text files containing the download links.

In [None]:
def filter_urls(input_csv, url_list, output_csv):
    # Read the input CSV as a single column DataFrame
    search_terms_df = pd.read_csv(input_csv, header=None, names=['Term'])
    
    # Read the URL list
    urls_df = pd.read_csv(url_list, header=None, names=['URL'])

    # Filter URLs
    matched_urls = []
    for term in search_terms_df['Term']:
        matched = urls_df[urls_df['URL'].str.contains(term, na=False)]
        matched_urls.extend(matched['URL'].tolist())

    # Write matched URLs to output CSV
    result_df = pd.DataFrame(matched_urls)
    result_df.to_csv(output_csv, index=False, header=False)
    
    print(f"Filtered URLs have been written to {output_csv}")
    print(f"Number of matched URLs: {len(result_df)}")

In [None]:
# !for i in $(cat ./Reference/NC_P3_2015_TEM_Coleridge_SE_opr.csv) ; do cat ./Reference/urllist6205.txt | grep $i >> ./Reference/NC_P3_2015_TEM_Coleridge_SE_opr_links.csv ; done

filter_urls('./Reference/NC_P3_2015_TEM_Coleridge_SE_opr.csv', './Reference/urllist6205.txt', './Reference/NC_P3_2015_TEM_Coleridge_SE_opr_links.csv')

In [None]:
# !for i in $(cat ./Reference/NC_P3_2015_TEM_Coleridge_SE_lpc.csv) ; do cat ./Reference/urllist6209.txt | grep $i >> ./Reference/NC_P3_2015_TEM_Coleridge_SE_lpc_links.csv ; done

filter_urls('./Reference/NC_P3_2015_TEM_Coleridge_SE_lpc.csv', './Reference/urllist6209.txt', './Reference/NC_P3_2015_TEM_Coleridge_SE_lpc_links.csv')

The next two cells print the contents of the download links (**30**). 

In [None]:
!cat -n ./Reference/NC_P3_2015_TEM_Coleridge_SE_opr_links.csv

In [None]:
!cat -n ./Reference/NC_P3_2015_TEM_Coleridge_SE_lpc_links.csv

The next cells download the 30 files.  There is a bash script and a python multiprocess mode.
  + The DEM data are downloaded to an Original Product Resolution (**OPR**) folder.  
  + The LiDAR data are downloaded to a LiDAR Point Cloud (**LPC**) folder.   
  
    The following table shows estimated download times for both original product resolution (OPR) DEM data and LPC data in either serial mode with bash or multiprocess mode with python.

  
    | Data | Mode | Script | Download Time (seconds) | Download Time (minutes) | Size (GB)
    |---|---|---|---|---|---
    | OPR (.tif)|Serial | bash | 113 | 2 | 0.31 
    | LPC (.laz)|Serial | bash | 1078 | 18 | 3.35
    | OPR (.tif)|Multiprocess | python | 10.72 | .17 | 0.31 
    | LPC (.laz)|Multiprocess | python | 94.60 | 1.57 | 3.35
    


This next cell could download test data in serial mode. For faster download the python multiprocess method is recommended.

In [None]:
# %%bash
# START=$(date +%s);
# sleep 1; 
# echo $START
# for map in $(cat ./Reference/NC_P3_2015_TEM_Coleridge_SE_opr_links.csv ) ; do wget -P ./OPR -np -nd -r -nH -L $map ; done
# END=$(date +%s);
# echo ----- $((END-START)) seconds -----

In the cell below, the input csv file with URL links and output folder are defined for python script download_files.py.

In [None]:
in_file = "./Reference/NC_P3_2015_TEM_Coleridge_SE_opr_links.csv"
out_fld = "./OPR"

In [None]:
start_time = time.time()
os.makedirs(out_fld,exist_ok=True)

with open(in_file,'r') as file:
    urls,reader = [],csv.reader(file)
    for line in reader: urls.append(line[0])

pool = ProcessPoolExecutor(max_workers=mp.cpu_count())
_ = list(
    pool.map(
        download_file,
        urls,
        repeat(out_fld)
    )
)

print(f"\n\n------ Done. {time.time() - start_time} seconds ------\n\n")

In [None]:
# %%bash
# START=$(date +%s);
# sleep 1; 
# echo $START
# for map in $(cat ./Reference/NC_P3_2015_TEM_Coleridge_SE_lpc_links.csv ) ; do wget -P ./LPC -np -nd -r -nH -L $map ; done
# END=$(date +%s);
# echo ----- $((END-START)) seconds -----

In [None]:
in_file = "./Reference/NC_P3_2015_TEM_Coleridge_SE_lpc_links.csv"
out_fld = "./LPC"

In [None]:
start_time = time.time()
os.makedirs(out_fld,exist_ok=True)

with open(in_file,'r') as file:
    urls,reader = [],csv.reader(file)
    for line in reader: urls.append(line[0])

pool = ProcessPoolExecutor(max_workers=mp.cpu_count())
_ = list(
    pool.map(
        download_file,
        urls,
        repeat(out_fld)
    )
)

print(f"------ Done. {time.time() - start_time} seconds ------")

## Create Reference Data and Mosaics.

The next cells use [gdal](https://gdal.org) to create raster mosaics for the original product resolution (OPR) DEM and National Agriculture Imagery Program (NAIP) data.

Create a text listing of the downloaded OPR data.  These bash commands will be subsequently repeated for NAIP datasets.

In [None]:
!find ./OPR -name '*.tif' > Reference/NC_P3_2015_TEM_Coleridge_SE_opr.txt

In [None]:
!find ./NAIP_2014 -name '*.jp2' > Reference/NC_P3_2015_TEM_Coleridge_SE_naip.txt

Below a spatial index of the downloaded DEM can be created with [gdaltindex](https://gdal.org/programs/gdaltindex.html#gdaltindex) using the input text listings.

In [None]:
!gdaltindex Reference/NC_P3_2015_TEM_Coleridge_SE_opr.shp --optfile Reference/NC_P3_2015_TEM_Coleridge_SE_opr.txt 

A dissolved single polygon can be generated with [ogr2ogr](https://gdal.org/programs/ogr2ogr.html).

In [None]:
!ogr2ogr Reference/NC_P3_2015_TEM_Coleridge_SE_opr_index_dissolve.shp Reference/NC_P3_2015_TEM_Coleridge_SE_opr.shp -dialect sqlite -sql "SELECT ST_Union(geometry) AS geometry FROM NC_P3_2015_TEM_Coleridge_SE_opr"

The input text listings can also be used as input to develop raster virtual datasets with [gdalbuildvrt](https://gdal.org/programs/gdalbuildvrt.html#gdalbuildvrt).

In [None]:
!gdalbuildvrt Reference/NC_P3_2015_TEM_Coleridge_SE_opr.vrt -input_file_list Reference/NC_P3_2015_TEM_Coleridge_SE_opr.txt

In [None]:
!gdalbuildvrt Reference/NC_P3_2015_TEM_Coleridge_SE_naip.vrt -input_file_list Reference/NC_P3_2015_TEM_Coleridge_SE_naip.txt

A physical GeoTIFF (Cloud Optimized - [COG](https://gdal.org/drivers/raster/cog.html#raster-cog)) can be generated for display with [gdalwarp](https://gdal.org/programs/gdalwarp.html) in about ~60 for OPR DEM, and ~3 minutes for the NAIP mosaic.

In [None]:
# %%bash
# START=$(date +%s);
# sleep 1; 
# echo $START
# gdalwarp Reference/NC_P3_2015_TEM_Coleridge_SE_opr.vrt Reference/NC_P3_2015_TEM_Coleridge_SE_opr.tif -of COG -co COMPRESS=LZW -co PREDICTOR=YES -co BLOCKSIZE=128 -co RESAMPLING=CUBIC -co OVERVIEW_PREDICTOR=YES -co OVERVIEW_COUNT=10 -co OVERVIEWS=IGNORE_EXISTING -co BIGTIFF=YES -co ADD_ALPHA=YES -overwrite
# END=$(date +%s);
# echo ----- $((END-START)) seconds -----

!gdalwarp Reference/NC_P3_2015_TEM_Coleridge_SE_opr.vrt Reference/NC_P3_2015_TEM_Coleridge_SE_opr.tif -of COG -co COMPRESS=LZW -co PREDICTOR=YES -co BLOCKSIZE=128 -co RESAMPLING=CUBIC -co OVERVIEW_PREDICTOR=YES -co OVERVIEW_COUNT=10 -co OVERVIEWS=IGNORE_EXISTING -co BIGTIFF=YES -co ADD_ALPHA=YES --config GDAL_CACHEMAX 75% -co NUM_THREADS=10 -overwrite

In [None]:
# %%bash
# START=$(date +%s);
# sleep 1; 
# echo $START
# gdalwarp Reference/NC_P3_2015_TEM_Coleridge_SE_naip.vrt Reference/NC_P3_2015_TEM_Coleridge_SE_naip.tif -t_srs "EPSG:6543" -dstalpha -of COG -co COMPRESS=LZW -co PREDICTOR=YES -co BLOCKSIZE=128 -co RESAMPLING=CUBIC -co OVERVIEW_PREDICTOR=YES -co OVERVIEW_COUNT=10 -co OVERVIEWS=IGNORE_EXISTING -co BIGTIFF=YES -overwrite
# END=$(date +%s);
# echo ----- $((END-START)) seconds -----

!gdalwarp Reference/NC_P3_2015_TEM_Coleridge_SE_naip.vrt Reference/NC_P3_2015_TEM_Coleridge_SE_naip.tif -t_srs "EPSG:6543" -dstalpha -of COG -co COMPRESS=LZW -co PREDICTOR=YES -co BLOCKSIZE=128 -co RESAMPLING=CUBIC -co OVERVIEW_PREDICTOR=YES -co OVERVIEW_COUNT=10 -co OVERVIEWS=IGNORE_EXISTING -co BIGTIFF=YES --config GDAL_CACHEMAX 75% -co NUM_THREADS=10 -overwrite

Image data source such as the National Agricultural Imagery Program (NAIP) is also used later in this exercise to colorize the downloaded LPC data.  Given that the downloaded data is vintage 2015, the [USGS EarthExplorer](https://earthexplorer.usgs.gov) is a good option especially for legacy/vintage NAIP data, however, this site requires an account with login credentials for data download.  Nonetheless, it may be the best option, for example, to approximate similar temporal range between image and point cloud source.  The EarthExplorer repository allows for search with a kml bounding box or ESRI Shapefile that can be generated using gdal ogr.  Alternatively, the [USGS](https://apps.nationalmap.gov/services/) also has a composite NAIP WMS source from which RGB & CIR rasters may be extracted.

The USGS National Map Services also include additional layer information for useful features such as the USGS Map Indices: [https://carto.nationalmap.gov/arcgis/rest/services/map_indices/MapServer](https://carto.nationalmap.gov/arcgis/rest/services/map_indices/MapServer).

## Retrieve Reference Data and Generate Mosaics.

The following cell reads USGSNAIPImagery WMS services (natural color (RGB) and color infrared (CIR) imagery) with [gdalinfo](https://gdal.org/programs/gdalinfo.html).

In [None]:
# %%bash
# gdalinfo "WMS:https://imagery.nationalmap.gov:443/arcgis/services/USGSNAIPImagery/ImageServer/WMSServer?SERVICE=WMS&VERSION=1.1.1&REQUEST=GetMap&LAYERS=USGSNAIPImagery:None&SRS=EPSG:4326&BBOX=-124.831355,24.485899,-66.851641,49.571288"
# gdalinfo "WMS:https://imagery.nationalmap.gov:443/arcgis/services/USGSNAIPImagery/ImageServer/WMSServer?SERVICE=WMS&VERSION=1.1.1&REQUEST=GetMap&LAYERS=USGSNAIPImagery:FalseColorComposite&SRS=EPSG:4326&BBOX=-124.831355,24.485899,-66.851641,49.571288"

!gdalinfo "WMS:https://imagery.nationalmap.gov:443/arcgis/services/USGSNAIPImagery/ImageServer/WMSServer?SERVICE=WMS&VERSION=1.1.1&REQUEST=GetMap&LAYERS=USGSNAIPImagery:None&SRS=EPSG:4326&BBOX=-124.831355,24.485899,-66.851641,49.571288"
!gdalinfo "WMS:https://imagery.nationalmap.gov:443/arcgis/services/USGSNAIPImagery/ImageServer/WMSServer?SERVICE=WMS&VERSION=1.1.1&REQUEST=GetMap&LAYERS=USGSNAIPImagery:FalseColorComposite&SRS=EPSG:4326&BBOX=-124.831355,24.485899,-66.851641,49.571288"

This information can be stored in xml with [gdal_translate](https://gdal.org/programs/gdal_translate.html) for subsequent data operation / extraction.

In [None]:
# %%bash
# gdal_translate -of WMS "WMS:https://imagery.nationalmap.gov:443/arcgis/services/USGSNAIPImagery/ImageServer/WMSServer?SERVICE=WMS&VERSION=1.1.1&REQUEST=GetMap&LAYERS=USGSNAIPImagery:None&SRS=EPSG:4326&BBOX=-124.831355,24.485899,-66.851641,49.571288" ./Reference/USGSNAIPImagery_RGB.xml
# gdal_translate -of WMS "WMS:https://imagery.nationalmap.gov:443/arcgis/services/USGSNAIPImagery/ImageServer/WMSServer?SERVICE=WMS&VERSION=1.1.1&REQUEST=GetMap&LAYERS=USGSNAIPImagery:FalseColorComposite&SRS=EPSG:4326&BBOX=-124.831355,24.485899,-66.851641,49.571288" ./Reference/USGSNAIPImagery_CIR.xml

!gdal_translate -of WMS "WMS:https://imagery.nationalmap.gov:443/arcgis/services/USGSNAIPImagery/ImageServer/WMSServer?SERVICE=WMS&VERSION=1.1.1&REQUEST=GetMap&LAYERS=USGSNAIPImagery:None&SRS=EPSG:4326&BBOX=-124.831355,24.485899,-66.851641,49.571288" ./Reference/USGSNAIPImagery_RGB.xml
!gdal_translate -of WMS "WMS:https://imagery.nationalmap.gov:443/arcgis/services/USGSNAIPImagery/ImageServer/WMSServer?SERVICE=WMS&VERSION=1.1.1&REQUEST=GetMap&LAYERS=USGSNAIPImagery:FalseColorComposite&SRS=EPSG:4326&BBOX=-124.831355,24.485899,-66.851641,49.571288" ./Reference/USGSNAIPImagery_CIR.xml

This next cell uses [gdalwarp](https://gdal.org/programs/gdalwarp.html) to download, reproject (to NAD83 (2011) State Plane North Carolina, US Survey Feet (EPSG 6543)) USGS NAIP RGB imagery in about ~7 minutes.

In [None]:
# %%bash 
# gdalwarp -overwrite -dstalpha -s_srs EPSG:4326 -t_srs EPSG:6543 -tr 2.0 2.0 -of COG -cutline Reference/NC_P3_2015_TEM_Coleridge_SE_opr_index_dissolve.shp -crop_to_cutline -co COMPRESS=LZW -co PREDICTOR=YES -co BLOCKSIZE=128 -co RESAMPLING=CUBIC -co OVERVIEW_COUNT=10 -co OVERVIEW_PREDICTOR=YES -co BIGTIFF=YES --config GDAL_CACHEMAX 75% -co NUM_THREADS=10 -co OVERVIEWS=IGNORE_EXISTING -co ADD_ALPHA=YES Reference/USGSNAIPImagery_RGB.xml Reference/Coleridge_SE_USGS_NAIP_RGB.tif 

!gdalwarp -overwrite -dstalpha -s_srs EPSG:4326 -t_srs EPSG:6543 -tr 2.0 2.0 -of COG -cutline Reference/NC_P3_2015_TEM_Coleridge_SE_opr_index_dissolve.shp -crop_to_cutline -co COMPRESS=LZW -co PREDICTOR=YES -co BLOCKSIZE=128 -co RESAMPLING=CUBIC -co OVERVIEW_COUNT=10 -co OVERVIEW_PREDICTOR=YES -co BIGTIFF=YES --config GDAL_CACHEMAX 75% -co NUM_THREADS=10 -co OVERVIEWS=IGNORE_EXISTING -co ADD_ALPHA=YES Reference/USGSNAIPImagery_RGB.xml Reference/Coleridge_SE_USGS_NAIP_RGB.tif 

If desired, the same can be done for NAIP CIR data with the cell below.

In [None]:
# %%bash
# gdalwarp -overwrite -dstalpha -s_srs EPSG:4326 -t_srs EPSG:6543 -tr 2.0 2.0 -of COG -cutline Reference/NC_P3_2015_TEM_Coleridge_SE_opr_index_dissolve.shp -crop_to_cutline -co COMPRESS=LZW -co PREDICTOR=YES -co BLOCKSIZE=128 -co RESAMPLING=CUBIC -co OVERVIEW_COUNT=10 -co OVERVIEW_PREDICTOR=YES -co BIGTIFF=YES --config GDAL_CACHEMAX 75% -co NUM_THREADS=10 -co OVERVIEWS=IGNORE_EXISTING -co ADD_ALPHA=YES Reference/USGSNAIPImagery_CIR.xml Reference/Coleridge_SE_USGS_NAIP_CIR.tif 

!gdalwarp -overwrite -dstalpha -s_srs EPSG:4326 -t_srs EPSG:6543 -tr 2.0 2.0 -of COG -cutline Reference/NC_P3_2015_TEM_Coleridge_SE_opr_index_dissolve.shp -crop_to_cutline -co COMPRESS=LZW -co PREDICTOR=YES -co BLOCKSIZE=128 -co RESAMPLING=CUBIC -co OVERVIEW_COUNT=10 -co OVERVIEW_PREDICTOR=YES -co BIGTIFF=YES --config GDAL_CACHEMAX 75% -co NUM_THREADS=10 -co OVERVIEWS=IGNORE_EXISTING -co ADD_ALPHA=YES Reference/USGSNAIPImagery_CIR.xml Reference/Coleridge_SE_USGS_NAIP_CIR.tif 