# Create AOI Image Tiles 

This notebook is used to create image tiles at DHS survaey locations for a given AOI using resampled and spatially aligned GeoTiff raster files that were previously created for the specific AOI using `prep_geospatial_data.ipynb`. The following processing steps are performed.

1. Define configurations (required input files, AOI parameters, file naming, etc.) 
2. Extract survey data from the Pakistan DHS shape file (specifically, Cluster IDs and GPS lat, lon)
3. Load AOI Raster Data
4. Convert each survey GPS location to the same CRS as the AOI image stack
5. Loop through all GPS locations and identify which points are within the AOI image stack
6. Crop image tiles from the AOI image stack:

    - For each GPS location, find the nearest vertex in the image stack
    - Use that vertex point as the center of a bounding box for cropping an image tile from the stack for each data type (224 x 224)
    - Save each crop as a data sample with an associated file name that clearly identifies the data sample
    
## File System Structure

The top level file structure is shown below which includes five folders and three notebooks.

<pre style="font-family: monospace;">
<span style="color: blue;">./AOI       </span>  <span style="color: gray;"># AOI Image Stacks and Image Tiles</span>  
<span style="color: blue;">./DHS       </span>  <span style="color: gray;"># DHS survey data</span>
<span style="color: blue;">./gist_utils</span>  <span style="color: gray;"># Python package with convenience functions</span>
<span style="color: gray;">./Nightlights</span>
<span style="color: gray;">./Population</span>
<span style="color: gray;">./Rainfall</span>

<span style="color: blue;">./prep_aoi_image_tiles.ipynb (this notebook)</span>
<span style="color: gray;">./prep_geospatial_data.ipynb</span>
<span style="color: gray;">./prep_rainfall_chirps.ipynb</span>
</pre>


## Input

The input GeoTiff files for this notebook come from this file structure (previoulsy generated by `prep_geospatial_data.ipynb`):

<pre style="font-family: monospace;">
./Nightlights/
    output/PK/
        VNL_v22_npp-j01_2022_global_vcmslcfg_median_PK_4_resampled_average.tif

./Population/
    output/PK/
        landscan-global-2022_PK_4_resampled_average.tif
            
./Rainfall/
    output/PK/
        chirps-v2.0.2023_PK_42N_avg_PK_4_resampled_average.tif      
</pre>

The first code cell in this notebook copies these resampled files into a new file stricture to create an Image Stack for the specified country.


## Output

This notebook produces the following (parallel) file structure which contains image tiles for each of the DHS survey locations for each data type for the specificed country. Additionally, a Virtual Reference Table (VRT) file is produced for each datatype that references all the image tiles. These VRT files provide a convenient way to load the rastewr image tiles in QGIS for visual inspection.

<pre style="font-family: monospace;">
./AOI/
    PK/
        Image_Tiles/
            Nightlights/
                # Cropped image tiles at each DHS cluster location.
                PK_001_C-002_Nightlights_2022_400m.tif
                PK_002_C-003_Nightlights_2022_400m.tif
                    :
                PK_265_C-415_Nightlights_2022_400m.tif
                
            Population/
                PK_001_C-002_Population_2022_400m.tif
                    :
                
            Rainfall/
                PK_001_C-002_Rainfall_2023_400m
                    :
            
            PK_Nightlights_2022_400m.vrt
            PK_Population_2022_400m.vrt
            PK_Rainfall_2023_400m.vrt
</pre>

## File Prep [One Time Copy]

Each of the re-sampled GeoTiff files genertead by `prep_geospatial_data.ipynb` for each data type for the specified counrty should be copied to the corresponding `Image_Stack` folder as shown below. This file structure below constitutes the "Image Stack" for the specified country. 

The following code cell automates this process. Once the Image Stack in the file structure below has been populated the remainder of this notebook can be executed to create Image Tiles for a specified country.

<pre style="font-family: monospace;">
./AOI/
    PK/
        Image_Stack/
            # Resampled, spatially aligned image stack.
            chirps-v2.0.2022_avg_2022_bilinear_4_resampled.tif
            landscan-global-2022_2022_average_4_resampled.tif
            VNL_v22_npp-j01_2022_global_vcmslcfg_median_2022_median_average_4_resampled.tif
            
</pre>

## Required Configurations

After the desired AOI is specifed based on its two-letter country code, the notebook can be executed which will produce image tiles for each of the three data types. 

<pre style="font-family: monospace;">
<span style="color: blue;">country_code= 'PK'</span>  # Set the country code
</pre>


In [1]:
import os
import shutil
import glob
from osgeo import gdal, osr, ogr 
from dataclasses import dataclass

# Import module that contains several convenience functions (e.g., gdal wrappers)
from gist_utils import *

# Adding path to gdal commands for local system
os.environ['PATH'] += ':/Users/billk/miniforge3/envs/py39-pt/bin/' 

#-------------------------------------------------
# REQUIRED CONFIGURATIONS HERE
#-------------------------------------------------
country_code = 'TD'   # Set the country code
#-------------------------------------------------

# Set to True to copy re-sampled data to create the Image Stack for the specified country.
#-----------------------------------------------------------------------------------------
make_image_stack = True
#-----------------------------------------------------------------------------------------

In [2]:
import os
import shutil
import glob

if make_image_stack:

    # Set the country code
    data_types = ['Rainfall', 'Nightlights', 'Population']
    source_base = './'
    destination_base = './AOI/'

    # Function to create directory if it doesn't exist
    def ensure_dir(directory):
        if not os.path.exists(directory):
            os.makedirs(directory)

    # Scan each data type in the source directory
    for data_type in data_types:
        source_path = os.path.join(source_base, data_type, 'output', country_code)

        # Look for TIFF files directly in the source path and its immediate contents
        file_search_path = os.path.join(source_path, '*resampled*.tif')
        
        # Use glob to find files that match the pattern
        for file_path in glob.glob(file_search_path):
            file_name = os.path.basename(file_path)
            dir_name = os.path.basename(os.path.dirname(file_path))

            # Build the destination path
            destination_path = os.path.join(destination_base, country_code, 'Image_Stack')
            # Ensure the destination directory exists
            ensure_dir(destination_path)
            # Copy the file to the destination
            shutil.copy(file_path, destination_path)
            print(f"Copied {file_path} to {destination_path}")

    print("File copying completed.")


Copied ./Rainfall/output/TD/R_chirps-v2.0.2023_TD_avg_TD_4_resampled_bilinear.tif to ./AOI/TD/Image_Stack
Copied ./Nightlights/output/TD/N_VNL_v22_npp-j01_2022_global_vcmslcfg_median_TD_4_resampled_bilinear.tif to ./AOI/TD/Image_Stack
Copied ./Population/output/TD/P_landscan-global-2022_TD_4_resampled_nearest.tif to ./AOI/TD/Image_Stack
File copying completed.


In [17]:
shapefile_path = aoi_configurations[country_code]['shapefile']

crs_lat = aoi_configurations[country_code]['crs_lat']
crs_lon = aoi_configurations[country_code]['crs_lon']

#expected_crs = proj_string = '+proj=moll +lon_0=0 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs'
expected_crs  = f'+proj=laea +lat_0={crs_lat} +lon_0={crs_lon} +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs'

case = country_code
print(shapefile_path)
print(expected_crs)

DHS/TD_2014-15_DHS_05072024_1511_211908/TDGE71FL/TDGE71FL.shp
+proj=laea +lat_0=14.0 +lon_0=18.0 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs


In [18]:
# Shape file fields
cluster_field  = 'DHSCLUST'
lat_field      = 'LATNUM'
lon_field      = 'LONGNUM'

expected_pixel_size  = (400, 400)    # This should match the pixel size in the input rasters

# Set the resolution for programmatic file naming below
res = expected_pixel_size[0]

# Build a list of the raster (produced by: prep_geospatial_data.ipynb)
aoi_image_stack_folder = f'./AOI/{country_code}/Image_Stack/'

# List all files in the image stack
aoi_image_stack_paths = sorted([os.path.join(aoi_image_stack_folder, file) for file in os.listdir(aoi_image_stack_folder) if file.endswith('.tif')])

# Image tile outputs (folders where image tiles are stored for each data type)
image_tile_folders = []

image_tile_folders.append(f'./AOI/{country_code}/Image_Tiles/Nightlights/')
image_tile_folders.append(f'./AOI/{country_code}/Image_Tiles/Population/')
image_tile_folders.append(f'./AOI/{country_code}/Image_Tiles/Rainfall/')

image_tile_suffixes = []
image_tile_suffixes.append(f'Nightlights_2022_{res}m')
image_tile_suffixes.append(f'Population_2022_{res}m')
image_tile_suffixes.append(f'Rainfall_2023_{res}m')

# VRT filename suffix
vrt_file_suffixes = []
vrt_file_suffixes.append(f'Nightlights_2022_{res}m')
vrt_file_suffixes.append(f'Population_2022_{res}m')
vrt_file_suffixes.append(f'Rainfall_2023_{res}m')

In [19]:
print(aoi_image_stack_paths[0])
print(aoi_image_stack_paths[1])
print(aoi_image_stack_paths[2])
print('\n')
print(image_tile_folders)

./AOI/TD/Image_Stack/N_VNL_v22_npp-j01_2022_global_vcmslcfg_median_TD_4_resampled_bilinear.tif
./AOI/TD/Image_Stack/P_landscan-global-2022_TD_4_resampled_nearest.tif
./AOI/TD/Image_Stack/R_chirps-v2.0.2023_TD_avg_TD_4_resampled_bilinear.tif


['./AOI/TD/Image_Tiles/Nightlights/', './AOI/TD/Image_Tiles/Population/', './AOI/TD/Image_Tiles/Rainfall/']


## 2 Extract Cluster IDs and Associated GPS Coordinates

In [20]:
if os.path.exists(shapefile_path):
    print("File exists.")
else:
    print("File does not exist.")

File exists.


## 3 Load AOI Raster Data

In [21]:
# Initialize a list to store the results
results = []

# Loop through each raster path and call the function
for path in aoi_image_stack_paths:
    
    raster, crs_match, pixel_size_match = load_raster(path, expected_crs, expected_pixel_size)
    result = {
        'path': path,
        'raster': raster, 
        'crs_match': crs_match,
        'pixel_size_match': pixel_size_match
    }
    results.append(result)

### Check Image Stack Metadata 

In [22]:
# Check 
# !gdalinfo {aoi_image_stack_paths[0]}
# !gdalinfo {aoi_image_stack_paths[1]}
# !gdalinfo {aoi_image_stack_paths[2]}

## 4 Project the DHS GPS Locations to the Image Stack CRS 

In [23]:
cluster_data = extract_cluster_info(shapefile_path, cluster_field, lat_field, lon_field)

### Optional Data Inspection

In [24]:
# Print a few records
# for idx in range(0,9):
#     print(cluster_data[idx])   
    
# Print data for a specific cluster
cluster_id = 1
indices = [cluster_id]
for index in indices:
    # Unpacking the tuple assuming it has a structure like (id, float1, float2)
    cluster_id, x, y = cluster_data[index]
        
    print(f"({cluster_id}, {x:.2f}, {y:.2f})")

(2.0, 13.47, 22.20)


In [25]:
all_crs_match = all(result['crs_match'] for result in results)

if all_crs_match:
    crs_coordinates = convert_cluster_coordinates(cluster_data, src_crs='EPSG:4326', dst_crs=expected_crs)
    print(len(crs_coordinates))
    
#     for record in crs_coordinates[0:10]:
#         print(record)

    indices = [192]
    for index in indices:
        # Unpacking the tuple assuming it has a structure like (id, float1, float2)
        cluster_id, x, y = crs_coordinates[index]
        
        print(f"({cluster_id}, {x:.2f}, {y:.2f})")
        
else:
    print("*** Error: CRS does not match.")

624
(193, -172694.42, -552440.42)


## 5 Find DHS Points that Fall within the Input Raster Files
The input raster files were previously cropped, projected to the same CRS, and resampled to the same pixel size. The cropping was extended beyond the AOI so that image tiles for DHS locations near the AOI boarder can be created. 

In [26]:
all_pixel_match = all(result['pixel_size_match'] for result in results)

if all_pixel_match:
    # Collect number of points in each dataset for comparison
    number_of_points_per_dataset = []

    for result in results:
        raster = result['raster']
        
        points_within_raster = find_points_within_raster(raster, crs_coordinates, expected_crs)
        
        # Store the points within raster into the results dictionary for each raster
        result['points_within_raster'] = points_within_raster
        number_of_points_per_dataset.append(len(points_within_raster))
        print(f"Points w/in bounds: {result['path']}: {len(points_within_raster)}\n")

    # Check if all datasets have the same number of points
    if len(set(number_of_points_per_dataset)) == 1:
        print("All datasets have the same number of points.")
    else:
        print("Warning: Datasets have varying numbers of points. Here are the counts per dataset:", number_of_points_per_dataset)

else:
    print("Pixel size match does not match for one or more rasters.")


Points w/in bounds: ./AOI/TD/Image_Stack/N_VNL_v22_npp-j01_2022_global_vcmslcfg_median_TD_4_resampled_bilinear.tif: 624

Points w/in bounds: ./AOI/TD/Image_Stack/P_landscan-global-2022_TD_4_resampled_nearest.tif: 624

Points w/in bounds: ./AOI/TD/Image_Stack/R_chirps-v2.0.2023_TD_avg_TD_4_resampled_bilinear.tif: 624

All datasets have the same number of points.


## Debug:  `points_within_raster`

In [27]:
#----------------
# TEST CASE
#----------------
# specific_cluster_id = 101  # The cluster ID you are interested in

# # Initialize an empty list to store specific points
# specific_points = []

# # Loop through each point and check the cluster ID
# for item in points_within_raster:
#     if item['cluster_id'] == specific_cluster_id:
#         specific_points.append(item)
#         print(f"Point: {item['point']} with Cluster ID: {item['cluster_id']}")

In [28]:
# Use a point that was determined above to be within the AOI.
# print(points_within_raster[0])

# first_point = points_within_raster[0]  # Get the first point from the list

# # Now access its x and y coordinates for use with find_nearest_vertex() below.
# x = first_point['point'].x
# y = first_point['point'].y

## 6 Crop Image Tiles from AOI Image Stack
Loop over each data type which is stored in memory as a raster and crop an image tile of the specified size for each of the survey points specified in the `points_within_raster` list. Additionally, build a VRT file which referneces the image tiles for each data type. The VRT facilitates  loading a large number of image tiles in QGIS for visualization purposes.

In [29]:
def build_vrt(image_tile_folder, vrt_file):
   
    # Get a list of all .tif files in the directory
    tif_files = glob.glob(os.path.join(image_tile_folder, "*.tif"))

    # Create a new VRT dataset
    vrt_options = gdal.BuildVRTOptions(VRTNodata=-999)
    vrt = gdal.BuildVRT(vrt_file, tif_files, options=vrt_options)

    # Check if the VRT dataset was created successfully
    if vrt is None:
        print("Failed to build VRT")
    else:
        vrt.FlushCache()  # Write to disk
        print(f"VRT built successfully at {vrt_file}")

In [30]:
# Loop over each result and its corresponding image tile path
idx = 0
for result, image_tile_folder, image_tile_suffix in zip(results, image_tile_folders, image_tile_suffixes):
    
    raster = result['raster']
    
    aoi_name = f"{country_code}"
    
    crop_raster_rasterio(raster, points_within_raster, aoi_name, image_tile_suffix, image_tile_folder, tile_size=224, debug=False)
    
    # Construct the VRT filename
    vrt_file = f"./AOI/{country_code}/{country_code}_{vrt_file_suffixes[idx]}.vrt"
    
    build_vrt(image_tile_folder, vrt_file)

    idx += 1 

Crops are saved in ./AOI/TD/Image_Tiles/Nightlights/
VRT built successfully at ./AOI/TD/TD_Nightlights_2022_400m.vrt
Crops are saved in ./AOI/TD/Image_Tiles/Population/
VRT built successfully at ./AOI/TD/TD_Population_2022_400m.vrt
Crops are saved in ./AOI/TD/Image_Tiles/Rainfall/
VRT built successfully at ./AOI/TD/TD_Rainfall_2023_400m.vrt
