# Clipping Las Files with Polygon Features
***Davies Lab Lidar Script***<br>
Peter Boucher <br>
2022/09/23 <br>

<p>This is the first step in a 2 part process for clipping las files with a set of polygons (1-ClipLasWithPolygons.ipynb) and then, computing vegetation structure metrics from the las files for each polygon (2-ComputeMetricsByPolygon.ipynb). </p>

#### Inputs: 
- a shapefile of polygon features with a unique integer ID attribute for each polygon feature
- a folder of las files (i.e. tiled point cloud data)
    - If computing metrics (2-ComputeMetricsByPolygon.ipynb), the input las files need to have a "Height" attribute for each point (height above ground)

#### Outputs:
- a folder of clipped las files, with one file per feature, named by the unique id from the input shapefile

## Define User Inputs Below:

In [2]:
# Import Dependencies
from pathlib import Path
import sys
sys.path.append('../bin/')
from LabLidar_Functions import lasClip_IndivFeature
import geopandas as gpd
import pandas as pd
import numpy as np
import concurrent.futures
import laspy
import time

# # #
# # # USER INPUTS

# Path to a shapefile (.shp) of polygon features to clip the point cloud with.
shpf = Path('../data/in/test/shapefile/MpalaForestGEOCanopies_LabLidarTest_EPSG32637.shp')

# Input directory of las files (usually in square tiles).
ld = Path('../data/in/test/lasfileinputs/')

# Output directory for clipped las files
od = Path('../data/out/test/clippedlasfiles/')

# EPSG code of the shapefile and the las files, as a string
# Note: Shapefiles and las files must have the same EPSG code (same CRS)
# Kruger is 32736 (WGS84 UTM 36S)
# Mpala is 32637 (WGS84 UTM 37N)
epsg='32637'

# feature id column - name of attribute column in shapefile which defines each polygon feature with a unique ID
featureIDcol = 'treeID'

# # # End User Inputs
# # # 

#### 1) Load shapefile inputs, and perform quality checks

In [3]:
# Read the shapefile as a geodataframe
# Note: Expects a file with polygon features only
shpdf = gpd.read_file(str(shpf))

In [4]:
# Quality Check for duplicate feature IDS

# Checks if there is a unique ID for each polygon feature.
# Otherwise, you can get multiple polygons, overwriting issues, etc.

# Check for duplicate ids by filtering
shpdf_nodupes = shpdf[featureIDcol].drop_duplicates()

# If there are duplicates
if shpdf_nodupes.shape[0] < shpdf.shape[0]:
    
    numberofdupes = shpdf.shape[0] - shpdf_nodupes.shape[0]
    
    q = input(f"{numberofdupes} duplicate IDs found. Make new feature id and continue? y/n \n")
    
    if q == "y":
    
        # sort the file by the original index
        shpdf.sort_values(by=featureIDcol, inplace=True)

        # Make a new column with a unique index (row number) to identify each feature with
        shpdf['FeatureID'] = shpdf.index

        # Set the featureIDcol value to be this new column:
        featureIDcol = 'FeatureID'

        # save new shapefile for future reference
        # make the new shapefile name
        oshp_name = shpf.name.split('.')[0] + '_NewFeatureID.shp'
        
        # set the output folder to be in same as the input file
        newshpf = Path(str(shpf.parent) + oshp_name)
        
        # Save it
        shpdf.to_file(str(newshpf))
        
        print(f'New shapefile {newshpf.name} with FeatureID saved in {newshpf.anchor}/ \n')
        
    if q == "n":
        
        print('Operation cancelled. Provide a new shapefile with unique feature IDs.')

In [5]:
# Quality Check for Polygon Features Only

# Make a copy for testing 
shpdf_test = shpdf.copy(deep=True)
shpdf_test.head()

# Label all rows with multipolygons
shpdf_test['NotPoly'] = shpdf_test.geometry.apply(lambda x: x.type != 'Polygon')

if shpdf_test.loc[shpdf_test['NotPoly']].size > 0:
    
    numnonpolys = shpdf_test.loc[shpdf_test['NotPoly']].size[0]
    
    print(f'{numnonpolys} non-polygon features found. \n')
    
    q = input('Discard non-polygon features and continue? y\n')
    
    if q == 'y':
        
        # Filter it to only include Polygon features
        shpdf = shpdf.query('NotPoly == False')
        
    else: 
        
        print('Provide a new shapefile with only polygons and restart process.\n')

In [6]:
# Last quality check

# make sure there aren't any LAS outputs already
# If there are, point will be appended to each file 
lfs = [l for l in od.glob('*.las')]

if len(lfs) > 0:
    
    print(f'WARNING: Output las files already found in directory: \n \t{od} \n')
    print('To avoid overwrite issues, delete all files in output directory before proceeding.\n')


#### 2) Clip Las Files 

In [7]:
# Define a function for running in parallel
def lasClip_IndivFeature_Parallel(feat, IDcol=featureIDcol):
    
    try:
        
        lasClip_IndivFeature(feature=feat,
                             lasdir=ld,
                             outdir=od,
                             featureIDcol=featureIDcol,
                             epsg=epsg,
                             verb=False)
    except:
        
        print(f'Issue with {IDcol}: {feat.get(IDcol)} \n')

In [8]:
# Make a list of all features in shapefile to iterate through
features = [f for i, f in shpdf.iterrows()]

# Run tree clipping function
start = time.time()

print(f'Starting to clip {len(features)} polygon features... \n')

with concurrent.futures.ProcessPoolExecutor(max_workers=None) as executor:
        for f in zip(executor.map(lasClip_IndivFeature_Parallel, features)):
            endi = time.time()
    
end = time.time()

print(f'{len(features)} features clipped in {end-start} s.')

Starting to clip 1698 polygon features... 

1698 features clipped in 2727.744031190872 s.


# DONE!