# Future Land Use RO

## Introduction

This notebook demonstrates the process of preparing a Research Object Crate (RO-Crate) for the Future Land Use dataset. The dataset contains spatial and attribute information. The spatial data will be stored in a Shapefile containing just the geographies and a unique identifier, and the attribute data will be stored in a CSV file containing the unique identifier and the remaining fields. The CSV file will be structured according to a Frictionless table schema to ensure consistency and quality. Additionally, the notebook will export the resulting data as a GeoPackage for further use. The entire process includes data extraction, transformation, validation, and packaging into a research object crate.

## Process Outline

The process carried out by this workflow can be described as follows:
  - Set Up: Import necessary packages and define parameters for file paths and output locations.
  - Extract and Split Data:
    -  Extract the ZIP file containing the Future Land Use dataset.
    -  Read the CSV file and split it into two separate files: one for geographies and one for attributes. 
  - Prepare RO-Crate:
    -  Create a Frictionless table schema for the attributes CSV file.
    -  Generate RO-Crate metadata and save it as a JSON file.
    -  Package the CSV files, schema, and metadata into a ZIP file to create the RO-Crate.   
  -  Export GeoPackage:
    -  Read the Shapefile and convert its coordinate reference system to Ohio State Plane South (EPSG:3735).
    -  Join the attributes CSV file with the Shapefile and export the resulting data as a GeoPackage.

## Set Up

### Import packages

In [1]:
import os
import pandas as pd
import geopandas as gpd
import zipfile
import json
import frictionless
import sys
sys.path.append(os.path.normpath("../../morpc-common"))
import morpc

### Parameters

#### Static Parameters

In [2]:
# Define directories and file paths
OUTPUT_DIR = os.path.normpath("./output_data")
INPUT_DIR = os.path.normpath("./input_data")

# Define INPUT file name and path for zip, csv, and xlsx
ZIP_NAME = 'Future_Land_use__MTP2024_parcels_Symbology.zip'
ZIP_INPUT_PATH = os.path.join(INPUT_DIR, ZIP_NAME)
FUTURE_LAND_USE_INPUT_NAME = "Future_Land_use__MTP2024_parcels_Symbology.csv"
FUTURE_LAND_USE_INPUT_PATH = os.path.join(INPUT_DIR, FUTURE_LAND_USE_INPUT_NAME)
TYPE_DESCIP_NAME = "LU_Standardized LandUse Type Descriptions.xlsx"
TYPE_DESCIP_PATH = os.path.join(INPUT_DIR, TYPE_DESCIP_NAME)

# Define OUTPUT shapefile name and path for data, schema, and resource files
output_shapefile_dir = os.path.join(OUTPUT_DIR, 'filtered_shapefile')
output_shapefile_path = os.path.join(output_shapefile_dir, 'filtered_data.shp')
ZIP_NAME = 'Future_Land_use__MTP2024_filtered.zip'
ZIP_OUTPUT_PATH = os.path.join(OUTPUT_DIR, ZIP_NAME)
SHAPEFILE_RESOURCE_FILE_PATH = os.path.join(OUTPUT_DIR, 'Future_Land_use__MTP2024_filtered_resource.yaml')

# Define SPLIT DEFINITIONS file name and path for data, schema, and resource files
TYPE_DESCIP_LUT_NAME = 'Land_Use_Types_descriptions.csv'
TYPE_DESCIP_LUT_PATH = os.path.join(OUTPUT_DIR, TYPE_DESCIP_LUT_NAME)
TYPE_DESCIP_MIXU_NAME = 'Mixed_Use_descriptions.csv'
TYPE_DESCIP_MIXU_PATH = os.path.join(OUTPUT_DIR, TYPE_DESCIP_MIXU_NAME)
TYPE_DESCIP_LUT_RESOURCE_NAME = 'Land_Use_Types_descriptions_resource.yaml'
TYPE_DESCIP_LUT_RESOURCE_PATH = os.path.join(OUTPUT_DIR, TYPE_DESCIP_LUT_RESOURCE_NAME)
TYPE_DESCIP_MIXU_RESOURCE_NAME = 'Mixed_Use_descriptions_resource.yaml'
TYPE_DESCIP_MIXU_RESOURCE_PATH = os.path.join(OUTPUT_DIR, TYPE_DESCIP_MIXU_RESOURCE_NAME)

# Define non-geographic attributes file name and path for data, schema, and resource files
FUTURE_LAND_USE_ATTRIB_OUTPUT_NAME = "Future_Land_use_attributes.csv"
FUTURE_LAND_USE_ATTRIB_OUTPUT_PATH = os.path.join(OUTPUT_DIR, FUTURE_LAND_USE_ATTRIB_OUTPUT_NAME)
ATTRIBUTES_SCHEMA_NAME = 'Future_Land_use_attributes_schema.yaml'
ATTRIBUTES_SCHEMA_PATH = os.path.join(OUTPUT_DIR, ATTRIBUTES_SCHEMA_NAME)
ATTRIBUTES_RESOURCE_NAME = 'Future_Land_use_attributes_resource.yaml'
ATTRIBUTES_RESOURCE_PATH = os.path.join(OUTPUT_DIR, ATTRIBUTES_RESOURCE_NAME)

# Define file name and path for zipped RO-Crate and metadata
RO_CRATE_METADATA_NAME = 'ro-crate-metadata.json'
RO_CRATE_METADATA_PATH = os.path.join(OUTPUT_DIR, RO_CRATE_METADATA_NAME)
RO_CRATE_NAME = 'future-land-use-crated.zip'
RO_CRATE_PATH = os.path.join(OUTPUT_DIR, RO_CRATE_NAME)

# Define file name and path for GeoPackage
OUTPUT_GEOPACKAGE_NAME = 'Future_Land_use.gpkg'
OUTPUT_GEOPACKAGE_PATH = os.path.join(OUTPUT_DIR, OUTPUT_GEOPACKAGE_NAME)

### Define Inputs

In [3]:
print("Zipped Future Land Use shapefile stored as: {}".format(ZIP_INPUT_PATH))
print("Future Land Use '.csv' stored as: {}".format(FUTURE_LAND_USE_INPUT_PATH))
print("Land Use Type Desciptions stored as: {}".format(TYPE_DESCIP_PATH))

Zipped Future Land Use shapefile stored as: input_data\Future_Land_use__MTP2024_parcels_Symbology.zip
Future Land Use '.csv' stored as: input_data\Future_Land_use__MTP2024_parcels_Symbology.csv
Land Use Type Desciptions stored as: input_data\LU_Standardized LandUse Type Descriptions.xlsx


### Define Outputs

In [4]:
print("Filtered RO-Crate saved as: {}".format(RO_CRATE_PATH))
print("Exported GeoPackage saved as: {}".format(OUTPUT_GEOPACKAGE_PATH))

Filtered RO-Crate saved as: output_data\future-land-use-crated.zip
Exported GeoPackage saved as: output_data\Future_Land_use.gpkg


## Code!

### Step 1: Read shapefile input, filter for geometer and OBJECTID, and save as new shapefile .zip

In [5]:
# Load the data
gdf = gpd.read_file(f'zip://{ZIP_INPUT_PATH}')

# Filter to retain only the unique ID and 'geometry' fields
required_fields = ['OBJECTID', 'geometry']
filtered_gdf = gdf[required_fields]

# Export the filtered GeoDataFrame to a Shapefile
filtered_gdf.to_file(output_shapefile_path, driver='ESRI Shapefile')

# Create a new zip file
with zipfile.ZipFile(ZIP_OUTPUT_PATH, 'w') as zipf:
    for root, _, files in os.walk(output_shapefile_dir):
        for file in files:
            file_path = os.path.join(root, file)
            zipf.write(file_path, os.path.relpath(file_path, output_shapefile_dir))

### Step 2: Split the input CSV and Excel file, save non-geographic csv feilds

In [6]:
# Define data types for the CSV file
dtype = {
    'OBJECTID': 'int',
    'ExLU21': 'str',
    'FutLU21': 'str',
    'MIXcode': 'str',
    'JoinAll': 'str',
    'Place': 'str',
    'last_edited_date': 'str',
    'County': 'str',
    'FUTsimple': 'str'
}

# Load the CSV file with specified data types
data_df = pd.read_csv(FUTURE_LAND_USE_INPUT_PATH, dtype=dtype)

# Split the dataset into CSV without 'Shape__Area', 'Shape__Length'
attributes_df = data_df.drop(columns=['Shape__Area', 'Shape__Length'])
attributes_df['OBJECTID'] = data_df['OBJECTID']  # Add unique identifier back

# Save the CSV
attributes_df.to_csv(FUTURE_LAND_USE_ATTRIB_OUTPUT_PATH, index=False)
attributes_df.head()

# Load each sheet into a DataFrame
land_use_df = pd.read_excel(TYPE_DESCIP_PATH, sheet_name='Land Use Types')
mixed_use_df = pd.read_excel(TYPE_DESCIP_PATH, sheet_name='Mixed Use')

mixed_use_df.dropna(inplace=True)
mixed_use_df = mixed_use_df.apply(lambda x: x.str.strip() if x.dtype == "object" else x)
mixed_use_df = mixed_use_df[(mixed_use_df != '').all(axis=1)]

land_use_df.dropna(inplace=True)
land_use_df = land_use_df.apply(lambda x: x.str.strip() if x.dtype == "object" else x)
land_use_df = land_use_df[(land_use_df != '').all(axis=1)]

land_use_df.to_csv(TYPE_DESCIP_LUT_PATH, index=False)
mixed_use_df.to_csv(TYPE_DESCIP_MIXU_PATH, index=False)

### Step 3: Create and validate resource files

#### Non-Geographic attribute csv resource

In [7]:
if os.path.exists(FUTURE_LAND_USE_ATTRIB_OUTPUT_PATH) and os.path.getsize(FUTURE_LAND_USE_ATTRIB_OUTPUT_PATH) > 0:

    df_temp = pd.read_csv(FUTURE_LAND_USE_ATTRIB_OUTPUT_PATH, low_memory = False)

    # Resource creation for WIDE ANNUAL
    if not df_temp.empty:
        acsResource = {
            "name": "future_land_use__attribute",
            "title": "future_land_use__attribute",
            "description": "future_land_use__attribute",
            "path": FUTURE_LAND_USE_ATTRIB_OUTPUT_NAME,
            "format": "csv",
            "mediatype": "text/csv",
            "encoding": "utf-8",
            "bytes": os.path.getsize(FUTURE_LAND_USE_ATTRIB_OUTPUT_PATH),
            "hash": morpc.md5(FUTURE_LAND_USE_ATTRIB_OUTPUT_PATH),
            "schema": ATTRIBUTES_SCHEMA_NAME,
            "profile":'tabular-data-resource'
        }
    
        # Create the resource object
        resource = frictionless.Resource(acsResource)

        print("Writing resource file to {}".format(ATTRIBUTES_RESOURCE_PATH))
        cwd = os.getcwd()
        os.chdir(os.path.dirname(ATTRIBUTES_RESOURCE_PATH))
        dummy = resource.to_yaml(os.path.basename(ATTRIBUTES_RESOURCE_PATH))
        os.chdir(cwd)
    
        print("Validating resource on disk (including data and schema). This may take some time.")
        resourceOnDisk = frictionless.Resource(ATTRIBUTES_RESOURCE_PATH)
        results = resourceOnDisk.validate()
        if(results.valid):
            print("Resource is valid\n")
        else:
            print("ERROR: Resource is NOT valid. Errors follow.\n")
            print(results)
            raise RuntimeError

else:
    print(f"{FUTURE_LAND_USE_ATTRIB_OUTPUT_PATH} does not exist or is empty\n")

Writing resource file to output_data\Future_Land_use_attributes_resource.yaml
Validating resource on disk (including data and schema). This may take some time.
Resource is valid



#### Land Use Types descriptions resource

In [8]:
if os.path.exists(TYPE_DESCIP_LUT_PATH) and os.path.getsize(TYPE_DESCIP_LUT_PATH) > 0:

    df_temp = pd.read_csv(TYPE_DESCIP_LUT_PATH)

    # Define the schema
    schema = {
        "fields": [
            {"name": "Code", "type": "string", "constraints": {"required": True}},
            {"name": "Land Use", "type": "string", "constraints": {"required": True}},
            {"name": "Land Use Description", "type": "string", "constraints": {"required": True}}
        ],
        "primaryKey": ["Code"]
    }


    # Resource creation for WIDE ANNUAL
    if not df_temp.empty:
        acsResource = {
            "name": "land_use_types_descriptions",
            "title": "land_use_types_descriptions",
            "description": "land_use_types_descriptions",
            "path": TYPE_DESCIP_LUT_NAME,
            "format": "csv",
            "mediatype": "text/csv",
            "encoding": "utf-8",
            "bytes": os.path.getsize(TYPE_DESCIP_LUT_PATH),
            "hash": morpc.md5(TYPE_DESCIP_LUT_PATH),
            "schema": schema,
            "profile":'tabular-data-resource'
        }
    
        # Create the resource object
        resource = frictionless.Resource(acsResource)

        print("Writing resource file to {}".format(TYPE_DESCIP_LUT_RESOURCE_PATH))
        cwd = os.getcwd()
        os.chdir(os.path.dirname(TYPE_DESCIP_LUT_RESOURCE_PATH))
        dummy = resource.to_yaml(os.path.basename(TYPE_DESCIP_LUT_RESOURCE_PATH))
        os.chdir(cwd)
    
        print("Validating resource on disk (including data and schema). This may take some time.")
        resourceOnDisk = frictionless.Resource(TYPE_DESCIP_LUT_RESOURCE_PATH)
        results = resourceOnDisk.validate()
        if(results.valid):
            print("Resource is valid\n")
        else:
            print("ERROR: Resource is NOT valid. Errors follow.\n")
            print(results)
            raise RuntimeError

else:
    print(f"{TYPE_DESCIP_LUT_RESOURCE_PATH} does not exist or is empty\n")

Writing resource file to output_data\Land_Use_Types_descriptions_resource.yaml
Validating resource on disk (including data and schema). This may take some time.
Resource is valid



#### Mixed use desciptions resource 

In [9]:
if os.path.exists(TYPE_DESCIP_MIXU_PATH) and os.path.getsize(TYPE_DESCIP_MIXU_PATH) > 0:

    # Define the schema
    schema = {
        "fields": [
            {"name": "Commercial sqft/acre", "type": "string"},
            {"name": "Plan DU/Acre", "type": "string"},
            {"name": "MORPC Res du/acre", "type": "string"},
            {"name": "Industrial sqft/acre", "type": "string"},
            {"name": "Office sqft/acre", "type": "string"},
            {"name": "Proportion of Site by Use (Include only relevant uses in the following order C/R/I/O)", "type": "string"},
            {"name": "MixType", "type": "string", "constraints": {"required": True}},
            {"name": "Description(based on various local plans)", "type": "string", "constraints": {"required": True}}
        ]
    }

    # Resource creation for WIDE ANNUAL
    if not df_temp.empty:
        acsResource = {
            "name": "mixed_use_descriptions",
            "title": "mixed_use_descriptions",
            "description": "mixed_use_descriptions",
            "path": TYPE_DESCIP_MIXU_NAME,
            "format": "csv",
            "mediatype": "text/csv",
            "encoding": "utf-8",
            "bytes": os.path.getsize(TYPE_DESCIP_MIXU_PATH),
            "hash": morpc.md5(TYPE_DESCIP_MIXU_PATH),
            "schema": schema,
            "profile":'tabular-data-resource'
        }
    
        # Create the resource object
        resource = frictionless.Resource(acsResource)

        print("Writing resource file to {}".format(TYPE_DESCIP_MIXU_RESOURCE_PATH))
        cwd = os.getcwd()
        os.chdir(os.path.dirname(TYPE_DESCIP_MIXU_RESOURCE_PATH))
        dummy = resource.to_yaml(os.path.basename(TYPE_DESCIP_MIXU_RESOURCE_PATH))
        os.chdir(cwd)
    
        print("Validating resource on disk (including data and schema). This may take some time.")
        resourceOnDisk = frictionless.Resource(TYPE_DESCIP_MIXU_RESOURCE_PATH)
        results = resourceOnDisk.validate()
        if(results.valid):
            print("Resource is valid\n")
        else:
            print("ERROR: Resource is NOT valid. Errors follow.\n")
            print(results)
            raise RuntimeError

else:
    print(f"{TYPE_DESCIP_MIXU_RESOURCE_PATH} does not exist or is empty\n")

Writing resource file to output_data\Mixed_Use_descriptions_resource.yaml
Validating resource on disk (including data and schema). This may take some time.
Resource is valid



#### Filtered shapefile resource

In [10]:
import hashlib

# Define the schema
schema = {
    "fields": [
        {"name": "OBJECTID", "type": "integer", "constraints": {"required": True}},
        {"name": "geometry", "type": "string", "constraints": {"required": True}}
    ]
}


acsResource = {
    "name": "future_land_use__mtp2024_filtered",
    "title": "Future Land Use MTP 2024 Filtered Parcels Symbology",
    "description": "Filtered shapefile containing the OBJECTID and geometry fields.",
    "path": ZIP_NAME,
    "format": "zip",
    "mediatype": "application/zip",
    "encoding": "utf-8",
    "bytes": os.path.getsize(ZIP_OUTPUT_PATH),
    "hash": hashlib.md5(open(ZIP_OUTPUT_PATH, 'rb').read()).hexdigest(),
    "schema": schema,
    "profile": 'data-resource'
}

# Create the resource object
resource = frictionless.Resource(acsResource)

print("Writing resource file to {}".format(SHAPEFILE_RESOURCE_FILE_PATH))
cwd = os.getcwd()
os.chdir(os.path.dirname(SHAPEFILE_RESOURCE_FILE_PATH))
resource.to_yaml(os.path.basename(SHAPEFILE_RESOURCE_FILE_PATH))
os.chdir(cwd)

print("Validating resource on disk (including data and schema). This may take some time.")
resourceOnDisk = frictionless.Resource(SHAPEFILE_RESOURCE_FILE_PATH)
results = resourceOnDisk.validate()
if results.valid:
    print("Resource is valid\n")
else:
    print("ERROR: Resource is NOT valid. Errors follow.\n")
    print(results)
    raise RuntimeError

Writing resource file to output_data\Future_Land_use__MTP2024_filtered_resource.yaml
Validating resource on disk (including data and schema). This may take some time.
Resource is valid



### Step 4: Preparing RO Crate

In [11]:
# Package the RO-Crate into a ZIP file
with zipfile.ZipFile(RO_CRATE_PATH, 'w') as ro_zip:
    ro_zip.write(RO_CRATE_METADATA_PATH, 'ro-crate-metadata.json')

    ro_zip.write(ZIP_OUTPUT_PATH, ZIP_NAME)
    ro_zip.write(SHAPEFILE_RESOURCE_FILE_PATH, 'Future_Land_use__MTP2024_filtered_resource.yaml')
    
    ro_zip.write(FUTURE_LAND_USE_ATTRIB_OUTPUT_PATH, 'Future_Land_use_attributes.csv')
    ro_zip.write(ATTRIBUTES_SCHEMA_PATH, ATTRIBUTES_SCHEMA_NAME)
    ro_zip.write(ATTRIBUTES_RESOURCE_PATH, ATTRIBUTES_RESOURCE_NAME)
    
    ro_zip.write(TYPE_DESCIP_LUT_PATH, TYPE_DESCIP_LUT_NAME)
    ro_zip.write(TYPE_DESCIP_LUT_RESOURCE_PATH, TYPE_DESCIP_LUT_RESOURCE_NAME)
    
    ro_zip.write(TYPE_DESCIP_MIXU_PATH, TYPE_DESCIP_MIXU_NAME)
    ro_zip.write(TYPE_DESCIP_MIXU_RESOURCE_PATH, TYPE_DESCIP_MIXU_RESOURCE_NAME)

print(f"RO-Crate has been saved to {RO_CRATE_PATH}")

RO-Crate has been saved to output_data\future-land-use-crated.zip


### Step 5: Exporting standard GeoPackage from Shapefile geodataframe and CSV dataframe

In [12]:
# Read the CSV file as a dataframe
attributes_df = pd.read_csv(FUTURE_LAND_USE_ATTRIB_OUTPUT_PATH, dtype=dtype)

# Read the Shapefile as a geodataframe
shapefile_gdf = gpd.read_file(output_shapefile_path)

# Convert the Shapefile to Ohio State Plane South coordinate reference system
shapefile_gdf = shapefile_gdf.to_crs(epsg=3735)

# Join the CSV dataframe to the Shapefile geodataframe using the unique identifier field
merged_gdf = shapefile_gdf.merge(attributes_df, on='OBJECTID')

# Export the resulting geodataframe as a GeoPackage
merged_gdf.to_file(OUTPUT_GEOPACKAGE_PATH, driver='GPKG')

print(f"GeoPackage has been saved to {OUTPUT_GEOPACKAGE_PATH}")

GeoPackage has been saved to output_data\Future_Land_use.gpkg
