# Future Land Use RO

## Introduction

This notebook demonstrates the process of preparing a Research Object Crate (RO-Crate) for the Future Land Use dataset. The dataset contains spatial and attribute information, which will be split into two separate files: a Shapefile containing just the geographies and a unique identifier, and a CSV file containing the unique identifier and the remaining fields. The CSV file will be structured according to a Frictionless table schema to ensure consistency and quality. Additionally, the notebook will export the resulting data as a GeoPackage for further use.

## Process Outline

The process carried out by this workflow can be described as follows:
  - Set Up: Import necessary packages and define parameters for file paths and output locations.
  - Extract and Split Data:
    -  Extract the ZIP file containing the Future Land Use dataset.
    -  Read the CSV file and split it into two separate files: one for geographies and one for attributes. 
  - Prepare RO-Crate:
    -  Create a Frictionless table schema for the attributes CSV file.
    -  Generate RO-Crate metadata and save it as a JSON file.
    -  Package the CSV files, schema, and metadata into a ZIP file to create the RO-Crate.   
  -  Export GeoPackage:
    -  Read the Shapefile and convert its coordinate reference system to Ohio State Plane South (EPSG:3735).
    -  Join the attributes CSV file with the Shapefile and export the resulting data as a GeoPackage.

## Set Up

### Import packages

In [9]:
import os
import pandas as pd
import geopandas as gpd
import zipfile
import json

### Parameters

#### Static Parameters

In [10]:
# Define directories and file paths
OUTPUT_DIR = os.path.normpath("./output_data")
INPUT_DIR = os.path.normpath("./input_data")

# Define file name and path for input zip, csv, and xlsx
ZIP_NAME = 'Future_Land_use__MTP2024_parcels_Symbology.zip'
ZIP_PATH = os.path.join(INPUT_DIR, ZIP_NAME)
FUTURE_LAND_USE_INPUT_NAME = "Future_Land_use__MTP2024_parcels_Symbology.csv"
FUTURE_LAND_USE_INPUT_PATH = os.path.join(INPUT_DIR, FUTURE_LAND_USE_INPUT_NAME)
TYPE_DESCIP_NAME = "LU_Standardized LandUse Type Descriptions.xlsx"
TYPE_DESCIP_PATH = os.path.join(INPUT_DIR, TYPE_DESCIP_NAME)

# Define file name and path for split xlsx land use definitions
TYPE_DESCIP_LUT_NAME = 'Land_Use_Types_descriptions.csv'
TYPE_DESCIP_LUT_PATH = os.path.join(OUTPUT_DIR, TYPE_DESCIP_LUT_NAME)
TYPE_DESCIP_MIXU_NAME = 'Mixed_Use_descriptions.csv'
TYPE_DESCIP_MIXU_PATH = os.path.join(OUTPUT_DIR, TYPE_DESCIP_MIXU_NAME)

# Define folder name and path for all extracted files from zip
EXTRACT_NAME = 'extracted_files'
EXTRACT_PATH = os.path.join(INPUT_DIR, EXTRACT_NAME)

# Define file name and path for extracted shapefile
SHAPEFILE_NAME = 'Future_Land_use__MTP2024_parcels_Symbology.shp'
SHAPEFILE_PATH = os.path.join(EXTRACT_PATH, SHAPEFILE_NAME)

# Define file name and path for csv split by geography and other attributes
FUTURE_LAND_USE_GEO_OUTPUT_NAME = "Future_Land_use__geography.csv"
FUTURE_LAND_USE_GEO_OUTPUT_PATH = os.path.join(OUTPUT_DIR, FUTURE_LAND_USE_GEO_OUTPUT_NAME)
FUTURE_LAND_USE_ATTRIB_OUTPUT_NAME = "Future_Land_use__attributes.csv"
FUTURE_LAND_USE_ATTRIB_OUTPUT_PATH = os.path.join(OUTPUT_DIR, FUTURE_LAND_USE_ATTRIB_OUTPUT_NAME)

# Define file name and path for zipped RO-Crate
RO_CRATE_NAME = 'future-land-use-crated.zip'
RO_CRATE_PATH = os.path.join(OUTPUT_DIR, RO_CRATE_NAME)

# Define file name and path for RO-Crate metadata
RO_CRATE_METADATA_NAME = 'ro-crate-metadata.json'
RO_CRATE_METADATA_PATH = os.path.join(OUTPUT_DIR, RO_CRATE_METADATA_NAME)

# RO-Crate metadata definition
RO_CRATE_METADATA = {
    "@context": "https://w3id.org/ro/crate/1.1/context",
    "@graph": [
        {
            "@id": "ro-crate-metadata.json",
            "@type": "CreativeWork",
            "conformsTo": {"@id": "https://w3id.org/ro/crate/1.1"},
            "about": {"@id": "./"}
        },
        {
            "@id": "./",
            "@type": "Dataset",
            "name": "Future Land Use Data",
            "description": "Dataset containing future land use data with spatial and attribute information.",
            "hasPart": [
                {"@id": "Future_Land_use__geography.csv"},
                {"@id": "Future_Land_use__attributes.csv"},
                {"@id": "attributes_schema.json"}
            ]
        },
        {
            "@id": "Future_Land_use__geography.csv",
            "@type": "File",
            "name": "Shapefile Data CSV",
            "encodingFormat": "text/csv"
        },
        {
            "@id": "Future_Land_use__attributes.csv",
            "@type": "File",
            "name": "Attributes Data CSV",
            "encodingFormat": "text/csv"
        },
        {
            "@id": "attributes_schema.json",
            "@type": "File",
            "name": "Attributes Schema",
            "encodingFormat": "application/json"
        }
    ]
}

# Define file name and path for GeoPackage
OUTPUT_GEOPACKAGE_NAME = 'Future_Land_use.gpkg'
OUTPUT_GEOPACKAGE_PATH = os.path.join(OUTPUT_DIR, OUTPUT_GEOPACKAGE_NAME)

# Define file name and path for attributes schema
ATTRIBUTES_SCHEMA_NAME = 'attributes_schema.yaml'
ATTRIBUTES_SCHEMA_PATH = os.path.join(OUTPUT_DIR, ATTRIBUTES_SCHEMA_NAME)

# Create output directory if it doesn't exist
os.makedirs(OUTPUT_DIR, exist_ok=True)

### Define Inputs

In [11]:
print("Zipped Future Land Use shapefile stored as: {}".format(ZIP_PATH))
print("Unzipped Future Land Use shapefiles will be stored in: {}".format(EXTRACT_PATH))
print("Future Land Use '.csv' stored as: {}".format(FUTURE_LAND_USE_INPUT_PATH))
print("Land Use Type Desciptions stored as: {}".format(TYPE_DESCIP_PATH))

Zipped Future Land Use shapefile stored as: input_data\Future_Land_use__MTP2024_parcels_Symbology.zip
Unzipped Future Land Use shapefiles will be stored in: input_data\extracted_files
Future Land Use '.csv' stored as: input_data\Future_Land_use__MTP2024_parcels_Symbology.csv
Land Use Type Desciptions stored as: input_data\LU_Standardized LandUse Type Descriptions.xlsx


### Define Outputs

In [12]:
print("Land Use Type Desciptions stored as: {}".format(TYPE_DESCIP_LUT_PATH))
print("Mixed Use Type Desciptions stored as: {}".format(TYPE_DESCIP_MIXU_PATH))
print("Geographic split '.csv' stored as: {}".format(FUTURE_LAND_USE_GEO_OUTPUT_PATH))
print("Non-geographic split '.csv' stored as: {}".format(FUTURE_LAND_USE_ATTRIB_OUTPUT_PATH))
print("Attribute schema stored as: {}".format(ATTRIBUTES_SCHEMA_PATH))
print("RO-Crate metadata stored as: {}".format(RO_CRATE_METADATA_PATH))
print("Zipped RO-Crate stored as: {}".format(RO_CRATE_PATH))
print("GeoPackage stored as: {}".format(OUTPUT_GEOPACKAGE_PATH))

Land Use Type Desciptions stored as: output_data\Land_Use_Types_descriptions.csv
Mixed Use Type Desciptions stored as: output_data\Mixed_Use_descriptions.csv
Geographic split '.csv' stored as: output_data\Future_Land_use__geography.csv
Non-geographic split '.csv' stored as: output_data\Future_Land_use__attributes.csv
Attribute schema stored as: output_data\attributes_schema.yaml
RO-Crate metadata stored as: output_data\ro-crate-metadata.json
Zipped RO-Crate stored as: output_data\future-land-use-crated.zip
GeoPackage stored as: output_data\Future_Land_use.gpkg


## Code!

### Step 1: Extract the ZIP file, split the CSV and Excel file

In [13]:
# Extract ZIP file
with zipfile.ZipFile(ZIP_PATH, 'r') as zip_ref:
    zip_ref.extractall(EXTRACT_PATH)

# Define data types for the CSV file
dtype = {
    'OBJECTID': 'int',
    'ExLU21': 'str',
    'FutLU21': 'str',
    'MIXcode': 'str',
    'JoinAll': 'str',
    'Place': 'str',
    'last_edited_date': 'str',
    'County': 'str',
    'FUTsimple': 'str'
}

# Load the CSV file with specified data types
data_df = pd.read_csv(FUTURE_LAND_USE_INPUT_PATH, dtype=dtype)

# Split the dataset into two CSVs using the correct column names
shapefile_df = data_df[['OBJECTID','Shape__Area', 'Shape__Length']]
attributes_df = data_df.drop(columns=['Shape__Area', 'Shape__Length'])
attributes_df['OBJECTID'] = data_df['OBJECTID']  # Add unique identifier back

# Save the CSVs
shapefile_df.to_csv(FUTURE_LAND_USE_GEO_OUTPUT_PATH, index=False)
attributes_df.to_csv(FUTURE_LAND_USE_ATTRIB_OUTPUT_PATH, index=False)

attributes_df.head()
shapefile_df.head()

# Load each sheet into a DataFrame
land_use_df = pd.read_excel(TYPE_DESCIP_PATH, sheet_name='Land Use Types')
mixed_use_df = pd.read_excel(TYPE_DESCIP_PATH, sheet_name='Mixed Use')

land_use_df.to_csv(TYPE_DESCIP_LUT_PATH, index=False)
mixed_use_df.to_csv(TYPE_DESCIP_MIXU_PATH, index=False)

### Step 2: Preparing RO Crate

In [14]:
# Save the RO-Crate metadata to a JSON file
with open(RO_CRATE_METADATA_PATH, 'w') as f:
    json.dump(RO_CRATE_METADATA, f, indent=4)

# Package the RO-Crate into a ZIP file
with zipfile.ZipFile(RO_CRATE_PATH, 'w') as ro_zip:
    ro_zip.write(FUTURE_LAND_USE_GEO_OUTPUT_PATH, 'Future_Land_use__geography.csv')
    ro_zip.write(FUTURE_LAND_USE_ATTRIB_OUTPUT_PATH, 'Future_Land_use__attributes.csv')
    ro_zip.write(ATTRIBUTES_SCHEMA_PATH, 'attributes_schema.yaml')
    ro_zip.write(RO_CRATE_METADATA_PATH, 'ro-crate-metadata.json')

### Step 3: Exporting standard GeoPackage from Shapefile geodataframe and CSV dataframe

In [15]:
# Read the CSV file as a dataframe
attributes_df = pd.read_csv(FUTURE_LAND_USE_ATTRIB_OUTPUT_PATH, dtype=dtype)

# Read the Shapefile as a geodataframe
shapefile_gdf = gpd.read_file(SHAPEFILE_PATH)

# Convert the Shapefile to Ohio State Plane South coordinate reference system
shapefile_gdf = shapefile_gdf.to_crs(epsg=3735)

# Join the CSV dataframe to the Shapefile geodataframe using the unique identifier field
merged_gdf = shapefile_gdf.merge(attributes_df, on='OBJECTID')

# Export the resulting geodataframe as a GeoPackage
merged_gdf.to_file(OUTPUT_GEOPACKAGE_PATH, driver='GPKG')

print(f"GeoPackage has been saved to {OUTPUT_GEOPACKAGE_PATH}")

GeoPackage has been saved to output_data\Future_Land_use.gpkg


### Step 4: Preview Outputs

In [16]:
# Check for the presence of the output files
output_files = [
    ATTRIBUTES_SCHEMA_PATH,
    FUTURE_LAND_USE_ATTRIB_OUTPUT_PATH,
    FUTURE_LAND_USE_GEO_OUTPUT_PATH,
    RO_CRATE_PATH,
    OUTPUT_GEOPACKAGE_PATH
]

missing_files = [file for file in output_files if not os.path.exists(file)]
if missing_files:
    print("The following expected output files are missing:")
    for file in missing_files:
        print(file)
else:
    print("All expected output files are present.")

All expected output files are present.
