# Title

## Introduction

This process is dependent on upstream processes. See the "Prerequisites" section below.

The workflow defined herein is identified as workflow ID #TBD in the [Data Team Master Document List](https://morpc1.sharepoint.com/:x:/s/GISteam/EfC4j3HhohZCrSZzxJdyt5cBFEqVD7zHick8ZW0INqgCYA?e=0WhrAI). References to document list identifiers are denoted by a number in brackets, e.g. [TBD].

## Process outline

## Prerequisites and usage notes

  - Outputs of one or more upstream workflows must be available at the indicated paths. Make sure that those outputs are up to date prior to running this script. 
  - This script includes several intentional RuntimeError instances that may be triggered to alert the user to conditions that may require their attention. If the script triggers one of these errors, review the error, verify that the condition is acceptable or resolve any issues, then proceed.

## Setup

### Import required packages

In [None]:
import os
import morpc

### User-specified parameters

In [None]:
# When STALE_DATA_INTERRUPT == True, the script will produce a RuntimeError in certain situations where the input 
# data may be stale and updates might be required prior to running the script.  Otherwise, a warning will be generated 
# but script execution will continue.  Regardless of whether an error or warning occurs, be sure to verify the readiness 
# of all input data.
STALE_DATA_INTERRUPT = True

# This script may pull data from outputs of upstream workflows.  The locations of these outputs are specified by their path relative
# a GitHub root directory. This is a single directory which is presumed to contain local working copies of MORPC GitHub repositories.
# Specify the path to the directory on your system where the local working copies are stored. By default, the GitHub root directory is
# assumed to be one level up from this script.
GITHUB_ROOT = "../"

# Specify the path to the directory where the input data is stored. Sometimes the data may be sourced from this location and sometimes 
# it may be sourced from elsewhere and archived here.
INPUT_DIR = "./input_data"

# Specify the path to the directory where the output data is stored. Typically it is not necessary to change this, and changing it for 
# established scripts may break other scripts that depend on outputs from this one.
OUTPUT_DIR = "./output_data"

# Specify the path to the directory where temporary outputs are stored.  Typically this is used to capture data or artifacts that are useful
# for understanding the internal workings of a script but which are not considered to be official outputs of the script and may not be 
# acceptable for use in downstream workflows.
TEMP_DIR = "./temp_data"

### Static parameters

### Define inputs

The following datasets are required by this notebook. They will be retrieved from the specified location and temporarily stored in INPUT_DIR.

#### Create input data directory

Create input data directory if it doesn't exist.

In [None]:
inputDir = os.path.normpath(INPUT_DIR)
if not os.path.exists(inputDir):
    os.makedirs(inputDir)

#### MORPC counties reference data [81]

Reference data for counties in the MORPC region will be loaded automatically as a morpc.countyLookup() object (see below).

#### Example input dataset [TBD]

In [None]:
# NOTE: As a best practice that the input schema should not be written by the script.  Rather, it should be created separately 
# (optionally with help from frictionless.Schema.describe) and only read by the script.  This will ensure that an error is produced during
# validation if the schema of the output data is inadvertently changed.
INPUT_FILENAME = "foo.csv"
INPUT_PATH = os.path.join(GITHUB_ROOT, "some-github-repo/output_data/{}".format(INPUT_FILENAME))   # Assumes that input is pulled from output of an upstream workflow
INPUT_SCHEMA_PATH = INPUT_PATH.replace(".csv",".schema.yaml")
INPUT_RESOURCE_PATH = INPUT_PATH.replace(".csv",".resource.yaml")
print("Data: {}".format(INPUT_PATH))
print("Schema: {}".format(INPUT_SCHEMA_PATH))
print("Resource: {}".format(INPUT_RESOURCE_PATH))

### Define outputs

#### Create output data directory

Create output data directory if it doesn't exist.

In [None]:
outputDir = os.path.normpath(OUTPUT_DIR)
if not os.path.exists(outputDir):
    os.makedirs(outputDir)   

#### Create temporary data directory

Create temporary data directory if it doesn't exist.

In [None]:
tempDir = os.path.normpath(TEMP_DIR)
if not os.path.exists(tempDir):
    os.makedirs(tempDir)   

#### Example output dataset [TBD]

In [None]:
OUTPUT_FILENAME = "foo.csv"
OUTPUT_PATH = os.path.join(outputDir, OUTPUT_FILENAME)
OUTPUT_SCHEMA_PATH = OUTPUT_PATH.replace(".csv",".schema.yaml")
OUTPUT_RESOURCE_PATH = OUTPUT_PATH.replace(".csv",".resource.yaml")
print("Data: {}".format(OUTPUT_PATH))
print("Schema: {}".format(OUTPUT_SCHEMA_PATH))
print("Resource: {}".format(OUTPUT_RESOURCE_PATH))

## Prepare input data

### Load county reference data

In [None]:
countyLookup = morpc.countyLookup(scope="15-County Region")

### Load example input data

In [None]:
(inputData, inputResource, inputSchema) = morpc.frictionless.load_data(INPUT_RESOURCE_PATH, archiveDir=inputDir, validate=True)

## Transform data

## Export data

In [None]:
outputData.to_csv(OUTPUT_PATH, index=False)

In [None]:
outputResource = morpc.frictionless.create_resource(OUTPUT_FILENAME, 
    resourcePath=OUTPUT_RESOURCE_PATH,
    title="Enter a meaningful title for the output dataset", 
    description="Enter a more detailed description of the output dataset.",
    writeResource=True,
    validate=True
)
outputResource