## Description of Program
- program:    ICD_06d_UploadINCORE
- task:       Upload file to IN-CORE Data
- See github commits for description of program updates
- project:    Interdependent Networked Community Resilience Modeling Environment (IN-CORE) Subtask 5.2 - Social Institutions
- funding:	  NIST Financial Assistance Award Numbers: 70NANB15H044 and 70NANB20H008 
- author:     Nathanael Rosenheim

In [1]:
%matplotlib inline

import numpy as np
import pandas as pd
import os # For saving output to path
import scooby # Report Python Environment
import urllib
import sys

In [2]:
# Generate report of Python environment
print(scooby.Report(additional=['pandas','geopandas','pyincore','urllib']))


--------------------------------------------------------------------------------
  Date: Fri Apr 22 16:47:02 2022 Central Daylight Time

                OS : Windows
            CPU(s) : 12
           Machine : AMD64
      Architecture : 64bit
               RAM : 31.6 GiB
       Environment : Jupyter

  Python 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 05:59:45)
  [MSC v.1929 64 bit (AMD64)]

            pandas : 1.4.2
         geopandas : 0.10.2
          pyincore : 1.4.0
            urllib : Version unknown
             numpy : 1.22.3
             scipy : 1.8.0
           IPython : 8.2.0
        matplotlib : 3.5.1
            scooby : 0.5.12
--------------------------------------------------------------------------------


### Setup notebook environment to access Cloned Github Package
This notebook uses functions that are in development. The current version of the package is available at:

https://github.com/npr99/intersect-community-data

Nathanael Rosenheim. (2022). npr99/intersect-community-data. Zenodo. https://doi.org/10.5281/zenodo.6476122

A permanent copy of the package and example datasets are available in the DesignSafe-CI repository:

Rosenheim, Nathanael (2021) “Detailed Household and Housing Unit Characteristics: Data and Replication Code.” DesignSafe-CI. 
https://doi.org/10.17603/ds2-jwf6-s535.

In [3]:
# To reload submodules need to use this magic command to set autoreload on
%load_ext autoreload
%autoreload 2
# open, read, and execute python program with reusable commands
from ICD_00b_directory_design import directory_design


# Setup access to IN-CORE
For instructions on how to setup and install pyincore see:

https://incore.ncsa.illinois.edu/

In [4]:
from pyincore import IncoreClient, DataService, SpaceService, Dataset

In [5]:
client = IncoreClient()

Connection successful to IN-CORE services. pyIncore version detected: 1.4.0


In [6]:
data_services = DataService(client)

## Read In Census data for a County

In [12]:
version = '2.0.0'
version_text = 'v2-0-0'

# Save Outputfolder - due to long folder name paths output saved to folder with shorter name
# files from this program will be saved with the program name - 
# this helps to follow the overall workflow
outputfolder = "OutputData"

# Set random seed for reproducibility
seed = 1000
basevintage = 2010

communities = {'Lumberton_NC' : {
                    'community_name' : 'Lumberton, NC',
                    'counties' : { 
                        1 : {'FIPS Code' : '37155', 'Name' : 'Robeson County, NC'}}},                   
                }

# List of all communities available in IN-CORE
communities = {'Lumberton_NC' : {
                    'community_name' : 'Lumberton, NC',
                    'counties' : { 
                        1 : {'FIPS Code' : '37155', 'Name' : 'Robeson County, NC'}}},                   
                'Shelby_TN' : {
                    'community_name' : 'Memphis, TN',
                    'counties' : { 
                        1 : {'FIPS Code' : '47157', 'Name' : 'Shelby County, TN'}}},
                'Joplin_MO' : {
                    'community_name' : 'Joplin, MO',
                    'counties' : { 
                        1 : {'FIPS Code' : '29097', 'Name' : 'Jasper County, MO'},
                        2 : {'FIPS Code' : '29145', 'Name' : 'Newton County, MO'}}},
                'Seaside_OR' : {
                    'community_name' : 'Seaside, OR',
                    'counties' : { 
                        1 : {'FIPS Code' : '41007', 'Name' : 'Clatsop County, OR'}}},                   
                'Galveston_TX' : {
                    'community_name' : 'Galveston, TX',
                    'counties' : { 
                        1 : {'FIPS Code' : '48167', 'Name' : 'Galveston County, TX'}}},
                'Mobile_AL' : {
                    'community_name' : 'Mobile, AL',
                    'counties' : { 
                        1 : {'FIPS Code' : '01097', 'Name' : 'Mobile County, AL'}}}                    
                }

In [8]:
os.getcwd()

'c:\\Users\\nathanael99\\MyProjects\\IN-CORE\\Tasks\\PublishHUIv2\\HousingUnitInventories_2022-03-03\\ReplicationCode\\intersect-community-data\\pyhui'

In [9]:
# Move up one directory to intersect-community-data directory
os.chdir('..')

In [10]:
os.getcwd()

'c:\\Users\\nathanael99\\MyProjects\\IN-CORE\\Tasks\\PublishHUIv2\\HousingUnitInventories_2022-03-03\\ReplicationCode\\intersect-community-data'

## Loop through Communities and Upload Files to IN-CORE


In [13]:
# Create directory structure
for community in communities.keys():
    print("Setting up Housing Unit and Person Record Inventories for",communities[community]['community_name'])
    title = "Housing Unit Inventory v2.0.0 data for "+communities[community]['community_name']
    county_list = ''
    for county in communities[community]['counties'].keys():
        state_county = communities[community]['counties'][county]['FIPS Code']
        state_county_name  = communities[community]['counties'][county]['Name']
        county_list = county_list + state_county_name+': county FIPS Code '+state_county

        print(state_county_name,': county FIPS Code',state_county)
    
        outputfolders = directory_design(state_county_name = state_county_name,
                                            outputfolder = outputfolder)

    print("Community includes the following counties:",county_list)

    # Read in file to upload
    output_filename = f'hui_{version_text}_{community}_{basevintage}_rs{seed}'
    csv_filepath = outputfolders['top']+"/"+output_filename+'.csv'
    upload_to_incore_df = pd.read_csv(csv_filepath)

    # Set up metadata
        ## Metadata is a string describing the dataset.
        ## dataType needs to be align with the analyses in pyincore.
        ## format is the file format of the dataset. 
        ## Currently we support “shapefile”, “table”, “Network”, 
        ## “textFiles”, “raster”, “geotiff” and etc.
        ## Please consult with development team if you intend to post a new format.
    hui_description =  '\n'.join(["2010 Housing Unit Inventory v2.0.0 with required IN-CORE columns. " 
                   "Compatible with pyincore v1.4. " 
                   "Unit of observation is housing unit. " 
                   "Detailed characteristics include number of persons, race, ethnicity, "
                   "vacancy type, group quarters type, and household income. " 
                   "For more details on this data file refer to " 
                   "Rosenheim, Nathanael (2021) 'Detailed Household and " 
                   "Housing Unit Characteristics: Data and Replication Code.' "
                   "DesignSafe-CI. https://doi.org/10.17603/ds2-jwf6-s535. "
                   "For more details on the replication code, refer to " 
                   "Rosenheim, Nathanael. (2022). npr99/intersect-community-data. Zenodo. " 
                   "https://doi.org/10.5281/zenodo.6476122. "
                   "File includes data for "+county_list])

    # note you have to put the correct dataType as well as format
    dataset_metadata = {
        "title":title,
        "description": hui_description,
        "dataType": "incore:housingUnitInventory",
        "format": "table"
        }
    ## Upload files to IN-CORE
    # Check if dataset already exists in IN-CORE
    # if it does, skip upload
    # if it doesn't, upload

    # Search Data Services for dataset

    url = urllib.parse.urljoin(data_services.base_url, "search")
    search_title = {"text": title}
    matched_datasets = data_services.client.get(url, params=search_title)

    match_count = len(matched_datasets.json())
    print(f'Number of datasets matching {title}: {match_count}')

    if match_count > 0:
        for dataset in matched_datasets.json():
            incore_filename = dataset['fileDescriptors'][0]['filename']
            if (dataset['title'] == title) and (incore_filename == output_filename+'.csv'):
                print(f'Dataset {title} already exists in IN-CORE')
                print(f'Dataset already exists in IN-CORE with filename {incore_filename}')
                dataset_id = dataset['id']
                print("Use dataset_id:",dataset_id)
                break
            else:
                print(f'Dataset {title} does not exist in IN-CORE')
                created_dataset = data_services.create_dataset(properties = dataset_metadata)
                dataset_id = created_dataset['id']
                print('dataset is created with id ' + dataset_id)

                ## Attach files to the dataset created
                files = [csv_filepath]
                full_dataset = data_services.add_files_to_dataset(dataset_id, files)

                print('The file(s): '+ output_filename +" have been uploaded to IN-CORE")
    elif match_count == 0:
        print(f'Dataset {title} does not exist in IN-CORE')
        created_dataset = data_services.create_dataset(properties = dataset_metadata)
        dataset_id = created_dataset['id']
        print('dataset is created with id ' + dataset_id)

        ## Attach files to the dataset created
        files = [csv_filepath]
        full_dataset = data_services.add_files_to_dataset(dataset_id, files)

        print('The file(s): '+ output_filename +" have been uploaded to IN-CORE")

Setting up Housing Unit and Person Record Inventories for Lumberton, NC
Robeson County, NC : county FIPS Code 37155
Community includes the following counties: Robeson County, NC: county FIPS Code 37155
Number of datasets matching Housing Unit Inventory v2.0.0 data for Lumberton, NC: 1
Dataset Housing Unit Inventory v2.0.0 data for Lumberton, NC already exists in IN-CORE
Dataset already exists in IN-CORE with filename hui_v2-0-0_Lumberton_NC_2010_rs1000.csv
Use dataset_id: 6262ef3204ce841cbeb30993
Setting up Housing Unit and Person Record Inventories for Memphis, TN
Shelby County, TN : county FIPS Code 47157
Community includes the following counties: Shelby County, TN: county FIPS Code 47157
Number of datasets matching Housing Unit Inventory v2.0.0 data for Memphis, TN: 0
Dataset Housing Unit Inventory v2.0.0 data for Memphis, TN does not exist in IN-CORE
dataset is created with id 6263229f04ce841cbeb309ee
The file(s): hui_v2-0-0_Shelby_TN_2010_rs1000 have been uploaded to IN-CORE
Setti