# Upload Block Dat with School Attendance Zones to IN-CORE
    

## Description of Program
- program:    IN-CORE_1gv1_UploadBlockSABS_2021-07-07
- task:       Upload Block dat with School Attendance Zones to IN-CORE
- Version:    2021-07-07
- project:    Interdependent Networked Community Resilience Modeling Environment (IN-CORE) Subtask 5.2 - Social Institutions
- funding:	  NIST Financial Assistance Award Numbers: 70NANB15H044 and 70NANB20H008 
- author:     Nathanael Rosenheim

- Suggested Citation:
Rosenheim, N. (2021) “Obtain, Clean, and Explore Labor Market Allocation Methods". 
Archived on Github and ICPSR.

Instructions on how to add dataset to IN-CORE:
https://incore.ncsa.illinois.edu/doc/incore/notebooks/create_dataset/create_dataset.html

In [None]:
import pandas as pd
from pyincore import IncoreClient, DataService, SpaceService, Dataset

import os # For saving output to path

In [None]:
# Display versions being used - important information for replication
import sys
print("Python Version     ", sys.version)
print("pandas version:    ", pd.__version__)

Python Version      3.7.10 | packaged by conda-forge | (default, Feb 19 2021, 15:37:01) [MSC v.1916 64 bit (AMD64)]
pandas version:     1.2.5


In [None]:
client = IncoreClient()

Enter username: natrose
Enter password: ········
Connection successful to IN-CORE services. pyIncore version detected: 0.9.4


In [None]:
data_service = DataService(client)
space_service = SpaceService(client)

## Check if File has already been Uploaded
If this notebook has already been run the file will be accessible on IN-CORE.

In [None]:
# Data file id
dataset_id = "60e5edd3d3c92a78c8940d06"
# load building inventory
dataset = Dataset.from_data_service(dataset_id, data_service)
filename = dataset.get_file_path('csv')
metadata = data_service.get_dataset_metadata(dataset_id = dataset_id)
print("The IN-CORE Dataservice has saved the dataset titled: "+metadata['title']+" on your local machine: "+filename)
print("\nDataset Description: \n"+metadata['description'])

Dataset already exists locally. Reading from local cached zip.
Unzipped folder found in the local cache. Reading from it...
The IN-CORE Dataservice has saved the dataset titled: Robeson County, NC Census Blocks and School Attendance Boundaries on your local machine: C:\Users\nathanael99\.incore\cache_data\60e5edd3d3c92a78c8940d06\IN-CORE_1hv1_AddSABS_BlockData_2021-07-01\IN-CORE_1hv1_AddSABS_BlockData_2021-07-01.csv

Dataset Description: 
2010 Census Blocks matched to 2015-2016 NCES School Attendance Boundaries. 2010 Census Block ID: BLOCKID10Primary schools (ncessch_1: NCESid, primary_schnm: Primary School Name) Middle schools (ncessch_2: NCESid, mid_schnm: Middle School Name) High schools (ncessch_3: NCESid, high_schnm: High School Name) Unique ids are primary and foreign keys to link to Housing Unit Inventory and School Characteristics.


## Read In Data to Upload

In [None]:
sourceprogram = "IN-CORE_1hv1_AddSABS_BlockData_2021-07-01"
filename = sourceprogram+".csv"
block_df = pd.read_csv(sourceprogram+"/"+filename)
block_df.head(2)

Unnamed: 0,BLOCKID10,ncessch_3,high_schnm,ncessch_2,mid_schnm,ncessch_1,primary_schnm
0,371559619002028,370393002184,South Robeson High,370393000000.0,Fairgrove Middle,370393001571,Green Grove Elementary
1,371559619002054,370393002232,Fairmont High,370393000000.0,Fairmont Middle,370393002241,Rosenwald Elementary


## Write Metadata
- Metadata is a string describing the dataset.
- dataType needs to be align with the analyses in pyincore.
- format is the file format of the dataset. Currently we support “shapefile”, “table”, “Network”, “textFiles”, “raster”, “geotiff” and etc. Please consult with development team if you intend to post a new format.

In [None]:
# note you have to put the correct dataType as well as format
dataset_metadata = {
    "title":"Robeson County, NC Census Blocks and School Attendance Boundaries",
    "description": "2010 Census Blocks matched to 2015-2016 NCES School Attendance Boundaries. "+
                   "2010 Census Block ID: BLOCKID10"
                   "Primary schools (ncessch_1: NCESid, primary_schnm: Primary School Name) "+
                   "Middle schools (ncessch_2: NCESid, mid_schnm: Middle School Name) "+
                   "High schools (ncessch_3: NCESid, high_schnm: High School Name) " +
                   "Unique ids are primary and foreign keys to link to Housing Unit Inventory and School Characteristics.",
    "dataType": "",
    "format": "table"
}

In [None]:
dataset_metadata

{'title': 'Robeson County, NC Census Blocks and School Attendance Boundaries',
 'description': '2010 Census Blocks matched to 2015-2016 NCES School Attendance Boundaries. 2010 Census Block ID: BLOCKID10Primary schools (ncessch_1: NCESid, primary_schnm: Primary School Name) Middle schools (ncessch_2: NCESid, mid_schnm: Middle School Name) High schools (ncessch_3: NCESid, high_schnm: High School Name) Unique ids are primary and foreign keys to link to Housing Unit Inventory and School Characteristics.',
 'dataType': '',
 'format': 'table'}

In [None]:
metadata['title'] == dataset_metadata['title'] 

True

In [None]:
metadata['description'] == dataset_metadata['description'] 

True

## Check if file exists on IN-CORE and then upload metadata and file
After upload metadata the “placeholder” dataset object has been created on INCORE service with the id which does not have files attached to it yet. However it is already possible to see the empty dataset on the service by searching that particular id.

In [None]:
# If dataset is already uploaded skip this step:
if (metadata['title'] == dataset_metadata['title']) and \
   (metadata['description'] == dataset_metadata['description']) and \
   (metadata['fileDescriptors'][0]['filename'] == filename):
    print("dataset with id "+dataset_id+" has the same filename, title and description.")
    print('The file(s): '+ filename+ " can be accessed using IN-CORE dataservice.")
else:
    created_dataset = data_services.create_dataset(dataset_metadata)
    dataset_id = created_dataset['id']
    print('dataset is created with id ' + dataset_id)
    
    ## Attach files to the dataset created
    files = [sourceprogram+"/"+filename]
    full_dataset = data_service.add_files_to_dataset(dataset_id, files)
    
    print('The file(s): '+ filename +" have been uploaded to IN-CORE")

dataset with id 60e5edd3d3c92a78c8940d06 has the same filename, title and description.
The file(s): IN-CORE_1hv1_AddSABS_BlockData_2021-07-01.csv can be accessed using IN-CORE dataservice.
