# Student Record File Workflow

## Overview
Functions to obtain and clean data required for the Student Record File in Python using Census API and NCES files. 

The workflow produces a Student Person Record (SREC) file that can be linked to the Person Record File. The file includes student level records with sex, grade level, race and ethnicity.

Based on NCES data. 

The output of this workflow is a CSV file with the student record file.

The output CSV is designed to be used in the Interdependent Networked Community Resilience Modeling Environment (IN-CORE) for the housing unit allocation model.

IN-CORE is an open source python package that can be used to model the resilience of a community. To download IN-CORE, see:

https://incore.ncsa.illinois.edu/


## Instructions
Users can run the workflow by executing each block of code in the notebook.

Users can modify the code to select one county or multiple counties.

## Description of Program
- program:    ncoda_07ev1_run_SREC_workflow
- task:       Obtain School Location and Attendance Boundaries
- Version:    2023-02-10
- project:    Interdependent Networked Community Resilience Modeling Environment (IN-CORE) Subtask 5.2 - Social Institutions
- funding:	  NIST Financial Assistance Award Numbers: 70NANB15H044 and 70NANB20H008 
- author:     Nathanael Rosenheim

- Suggested Citation:
Rosenheim, Nathanael (2021) “Detailed Household and Housing Unit Characteristics: Data and Replication Code.” DesignSafe-CI. 
https://doi.org/10.17603/ds2-jwf6-s535.

## Setup Python Environment

In [110]:
# Import Python Packages Required for program
import pandas as pd       # Pandas for reading in data 
import geopandas as gpd   # Geopandas for reading Shapefiles
import numpy as np        # Numpy for working with arrays
import os                 # Operating System (os) For folders and finding working directory
import sys
import zipfile            # Zipfile for working with compressed Zipped files
import wget               # Wget for downloading files from the web
import scooby # Reports Python environment

In [111]:
# Generate report of Python environment
print(scooby.Report(additional=['pandas']))


--------------------------------------------------------------------------------
  Date: Mon Feb 13 15:07:35 2023 Central Standard Time

                OS : Windows
            CPU(s) : 12
           Machine : AMD64
      Architecture : 64bit
               RAM : 31.6 GiB
       Environment : Jupyter

  Python 3.10.9 | packaged by conda-forge | (main, Feb  2 2023, 20:14:58) [MSC
  v.1929 64 bit (AMD64)]

            pandas : 1.3.5
             numpy : 1.24.2
             scipy : 1.10.0
           IPython : 8.10.0
        matplotlib : 3.6.3
            scooby : 0.5.12
--------------------------------------------------------------------------------


In [112]:
#To replicate this notebook Clone the Github Package to a folder that is a sibling of this notebook.
# To access the sibling package you will need to append the parent directory ('..') to the system path list.
# append the path of the directory that includes the github repository.
# This step is not required when the package is in a folder below the notebook file.
github_code_path  = ""
sys.path.append(github_code_path)

In [113]:
os.getcwd()

'c:\\Users\\nathanael99\\MyProjects\\github\\intersect-community-data'

In [114]:
# To reload submodules need to use this magic command to set autoreload on
%load_ext autoreload
%autoreload 2
# open, read, and execute python program with reusable commands
from pyncoda.CommunitySourceData.nces_ed_gov.nces_01a_downloadfiles \
    import *
from pyncoda.CommunitySourceData.nces_ed_gov.nces_00c_cleanutils \
    import *
from pyncoda.CommunitySourceData.nces_ed_gov.nces_02c_SRECcleanCCD \
    import *
from pyncoda.CommunitySourceData.nces_ed_gov.nces_02d_SRECtidy \
    import *

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## Obtain NCES Files
This section of code provides details on the web addresses for obtaining the NCES data. These datafiles are quiet large. It is recommended that the files are downloaded once. To facilitate the downloading of the files a Comma Seperated Values (CSV) file was create using Microsoft Excel (note CSV files are easier to read into the notebook). The CSV file includes the descriptions and important file names to be obtained. This input file can be modified for different school years.

In [115]:
folder_path = 'pyncoda\\CommunitySourceData\\nces_ed_gov\\'
filename = 'nces_00b_ObtainSchoolData_2023-02-10.csv'
downloadlistcsv = folder_path + filename
county_list = ['37155']
communityname = "RobesonCounty_NC"
outputfolder = f"OutputData\\{communityname}\\01_CommunitySourceData"
year = '2015-2016'

schooldata_community, SAB_community = \
            download_nces_files(downloadlistcsv, 
                        county_list,
                        communityname,
                        outputfolder)

SyntaxError: incomplete input (4166948777.py, line 13)

In [None]:
pd.crosstab(schooldata_community['schtype'],schooldata_community['level'])

In [None]:
sab_boundaries = split_SAB_gradelevel(SAB_community,outputfolder,year)