<a href="https://colab.research.google.com/github/npr99/URSC645/blob/main/Admin/TemplateFiles/URSC645_00_templateDSnotebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Template Jupyter Notebook File
This is a template notebook file that can be used to make new notebooks with a structured format.

Guide to writing in notebook markdown: https://medium.com/analytics-vidhya/the-ultimate-markdown-guide-for-jupyter-notebook-d5e5abf728fd

## Description of Program
- program:    FAIS_00_templateDSnotebook_2021-01-23.ipynb
- task:       Template Jupyter Notebook File for Design Safe
- Version:    first Version
- project:    Food Access Impact Study
- funding:	  NSF CRISP 1638273 & 1760726
- author:     Nathanael Rosenheim \ Jan 23, 2021

- Suggested Citation:
Rosenheim, N.; Peacock, W. G.; Williams, A.; Lane, G.; Watson, M; Sullivan, E.; 
Katare, A.; Kastor, H. (2021) “Report of Applied Methods”, 
in Food Access Impact Survey for Harris County and Southeast Texas after 
Hurricane Harvey in 2017. 
Archived on DesignSafe-CI https://www.designsafe-ci.org/  

## Control Python
Add install and add packages. Check versions for future replication.

### Section notes
This section is where packages can be installed and imported.

### How to install a new package
DesignSafe has many packages preinstalled. However, it is possible to install a new package if needed.

Here is an example using `!pip` command to install the `contextily` package
```
# The contextily package is required for adding base map to maps
!pip install contextily --user --quiet
```
Note - the `--user` option is required in DesignSafe.
The `--quiet` command is a nice option, and it hides the output, this simply keeps the notebook more presentable and less clutered.

### Import packages
Here is a list of common packages to import:

```
import pandas as pd     # For obtaining and cleaning tabular data
import geopandas as gpd # For obtaining and cleaning spatial data
import os # For saving output to path

import folium as fm # folium has more dynamic maps - but requires internet connection

# Pacakges for making quick maps of data
import matplotlib as matlib
import matplotlib.pyplot as plt  # for plotting points
import matplotlib.image as mpimg
import contextily as ctx # package for mapping shapefiles
```

In [None]:
import pandas as pd     # For obtaining and cleaning tabular data
import geopandas as gpd # For obtaining and cleaning spatial data
import os # For saving output to path

## Check Versions
Python is a collection of opensource packages - which often change overtime. Unless you use the same version a program may not replicate if a newer package version change the way it works.

Therefore, it is important to check the versions of python and all packages.

```
# Display versions being used - important information for replication
import sys
print("Python Version     ", sys.version)
print("geopandas version: ", gpd.__version__)
print("pandas version:    ", pd.__version__)
print("folium version:    ", fm.__version__)
print("matplotlib version: ", matlib.__version__)
print("contextily versino: ", ctx.__version__)
```

In [None]:
import sys
print("Python Version     ", sys.version)
print("geopandas version: ", gpd.__version__)

## Set Provenance 
Where will output files be saved?
What is the program name? 

Provenance refers to the place of origin, the history of something. This section helps keep track of where a program was located at the time it was run and where to expect to find outputs.

This program is designed to use relative paths for obtaining data and output data. 

The output files are saved in the a folder named after the program - an excellent tool for keeping track of provenance. The output files are also named after the source program, which helps identify the provenance of a file.

### Note on filename scheme:
```
File Naming:
All files must start with the project or subproject mnemonic
All files should end with the date in the format YYYY-MM-DD

File Name Structure
                   s     #
                    \   /     description         extension
                     - -     /                    /
                PRJ_tsv#_xxxxxxxxxxxx_yyyy-mm-dd.ext
                   -  -                -   -  -
                  /  /                 |   |  |
                 t  v                  y   m  d


            name    length          contents
            -----------------------------------------------------------
            PRJ         3-5         Project Mnemonic (fixed string)
            _            1          padding underscore
            t            1          data science workflow task number (0-6)
            s            1          letter step within task (a,b,c..)
            v            1          v = version
            #            1          version number (1,2,3,4...)
            _            1          padding underscore
            x           5-10*       description of step
            _            1          padding underscore
            y            4          year (2017,2018,2019,2020...)
            -            1          padding dash
            m            2          month (01,02...12)
            -            1          padding dash
            d            2          date (01,02,...31)
            .            1          decimal
            ext          3          file type extension
            -----------------------------------------------------------
            t: Report Sections

The data science steps include:

0 = Research Log or Project Admin
1 = Obtain Data
2 = Clean Data
3 = Explore Data
4 = Model Data
5 = Interpret Data
6 = Publish Data
```

For more information on file naming scheme see:

https://github.com/npr99/URSC645/blob/main/Admin/URSC645_FileNaming.md 


### Note on folder organization:
This program depends on relative paths for folder locations. The Source Data is located in a SourceData folder that is located two directories above the program file. All required source files are saved in directories that provide information about the provenance of the source data. Files created by this program are saved in a folder with the same name as the folder.

In [None]:
# Get information on current working directory (getcwd)
os.getcwd()

In [None]:
# Store Program Name for output files to have the same name
programname = "FAIS_00_templateDSnotebook_2021-01-24"
# Make directory to save output
if not os.path.exists(programname):
    os.mkdir(programname)

# Step 1: Obtain Data

# Step 2: Clean Data

# Step 3: Explore Data

# Iterative Step 2: Clean Data
It is often necessary to clean data after the data has been explored.

# Iterative Step 3: Explore Data

# Step 4: Model Data

# Output files

In [None]:
# Save Work at this point as CSV
savefile = programname+"/"+programname+".csv"
points_gdf.to_csv(savefile, index=False)