# NK's Tutorial for NEA Arts Data

To begin, install Miniconda2 through command line tools (on MacOS). I used the following steps:

1. Install homebrew through terminal. 
2. Determine which python version you have by running: python --version
3. Run: brew install wget
4. Run: wget https://repo.continuum.io/miniconda/Miniconda2-latest-MacOSX-x86_64.sh
5. Run: bash Miniconda2-latest-MacOSX-x86_64.sh
    * (Note: install the corresponding version of miniconda depending on your python version) 
6. Complete the installer's steps.
7. Make sure the PATH to conda has been added to your .bash_profile by running: nano ~/.bash_profile
    * (If it is not added, run sudo nano ~/.bash_profile and add ' export PATH="$HOME/miniconda2/bin:$PATH" '
    to the file and save it.
8. Run: source ~/.bash_profile if you had to edit this file.
9. Install the r-irkernel package to run a local version of R. 
    * Instructions: https://irkernel.github.io/installation/
10. (Optional: Start a Jupyter Notebook): jupyter notebook
    
You are now ready to begin the tutorial!

# Greer's Tutorial

## Why R?
1. Free/open source geocode capabilties.
2. Merging/matching to congressional districts/MSAs/counties/etc. 
3. Add geographic indicators to geocoded data.

## Basic Information about R & RStudio

### Screens in RStudio
1. **Script window:** *(top left)* the commands and functions that you would like to keep for later use
2. **Console:** *(bottom left)* the window where the commands run and produce outputs. Commands typed here do not get saved in your script. 
3. **Environment/Help window:** shows details about the current dataset loaded and any help documentation for R
4. **Plots/packages window:** shows details about the current package selected (for example, details about the "neaR" package)

### How to Run a Script
1. Highlight the command you would like to run
2. Click the key combination CTRL+ENTER

## Step 1

### Step 1.a.
Install the devtools package by writing the following command in the script window. Highlight the command in a script window and click CTRL+ENTER to run the command.

The command is finished executing when the ">" arrow appears in the console. 

In [1]:
install.packages("devtools")


The downloaded binary packages are in
	/var/folders/kp/2dzcn39934x4700rb9kpvtx00000gn/T//RtmpUbOjsi/downloaded_packages


### Step 1.b.
Enable the devtools package by writing the following command in the script window. Then, highlight "library(devtools)" and click CTRL+ENTER. If done correctly, the next line in the console should show this symbol ">".

In [2]:
library(devtools)

“package ‘devtools’ was built under R version 3.4.1”

## Step 2

### Step 2.a. 
Install Greer Mellon's "neaR" package. 

In [3]:
install_github("gmellon/neaR")

Skipping install of 'neaR' from a github remote, the SHA1 (c969f1b9) has not changed since last install.
  Use `force = TRUE` to force installation


### Step 2.b.

Enable the neaR package.

In [4]:
library(neaR)

Loading required package: sf
“package ‘sf’ was built under R version 3.4.1”Linking to GEOS 3.6.1, GDAL 2.1.3, proj.4 4.9.3
Loading required package: stringr
Loading required package: tigris
As of version 0.5.1, tigris does not cache downloaded data by default. To enable caching of data, set `options(tigris_use_cache = TRUE)` in your R script or .Rprofile.

Attaching package: ‘tigris’

The following object is masked from ‘package:graphics’:

    plot

Loading required package: RJSONIO
Loading required package: RDSTK
Loading required package: plyr
Loading required package: rjson

Attaching package: ‘rjson’

The following objects are masked from ‘package:RJSONIO’:

    fromJSON, toJSON

Loading required package: RCurl
Loading required package: bitops
Loading required package: progress
Loading required package: sp
“package ‘sp’ was built under R version 3.4.1”Loading required package: httr


## Step 3

### Step 3.a. 

Set your working directory. This is where your files will be located after the scripts run. 

In the RStudio application window, at the top select **Session --> Set working directory --> Choose Directory --> (select folder location you would like to work in)**

In [7]:
setwd("/Users/michelle/Desktop/Tableau Workshop Materials - 081617/Data Files/")

### Step 3.b.

Use the following command to receive a list of files and directories in the working directory previously set. 

In [8]:
list.files()

## Step 4
### Step 4.a.
Assign an object to a data file by typing in the following command to your script window and using the CTRL-ENTER key combination.  

In [12]:
NEA <- read.csv("NEA_workshop_file.csv" , stringsAsFactors=F )

### Step 4.b.
Use the following command to view the CSV/dataset that you just assigned to the "NEA" object. 

In [13]:
View(NEA)

ERROR: Error in View(NEA): ‘View()’ not yet supported in the Jupyter R kernel


You have now successfully loaded data into RStudio and can now use this data with the "neaR" package commands. 

# Working with the neaR Package

In [14]:
NEA$CoZip

In [15]:
NEA$CoZip <- get_padded_zip(NEA$CoZip)

In [16]:
help("create_full_address")

**Example:** 

NEA$full_address <- create_full_address(NEA, c("CoAddress1", "CoAddress2",
"CoCity", "CoState", "CoZip"))

In [18]:
NEA$address <- create_full_address(NEA, c("CoAddress1", "CoAddress2", "CoCity", "CoState", "CoZip"))

Run the "get_geocode_data" command with the previously created "NEA$Address" object as the command's input. This will geocode each address in the "NEA" object and assign the (Lat, Lon) combination to the "coords" object. 

In [20]:
coords <- get_geocode_data(NEA$address)

Assign the latitude values held within the "coords" object to the "NEA" object in its own field called "NEA$Latitude". This creates a new column with only Latitude values. 

In [21]:
NEA$Latitude <- coords$lat

Assign the longitude values within the "coords" object to the "NEA" object in its own field called "NEA$Longitude". This creates a new column with only Longitude values. 

In [22]:
NEA$Longitude <- coords$lon

You can view these newly added columns briefly by using the "head(NEA)" command. THe Latitude and Longitude columns are all the way to the right in this view. 

In [23]:
head(NEA)

X,ApplicationNumber,Discipline,CoName,CoAddress1,CoAddress2,CoCity,CoState,CoZip,TOTAL_AUDIENCE,⋯,Committed,FY,FdrFlag,Disposition,Individual,CT_NAMELSAD,CT_GEOID,address,Latitude,Longitude
1,10-922191,Music,Chorus America Association,"1156 15th Street, NW",Suite 310,Washington,DC,20005-1747,0.0,⋯,100000.0,2011,A,Awarded,,Census Tract 101,11001010100,"1156 15th Street, NW, Suite 310, Washington, DC, 20005-1747",38.90483,-77.03465
2,11-932113,Music,Chorus America Association,"1156 15th Street, NW",Suite 310,Washington,DC,20005-1747,,⋯,0.0,2012,,Rejected,,Census Tract 101,11001010100,"1156 15th Street, NW, Suite 310, Washington, DC, 20005-1747",38.90483,-77.03465
3,13-948528,Music,Chorus America Association,"1156 15th Street, NW",Suite 310,Washington,DC,20005-1747,0.0,⋯,95000.0,2014,A,Awarded,,Census Tract 101,11001010100,"1156 15th Street, NW, Suite 310, Washington, DC, 20005-1747",38.90483,-77.03465
4,17-980612,Music,Chorus America Association,"1156 15th Street, NW",Suite 310,Washington,DC,20005-1747,0.0,⋯,,2018,,Pending,,Census Tract 101,11001010100,"1156 15th Street, NW, Suite 310, Washington, DC, 20005-1747",38.90483,-77.03465
5,17-980082,Research,Chorus America Association,"1156 15th Street, NW",Suite 310,Washington,DC,20005-1747,0.0,⋯,20000.0,2017,,Awarded,,Census Tract 101,11001010100,"1156 15th Street, NW, Suite 310, Washington, DC, 20005-1747",38.90483,-77.03465
6,16-969075,Music,Chorus America Association,"1156 15th Street, NW",Suite 310,Washington,DC,20005-1747,0.0,⋯,90000.0,2017,,Awarded,,Census Tract 101,11001010100,"1156 15th Street, NW, Suite 310, Washington, DC, 20005-1747",38.90483,-77.03465


## Advanced commands

This command assigns MSA data to the object "msa" using the command "get_msa_data". This command is provided in the "neaR" package. 

In [24]:
help("get_msa_data")

**Usage:** get_msa_data(Latitude, Longitude, year = 2016)

In [25]:
msa <- get_msa_data(NEA$Latitude, NEA$Longitude)

Use the following command to view the first six rows of the "msa" object and the column names.

In [26]:
head(msa)

cbsa_GEOID,cbsa_NAMELSAD,cbsa_LSAD,source
47900,"Washington-Arlington-Alexandria, DC-VA-MD-WV Metro Area",M1,MSA
47900,"Washington-Arlington-Alexandria, DC-VA-MD-WV Metro Area",M1,MSA
47900,"Washington-Arlington-Alexandria, DC-VA-MD-WV Metro Area",M1,MSA
47900,"Washington-Arlington-Alexandria, DC-VA-MD-WV Metro Area",M1,MSA
47900,"Washington-Arlington-Alexandria, DC-VA-MD-WV Metro Area",M1,MSA
47900,"Washington-Arlington-Alexandria, DC-VA-MD-WV Metro Area",M1,MSA


Create a new column in the "NEA" object. It will appear all the way to the right if you run the "View(NEA)" command.

In [27]:
NEA$poverty_rate <- NA 

***NOTE:*** If your object does not already have a "CT_GEOID" column, you need to assign a NEA$CT_GEOID column and run the "get_ct_data" command in the "neaR" package.

In [28]:
NEA$poverty_rate <- append_poverty_data(NEA$CT_GEOID, NEA$poverty_rate)

Save the "NEA" object to a CSV using the following command. The resulting file will be saved in your working directoy set at the beginning of the tutorial. 

In [29]:
write.csv(NEA, "workshop_sample.csv")