# Hog Farms and School Attendance
Do students who live in closer proximity to hog farms miss more school than students elsewhere because of health issues?

Some preliminary EDA using data from the North Carolina Department of Environmental Quality and North Carolina Department of Education. I'll be looking at the data by two measures. The first is school-to-farm proximity. The second is concentration of hog farms in school districts.

### Attendance and school-farm proximity

In [1]:
### Load packages, datasets

library(dplyr)
library(ggplot2)

setwd('./data')
hogs = read.csv('animal-facilities.csv')
ada = read.csv('nc-ada-15-17.csv', header = TRUE)
loc = read.csv('sch-geo-dat.csv', header = TRUE)


Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union



First step: Merge Average Daily Attendance (ADA) data with latitude and longitude data for NC public schools.

In [2]:
# Filter national location data to just North Carolina schools
nc.loc = loc %>% filter(STATE == 'NC')

# Merge location and ADA data by school name
atd = left_join(ada, loc, by = c('school.name' = 'NAME')) # Use left_join here since we only care about schools in ADA dataset

# See how many return no match
atd %>% summarise(no.match = sum(is.na(CSA)), matched = sum(!is.na(CSA)))

“Column `school.name`/`NAME` joining factors with different levels, coercing to character vector”

no.match,matched
11,4738


There are 11 public schools that weren't matched between the geographic data and the ADA data. Since we have 4738 matched schools, these 11 don't pose a huge issue. Let's continue with the EDA.

In [3]:
# Lets filter for only columns we need. This is a bit unruly.
colnames(atd)

In [8]:
# Selecting district, school names, grades, attendance and membership by years, as well as latitude and lognitude
atd.trimmed = atd %>% select(lea.name, school.name, grade.span,
                             ada.1415, adm.1415,
                             ada.1516, adm.1516,
                             ada.1617, adm.1617,
                             LAT, LON)

Now that we have ADA by school for the 2014-15, 2015-16 and 2016-17 academic years, we can measure the distance from schools to hog farms. Let's calculate a distance for each hog farm. But first, we need to cut down our farm database a bit.

In [9]:
head(hogs)

Permit.Number,Facility.Name,Combined.Owner,Regulated.Operation,Permit.Type,Regulated.Activity,Allowable.Count,Number.Of.Lagoons,Issued.Date,Effective.Date,Expiration.Date,Admin.Region,County.Name,Location.Lat.Num,Location.Long.Num,Address.1,Address.2,City,State,Zip
AWC010002,Piedmont L/S Co Farm,Joe Jones,Cattle,Cattle State COC,Cattle - Beef Feeder,300,0,10-01-2014,10-01-2014,09-30-2019,Winston-Salem,Alamance,36.1742,-79.4994,,,,,
AWC010002,Piedmont L/S Co Farm,Joe Jones,Swine,Cattle State COC,Swine - Farrow to Wean,300,0,10-01-2014,10-01-2014,09-30-2019,Winston-Salem,Alamance,36.1742,-79.4994,,,,,
AWC010006,Covington Dairy Farm Inc,William Covington,Cattle,Cattle State COC,Cattle - Milk Cow,300,1,10-01-2014,10-01-2014,09-30-2019,Winston-Salem,Alamance,36.0442,-79.3261,3008 S Nc119,,Mebane,NC,27302.0
AWC010010,Triple W Farms,Harold Woody,Cattle,Cattle State COC,Cattle - Dairy Heifer,200,1,03-11-2016,03-11-2016,09-30-2019,Winston-Salem,Alamance,35.9042,-79.3,3545 E Greensboro-Chapel Hill Hwy,,Snow Camp,NC,27349.0
AWC010010,Triple W Farms,Harold Woody,Cattle,Cattle State COC,Cattle - Milk Cow,400,1,03-11-2016,03-11-2016,09-30-2019,Winston-Salem,Alamance,35.9042,-79.3,3545 E Greensboro-Chapel Hill Hwy,,Snow Camp,NC,27349.0
AWC010012,Lindley Dairy Inc. Farm,W Lindley,Cattle,Cattle State COC,Cattle - Milk Cow,225,1,10-01-2014,10-01-2014,09-30-2019,Winston-Salem,Alamance,35.8997,-79.3306,3159 E Greensboro Chapel Hill Rd,,Snow Camp,NC,27349.0


In [12]:
# Filter for only hog farms
hogs = hogs %>% filter(Regulated.Operation == 'Swine')

# Now let's see what data we're working with
colnames(hogs)

In [None]:
# Need to make a matrix of schools and hog farms
# Calculate distance between each
# Add up values
# But also control for ridiculously long distances, need a min values 