# Attribute geospatial data with seal abundance data
This notebook uses the information on the estimated number of seals per map to attribute the basic grid of geospatial data with seal abundance. It then tabulates the resulting raster, where each column is one of the geospatial datasets, and there is a column for seal abundance. The resulting table is saved and becomes the basic input for the modeling of factors deteriming the presence and abundance of seals throughout the entire continent.  
  
Run this file every time a new geospatial covariate is added to the collection.

### Loading the required libraries and other dependencies

In [1]:
## Clear memory
rm(list=ls())
gc()

Unnamed: 0,used,(Mb),gc trigger,(Mb).1,max used,(Mb).2
Ncells,511987,27.4,940480,50.3,940480,50.3
Vcells,931954,7.2,1650153,12.6,1128279,8.7


In [2]:
libs<-c("ggplot2","plyr","dplyr","sp","rgeos","raster","rgdal")
lapply(libs, require, character.only = TRUE)
pathToLocalGit<-"/home/ubuntu/Workspace/ContinentalWESEestimates/"

## load the WESE map data
load(file=paste0(pathToLocalGit,"data/FinalWESEcounts.RData"))

## Load the current table of data
load(file=paste0(pathToLocalGit,"data/studyarea_points_wNearLandPenguins.RData"))

## Source the functions file
source(paste0(pathToLocalGit,"scripts/countSealsFromTags_functions.R"))
 
cdf<-as.data.frame(studyarea_pointswLandPenguins,xy=TRUE)
cdf$pointid<-as.integer(as.character(cdf$pointid))
NROW(unique(cdf$pointid))==nrow(cdf)

Loading required package: ggplot2

Loading required package: plyr

Loading required package: dplyr


Attaching package: ‘dplyr’


The following objects are masked from ‘package:plyr’:

    arrange, count, desc, failwith, id, mutate, rename, summarise,
    summarize


The following objects are masked from ‘package:stats’:

    filter, lag


The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union


Loading required package: sp

Loading required package: rgeos

rgeos version: 0.5-2, (SVN revision 621)
 GEOS runtime version: 3.6.2-CAPI-1.10.2 
 Linking to sp version: 1.4-1 
 Polygon checking: TRUE 


Loading required package: raster


Attaching package: ‘raster’


The following object is masked from ‘package:dplyr’:

    select


Loading required package: rgdal

rgdal: version: 1.2-20, (SVN revision 725)
 Geospatial Data Abstraction Library extensions to R successfully loaded
 Loaded GDAL runtime: GDAL 2.2.3, released 2017/11/20
 Path to GDAL shared f

In [3]:
write.csv(countdf,file=paste0(pathToLocalGit,"data/WESElocations_counts.csv"))

The current table of data is a spatial data.frame of a 5 km grid of cells, with geospatial attributes for each cell. We want to add one more attribute to this cell: WESE abundance. We use the WESE data frame for this purpose. So, we first attribute the WESE data, which are summarized by 500-m maps, with the cellId of the 5-km grid. We then aggregate the map counts by (add up to) the 5-km cellId. Finally, we merge the WESE data to the geospatial data, and save as data.frame. 
We use a function that does this, using UTM coordinates 

In [8]:
#wese5k<-getWESEcountsBy5km(gdf=studyarea_pointswLandPenguins,wesedf=unique(countdf[,c("regionMapId","mapcoords.x1","mapcoords.x2")]))

In [18]:
## Checking
load(file=paste0(pathToLocalGit,"data/wese5k.RData"))
sum(is.na(wese5k$pointid))  #must be 0
nrow(wese5k)==nrow(unique(countdf[,c("regionMapId","mapcoords.x1","mapcoords.x2")]))  #TRUE
nrow(wese5k)*2==nrow(countdf)   #TRUE
sum(is.na(wese5k$mapcoords.x1))   #must be 0

In [5]:
## IGNORE THIS unless you can fix the few haul-outs > 5km away from grid point
## Now we merge with countdf
## But before we do that, here's a check of coordinates, because some maps are > 5km away from nearest grid cell
odf<-cdf[,c("pointid","coords.x1","coords.x2")]
tdf<-merge(wese5k,odf, by="pointid",all.x=T)
rdf<-unique(countdf[,c("regionMapId","mapcoords.x1","mapcoords.x2")])
names(rdf)<-c("regionMapId","mcx1","mcx2")
tdf<-merge(tdf,rdf,by="regionMapId",all.x=T)
tdf$dist<-sqrt(((tdf$mapcoords.x1-tdf$coords.x1)^2)+((tdf$mapcoords.x2-tdf$coords.x2)^2))
head(tdf)

Unnamed: 0_level_0,regionMapId,pointid,mapcoords.x1,mapcoords.x2,coords.x1,coords.x2,mcx1,mcx2,dist
Unnamed: 0_level_1,<chr>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1,AMU101663,541761,-1816006,143453.4,-1816000,146000,-85.48336,-73.34752,2546.6202
2,AMU102072,541763,-1808177,144061.5,-1806000,146000,-85.44474,-73.41747,2914.6879
3,AMU102099,542013,-1836031,135076.1,-1836000,136000,-85.79235,-73.17324,924.4466
4,AMU103615,542013,-1835618,135045.7,-1836000,136000,-85.79235,-73.17696,1027.7203
5,AMU103695,542013,-1836443,135106.4,-1836000,136000,-85.79235,-73.16951,997.4273
6,AMU103791,542013,-1836076,134457.2,-1836000,136000,-85.81166,-73.17324,1544.6675


In [19]:
## It looks like this is the best we can do for now, so...
## Let's use 2011 estimates
nrow(wese5k)
y2011df<-subset(countdf, Year=="2011",select=c("mdlColEstimate","mdlIslEstimate","regionMapId"))
wesedata<-merge(y2011df,wese5k[,c("regionMapId","pointid")],by="regionMapId",all.x=TRUE)
nrow(wesedata);nrow(y2011df)
head(wesedata)

Unnamed: 0_level_0,regionMapId,mdlColEstimate,mdlIslEstimate,pointid
Unnamed: 0_level_1,<chr>,<dbl>,<dbl>,<int>
1,AMU101663,0,0,541761
2,AMU102072,0,0,541763
3,AMU102099,8,9,542013
4,AMU103615,0,0,542013
5,AMU103695,0,0,542013
6,AMU103791,30,37,542013


In [21]:
## Merging with the geospatial data.frame now...
wdf<-merge(cdf,wesedata[,c("mdlColEstimate","mdlIslEstimate","pointid")],by="pointid",all.x=TRUE)
## But we will end up with several maps per grid cell
nrow(wdf);nrow(cdf)
## So we must aggregate... (dplyr kills the kernel)
wesedf<-unique(wdf[,c("pointid","meanslope","meanbathy","glacierdist","distToShore","cont300dist","cont800dist",
                      "DecemberIcePresence","Persistence2Years","PredictabilityDec5Years","distNearestIceEdge",
                      "fastIceWidth","fastIcePresent","ADPEname","ADPEdist","ADPEabund","EMPEname","EMPEdist",
                      "EMPEabund","coords.x1","coords.x2")])
mdlColsum<-aggregate(mdlColEstimate~pointid,wdf,sum);names(mdlColsum)<-c("pointid","mdlCol")
mdlIslsum<-aggregate(mdlIslEstimate~pointid,wdf,sum);names(mdlIslsum)<-c("pointid","mdlIsl")
wesedf<-merge(wesedf,mdlColsum,by="pointid",all.x=TRUE)
wesedf<-merge(wesedf,mdlIslsum,by="pointid",all.x=TRUE)
#wesedf<-as.data.frame(wdf %>% 
#            group_by(pointid,meanslope,meanbathy,slope,bathy,shoredist,glacierdist,nearLineId,near_x,near_y,distToShore,
#                     adpedist,adpecol,empedist,empecol,cont300dist,cont800dist,DecemberIcePresence,Persistence2Years,
#                     PredictabilityDec5Years,coords.x1,coords.x2) %>% 
#                dplyr::summarize(mdlCol=sum(mdlColEstimate),mdlIsl=sum(mdlIslEstimate)))

nrow(cdf)==nrow(wesedf)   #must be TRUE

In [22]:
## There are some grid cells with maps with possible seals, except that the mean expected value is 0, while max > 0
## So, here we distinguish these from grid cells with no maps with seals (i.e., true 0 seals)
wesedf$hasMaps<-ifelse(!is.na(wesedf$mdlIsl),1,0)

#So, now we can make the NA's be 0
wesedf$mdlCol<-ifelse(is.na(wesedf$mdlCol),0,wesedf$mdlCol)
wesedf$mdlIsl<-ifelse(is.na(wesedf$mdlIsl),0,wesedf$mdlIsl)

head(wesedf)

Unnamed: 0_level_0,pointid,meanslope,meanbathy,glacierdist,distToShore,cont300dist,cont800dist,DecemberIcePresence,Persistence2Years,PredictabilityDec5Years,⋯,ADPEdist,ADPEabund,EMPEname,EMPEdist,EMPEabund,coords.x1,coords.x2,mdlCol,mdlIsl,hasMaps
Unnamed: 0_level_1,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
1,68791,2.621992,-2846.908,247849.6,201526.4,142791.6,588996.1,0,0,0,⋯,,,,,,-2191000,2511000,0,0,0
2,68792,4.591249,-2587.33,250334.0,203621.5,142791.6,588996.1,0,0,0,⋯,,,,,,-2186000,2511000,0,0,0
3,68819,1.411659,-3014.08,241050.2,194935.5,138016.9,584654.8,0,0,0,⋯,,,,,,-2196000,2506000,0,0,0
4,68820,1.413779,-2920.63,243501.3,196973.9,138016.9,584654.8,0,0,0,⋯,,,,,,-2191000,2506000,0,0,0
5,68821,4.561732,-2686.22,246029.6,199116.9,138016.9,584654.8,0,0,0,⋯,,,,,,-2186000,2506000,0,0,0
6,68822,4.902608,-2264.93,248632.8,201361.3,138016.9,584654.8,0,0,0,⋯,,,,,,-2181000,2506000,0,0,0


In [89]:
q<-subset(wesedf,mdlCol>0 & fastIcePresent==FALSE)
sum(q$mdlCol)
nrow(subset(wesedf,mdlCol>0)) 
load(paste0(pathToLocalGit,"data/FastIceGridPoints_weseNoIce_111114.RData"))
FIGPdf<-FIGPdf[,which(!names(FIGPdf) %in% c('near.x1','near.x2','iceedge.x1','iceedge.x2','fastIceAbund'))]
FIGPdf$ADPEname<-as.character(FIGPdf$ADPEname)
FIGPdf$EMPEname<-as.character(FIGPdf$EMPEname)
names(wesedf)
names(FIGPdf)

In [90]:
#checking
FIGPdf[FIGPdf$pointid==122914,"meanslope"]
FIGPdf[FIGPdf$pointid==123037,"glacierdist"]
FIGPdf[FIGPdf$pointid==123042,"EMPEname"]

In [91]:
#Then matchning by pointid, replace the values of all fields from FIGPdf into wesedf
ednames<-names(FIGPdf)[which(!names(FIGPdf) %in% c("pointid","coords.x1","coords.x2"))]
for(rr in 1:nrow(FIGPdf)){
    pid<-as.numeric(FIGPdf[rr,"pointid"])
    for(nn in ednames){
        wesedf[wesedf$gridCellId==pid,nn]<-FIGPdf[rr,nn]
    }
}


In [92]:
#match?
wesedf[wesedf$gridCellId==122914,"meanslope"]
wesedf[wesedf$gridCellId==123037,"glacierdist"]
wesedf[wesedf$gridCellId==123042,"EMPEname"]

In [95]:
#checking about the number of seals we lose from no ice attrib
nrow(subset(wesedf,fastIcePresent==FALSE & mdlCol>0))
sum(subset(wesedf,fastIcePresent==FALSE)$mdlCol)
nrow(wesedf)

In [96]:
## Now we save and setup the linear model analyses...
names(wesedf)<-gsub("pointid","gridCellId",names(wesedf))
save(wesedf,file=paste0(pathToLocalGit,"data/continentalWESE.RData"))