<font size="8"> **Using crabeater observations to create masks using ACCESS-OM2-01 grids** </font>  
In this notebook, we will load the clean crabeater observations to create a mask using the ACCESS-OM2-01 grids (ocean and velocity grids). We will use the crabeater observations mask to extract relevant environmental variables from the ACCESS-OM2-01 model.  
  
The clean crabeater observations dataset includes two types of records: `HUMAN_OBSERVATION` and `MACHINE_OBSERVATION`. The first type involves one or more people searching for crabeater seals and recording their presence. The second type comes from instruments, such as GPS tags. For this project, we will only use `HUMAN_OBSERVATION` records.  
  
Given that crabeater seal data came from different sources and not all sources provide enough information to calculate abundance, we will transform crabeater records to presence only. Further, we will reduce crabeater sighting records to one record per month per cell in the ACCESS-OM2-01 grid. This means that we will assign a value of `1` to a grid cell where crabeater seals have reported, regardless of the amount of individuals or sightings reported at that specific grid cell within a particular month.

# Setting working directory
In order to ensure these notebooks work correctly, we will set the working directory. We assume that you have saved a copy of this repository in your home directory (represented by `~` in the code chunk below). If you have saved this repository elsewhere in your machine, you need to ensure you update this line with the correct filepath where you saved these notebooks.

In [1]:
import os
os.chdir(os.path.expanduser('~/Chapter2_Crabeaters/Scripts'))

# Loading other relevant libraries

In [2]:
from dask.distributed import Client
#Accessing model data
import cosima_cookbook as cc
#Useful functions
import UsefulFunctions as uf
#Dealing with data
import xarray as xr
import pandas as pd
import numpy as np
#Data visualisation
import matplotlib.pyplot as plt

# Paralellising work 

In [3]:
client = Client()
client

0,1
Connection method: Cluster object,Cluster type: distributed.LocalCluster
Dashboard: /proxy/34561/status,

0,1
Dashboard: /proxy/34561/status,Workers: 7
Total threads: 14,Total memory: 63.00 GiB
Status: running,Using processes: True

0,1
Comm: tcp://127.0.0.1:33035,Workers: 7
Dashboard: /proxy/34561/status,Total threads: 14
Started: Just now,Total memory: 63.00 GiB

0,1
Comm: tcp://127.0.0.1:34601,Total threads: 2
Dashboard: /proxy/45995/status,Memory: 9.00 GiB
Nanny: tcp://127.0.0.1:43659,
Local directory: /jobfs/100266721.gadi-pbs/dask-scratch-space/worker-auwjyrqv,Local directory: /jobfs/100266721.gadi-pbs/dask-scratch-space/worker-auwjyrqv

0,1
Comm: tcp://127.0.0.1:33671,Total threads: 2
Dashboard: /proxy/33809/status,Memory: 9.00 GiB
Nanny: tcp://127.0.0.1:40727,
Local directory: /jobfs/100266721.gadi-pbs/dask-scratch-space/worker-a8nxo4e4,Local directory: /jobfs/100266721.gadi-pbs/dask-scratch-space/worker-a8nxo4e4

0,1
Comm: tcp://127.0.0.1:36969,Total threads: 2
Dashboard: /proxy/42915/status,Memory: 9.00 GiB
Nanny: tcp://127.0.0.1:42875,
Local directory: /jobfs/100266721.gadi-pbs/dask-scratch-space/worker-zncn4rk_,Local directory: /jobfs/100266721.gadi-pbs/dask-scratch-space/worker-zncn4rk_

0,1
Comm: tcp://127.0.0.1:42717,Total threads: 2
Dashboard: /proxy/32957/status,Memory: 9.00 GiB
Nanny: tcp://127.0.0.1:46653,
Local directory: /jobfs/100266721.gadi-pbs/dask-scratch-space/worker-5wo2c6hm,Local directory: /jobfs/100266721.gadi-pbs/dask-scratch-space/worker-5wo2c6hm

0,1
Comm: tcp://127.0.0.1:43579,Total threads: 2
Dashboard: /proxy/34923/status,Memory: 9.00 GiB
Nanny: tcp://127.0.0.1:45281,
Local directory: /jobfs/100266721.gadi-pbs/dask-scratch-space/worker-pd5rdbya,Local directory: /jobfs/100266721.gadi-pbs/dask-scratch-space/worker-pd5rdbya

0,1
Comm: tcp://127.0.0.1:41835,Total threads: 2
Dashboard: /proxy/38863/status,Memory: 9.00 GiB
Nanny: tcp://127.0.0.1:34557,
Local directory: /jobfs/100266721.gadi-pbs/dask-scratch-space/worker-sx3yk9r4,Local directory: /jobfs/100266721.gadi-pbs/dask-scratch-space/worker-sx3yk9r4

0,1
Comm: tcp://127.0.0.1:38867,Total threads: 2
Dashboard: /proxy/33377/status,Memory: 9.00 GiB
Nanny: tcp://127.0.0.1:36793,
Local directory: /jobfs/100266721.gadi-pbs/dask-scratch-space/worker-vnjuuqeq,Local directory: /jobfs/100266721.gadi-pbs/dask-scratch-space/worker-vnjuuqeq


# Loading crabeater seal observations
This is the dataset that includes the MEASO sector category. See [03_Adding_MEASO_bio_data]('03_Adding_MEASO_bio_data.ipynb') notebook for more details. We will only keep crabeater seals observations that are categorised as `HUMAN_OBSERVATION` in the `basis_record` column.  
  
We will add two columns: `year` and `month` to identify unique monthly crabeater observations at a grid cell.

In [13]:
#Loading dataset as pandas data frame
crabeaters = pd.read_csv('../Biological_Data/BG_points/Background_20xPoints_Indian-Sectors_weaning.csv')

#Creating new date column, which will only include the year and month the observation occurred
crabeaters['date'] = crabeaters.apply(lambda x: f'{x.year}-{str(x.month).zfill(2)}', axis = 1)

#Checking results
crabeaters.head()

Unnamed: 0,date,year,sector,zone,month,season_year,life_stage,decade,presence,longitude,latitude
0,1998-11,1998,Central Indian,Antarctic,11,autumn,weaning,1990,0,100.35,-61.95
1,1987-11,1987,Central Indian,Antarctic,11,autumn,weaning,1980,0,79.35,-64.85
2,1987-11,1987,Central Indian,Antarctic,11,autumn,weaning,1980,0,105.15,-60.75
3,1998-11,1998,Central Indian,Antarctic,11,autumn,weaning,1990,0,84.95,-60.85
4,2001-11,2001,Central Indian,Antarctic,11,autumn,weaning,2000,0,110.45,-62.95


## Rearranging columns in crabeater data
We will move the newly created `date` column next to the `event_date` column. This way it is easier to inspect that the new column contains the correct information.

In [14]:
#Getting the names of columns in crabeater dataset
cols = crabeaters.columns.tolist()

#Re-arranging column names so date appears next to event date
cols = cols[0:3] + cols[-2:] + cols[3:-2]

#Applying to crabeater dataset
crabeaters = crabeaters[cols]

#Checking results
crabeaters

Unnamed: 0,date,year,sector,longitude,latitude,zone,month,season_year,life_stage,decade,presence
0,1998-11,1998,Central Indian,100.35,-61.95,Antarctic,11,autumn,weaning,1990,0
1,1987-11,1987,Central Indian,79.35,-64.85,Antarctic,11,autumn,weaning,1980,0
2,1987-11,1987,Central Indian,105.15,-60.75,Antarctic,11,autumn,weaning,1980,0
3,1998-11,1998,Central Indian,84.95,-60.85,Antarctic,11,autumn,weaning,1990,0
4,2001-11,2001,Central Indian,110.45,-62.95,Antarctic,11,autumn,weaning,2000,0
...,...,...,...,...,...,...,...,...,...,...,...
36815,1999-12,1999,East Indian,80.45,-65.65,Antarctic,12,summer,weaning,1990,0
36816,1999-12,1999,East Indian,73.05,-67.35,Antarctic,12,summer,weaning,1990,0
36817,1999-12,1999,East Indian,125.75,-64.45,Antarctic,12,summer,weaning,1990,0
36818,1999-12,1999,East Indian,121.85,-64.85,Antarctic,12,summer,weaning,1990,0


# Loading ACCESS-OM2-01 grids
In this step, we will identify the model grid cell within which a crabeater seal was reported. We will then add the the grid cell coordinates to the crabeater seal dataframe (`xt_ocean`, and `yt_ocean`).  

In [7]:
#Creating new COSIMA cookbook session
session = cc.database.create_session()

#Accessing the area of grid and keeping data for the Southern Ocean only
grid_all = cc.querying.getvar('01deg_jra55v140_iaf_cycle4', 'area_t', session, n = 1).sel(yt_ocean = slice(-80, -45))
#Correcting longitude values to keep them between +/- 180
grid_all = uf.corrlong(grid_all)

# Identifying unique crabeater observations per month per grid cell
In this step, we will identify the model grid cell within which a crabeater seal was reported. We will then add the the grid cell coordinates to the crabeater seal dataframe (`xt_ocean` and `yt_ocean`).  
  
This step may take a couple of minutes to run.

In [15]:
#Coordinates from crabeater data
lat = xr.DataArray(crabeaters.latitude.values)
lon = xr.DataArray(crabeaters.longitude.values)
#Extracting closest grid cell from ACCESS-OM2-01 model to crabeater sighting
coords = grid_all.sel(xt_ocean = lon, yt_ocean = lat, method = 'nearest')
#Transform into data frame - Remove area values (not needed)
coords = coords.to_dataframe().round(3).drop(columns = 'area_t')
#Add to crabeater data
crabeaters = pd.concat([crabeaters.reset_index(drop = True), coords], axis = 1)

We will reorganise the columns to keep all spatial information together.

In [16]:
#Reorganising columns
#Getting the names of columns in crabeater dataset
cols = crabeaters.columns.tolist()
#Re-arranging column names so date appears next to event date
cols = cols[0:5] + cols[-2:] + cols[5:-2]

#Applying to crabeater dataset
crabeaters = crabeaters[cols]
#Checking results
crabeaters

Unnamed: 0,date,year,sector,longitude,latitude,xt_ocean,yt_ocean,zone,month,season_year,life_stage,decade,presence
0,1998-11,1998,Central Indian,100.35,-61.95,100.35,-61.937,Antarctic,11,autumn,weaning,1990,0
1,1987-11,1987,Central Indian,79.35,-64.85,79.35,-64.846,Antarctic,11,autumn,weaning,1980,0
2,1987-11,1987,Central Indian,105.15,-60.75,105.15,-60.739,Antarctic,11,autumn,weaning,1980,0
3,1998-11,1998,Central Indian,84.95,-60.85,84.95,-60.836,Antarctic,11,autumn,weaning,1990,0
4,2001-11,2001,Central Indian,110.45,-62.95,110.45,-62.955,Antarctic,11,autumn,weaning,2000,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
36815,1999-12,1999,East Indian,80.45,-65.65,80.45,-65.649,Antarctic,12,summer,weaning,1990,0
36816,1999-12,1999,East Indian,73.05,-67.35,73.05,-67.339,Antarctic,12,summer,weaning,1990,0
36817,1999-12,1999,East Indian,125.75,-64.45,125.75,-64.461,Antarctic,12,summer,weaning,1990,0
36818,1999-12,1999,East Indian,121.85,-64.85,121.85,-64.846,Antarctic,12,summer,weaning,1990,0


We will remove duplicate points based on the model grid coordinates and the date.

In [17]:
#Getting all column names for crabeaters observation dataset
cols = crabeaters.columns.to_list()
#Keeeping column names to be used when identifying duplicates
cols = cols[0:3] + cols[5:]

#Removing duplicates
crabeaters.drop_duplicates(subset = cols, inplace = True)
#Checking results
crabeaters

Unnamed: 0,date,year,sector,longitude,latitude,xt_ocean,yt_ocean,zone,month,season_year,life_stage,decade,presence
0,1998-11,1998,Central Indian,100.35,-61.95,100.35,-61.937,Antarctic,11,autumn,weaning,1990,0
1,1987-11,1987,Central Indian,79.35,-64.85,79.35,-64.846,Antarctic,11,autumn,weaning,1980,0
2,1987-11,1987,Central Indian,105.15,-60.75,105.15,-60.739,Antarctic,11,autumn,weaning,1980,0
3,1998-11,1998,Central Indian,84.95,-60.85,84.95,-60.836,Antarctic,11,autumn,weaning,1990,0
4,2001-11,2001,Central Indian,110.45,-62.95,110.45,-62.955,Antarctic,11,autumn,weaning,2000,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
36812,1985-12,1985,Central Indian,88.25,-63.75,88.25,-63.762,Antarctic,12,summer,weaning,1980,0
36813,1988-12,1988,Central Indian,86.55,-63.75,86.55,-63.762,Antarctic,12,summer,weaning,1980,0
36814,2000-12,2000,Central Indian,90.55,-62.85,90.55,-62.864,Antarctic,12,summer,weaning,2000,0
36816,1999-12,1999,East Indian,73.05,-67.35,73.05,-67.339,Antarctic,12,summer,weaning,1990,0


## Saving unique crabeater observations per month and grid cell

In [18]:
crabeaters.to_csv('../Biological_Data/BG_points/unique_background_20x_obs_grid.csv', index = False)