<font size="8"> **Adding environmental data from ACCESS-OM2-01 to unique background points** </font>  
In this notebook, we will extract environmental data from the ACCESS-OM2-01 model outputs and add it to our data frame containing unique background points matching the spatial bias of crabeater data (see `01_Bio_data/04_Generating_background_samples.R` for more information).

# Setting working directory
In order to ensure these notebooks work correctly, we will set the working directory. We assume that you have saved a copy of this repository in your home directory (represented by `~` in the code chunk below). If you have saved this repository elsewhere in your machine, you need to ensure you update this line with the correct filepath where you saved these notebooks.

In [1]:
import os
os.chdir(os.path.expanduser('~/Chapter2_Crabeaters/Scripts'))

# Loading other relevant libraries

In [2]:
from dask.distributed import Client
from glob import glob
#Accessing model data
import cosima_cookbook as cc
#Useful functions
import UsefulFunctions as uf
#Dealing with data
import xarray as xr
import pandas as pd
import numpy as np
#Data visualisation
import matplotlib.pyplot as plt
import cartopy.crs as ccrs

# Paralellising work 

In [3]:
client = Client()
client

0,1
Connection method: Cluster object,Cluster type: distributed.LocalCluster
Dashboard: /proxy/42943/status,

0,1
Dashboard: /proxy/42943/status,Workers: 7
Total threads: 14,Total memory: 63.00 GiB
Status: running,Using processes: True

0,1
Comm: tcp://127.0.0.1:45945,Workers: 7
Dashboard: /proxy/42943/status,Total threads: 14
Started: Just now,Total memory: 63.00 GiB

0,1
Comm: tcp://127.0.0.1:44011,Total threads: 2
Dashboard: /proxy/36551/status,Memory: 9.00 GiB
Nanny: tcp://127.0.0.1:45215,
Local directory: /jobfs/114488478.gadi-pbs/dask-scratch-space/worker-os_yyla9,Local directory: /jobfs/114488478.gadi-pbs/dask-scratch-space/worker-os_yyla9

0,1
Comm: tcp://127.0.0.1:33905,Total threads: 2
Dashboard: /proxy/42281/status,Memory: 9.00 GiB
Nanny: tcp://127.0.0.1:41007,
Local directory: /jobfs/114488478.gadi-pbs/dask-scratch-space/worker-t1_ouzqu,Local directory: /jobfs/114488478.gadi-pbs/dask-scratch-space/worker-t1_ouzqu

0,1
Comm: tcp://127.0.0.1:38457,Total threads: 2
Dashboard: /proxy/42533/status,Memory: 9.00 GiB
Nanny: tcp://127.0.0.1:37989,
Local directory: /jobfs/114488478.gadi-pbs/dask-scratch-space/worker-a_36hflx,Local directory: /jobfs/114488478.gadi-pbs/dask-scratch-space/worker-a_36hflx

0,1
Comm: tcp://127.0.0.1:36165,Total threads: 2
Dashboard: /proxy/44335/status,Memory: 9.00 GiB
Nanny: tcp://127.0.0.1:46235,
Local directory: /jobfs/114488478.gadi-pbs/dask-scratch-space/worker-fktq44gs,Local directory: /jobfs/114488478.gadi-pbs/dask-scratch-space/worker-fktq44gs

0,1
Comm: tcp://127.0.0.1:35017,Total threads: 2
Dashboard: /proxy/41849/status,Memory: 9.00 GiB
Nanny: tcp://127.0.0.1:35103,
Local directory: /jobfs/114488478.gadi-pbs/dask-scratch-space/worker-x6rautmg,Local directory: /jobfs/114488478.gadi-pbs/dask-scratch-space/worker-x6rautmg

0,1
Comm: tcp://127.0.0.1:34397,Total threads: 2
Dashboard: /proxy/40241/status,Memory: 9.00 GiB
Nanny: tcp://127.0.0.1:43223,
Local directory: /jobfs/114488478.gadi-pbs/dask-scratch-space/worker-ozqgzg_7,Local directory: /jobfs/114488478.gadi-pbs/dask-scratch-space/worker-ozqgzg_7

0,1
Comm: tcp://127.0.0.1:36161,Total threads: 2
Dashboard: /proxy/46147/status,Memory: 9.00 GiB
Nanny: tcp://127.0.0.1:43585,
Local directory: /jobfs/114488478.gadi-pbs/dask-scratch-space/worker-zp5u59ul,Local directory: /jobfs/114488478.gadi-pbs/dask-scratch-space/worker-zp5u59ul


# Loading unique crabeater seal observations data frame

In [4]:
#Loading dataset as pandas data frame
bg_path = '../Biological_Data/BG_points/unique_background_20x_obs_grid.csv'
crabeaters = pd.read_csv(bg_path)

#Ensuring date column is formatted correctly (year-month)
crabeaters['date'] = crabeaters.apply(lambda x: f'{x.year}-{str(x.month).zfill(2)}', axis = 1)

#Checking results
crabeaters

Unnamed: 0,date,year,sector,longitude,latitude,xt_ocean,yt_ocean,zone,month,season_year,life_stage,decade,presence,bottom_slope_deg,dist_shelf_km,dist_coast_km,depth_m
0,1987-11,1987,Central Indian,71.45,-69.65,71.45,-69.662,Antarctic,11,autumn,weaning,1980,0,,555.143,208.751,
1,1987-11,1987,Central Indian,73.05,-69.65,73.05,-69.662,Antarctic,11,autumn,weaning,1980,0,,554.141,189.321,
2,1996-11,1996,Central Indian,74.45,-69.65,74.45,-69.662,Antarctic,11,autumn,weaning,1990,0,89.985,550.881,187.224,518.817
3,1998-11,1998,Central Indian,76.55,-69.55,76.55,-69.535,Subantarctic,11,autumn,weaning,1990,0,,542.235,183.364,
4,1996-11,1996,Central Indian,73.75,-69.45,73.75,-69.451,Antarctic,11,autumn,weaning,1990,0,,528.259,163.740,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
30666,1989-11,1989,Central Indian,74.05,-59.45,74.05,-59.442,Antarctic,11,autumn,weaning,1980,0,89.919,543.100,725.785,1618.292
30667,1998-11,1998,Central Indian,71.75,-59.25,71.75,-59.238,Antarctic,11,autumn,weaning,1990,0,89.976,559.524,720.119,4481.736
30668,1996-11,1996,Central Indian,76.35,-59.25,76.35,-59.238,Antarctic,11,autumn,weaning,1990,0,89.961,506.485,705.324,1267.889
30669,1989-11,1989,Central Indian,73.95,-58.85,73.95,-58.827,Subantarctic,11,autumn,weaning,1980,0,89.971,610.227,790.155,2223.069


# Adding values for static variables only
Static variables referred to any physical variables that do not change over time (at least not during the time period of our interest). Examples include depth of the water column and distance to coastline. Given that we only have one value for these variables, the process of extracting data is relatively simple. We do not need to take into account the date observations were collected.

## Defining dictionary with information about static variables
This dictionary contains the column labels for each and the name of the files for each static variable to be included in our analysis. We will also define a variable containing the full path to the folder where all static variables are stored.

In [11]:
#Full path to static variables
base_dir_static = '/g/data/v45/la6889/Chapter2_Crabeaters/Static_Variables/'

#List of static variables
varDict = {'bottom_slope_deg': 'bathy_slope_GEBCO_2D.nc',
           'dist_shelf_km': 'distance_shelf.nc',
           'dist_coast_km': 'distance_coastline.nc',
           'depth_m': 'bathy_GEBCO_2D.nc'}

## Extracting data for each observation and adding it to a new column in crabeater data

In [12]:
#Getting coordinates from crabeater dataset
lat = xr.DataArray(crabeaters.latitude.values)
lon = xr.DataArray(crabeaters.longitude.values)

#Looping through dictionary keys
for var in varDict:
    #Creating full path to file of interest
    file_path = os.path.join(base_dir_static, varDict[var])
    #Load as raster
    ras = xr.open_dataarray(file_path)
    rename_var = ras.name
    #Extracting values
    ras_sub = ras.sel(xt_ocean = lon, yt_ocean = lat, method = 'nearest')
    #Turning into data frame and rounding all columns to 3 decimal places
    ras_df = ras_sub.to_dataframe().round(3).rename(columns = {rename_var: var})
    #Adding to crabeater observations data frame
    crabeaters = pd.merge(crabeaters, ras_df, on = ['yt_ocean', 'xt_ocean'], how = 'left', sort = True).drop_duplicates()
    
#Checking results
crabeaters

Unnamed: 0,date,year,sector,longitude,latitude,xt_ocean,yt_ocean,zone,month,season_year,life_stage,decade,presence,bottom_slope_deg,dist_shelf_km,dist_coast_km,depth_m
0,1987-11,1987,Central Indian,71.45,-69.65,71.45,-69.662,Antarctic,11,autumn,weaning,1980,0,,,,
1,1987-11,1987,Central Indian,73.05,-69.65,73.05,-69.662,Antarctic,11,autumn,weaning,1980,0,,,,
2,1996-11,1996,Central Indian,74.45,-69.65,74.45,-69.662,Antarctic,11,autumn,weaning,1990,0,89.985,-550.881,187.224,518.817017
3,1998-11,1998,Central Indian,76.55,-69.55,76.55,-69.535,Subantarctic,11,autumn,weaning,1990,0,,,,
4,1996-11,1996,Central Indian,73.75,-69.45,73.75,-69.451,Antarctic,11,autumn,weaning,1990,0,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
67462,1989-11,1989,Central Indian,74.05,-59.45,74.05,-59.442,Antarctic,11,autumn,weaning,1980,0,89.919,543.100,725.785,1618.291992
67463,1998-11,1998,Central Indian,71.75,-59.25,71.75,-59.238,Antarctic,11,autumn,weaning,1990,0,89.976,559.524,720.119,4481.735840
67464,1996-11,1996,Central Indian,76.35,-59.25,76.35,-59.238,Antarctic,11,autumn,weaning,1990,0,89.961,506.485,705.324,1267.889038
67465,1989-11,1989,Central Indian,73.95,-58.85,73.95,-58.827,Subantarctic,11,autumn,weaning,1980,0,89.971,610.227,790.155,2223.069092


## Saving data frame with static variables
Given that the dynamic variables take some time to extract. We will save intermediary results to avoid having to extract them again.

In [13]:
crabeaters.to_csv(bg_path, index = False)

# Adding values for dynamic variables
Given the amount of crabeater seal observations and the time period covered by this dataset, the extraction of these values may take some time. It is recommended to save the data frame after every time a new variable is extracted. This way we can avoid losing data.

In [8]:
crabeaters = pd.read_csv('../Biological_Data/BG_points/unique_background_20x_obs_grid.csv')
#Ensuring date column is formatted correctly (year-month)
crabeaters['date'] = crabeaters.apply(lambda x: f'{x.year}-{str(x.month).zfill(2)}', axis = 1)
crabeaters

Unnamed: 0,date,year,sector,longitude,latitude,xt_ocean,yt_ocean,zone,month,season_year,life_stage,decade,presence,bottom_slope_deg,dist_shelf_km,dist_coast_km,depth_m
0,1987-11,1987,Central Indian,71.45,-69.65,71.45,-69.662,Antarctic,11,autumn,weaning,1980,0,,555.143,208.751,
1,1987-11,1987,Central Indian,73.05,-69.65,73.05,-69.662,Antarctic,11,autumn,weaning,1980,0,,554.141,189.321,
2,1996-11,1996,Central Indian,74.45,-69.65,74.45,-69.662,Antarctic,11,autumn,weaning,1990,0,89.985,550.881,187.224,518.817
3,1998-11,1998,Central Indian,76.55,-69.55,76.55,-69.535,Subantarctic,11,autumn,weaning,1990,0,,542.235,183.364,
4,1996-11,1996,Central Indian,73.75,-69.45,73.75,-69.451,Antarctic,11,autumn,weaning,1990,0,,528.259,163.740,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
30666,1989-11,1989,Central Indian,74.05,-59.45,74.05,-59.442,Antarctic,11,autumn,weaning,1980,0,89.919,543.100,725.785,1618.292
30667,1998-11,1998,Central Indian,71.75,-59.25,71.75,-59.238,Antarctic,11,autumn,weaning,1990,0,89.976,559.524,720.119,4481.736
30668,1996-11,1996,Central Indian,76.35,-59.25,76.35,-59.238,Antarctic,11,autumn,weaning,1990,0,89.961,506.485,705.324,1267.889
30669,1989-11,1989,Central Indian,73.95,-58.85,73.95,-58.827,Subantarctic,11,autumn,weaning,1980,0,89.971,610.227,790.155,2223.069


## Accessing ACCESS-OM2-01 model outputs
We will create a new `cosima cookbook` session to load the model outputs of interest, and we will also create a dictionary that contains useful information related to data extraction.

In [4]:
#Creating new COSIMA cookbook session
session = cc.database.create_session()

#Creating dictionary with useful information
varDict = {'model': 'ACCESS-OM2-01',
           #ACCESS-OM2-01 cycle 4 (1958-2018)
           'exp': '01deg_jra55v140_iaf_cycle4',
           #ACCESS-OM2-01 cycle 4 extension (2018-2022)
           'exp_ext': '01deg_jra55v140_iaf_cycle4_jra55v150_extension',
           #Temporal resolution
           'freq': '1 monthly',
           #Output folder
           'base_out': '../Environmental_Data/ACCESS-OM2-01'}

## Loading data frame with ACCESS-OM2-01 outputs
We can use this data frame to find the variable names for the environmental factors that we know are influential for the distribution of crabeater seals.

In [102]:
#Loading data frame with model outputs
var_acc = cc.querying.get_variables(session, experiment = varDict['exp_ext'], frequency = '1 monthly')

#Searching data frame for variables of interest
var_acc[var_acc.name.str.contains('salt')]

Unnamed: 0,name,long_name,units,frequency,ncfile,cell_methods,# ncfiles,time_start,time_end
58,fsalt_ai_m,salt flux ice to ocean,kg/m^2/s,1 monthly,output1008/ice/OUTPUT/iceh.2023-03.nc,time: mean,51,2019-01-01 00:00:00,2023-04-01 00:00:00
59,fsalt_m,salt flux ice to ocn (cpl),kg/m^2/s,1 monthly,output1008/ice/OUTPUT/iceh.2023-03.nc,time: mean,51,2019-01-01 00:00:00,2023-04-01 00:00:00
96,salt,Practical Salinity,psu,1 monthly,output1008/ocean/ocean-3d-salt-1-monthly-mean-...,time: mean,51,2019-01-01 00:00:00,2023-04-01 00:00:00
97,salt_xflux_adv,rho*dzt*dyt*u*tracer,kg/sec,1 monthly,output1008/ocean/ocean-3d-salt_xflux_adv-1-mon...,time: mean,51,2019-01-01 00:00:00,2023-04-01 00:00:00
98,salt_yflux_adv,rho*dzt*dxt*v*tracer,kg/sec,1 monthly,output1008/ocean/ocean-3d-salt_yflux_adv-1-mon...,time: mean,51,2019-01-01 00:00:00,2023-04-01 00:00:00
105,sfc_salt_flux_coupler,sfc_salt_flux_coupler: flux from the coupler,kg/(m^2*sec),1 monthly,output1008/ocean/ocean-2d-sfc_salt_flux_couple...,time: mean,51,2019-01-01 00:00:00,2023-04-01 00:00:00
106,sfc_salt_flux_ice,sfc_salt_flux_ice,kg/(m^2*sec),1 monthly,output1008/ocean/ocean-2d-sfc_salt_flux_ice-1-...,time: mean,51,2019-01-01 00:00:00,2023-04-01 00:00:00
107,sfc_salt_flux_restore,sfc_salt_flux_restore: flux from restoring term,kg/(m^2*sec),1 monthly,output1008/ocean/ocean-2d-sfc_salt_flux_restor...,time: mean,51,2019-01-01 00:00:00,2023-04-01 00:00:00
120,surface_salt,Practical Salinity,psu,1 monthly,output1008/ocean/ocean-2d-surface_salt-1-month...,time: mean,51,2019-01-01 00:00:00,2023-04-01 00:00:00


## Completing dictionary with useful variables
Now that we identified the correct name for the variable of our interest, we can complete our dictionary.

In [5]:
#Variable name in the model
varDict['var_mod'] = 'krill_ggp'
#Name of column where we will store the extracted data
varDict['var_short_name'] = 'krill_ggp'
#Defining if this variable is related to sea ice or not
varDict['ice_data'] = False
#Checking final dictionary
varDict

{'model': 'ACCESS-OM2-01',
 'exp': '01deg_jra55v140_iaf_cycle4',
 'exp_ext': '01deg_jra55v140_iaf_cycle4_jra55v150_extension',
 'freq': '1 monthly',
 'base_out': '../Environmental_Data/ACCESS-OM2-01',
 'var_mod': 'krill_ggp',
 'var_short_name': 'krill_ggp',
 'ice_data': False}

## Loading data from ACCESS-OM2-01
ACCESS-OM2-01 has four different cycles available with a temporal range of 60 years (1958-2018), but only cycle 4 includes biogeochemical (BGC) outputs. Since we are interested in examining the effect of some BGC variables on crabeater seals, we will use the fourth cycle in this project. Another special feature of cycle 4 is that it was extended until December 2022. Outputs from cycle 4 are available through two experiments: `01deg_jra55v140_iaf_cycle4` which goes from 1958 to 2018, and `01deg_jra55v140_iaf_cycle4_jra55v150_extension` that includes outputs from 2019 to 2022. The crabeater dataset has temporal range between 1978 and 2022, and this is the reason why we are using the two experiment of cycle 4.  
  
In the chunk below, we load the ACCESS-OM2-01 data, correct longitudes so they range between -180 and +180, and apply a transformation to temperature outputs (`temp`) only because the original units are in Kelvin and we need them in $^{\circ}C$.

In [49]:
#Loading data from fourth cycle (temporal range 1958 to 2018)
var_df = uf.getACCESSdata_SO(varDict['var_mod'], '1981-11', '2014-01', 
                              freq = varDict['freq'], ses = session, minlat = -80, maxlat = -40,
                              exp = varDict['exp'], ice_data = varDict['ice_data'])

#Transforming longitudes so their range is +/-180 degrees
var_df = uf.corrlong(var_df)

#Selecting Indian sector
var_df = var_df.sel(xt_ocean = slice(30, 170))

#If temperature data, transform from Kelvin to degrees C
if var_df.name == 'temp':
    var_df = var_df-273.15

#Check results
var_df

Unnamed: 0,Array,Chunk
Bytes,118.23 GiB,1.76 MiB
Shape,"(387, 75, 781, 1400)","(1, 19, 135, 180)"
Dask graph,74304 chunks in 778 graph layers,74304 chunks in 778 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 118.23 GiB 1.76 MiB Shape (387, 75, 781, 1400) (1, 19, 135, 180) Dask graph 74304 chunks in 778 graph layers Data type float32 numpy.ndarray",387  1  1400  781  75,

Unnamed: 0,Array,Chunk
Bytes,118.23 GiB,1.76 MiB
Shape,"(387, 75, 781, 1400)","(1, 19, 135, 180)"
Dask graph,74304 chunks in 778 graph layers,74304 chunks in 778 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


## *Optional: Subsetting surface layer data*
For some ocean variables, we need to subset data to extract surface values or bottom values. Subsetting data for the surface layer is an easy process, we simply need to select the first depth bin available. The `st_ocean` dimension contains the depth bins.

In [32]:
#Selecting the first depth available in the model (i.e. surface layer)
var_df = var_df.isel(st_ocean = 0)
#Removing depth dimension
var_df = var_df.squeeze().drop('st_ocean')
#Checking results - dataset has three dimensions instead of the original four
var_df

Unnamed: 0,Array,Chunk
Bytes,1.58 GiB,94.92 kiB
Shape,"(387, 781, 1400)","(1, 135, 180)"
Dask graph,18576 chunks in 780 graph layers,18576 chunks in 780 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 1.58 GiB 94.92 kiB Shape (387, 781, 1400) (1, 135, 180) Dask graph 18576 chunks in 780 graph layers Data type float32 numpy.ndarray",1400  781  387,

Unnamed: 0,Array,Chunk
Bytes,1.58 GiB,94.92 kiB
Shape,"(387, 781, 1400)","(1, 135, 180)"
Dask graph,18576 chunks in 780 graph layers,18576 chunks in 780 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


## *Optional: Subsetting bottom data*
Subsetting data for the bottom layer is not as straightforward as for the surface. This is because the bathymetry is not the same across the Southern Ocean. To identify the correct depth bin that contains the deepest values for a particular grid cell.  
  
We have included a function called `extract_bottom_layer` in the `UsefulFunctions.py` script that extracts data for the bottom layer of any variables with a depth dimension (`st_ocean`). For more details refer to the script.  

In [50]:
var_df = uf.extract_bottom_layer(var_df)
#Checking results
var_df

Unnamed: 0,Array,Chunk
Bytes,3.15 GiB,189.84 kiB
Shape,"(387, 781, 1400)","(1, 135, 180)"
Dask graph,18576 chunks in 800 graph layers,18576 chunks in 800 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 3.15 GiB 189.84 kiB Shape (387, 781, 1400) (1, 135, 180) Dask graph 18576 chunks in 800 graph layers Data type float64 numpy.ndarray",1400  781  387,

Unnamed: 0,Array,Chunk
Bytes,3.15 GiB,189.84 kiB
Shape,"(387, 781, 1400)","(1, 135, 180)"
Dask graph,18576 chunks in 800 graph layers,18576 chunks in 800 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


## Loading other dynamic variables derived from calculations or regridded
These variables are not directly available in ACCESS-OM2-01, but they have been calculated from model outputs. Refer to folder `02_Environmental_Data` to see the full details of each calculation.

In [6]:
file_path = '/g/data/v45/la6889/Chapter2_Crabeaters/Krill_habitat/*.nc'
var_name = varDict['var_mod']

In [7]:
#Load data
var_df = xr.open_mfdataset(sorted(glob(file_path)))[var_name]
#Selecting dates between 1981 and 2013 and for the Indian sectors
var_df = var_df.sel(time = slice('1981-11', '2013-12'), xt_ocean = slice(30, 170))
#If mask is present as dimension, drop it
if 'mask' in var_df.coords:
    var_df = var_df.squeeze().drop('mask')
#Rechunking dataset
var_df = var_df.chunk((1, 135, 180))
#Check results
var_df

Unnamed: 0,Array,Chunk
Bytes,196.33 MiB,94.92 kiB
Shape,"(66, 557, 1400)","(1, 135, 180)"
Dask graph,2640 chunks in 68 graph layers,2640 chunks in 68 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 196.33 MiB 94.92 kiB Shape (66, 557, 1400) (1, 135, 180) Dask graph 2640 chunks in 68 graph layers Data type float32 numpy.ndarray",1400  557  66,

Unnamed: 0,Array,Chunk
Bytes,196.33 MiB,94.92 kiB
Shape,"(66, 557, 1400)","(1, 135, 180)"
Dask graph,2640 chunks in 68 graph layers,2640 chunks in 68 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


## Extracting environmental data
We will use the `latitude` and `longitude` columns together with the `event_date` column from the crabeater seal observations to find the corresponding grid cell in the model outputs and extract the value of the environmental factor of our interest.

In [11]:
#Getting coordinates from the crabeater data
lat = xr.DataArray(crabeaters.latitude)
lon = xr.DataArray(crabeaters.longitude)
#Getting data of observation from the crabeater data
time = xr.DataArray(crabeaters.apply(lambda x: pd.to_datetime(f'{x.date}-16'), axis = 1))

## Extracting data

In [12]:
#Extracting data
var_sub = var_df.sel(time = time, yt_ocean = lat, xt_ocean = lon, method = 'nearest')

#Transforming to data frame
var_pd = var_sub.to_dataframe().sort_values(['time', 'xt_ocean', 'yt_ocean'])
#Adding year and month
var_pd['year'] = var_pd.time.dt.year
var_pd['month'] = var_pd.time.dt.month
#Removing time column that is no longer needed
var_pd.drop(columns = 'time', inplace = True)
#Finding name of columns to round up
round_cols = [i for i in var_pd.columns if 'ocean' in i]
#Rounding coordinate values prior to merging
var_pd = var_pd.round({round_cols[0]: 3, round_cols[1]: 3})
#Renaming variable to be added to crabeater data
var_pd.rename(columns = {varDict['var_mod']: varDict['var_short_name']}, inplace = True)
#Getting column names for merging
cols = var_pd.drop(columns = varDict['var_short_name']).columns.tolist()

#Checking results
print(cols); var_pd

['yt_ocean', 'xt_ocean', 'year', 'month']


Unnamed: 0_level_0,yt_ocean,xt_ocean,krill_ggp,year,month
dim_0,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
303,-67.465,75.15,0.975826,1981,12
16748,-64.331,75.35,1.073900,1981,12
6374,-65.269,77.15,1.043685,1981,12
2452,-66.240,77.65,1.024766,1981,12
8358,-65.058,77.65,1.055323,1981,12
...,...,...,...,...,...
14147,-64.633,146.95,1.081529,2013,12
9285,-65.058,149.05,1.045662,2013,12
16620,-64.461,149.95,1.080060,2013,12
27250,-63.136,150.05,1.118926,2013,12


## Joining environmental data frame with background points
We will use the grid cell coordinates and dates to perform this join.

In [13]:
crabeaters = crabeaters.merge(var_pd, on = cols, how = 'left')
crabeaters = crabeaters.drop_duplicates()
crabeaters

Unnamed: 0,date,year,sector,longitude,latitude,xt_ocean,yt_ocean,zone,month,season_year,...,bottom_temp_degC,SSS_psu,bottom_sal_psu,vel_lat_surf_msec,vel_lat_bottom_msec,vel_lon_surf_msec,vel_lon_bottom_msec,lt_pack_ice,dist_ice_edge_km,krill_ggp
0,1987-11,1987,Central Indian,71.45,-69.65,71.45,-69.662,Antarctic,11,autumn,...,0.000000,,0.000000,,,,,0.000000,,
1,1987-11,1987,Central Indian,73.05,-69.65,73.05,-69.662,Antarctic,11,autumn,...,0.000000,,0.000000,,,,,0.000000,,
2,1996-11,1996,Central Indian,74.45,-69.65,74.45,-69.662,Antarctic,11,autumn,...,-1.791901,33.340260,34.478104,0.004291,-0.012704,0.006942,0.035388,0.666667,-1011.244645,0.925517
3,1998-11,1998,Central Indian,76.55,-69.55,76.55,-69.535,Subantarctic,11,autumn,...,0.000000,,0.000000,,,,,0.000000,,
4,1996-11,1996,Central Indian,73.75,-69.45,73.75,-69.451,Antarctic,11,autumn,...,0.000000,,0.000000,,,,,0.000000,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
37880,1989-11,1989,Central Indian,74.05,-59.45,74.05,-59.442,Antarctic,11,autumn,...,0.815704,33.729412,34.719143,0.063438,0.001540,0.004965,-0.005750,0.000000,-83.181802,1.050051
37881,1998-11,1998,Central Indian,71.75,-59.25,71.75,-59.238,Antarctic,11,autumn,...,-0.522980,33.869095,34.632740,-0.000976,0.000412,0.076354,-0.005092,0.000000,76.604135,1.089497
37882,1996-11,1996,Central Indian,76.35,-59.25,76.35,-59.238,Antarctic,11,autumn,...,1.297180,33.836685,34.741093,0.015163,0.000069,0.007057,0.003959,0.011905,52.081125,1.098716
37883,1989-11,1989,Central Indian,73.95,-58.85,73.95,-58.827,Subantarctic,11,autumn,...,0.132050,33.750946,34.673641,0.062015,-0.000036,0.164571,-0.000467,0.000000,-18.242142,1.062114


## Saving data frame to disk

In [14]:
#Ensure output folder exists
os.makedirs(varDict['base_out'], exist_ok = True)

#Create file path where data will be saved
file_out = os.path.join(varDict['base_out'], 'unique_background_20x_obs_all_env.csv')

#Saving as csv file
crabeaters.to_csv(file_out, index = False)