<font size="8"> **Identifying areas with long-term pack ice presence** </font>  
Long-term presence of pack ice has been found to have a high correlation with crabeater seal (*Lobodon carcinophagus*) distribution.  
  
We will calculate the proportion of time that a grid cell has a sea ice concentration (SIC) of 85\% or more. This is similar to the definition given by [Oosthuizen et al 2021](https://doi.org/10.3354/meps13787), but we use a different reference period: 7 years prior to observation taken place, rather than a set time frame as they did (Jan 2003 to Dec 2010).

We will use monthly sea ice concentration from [NASA Goddard-merged Near Real Time NOAA/NSIDC Climate Data Record of Passive Microwave Sea Ice Concentration](https://climatedataguide.ucar.edu/climate-data/sea-ice-concentration-data-nasa-goddard-and-nsidc-based-nasa-team-algorithm) (version 3) dataset available in Gadi. The SIC data used here is regridded to match the ACCESS-OM2-01 model outputs.

# Loading modules

In [4]:
#Accessing model data
import cosima_cookbook as cc
#Dealing with data
import xarray as xr
import numpy as np
import pandas as pd
#Data visualisation
import matplotlib.pyplot as plt
#Useful package to deal with file paths
from glob import glob
import os

# Defining dictionary of useful variables
In this dictionary we will define a variables that will be used multiple times throughout this notebook to avoid repetition. It will mostly contain paths to folders where intermediate or final outputs will be stored.

In [5]:
varDict = {'base_folder': '/g/data/v45/la6889/Chapter2_Crabeaters/SeaIceObs/regridded_monthly/*.nc',
           'out_folder': '/g/data/v45/la6889/Chapter2_Crabeaters/SeaIceObs/LongTerm_PackIce/'}

# Loading sea ice concentration (SIC) observational dataset
SIC data comes from [NASA Goddard-merged Near Real Time NOAA/NSIDC Climate Data Record of Passive Microwave Sea Ice Concentration](https://climatedataguide.ucar.edu/climate-data/sea-ice-concentration-data-nasa-goddard-and-nsidc-based-nasa-team-algorithm) (version 3) dataset and it is available in Gadi.  

The data used here has been regridded to match the ACCESS-OM2-01 model outputs. See the `Obs_SIC-NASA.ipynb` for instructions on how to access and regrid the SIC observations.

In [6]:
#Loading data
var_ice = xr.open_mfdataset(glob(varDict['base_folder']))
#Renaming variable
var_ice = var_ice.rename_vars({'__xarray_dataarray_variable__': 'sic'})
#Loading as data array
var_ice = var_ice.sic
#Checking result
var_ice

Unnamed: 0,Array,Chunk
Bytes,9.81 GiB,243.90 MiB
Shape,"(494, 740, 3600)","(12, 740, 3600)"
Dask graph,42 chunks in 85 graph layers,42 chunks in 85 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 9.81 GiB 243.90 MiB Shape (494, 740, 3600) (12, 740, 3600) Dask graph 42 chunks in 85 graph layers Data type float64 numpy.ndarray",3600  740  494,

Unnamed: 0,Array,Chunk
Bytes,9.81 GiB,243.90 MiB
Shape,"(494, 740, 3600)","(12, 740, 3600)"
Dask graph,42 chunks in 85 graph layers,42 chunks in 85 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


# Long-term pack ice presence calculation
This calculation will require the following steps:
1. Identify grid cells where sea ice concentration (SIC) was 85\% or higher: We will assign a value of `1` to any grid cells that meet our condition, otherwise a value of `0` will be assigned.
2. For each timestep (month) within our period of interest (1978 to 2022) calculate proportion of time a grid cell meet our SIC condition: We add all timesteps within a 7 year period and divide by the total number of months in 7 years.
3. Create a new data array with proportion calculations.
4. Save results to local disk: Yearly files are saved due to limitations with saving very large files.

In [7]:
#Assigning a value of 1 when SIC condition is met
pack_ice = xr.where(var_ice >= 0.85, 1, 0).where(~np.isnan(var_ice))
#Checking results
pack_ice

Unnamed: 0,Array,Chunk
Bytes,9.81 GiB,243.90 MiB
Shape,"(494, 740, 3600)","(12, 740, 3600)"
Dask graph,42 chunks in 91 graph layers,42 chunks in 91 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 9.81 GiB 243.90 MiB Shape (494, 740, 3600) (12, 740, 3600) Dask graph 42 chunks in 91 graph layers Data type float64 numpy.ndarray",3600  740  494,

Unnamed: 0,Array,Chunk
Bytes,9.81 GiB,243.90 MiB
Shape,"(494, 740, 3600)","(12, 740, 3600)"
Dask graph,42 chunks in 91 graph layers,42 chunks in 91 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


## Subsetting data every 7 years

In [8]:
#Defining months in 7 years
months_in_7_yrs = 7*12
#Creating a list of timesteps within our study period
times_interest = pd.period_range('1984-11', '2019-12', freq = 'M')
#Identifying the date when the 7 year period begins
times_begin = [(t-pd.offsets.MonthEnd(months_in_7_yrs)).to_timestamp() for t in times_interest]

In [9]:
#Creating empty list to save results
long_term_pack_ice = []

#Loop through each timestep of our interest
for i, t in enumerate(times_interest):
    #Select 7-year periods and calculate proportion of time a grid cell covered by at least 85% SIC
    da = pack_ice.sel(time = slice(times_begin[i], t.to_timestamp())).sum('time')/months_in_7_yrs
    #Assign a date to each timestep - Here we assign the end date of the 7 year period
    da['time'] = t.to_timestamp()
    #Add results to list
    long_term_pack_ice.append(da)

In [10]:
#Concatenate results into a single file
long_term_pack_ice = xr.concat(long_term_pack_ice, dim = 'time')
#Checking results - Note there are fewer time steps that original data. As we do not need the initial seven years.
long_term_pack_ice

Unnamed: 0,Array,Chunk
Bytes,8.38 GiB,20.32 MiB
Shape,"(422, 740, 3600)","(1, 740, 3600)"
Dask graph,422 chunks in 3492 graph layers,422 chunks in 3492 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 8.38 GiB 20.32 MiB Shape (422, 740, 3600) (1, 740, 3600) Dask graph 422 chunks in 3492 graph layers Data type float64 numpy.ndarray",3600  740  422,

Unnamed: 0,Array,Chunk
Bytes,8.38 GiB,20.32 MiB
Shape,"(422, 740, 3600)","(1, 740, 3600)"
Dask graph,422 chunks in 3492 graph layers,422 chunks in 3492 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray


# Saving outputs to local machine
Data saved as yearly outputs due to limitations in storing a single large file.

In [12]:
#Ensuring output directory exists
os.makedirs(varDict['out_folder'], exist_ok = True)

In [13]:
#Grouping data by year
for yr, da in long_term_pack_ice.groupby('time.year'):
    #Creating name for yearly output file
    file_out = os.path.join(varDict['out_folder'], f'LongTerm_PackIce_Obs_Monthly_Jan-Dec_{yr}.nc')
    #Saving yearly output file
    da.to_netcdf(file_out)