# Plotting sea ice in the Southern Ocean
This script produces plots of sea ice area around the Southern Ocean using outputs from ACCESS-OM2-01.  
**Requirements:** It is suggested you use the `conda/analysis3-20.01` (or later) kernel. This can be defined using the drop down list on the left hand corner, or type `!module load conda/analysis3` in a Python cell.

## Loading relevant modules
These modules are used to access relevant outputs and to manipulate data. 

In [1]:
#This first line will show plots produced by matplotlib inside this Jupyter notebook
%matplotlib inline

import cosima_cookbook as cc
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import netCDF4 as nc
import xarray as xr
import numpy as np
import pandas as pd
import datetime as dt
import os
import calendar
from dask.distributed import Client
from tqdm import tqdm_notebook #to access observations
import gc #to free up memory
#Activate if needed
import copy

The following modules are used in map creation.

In [2]:
import cmocean as cm                              # Nice colormaps
from collections import OrderedDict               # We often use this to organise our experiments
import cftime                                     # In case you need to work with time axes
from glob import glob                             # If you need to search file systems
import cartopy.crs as ccrs                        # For making maps with different projections
import cartopy.feature as cft                     # For adding features to maps

## Accessing model outputs
Start a cluster that has multiple cores to work with. Remember that the number of cores cannot exceed the number of CPUs requested when accessing GADI.  
If the line below does not run, skip it. The result is that the job will not be parallelised, but the script will still run.

In [3]:
client = Client(n_workers = 12)

Access the default database of experiments from where data will be loaded.

In [4]:
session = cc.database.create_session()

This notebook uses the outputs for the v140 run of ACCESS-OM2-01 which includes wind forcing. Includes experiments `01deg_jra55v140_iaf` and `01deg_jra55v140_iaf`. A list of experiments can be accessed using `cc.querying.get_experiments(session)`, you can get a detailed list of experiments by adding `all = True` argument.

## Accessing ACCESS-OM2 model outputs
Once the correct experiment variables have been identified, data can be loaded into the notebook for further processing. All variables needed to do this are included below.

In [5]:
#Saving name of experiments of interest in variables that can be easily referred to
exp = "01deg_jra55v140_iaf_cycle2"
#Name (short name) of variable of interest
varInt = "aice_m" #sea ice concentration

Optional variables, activate if needed. Note that because times need correction, **the start time is actually one month after the month we are interested in**. See below for explanation.

In [6]:
#Give the start and end dates for the analyses. The input must be given as a list, even if it is one item only.
#Dates can be given as full date (e.g., 2010-01-01), just year and month, or just year. If multiple years are to be analysed, ensure both variables have the same length
#Start date
stime = [str(i)+'-02' for i in range(2000, 2001, 1)]
#End date
etime = [str(i)+'-01' for i in range(2001, 2002, 1)]
#Define frequency. Remember to check frequency and variable of interest are related to each other, for example, aice_m has a monthly frequency, while aice has a daily frequency.
freq = '1 monthly'

## Defining functions

**Accessing ACCESS-OM2-01 outputs**  
Defining function that loads data automatically using `cc.querying.getvar()` in a loop. The inputs needed are similar to those for the `cc.querying.getvar()` function, with the addition of inputs to define an area of interest.  
The `getACCESSdata` will achieve the following:  
- Access data for the experiment and variable of interest at the frequency requested and within the time frame specified  
- Apply **time corrections** as midnight (00:00:00) is interpreted differently by the CICE model and the xarray package.
    - CICE reads *2010-01-01 00:00:00* as the start of 2010-01-01, while xarray interprets it as the start of the following day (2010-01-02). To fix this problem, 12 hours are subtracted from the time dimension (also known as *time coordinate*).  
- Latitude and longitude will be corrected in the dataset using the `geolon_t` dataset. The coordinate names are replaced by names that are more intuitive.  
- Minimum and maximum latitudes and longitudes can be specified in the function to access specific areas of the dataset if required.  The **Southern Ocean** is defined as ocean waters south of 45S.

In [7]:
#Accessing corrected longitude data to update geographical coordinates in the array of interest
geolon_t = cc.querying.getvar(exp, 'geolon_t', session, n = -1)

#Frequency, experiment and session do not need to be specified if they were defined in the previous step
def getACCESSdata(var, start, end, exp = exp, freq = freq, ses = session, minlon = geolon_t.yt_ocean.values.min(), maxlon = geolon_t.yt_ocean.values.max(),\
                  minlat = geolon_t.xt_ocean.values.min(), maxlat = geolon_t.xt_ocean.values.max()):
    #Accessing data
    vararray = cc.querying.getvar(exp, var, ses, frequency = freq, start_time = start, end_time = end)
    #Applying time correction 
    vararray['time'] = vararray.time - dt.timedelta(hours = 12)
    # assign new coordinates to SST dataset 
    #.coords extracts the values of the coordinate specified in the brackets
    vararray.coords['ni'] = geolon_t['xt_ocean'].values
    vararray.coords['nj'] = geolon_t['yt_ocean'].values
    #Rename function from xarray uses dictionaries to change names. Keys refer to current names and values are the desired names
    vararray = vararray.rename(({'ni':'xt_ocean', 'nj':'yt_ocean'}))
    #Subsetting data to area of interest
    #Subsetting sea ice concentration array
    vararray = vararray.sel(yt_ocean = slice(minlon, maxlon))
    return vararray

**Run-Length Encoding (RLE)**  
Defining `rle_encode` function which identifies values in a string and the number of times a value is repeated. For example:  
***Input*** `rle_encode("AAABBCCCC")`  
***Output*** `[[A, B, C] [3, 2, 4]]`  
The input can be a list or a string.

In [8]:
def rle_encode(data):
    #Variable that will contain the values identified in the input list
    encoding = []
    #Variable that will contain the number of times in a row a value is identified in the input list
    numb = []
    #String containing the value of the previous character
    prev_char = ''
    #Counter
    count = 1
     
    #Looping through every item in the input
    for char in data:
        #If item is not the same as previous character
        if str(char) != prev_char:
            #If previous character does exist, then append it to encoding list 
            if prev_char:
                encoding.append(prev_char)
                #Appending counter to number variable
                numb.append(count)
            #Reset counter to one before updating previous character
            count = 1
            #Pass current character as previous character
            prev_char = str(char)
        #If char and prev_char are the same
        else:
            #Increase counter by one
            count += 1
    else:
        # Finish off the encoding
        encoding.append(prev_char)
        numb.append(count)
        #Create array with values in first section and number of repeats in second section
    z = np.array([[float(i) for i in encoding], [float(j) for j in numb]])
    return(z)

## Applying functions
A variety of plots showing sea ice area changes in the Southern Ocean will be created below and saved in a folder using years of data shown in the graph as a unique identifier.

The `aice_m` variable gives the mean monthly concentration of ice in a cell. In other words, this represents the monthly mean proportion of ice found in a given grid cell. To calculate the area covered by ice, the area of each cell is needed.  
The area is saved as `area_t` and it is given by the ocean model, below we will load it to the notebook and multiply it by `aice_m` to get the total ice area.

In [9]:
#Loading ice area data (in m2) - Only one file needed as they will be the same regardless of the time
IceArea = cc.querying.getvar(exp, 'area_t', session, n = 1) #ncfile = 'iceh.2010-01.nc') - A specific ncfile name can be specified
#Accessing ACCESS 0.1deg outputs for the entire time range of interest
SO = getACCESSdata(varInt, stime[0], etime[-1], minlon = -90, maxlon = -45)

## Sea ice seasonality calculations
The code below has been 'translated' from the `calc_ice_season` which is part of the `aceecostats` package developed by Michael Sumner at AAD. This section calculates annual sea ice advance and retreat as defined by Massom et al 2013 [DOI:10.1371/journal.pone.0064756]. If a pixel has at least 15% of sea ice concentration for five consecutive days, sea ice is considered to be advancing at that pixel. Day of retreat is the time when concentration remains below 15% until the end of the year.

In [10]:
#Threshold refers to 15% sea ice concentration
thres = 0.15
#Refers to the number of consecutive days ice concentration needs to be over 15% to be considered as advancing
ndays = 5
#How many years are we evaluating
year_n = len(SO.time.values)
#Create a copy of one year of the sea ice concentration data and fill it up with zeroes
template = copy.deepcopy(SO[1])*0

#Create a copy of all SO data
IceMat = copy.deepcopy(SO)
#Change multilayer data array to matrix, each column will hold all data for one time step
IceMat = IceMat.stack(z = ('xt_ocean', 'yt_ocean')).transpose()

#Create an array with the same shape as the matrix to calculate sea ice advance and another one for sea ice retreat
adv = np.zeros(IceMat.shape[0])
ret = np.zeros(IceMat.shape[0])

#Create a new matrix where all nan values will be set to -999 (based on concentration data)
threshold = copy.deepcopy(IceMat).fillna(-999)
#Any cells that are below the threshold will be set to zero, otherwise they are set to one
threshold = threshold.where(threshold < thres, other = True).where(threshold >= thres, other = False)

#Create new variable that contains the sum of all concentrations over one year
rsum = threshold.sum('time')

#Make an array with the time sums and make a boolean to identify zeroes
alllt = np.array(rsum)
alllt = np.where(alllt == 0, True, False)

#Make an array with the time sums and make a boolean to identify cells with values equal to the time being analysed
allgt = np.array(rsum)
allgt = np.where(allgt == threshold.shape[1], True, False)

#New array will have zeroes for cells that either had sum of zero or were always covered in ice
visit = np.where((alllt == False) & (allgt == False))
visit = visit[0]

In [None]:
for i in visit:
    rl = rle_encode(threshold[i,].values)
    for j in np.arange(0, len(rl[0])):
        if(rl[0][j] == True and rl[1][j] >= ndays):
            if j == 0:
                adv[i] = 1
            else:
                adv[i] = sum(rl[1][0:j])
                break
    if(adv[i] == 0):
        adv[i] = np.nan
    revlengths = rl[1][::-1]
    revvals = rl[0][::-1]
    for k in np.arange(0, len(revlengths)):
        if(revvals[k] == True):
            if k == 0:
                ret[i] = 1
            else:
                ret[i] = sum(revlengths[k:len(revlengths)])
                break
    if(ret[i] == 0):
        ret[i] = np.nan

In [None]:
adv[np.where(alllt == True)] = np.nan
adv[np.where(allgt == True)] = 1
ret[np.where(alllt == True)] = np.nan
ret[np.where(allgt == True)] = 1
        
adv = xr.DataArray(np.array(adv).reshape(template.shape[0], template.shape[1]), coords = [('yt_ocean', template['yt_ocean'].values), ('xt_ocean', template['xt_ocean'].values)])
adv.to_netcdf('/g/data/v45/la6889/Calculations/SeaIceAdvance.nc')
ret = xr.DataArray(np.array(ret).reshape(template.shape[0], template.shape[1]), coords = [('yt_ocean', template['yt_ocean'].values), ('xt_ocean', template['xt_ocean'].values)])
ret.to_netcdf('/g/data/v45/la6889/Calculations/SeaIceRetreat.nc')