# Step 2: Fit 7-9 Class GMMs and save class properties

This notebook will fit 7, 8 and 9 class GMMs to UK-ESM historical Southern Ocean data 2001-2018, following Jones et al. 2019 (https://doi.org/10.1029/2018JC014629). These models were trained in Step1_trainmodels.ipynb.

These files are required to reproduce Figures YY from *Heuristic Methods for Determining the Number of Classes in Unsupervised Classification of Climate Models*, E. Boland et al. 2022 (doi to follow). This requires cluster_utils.py and input datafiles via the googleapi CMIP6 store (see cluster_utils.py for more info)

Outputs stored in \[model\]/\[ensemble\]/\[nclasses\]

Please attribute any plots or code from this notebook using the DOI from Zenodo: to come

Updated Feb 2023
E Atkinson & E Boland [emmomp@bas.ac.uk](email:emmomp@bas.ac.uk)

In [2]:
from dask.distributed import Client

client = Client("tcp://127.0.0.1:32988")
client

0,1
Connection method: Direct,
Dashboard: http://127.0.0.1:8787/status,

0,1
Comm: tcp://127.0.0.1:32988,Workers: 8
Dashboard: http://127.0.0.1:8787/status,Total threads: 8
Started: Just now,Total memory: 64.00 GiB

0,1
Comm: tcp://127.0.0.1:32937,Total threads: 1
Dashboard: http://127.0.0.1:46130/status,Memory: 8.00 GiB
Nanny: tcp://127.0.0.1:34357,
Local directory: /tmp/dask-worker-space/worker-hozgwb7j,Local directory: /tmp/dask-worker-space/worker-hozgwb7j
Tasks executing: 0,Tasks in memory: 0
Tasks ready: 0,Tasks in flight: 0
CPU usage: 0.0%,Last seen: Just now
Memory usage: 138.54 MiB,Spilled bytes: 0 B
Read bytes: 13.92 kiB,Write bytes: 14.90 kiB

0,1
Comm: tcp://127.0.0.1:35017,Total threads: 1
Dashboard: http://127.0.0.1:40623/status,Memory: 8.00 GiB
Nanny: tcp://127.0.0.1:37184,
Local directory: /tmp/dask-worker-space/worker-54jx_8fl,Local directory: /tmp/dask-worker-space/worker-54jx_8fl
Tasks executing: 0,Tasks in memory: 0
Tasks ready: 0,Tasks in flight: 0
CPU usage: 2.0%,Last seen: Just now
Memory usage: 140.99 MiB,Spilled bytes: 0 B
Read bytes: 19.22 kiB,Write bytes: 18.41 kiB

0,1
Comm: tcp://127.0.0.1:41329,Total threads: 1
Dashboard: http://127.0.0.1:42454/status,Memory: 8.00 GiB
Nanny: tcp://127.0.0.1:35189,
Local directory: /tmp/dask-worker-space/worker-itja0jnu,Local directory: /tmp/dask-worker-space/worker-itja0jnu
Tasks executing: 0,Tasks in memory: 0
Tasks ready: 0,Tasks in flight: 0
CPU usage: 0.0%,Last seen: Just now
Memory usage: 142.29 MiB,Spilled bytes: 0 B
Read bytes: 13.93 kiB,Write bytes: 14.91 kiB

0,1
Comm: tcp://127.0.0.1:34401,Total threads: 1
Dashboard: http://127.0.0.1:43390/status,Memory: 8.00 GiB
Nanny: tcp://127.0.0.1:40999,
Local directory: /tmp/dask-worker-space/worker-fy8wujwg,Local directory: /tmp/dask-worker-space/worker-fy8wujwg
Tasks executing: 0,Tasks in memory: 0
Tasks ready: 0,Tasks in flight: 0
CPU usage: 2.0%,Last seen: Just now
Memory usage: 143.28 MiB,Spilled bytes: 0 B
Read bytes: 50.71 kiB,Write bytes: 51.95 kiB

0,1
Comm: tcp://127.0.0.1:33902,Total threads: 1
Dashboard: http://127.0.0.1:41046/status,Memory: 8.00 GiB
Nanny: tcp://127.0.0.1:39730,
Local directory: /tmp/dask-worker-space/worker-wbkcljr5,Local directory: /tmp/dask-worker-space/worker-wbkcljr5
Tasks executing: 0,Tasks in memory: 0
Tasks ready: 0,Tasks in flight: 0
CPU usage: 2.0%,Last seen: Just now
Memory usage: 144.28 MiB,Spilled bytes: 0 B
Read bytes: 51.88 kiB,Write bytes: 53.12 kiB

0,1
Comm: tcp://127.0.0.1:44059,Total threads: 1
Dashboard: http://127.0.0.1:43079/status,Memory: 8.00 GiB
Nanny: tcp://127.0.0.1:39053,
Local directory: /tmp/dask-worker-space/worker-xyvu9j7q,Local directory: /tmp/dask-worker-space/worker-xyvu9j7q
Tasks executing: 0,Tasks in memory: 0
Tasks ready: 0,Tasks in flight: 0
CPU usage: 2.0%,Last seen: Just now
Memory usage: 142.04 MiB,Spilled bytes: 0 B
Read bytes: 48.05 kiB,Write bytes: 49.28 kiB

0,1
Comm: tcp://127.0.0.1:35525,Total threads: 1
Dashboard: http://127.0.0.1:43010/status,Memory: 8.00 GiB
Nanny: tcp://127.0.0.1:44447,
Local directory: /tmp/dask-worker-space/worker-n2itsz5o,Local directory: /tmp/dask-worker-space/worker-n2itsz5o
Tasks executing: 0,Tasks in memory: 0
Tasks ready: 0,Tasks in flight: 0
CPU usage: 0.0%,Last seen: Just now
Memory usage: 144.02 MiB,Spilled bytes: 0 B
Read bytes: 20.48 kiB,Write bytes: 19.67 kiB

0,1
Comm: tcp://127.0.0.1:33611,Total threads: 1
Dashboard: http://127.0.0.1:39036/status,Memory: 8.00 GiB
Nanny: tcp://127.0.0.1:40300,
Local directory: /tmp/dask-worker-space/worker-gti__av5,Local directory: /tmp/dask-worker-space/worker-gti__av5
Tasks executing: 0,Tasks in memory: 0
Tasks ready: 0,Tasks in flight: 0
CPU usage: 0.0%,Last seen: Just now
Memory usage: 141.70 MiB,Spilled bytes: 0 B
Read bytes: 49.34 kiB,Write bytes: 50.57 kiB


In [3]:
import numpy as np
import xarray as xr

import os
import pickle

import cluster_utils as flt

### User options
Leave as is to recreate the paper

In [4]:
# Number of classes 
model='model'
classes = [7,8,9]
#Time range
tslice=slice('2001-01', '2017-12') 
#Depth range
levSel=slice(5, 2000)
ids = ['r1i1p1f2', 'r2i1p1f2', 'r3i1p1f2', 'r4i1p1f2', 'r5i1p1f3', 'r6i1p1f3', 'r7i1p1f3', 'r8i1p1f2', 'r9i1p1f2', 'r10i1p1f2']
mask = np.load('data/mask.npy', allow_pickle=True)

### Train models and generate average profiles for chosen ensemble members and classes 

In [6]:
avg_profiles = {}
for m_id in ids:
    
    for nn,n_classes in enumerate(classes):   
        path_n = '{}/{}/{}'.format(model,m_id, n_classes)

        print('Starting {}'.format(m_id))
        options = {'memberId' : m_id}
        path_id = '{}/{}'.format(model,m_id)
    
        # Load PCA
        with open('{}/pca.obj'.format(path_id),'rb') as file:
            pca=pickle.load(file)         
        
        # Retrieve ALL Southern Ocean data
        options = {'memberId' : m_id}
        data = flt.retrieve_profiles(timeRange=tslice,mask=mask,options=options,levSel=levSel)
        data = data.chunk({'time': data.sizes['time'], 'n': 1024})
        # Normalise the samples
        data_norm = flt.normalise_data(data, ('n', 'time')) 
        # Transform to PCA space
        data_trans = flt.pca_transform(data_norm, pca)
        print('Finished setup for {}'.format(m_id))      

        for nn,n_classes in enumerate(classes):  
            path_n = '{}/{}/{}'.format(model,m_id, n_classes)
            with open('{}/gmm.obj'.format(path_n),'rb') as file:
                gmm=pickle.load(file)                

            # Classify full dataset            
            print('Classifying full dataset into {} classes'.format(n_classes))
            data_classes = flt.gmm_classify(data_trans, gmm)
            data_probs = flt.gmm_prob(data_trans, gmm)
            print('Classification complete, writing to file'.format(n_classes))
            flt.write_tonc(data_classes.reset_index('n'),n_classes,m_id,'class',path_n)
            flt.write_tonc(data_probs.reset_index('n').mean('time'),n_classes,m_id,'probs',path_n)
            # Calculate average profiles for each class
            avg_prof = flt.avg_profiles(data, data_classes, n_classes)
            print('Average profiles calculated, writing to file'.format(n_classes))
            with open('{}/avg_prof.obj'.format(path_n), 'wb') as file:
                pickle.dump(avg_prof, file)      
            print('Done with {} classes'.format(n_classes))      
    
print('Done!')

Found classifications for r1i1p1f2, skipping
Found classifications for r2i1p1f2, skipping
Found classifications for r3i1p1f2, skipping
Found classifications for r4i1p1f2, skipping
Found classifications for r5i1p1f3, skipping
Found classifications for r6i1p1f3, skipping
Found classifications for r7i1p1f3, skipping
Found classifications for r8i1p1f2, skipping
Starting r9i1p1f2
No models found, generating training set
Finished setup for r9i1p1f2
Classifying full dataset into 7 classes
Classification complete, writing to file
class written to model_20012017/r9i1p1f2/7/class.nc
probs written to model_20012017/r9i1p1f2/7/probs.nc
Average profiles calculated, writing to file
Done with 7 classes
Training 8 class model
Classifying full dataset into 8 classes
Classification complete, writing to file
class written to model_20012017/r9i1p1f2/8/class.nc
probs written to model_20012017/r9i1p1f2/8/probs.nc
Average profiles calculated, writing to file
Done with 8 classes
Training 9 class model
Classif