# Running Pre-Trained CGNet Models

Purpose:
--------
The purpose of this notebook is to run pre-trained CGnet models for machine learning detection of atmospheric rivers and tropical cyclones.\
See ClimateNet repo here: https://github.com/andregraubner/ClimateNet

Authors/Contributors:
---------------------
* Teagan King
* John Truesdale
* Katie Dagon

## Import libraries

In [1]:
import os
import sys
import numpy as np

sys.path.append("/glade/work/kdagon/ClimateNet") # append path to ClimateNet repo
from climatenet.utils.data import ClimateDatasetLabeled, ClimateDataset
from climatenet.models import CGNet
from climatenet.utils.utils import Config
from climatenet.track_events import track_events
from climatenet.analyze_events import analyze_events
from climatenet.visualize_events import visualize_events

from os import path

## Confirm GPU resources
Can request through JupyterHub launch page.\
Current resources request (2/15/23): 1 node, 4 cpu, 64GB mem, 2 V100 GPU

In [2]:
# requires loading pytorch into environment
import torch
print(torch.cuda.is_available())
print(torch.cuda.device_count())

True
2


## Load pre-trained model
No need to specify config, just the folder with config/weights will work

In [7]:
cgnet = CGNet(model_path='/glade/work/kdagon/ML-extremes/trained_models/trained_cgnet.021523')

In [8]:
cgnet

<climatenet.models.CGNet at 0x2b6f842d8470>

In [11]:
cgnet.config

<climatenet.utils.utils.Config at 0x2b6f842d82e8>

## Test inference on a subset of historical 2000 data

In [12]:
inference = ClimateDataset('/glade/scratch/kdagon/cgnet/test_2000', cgnet.config)

In [14]:
inference.fields

{'TMQ': {'mean': 24.927238169017997, 'std': 15.817276954650879},
 'U850': {'mean': 1.0356735863118816, 'std': 8.29762077331543},
 'V850': {'mean': 0.20847854977498861, 'std': 6.231630802154541},
 'PSL': {'mean': 101095.03520124489, 'std': 1461.225830078125}}

In [19]:
inference.length

921

In [15]:
%%time
class_masks = cgnet.predict(inference) # masks with 1==TC, 2==AR

100%|██████████| 58/58 [01:44<00:00,  1.81s/it]


CPU times: user 46.7 s, sys: 18.1 s, total: 1min 4s
Wall time: 1min 47s


In [17]:
class_masks

In [26]:
# change the dataarray name
class_masks.name = 'masks'

In [27]:
class_masks

In [28]:
%%time
class_masks.to_netcdf("/glade/scratch/kdagon/cgnet/test_2000/class_masks.nc")

CPU times: user 28.5 ms, sys: 1.88 s, total: 1.91 s
Wall time: 2.13 s


## Test inference on RCP2.6 2006 data

In [29]:
inference_2006 = ClimateDataset('/glade/campaign/cgd/amp/jet/ClimateNet/data_processing/extend/2006-2015-RCP26-3hr/2006', cgnet.config) 

In [30]:
inference_2006.length

2920

In [31]:
%%time
class_masks_2006 = cgnet.predict(inference_2006) # masks with 1==TC, 2==AR

100%|██████████| 183/183 [08:21<00:00,  2.74s/it]


CPU times: user 2min 38s, sys: 1min, total: 3min 39s
Wall time: 8min 29s


In [32]:
class_masks_2006.name = 'masks'
class_masks_2006.to_netcdf("/glade/scratch/kdagon/cgnet/test_2006/class_masks.nc")

## Test inference on subset of RCP8.5 2086 data

In [33]:
inference_2086 = ClimateDataset('/glade/scratch/kdagon/cgnet/test_2086', cgnet.config)

In [34]:
inference_2086.length

921

In [35]:
%%time
class_masks_2086 = cgnet.predict(inference_2086)

100%|██████████| 58/58 [01:47<00:00,  1.85s/it]


CPU times: user 47 s, sys: 17.5 s, total: 1min 4s
Wall time: 1min 49s


In [36]:
class_masks_2086.name = 'masks'
class_masks_2086.to_netcdf("/glade/scratch/kdagon/cgnet/test_2086/class_masks.nc")

## Function for creating TC/AR masks using pre-trained model

In [3]:
def cgnet_load_create_masks(model_path, inference_path, save_dir, analyze=False, visualize=False):
    """Use a pre-trained model to create masks of tropical cyclones (mask value = 1)
    and atmospheric rivers (mask value = 2)
    
    Function will create NetCDF mask files and save them in the save_directory.

    Parameters:
    -----------
    model_path: str
        filepath to pre-trained CGnet model
    inference_path: str
        filepath to inference data
    save_dir: str
        filepath to where the masks will be saved as .nc files
    analyze: bool (optional)
        default is False; if True, will save plots for analyzing events using climatenet.analyze_events().
        Note that this can significantly increase the time to run.
    visualize : bool (optional)
        default is False; if True, will save plots for visualizing events using climatenet.visualize_events().
        Note that this can significantly increase the time to run.
    
    """
    # instantiate CGNet with pre-trained model
    cgnet = CGNet(model_path=model_path) 
    
    # inference using the pre-trained config file
    inference = ClimateDataset(inference_path, cgnet.config)
    class_masks = cgnet.predict(inference) # masks with 1==TC, 2==AR

    # create save dir, if needed
    if not os.path.isdir(save_dir):
        os.makedirs(save_dir)
    else:
        print("Warning: might overwrite {}".format(save_dir))
    
    # save out class masks
    class_masks.name = 'masks'
    class_masks.to_netcdf(save_dir+"/class_masks.nc")
    print("Saved class masks to {}".format(save_dir))
    
    # note: this is resource intensive
    #event_masks = track_events(class_masks) # masks with event IDs
    #event_masks.to_netcdf(save_dir+"/event_masks.nc")
    #print("Saved event masks to {}".format(save_dir))
    
    if analyze:
        analyze_events(event_masks, class_masks, save_dir+"/")
        print("Analyze events done")
    if visualize:
        visualize_events(event_masks, inference, save_dir+"/")
        print("Visualize events done")

    return

## Pre-trained model which uses TMQ, V850, U850, and PSL

### CESM Historical output, 2000-2005
2/23/23\
Memory use peaking at ~38GB during inference on each year\
Each year takes ~5 min to run

In [9]:
%%time
model_path = '/glade/work/kdagon/ML-extremes/trained_models/trained_cgnet.021523'
inference_dir = '/glade/campaign/cgd/ccr/kdagon/cgnet/B20TRC5CN/'

for year in range(2000, 2005):
    inference_path = inference_dir+str(year)
    save_dir = inference_path+'/masks'
    cgnet_load_create_masks(model_path, inference_path, save_dir, analyze=False, visualize=False)

100%|██████████| 183/183 [05:06<00:00,  1.67s/it]
  0%|          | 0/183 [00:00<?, ?it/s]

Saved class masks to /glade/campaign/cgd/ccr/kdagon/cgnet/B20TRC5CN/2000/masks


100%|██████████| 183/183 [05:02<00:00,  1.65s/it]
  0%|          | 0/183 [00:00<?, ?it/s]

Saved class masks to /glade/campaign/cgd/ccr/kdagon/cgnet/B20TRC5CN/2001/masks


100%|██████████| 183/183 [05:10<00:00,  1.70s/it]
  0%|          | 0/183 [00:00<?, ?it/s]

Saved class masks to /glade/campaign/cgd/ccr/kdagon/cgnet/B20TRC5CN/2002/masks


100%|██████████| 183/183 [05:18<00:00,  1.74s/it]
  0%|          | 0/183 [00:00<?, ?it/s]

Saved class masks to /glade/campaign/cgd/ccr/kdagon/cgnet/B20TRC5CN/2003/masks


100%|██████████| 183/183 [05:13<00:00,  1.72s/it]


Saved class masks to /glade/campaign/cgd/ccr/kdagon/cgnet/B20TRC5CN/2004/masks
CPU times: user 12min 13s, sys: 4min 32s, total: 16min 45s
Wall time: 27min 11s


In [13]:
%%time
model_path = '/glade/work/kdagon/ML-extremes/trained_models/trained_cgnet.021523'
inference_dir = '/glade/campaign/cgd/ccr/kdagon/cgnet/B20TRC5CN/'

# forgot that range doesn't sample the end year
year = 2005
inference_path = inference_dir+str(year)
save_dir = inference_path+'/masks'
cgnet_load_create_masks(model_path, inference_path, save_dir, analyze=False, visualize=False)

100%|██████████| 183/183 [05:10<00:00,  1.70s/it]


Saved class masks to /glade/campaign/cgd/ccr/kdagon/cgnet/B20TRC5CN/2005/masks
CPU times: user 2min 26s, sys: 52.3 s, total: 3min 19s
Wall time: 5min 23s


### CESM RCP2.6 output, 2006-2015
2/27/23: Each year takes ~9 min to run

In [7]:
%%time
model_path = '/glade/work/kdagon/ML-extremes/trained_models/trained_cgnet.021523'
inference_dir = '/glade/campaign/cgd/amp/jet/ClimateNet/data_processing/extend/2006-2015-RCP26-3hr/'
save_top_dir = '/glade/campaign/cgd/ccr/kdagon/cgnet/BRCP26C5CN/'

for year in range(2006, 2016):
    inference_path = inference_dir+str(year)
    save_dir = save_top_dir+str(year)+'/masks'
    cgnet_load_create_masks(model_path, inference_path, save_dir, analyze=False, visualize=False)

100%|██████████| 183/183 [08:02<00:00,  2.64s/it]
  0%|          | 0/183 [00:00<?, ?it/s]

Saved class masks to /glade/campaign/cgd/ccr/kdagon/cgnet/BRCP26C5CN/2006/masks


100%|██████████| 183/183 [08:56<00:00,  2.93s/it]
  0%|          | 0/183 [00:00<?, ?it/s]

Saved class masks to /glade/campaign/cgd/ccr/kdagon/cgnet/BRCP26C5CN/2007/masks


100%|██████████| 183/183 [09:00<00:00,  2.95s/it]
  0%|          | 0/183 [00:00<?, ?it/s]

Saved class masks to /glade/campaign/cgd/ccr/kdagon/cgnet/BRCP26C5CN/2008/masks


100%|██████████| 183/183 [08:57<00:00,  2.94s/it]
  0%|          | 0/183 [00:00<?, ?it/s]

Saved class masks to /glade/campaign/cgd/ccr/kdagon/cgnet/BRCP26C5CN/2009/masks


100%|██████████| 183/183 [09:05<00:00,  2.98s/it]
  0%|          | 0/183 [00:00<?, ?it/s]

Saved class masks to /glade/campaign/cgd/ccr/kdagon/cgnet/BRCP26C5CN/2010/masks


100%|██████████| 183/183 [09:12<00:00,  3.02s/it]
  0%|          | 0/183 [00:00<?, ?it/s]

Saved class masks to /glade/campaign/cgd/ccr/kdagon/cgnet/BRCP26C5CN/2011/masks


100%|██████████| 183/183 [09:00<00:00,  2.95s/it]


Saved class masks to /glade/campaign/cgd/ccr/kdagon/cgnet/BRCP26C5CN/2012/masks


100%|██████████| 183/183 [09:06<00:00,  2.98s/it]
  0%|          | 0/183 [00:00<?, ?it/s]

Saved class masks to /glade/campaign/cgd/ccr/kdagon/cgnet/BRCP26C5CN/2013/masks


100%|██████████| 183/183 [08:55<00:00,  2.93s/it]
  0%|          | 0/183 [00:00<?, ?it/s]

Saved class masks to /glade/campaign/cgd/ccr/kdagon/cgnet/BRCP26C5CN/2014/masks


100%|██████████| 183/183 [09:00<00:00,  2.95s/it]


Saved class masks to /glade/campaign/cgd/ccr/kdagon/cgnet/BRCP26C5CN/2015/masks
CPU times: user 26min 11s, sys: 10min 8s, total: 36min 19s
Wall time: 1h 31min 29s


### CESM RCP8.5 output, 2086-2100
2/27/23 new GPU job: Each year takes ~5 min to run

In [4]:
%%time
model_path = '/glade/work/kdagon/ML-extremes/trained_models/trained_cgnet.021523'
inference_dir = '/glade/campaign/cgd/ccr/kdagon/cgnet/BRCP85C5CN/'

for year in range(2086, 2101):
    inference_path = inference_dir+str(year)
    save_dir = inference_path+'/masks'
    cgnet_load_create_masks(model_path, inference_path, save_dir, analyze=False, visualize=False)

100%|██████████| 183/183 [05:12<00:00,  1.71s/it]
  0%|          | 0/183 [00:00<?, ?it/s]

Saved class masks to /glade/campaign/cgd/ccr/kdagon/cgnet/BRCP85C5CN/2086/masks


100%|██████████| 183/183 [05:09<00:00,  1.69s/it]
  0%|          | 0/183 [00:00<?, ?it/s]

Saved class masks to /glade/campaign/cgd/ccr/kdagon/cgnet/BRCP85C5CN/2087/masks


100%|██████████| 183/183 [05:20<00:00,  1.75s/it]
  0%|          | 0/183 [00:00<?, ?it/s]

Saved class masks to /glade/campaign/cgd/ccr/kdagon/cgnet/BRCP85C5CN/2088/masks


100%|██████████| 183/183 [05:29<00:00,  1.80s/it]
  0%|          | 0/183 [00:00<?, ?it/s]

Saved class masks to /glade/campaign/cgd/ccr/kdagon/cgnet/BRCP85C5CN/2089/masks





KeyError: 'U850'

2090 is missing U850/V850; come back to that year

In [5]:
%%time
model_path = '/glade/work/kdagon/ML-extremes/trained_models/trained_cgnet.021523'
inference_dir = '/glade/campaign/cgd/ccr/kdagon/cgnet/BRCP85C5CN/'

for year in range(2091, 2101):
    inference_path = inference_dir+str(year)
    save_dir = inference_path+'/masks'
    cgnet_load_create_masks(model_path, inference_path, save_dir, analyze=False, visualize=False)

100%|██████████| 183/183 [05:15<00:00,  1.72s/it]
  0%|          | 0/183 [00:00<?, ?it/s]

Saved class masks to /glade/campaign/cgd/ccr/kdagon/cgnet/BRCP85C5CN/2091/masks


100%|██████████| 183/183 [05:16<00:00,  1.73s/it]
  0%|          | 0/183 [00:00<?, ?it/s]

Saved class masks to /glade/campaign/cgd/ccr/kdagon/cgnet/BRCP85C5CN/2092/masks


100%|██████████| 183/183 [05:09<00:00,  1.69s/it]
  0%|          | 0/183 [00:00<?, ?it/s]

Saved class masks to /glade/campaign/cgd/ccr/kdagon/cgnet/BRCP85C5CN/2093/masks


100%|██████████| 183/183 [05:13<00:00,  1.71s/it]
  0%|          | 0/183 [00:00<?, ?it/s]

Saved class masks to /glade/campaign/cgd/ccr/kdagon/cgnet/BRCP85C5CN/2094/masks


100%|██████████| 183/183 [05:07<00:00,  1.68s/it]
  0%|          | 0/183 [00:00<?, ?it/s]

Saved class masks to /glade/campaign/cgd/ccr/kdagon/cgnet/BRCP85C5CN/2095/masks


100%|██████████| 183/183 [05:09<00:00,  1.69s/it]
  0%|          | 0/183 [00:00<?, ?it/s]

Saved class masks to /glade/campaign/cgd/ccr/kdagon/cgnet/BRCP85C5CN/2096/masks


100%|██████████| 183/183 [05:21<00:00,  1.75s/it]
  0%|          | 0/183 [00:00<?, ?it/s]

Saved class masks to /glade/campaign/cgd/ccr/kdagon/cgnet/BRCP85C5CN/2097/masks


100%|██████████| 183/183 [05:32<00:00,  1.82s/it]
  0%|          | 0/183 [00:00<?, ?it/s]

Saved class masks to /glade/campaign/cgd/ccr/kdagon/cgnet/BRCP85C5CN/2098/masks


100%|██████████| 183/183 [05:26<00:00,  1.78s/it]
  0%|          | 0/183 [00:00<?, ?it/s]

Saved class masks to /glade/campaign/cgd/ccr/kdagon/cgnet/BRCP85C5CN/2099/masks


100%|██████████| 183/183 [05:26<00:00,  1.78s/it]


Saved class masks to /glade/campaign/cgd/ccr/kdagon/cgnet/BRCP85C5CN/2100/masks
CPU times: user 24min 34s, sys: 9min 25s, total: 33min 59s
Wall time: 55min 4s


3/6/23: 2090 with fixed files

In [4]:
%%time
model_path = '/glade/work/kdagon/ML-extremes/trained_models/trained_cgnet.021523'
inference_dir = '/glade/campaign/cgd/ccr/kdagon/cgnet/BRCP85C5CN/'

year = 2090
inference_path = inference_dir+str(year)
save_dir = inference_path+'/masks'
cgnet_load_create_masks(model_path, inference_path, save_dir, analyze=False, visualize=False)

100%|██████████| 183/183 [05:22<00:00,  1.76s/it]


Saved class masks to /glade/campaign/cgd/ccr/kdagon/cgnet/BRCP85C5CN/2090/masks
CPU times: user 2min 34s, sys: 1min, total: 3min 34s
Wall time: 6min 47s


### CESM RCP8.5 output, 2086-2100, with modified config file
Means/std values taken from RCP8.5 data\
3/17/23: Each year takes ~5 min to run\
Job timed out but made it through 2089 with new config file

In [None]:
%%time
model_path = '/glade/work/kdagon/ML-extremes/trained_models/trained_cgnet.021523_rcp85'
inference_dir = '/glade/campaign/cgd/ccr/kdagon/cgnet/BRCP85C5CN/'

for year in range(2086, 2101):
    inference_path = inference_dir+str(year)
    save_dir = inference_path+'/masks_rcp85config'
    cgnet_load_create_masks(model_path, inference_path, save_dir, analyze=False, visualize=False)

100%|██████████| 183/183 [05:14<00:00,  1.72s/it]
  0%|          | 0/183 [00:00<?, ?it/s]

Saved class masks to /glade/campaign/cgd/ccr/kdagon/cgnet/BRCP85C5CN/2086/masks_rcp85config


100%|██████████| 183/183 [05:05<00:00,  1.67s/it]
  0%|          | 0/183 [00:00<?, ?it/s]

Saved class masks to /glade/campaign/cgd/ccr/kdagon/cgnet/BRCP85C5CN/2087/masks_rcp85config


100%|██████████| 183/183 [05:07<00:00,  1.68s/it]
  0%|          | 0/183 [00:00<?, ?it/s]

Saved class masks to /glade/campaign/cgd/ccr/kdagon/cgnet/BRCP85C5CN/2088/masks_rcp85config


100%|██████████| 183/183 [05:04<00:00,  1.66s/it]
  0%|          | 0/183 [00:00<?, ?it/s]

Saved class masks to /glade/campaign/cgd/ccr/kdagon/cgnet/BRCP85C5CN/2089/masks_rcp85config


 58%|█████▊    | 107/183 [03:03<02:22,  1.87s/it]