# Reprocess Eddy & Ring Data

This notebook is for reprocessing the Northwewst Atlantic eddy tracks derived from the Chelton Tracks

The tracks are stored as an xarray DataSet, which is a collection of DataArrays.

This makes it easy to reprocess the DataFrames of eddy tracks, ring tracks, and ring counts used in data analysis in the rings_working_notebook. Data processing functions are saved in utils.eddy_data_utils, but need to be run line-by-line (since the eddy tracks depends on the Chelton tracks, ring tracks depend on eddy tracks, and ring counts depend on ring tracks). So when the criteria for what an eddy in the Northwest Atlantic is (if the region bounds change) or the criteria for what a ring is changes, then it is easy to re-run this notebook and reprocess all the DataFrames and save them to the data folder.

## Import Functions:

In [1]:
%%time
# adds howupper level to working directory, this is where the utils folder is saved
import sys
sys.path.append("..")

# import the util functions
from utils.eddy_plot_utils import *
from utils.eddy_data_utils import *

CPU times: user 1.49 s, sys: 232 ms, total: 1.72 s
Wall time: 2.39 s


## Purpose:
* These cells reprocesses all the DataFrames for eddies, warm core rings, and cold core rings. 
* Useful if the criteria for what constitutes an eddy, WCR, or CCR ring changes and you need to reprocess the DataFrames based on the new criteria.

In [2]:
%%time
# OPEN nwa_eddies so we can process & filter out eddies
nwa_eddies = pd.read_pickle('/Users/elenaperez/Desktop/chatts/data/pd_dataframes/nwa_eddies.pkl')

gs = loadmat('/Users/elenaperez/Desktop/chatts/data/gs/GS_daily_CMEMS_047_50cm_contours_1993_to_nrt.mat')
for d in range(len(gs['time'][0])-1):
    gs['time'][0][d] = gs['time'][0][d]+date.toordinal(date(1950,1,1))

CPU times: user 282 ms, sys: 38.5 ms, total: 320 ms
Wall time: 329 ms


In [27]:
# %%time
# # first, convert nwa_eddies into nwa_wcr_df and nwa_ccr_df which is all of the wcrs/ccrs in nwa 
# eddy_df_to_ring_df(nwa_eddies)

# # When called, eddy_df_to_ring_df takes DataFrame of all eddies in the Northwest Atlantic and determines
# # which eddies are WCRs and CCRs and create a DataFrame for each, respectively nwa_wcr_df and nwa_ccr_df. 
# # The new DataFrames are saved as pickles in the pd_dataframes folder:
# # path_to_data_folder = '/Users/elenaperez/Desktop/chatts/data/pd_dataframes/'

# # Also, reprocess the Cape Hatteras dataFrame
# make_eddy_ch_df(nwa_eddies, False, 'ch_eddies')

In [4]:
%%time
# OPEN nwa_ccr_df and nwa_wcr_df so they can be processed
# use nwa_wcr_df, nwa_ccr_df for all west/east features, nwa_wcr_day_df, nwa_ccr_day_df for westward features,
# and nwa_east_wcr_day_df, nwa_east_ccr_day_df for westward features
nwa_wcr_day_df = pd.read_pickle('/Users/elenaperez/Desktop/chatts/data/pd_dataframes/nwa_wcr_day_df.pkl')
nwa_ccr_day_df = pd.read_pickle('/Users/elenaperez/Desktop/chatts/data/pd_dataframes/nwa_ccr_day_df.pkl')

# Next, we need to open the WCR and CCR DataFrames in order to call them in the next cell.

CPU times: user 2.91 ms, sys: 6.75 ms, total: 9.66 ms
Wall time: 11.2 ms


In [5]:
%%time

# second, split eddies, wcrs, and ccrs into zone-specific dataframes 
zone_df_names = {'zone_eddies':0, 'zone1_eddies':1, 'zone2_eddies':2, 'zone3_eddies':3, 'zone4_eddies':4,
                'zone_wcrs':0, 'zone1_wcrs':1, 'zone2_wcrs':2, 'zone3_wcrs':3, 'zone4_wcrs':4,
                'zone_ccrs':0, 'zone1_ccrs':1, 'zone2_ccrs':2, 'zone3_ccrs':3, 'zone4_ccrs':4}

for name in zone_df_names:
    # CCR
    if name != None and 'ccr' in name:
        make_eddy_zone_df(nwa_ccr_day_df, False, zone_df_names[name], name)
    # WCR
    elif name != None and 'wcr' in name:
        make_eddy_zone_df(nwa_wcr_day_df, False, zone_df_names[name], name)
    # eddies
    elif name != None and 'eddies' in name:
        make_eddy_zone_df(nwa_eddies, False, zone_df_names[name], name)
        
# When called, make_eddy_zone_df takes a ring DataFrame (e.g. nwa_wcr_df or nwa_ccr_df) and splits the rings
# into the respective zones defined in Gangopadhyay et al., 2019. The latitude of all zones spans 30N to 45N.
# The zones are defined longitudinally with Zone 1 bounded from 75W to 70W, Zone 2 is 70W to 65W, 
# Zone 3 is 65W to 60W, and Zone 4 is 60W to 55W. Here I've added Zone "0" to span 75W to 55W. This is important
# because not all rings in the Northwest Atlantic are contained in Zones 1-4, but we only care about Zones 1-4
# since we are validating the ring census work of Gangopadhyay et al., 2019 and Silver et al., 2021.
# The zone-specific DataFrames are saved in the pd_dataframes folder:
# path_to_data_folder = '/Users/elenaperez/Desktop/chatts/data/pd_dataframes/'

CPU times: user 5min 2s, sys: 23.7 s, total: 5min 26s
Wall time: 2min 5s


In [28]:
%%time
## EASTWARD features ###
nwa_east_wcr_day_df = pd.read_pickle('/Users/elenaperez/Desktop/chatts/data/pd_dataframes/nwa_east_wcr_day_df.pkl')
nwa_east_ccr_day_df = pd.read_pickle('/Users/elenaperez/Desktop/chatts/data/pd_dataframes/nwa_east_ccr_day_df.pkl')

# second, split eddies, wcrs, and ccrs into zone-specific dataframes 
zone_df_names = {'zone_east_eddies':0, 'zone1_east_eddies':1, 'zone2_east_eddies':2, 'zone3_east_eddies':3, 'zone4_east_eddies':4,
                'zone_east_wcrs':0, 'zone1_east_wcrs':1, 'zone2_east_wcrs':2, 'zone3_east_wcrs':3, 'zone4_east_wcrs':4,
                'zone_east_ccrs':0, 'zone1_east_ccrs':1, 'zone2_east_ccrs':2, 'zone3_east_ccrs':3, 'zone4_east_ccrs':4}

for name in zone_df_names:
    # CCR
    if name != None and 'ccr' in name:
        make_eddy_zone_df(nwa_east_ccr_day_df, False, zone_df_names[name], name)
    # WCR
    elif name != None and 'wcr' in name:
        make_eddy_zone_df(nwa_east_wcr_day_df, False, zone_df_names[name], name)
    # eddies
    elif name != None and 'eddies' in name:
        make_eddy_zone_df(nwa_eddies, False, zone_df_names[name], name)
        
# When called, make_eddy_zone_df takes a ring DataFrame (e.g. nwa_wcr_df or nwa_ccr_df) and splits the rings
# into the respective zones defined in Gangopadhyay et al., 2019. The latitude of all zones spans 30N to 45N.
# The zones are defined longitudinally with Zone 1 bounded from 75W to 70W, Zone 2 is 70W to 65W, 
# Zone 3 is 65W to 60W, and Zone 4 is 60W to 55W. Here I've added Zone "0" to span 75W to 55W. This is important
# because not all rings in the Northwest Atlantic are contained in Zones 1-4, but we only care about Zones 1-4
# since we are validating the ring census work of Gangopadhyay et al., 2019 and Silver et al., 2021.
# The zone-specific DataFrames are saved in the pd_dataframes folder:
# path_to_data_folder = '/Users/elenaperez/Desktop/chatts/data/pd_dataframes/'

CPU times: user 4min 57s, sys: 24.1 s, total: 5min 21s
Wall time: 2min 2s


In [18]:
%%time

# OPEN zone-specific dataframes so they can be counted up and made into a formations count dataframe
# open *eddy* dfs for Gangopadhyay et al., 2019 zones
zone_eddies = pd.read_pickle('/Users/elenaperez/Desktop/chatts/data/pd_dataframes/zone_eddies.pkl') # all zones
zone1_eddies = pd.read_pickle('/Users/elenaperez/Desktop/chatts/data/pd_dataframes/zone1_eddies.pkl') 
zone2_eddies = pd.read_pickle('/Users/elenaperez/Desktop/chatts/data/pd_dataframes/zone2_eddies.pkl') 
zone3_eddies = pd.read_pickle('/Users/elenaperez/Desktop/chatts/data/pd_dataframes/zone3_eddies.pkl')
zone4_eddies = pd.read_pickle('/Users/elenaperez/Desktop/chatts/data/pd_dataframes/zone4_eddies.pkl') 

# open *CYCLONIC* eddy df for Gangopadhyay et al., 2019 zones
zone_ceddies = pd.read_pickle('/Users/elenaperez/Desktop/chatts/data/pd_dataframes/zone_ceddies.pkl') # all zones
zone1_ceddies = pd.read_pickle('/Users/elenaperez/Desktop/chatts/data/pd_dataframes/zone1_ceddies.pkl') 
zone2_ceddies = pd.read_pickle('/Users/elenaperez/Desktop/chatts/data/pd_dataframes/zone2_ceddies.pkl') 
zone3_ceddies = pd.read_pickle('/Users/elenaperez/Desktop/chatts/data/pd_dataframes/zone3_ceddies.pkl')
zone4_ceddies = pd.read_pickle('/Users/elenaperez/Desktop/chatts/data/pd_dataframes/zone4_ceddies.pkl') 

# open *ANTI-CYCLONIC* eddy df for Gangopadhyay et al., 2019 zones
zone_aeddies = pd.read_pickle('/Users/elenaperez/Desktop/chatts/data/pd_dataframes/zone_aeddies.pkl') # all zones
zone1_aeddies = pd.read_pickle('/Users/elenaperez/Desktop/chatts/data/pd_dataframes/zone1_aeddies.pkl') 
zone2_aeddies = pd.read_pickle('/Users/elenaperez/Desktop/chatts/data/pd_dataframes/zone2_aeddies.pkl') 
zone3_aeddies = pd.read_pickle('/Users/elenaperez/Desktop/chatts/data/pd_dataframes/zone3_aeddies.pkl')
zone4_aeddies = pd.read_pickle('/Users/elenaperez/Desktop/chatts/data/pd_dataframes/zone4_aeddies.pkl') 

# open *CCR* dfs for Gangopadhyay et al., 2019 zones
zone_ccrs = pd.read_pickle('/Users/elenaperez/Desktop/chatts/data/pd_dataframes/zone_ccrs.pkl') # all zones
zone1_ccrs = pd.read_pickle('/Users/elenaperez/Desktop/chatts/data/pd_dataframes/zone1_ccrs.pkl') 
zone2_ccrs = pd.read_pickle('/Users/elenaperez/Desktop/chatts/data/pd_dataframes/zone2_ccrs.pkl') 
zone3_ccrs = pd.read_pickle('/Users/elenaperez/Desktop/chatts/data/pd_dataframes/zone3_ccrs.pkl') 
zone4_ccrs = pd.read_pickle('/Users/elenaperez/Desktop/chatts/data/pd_dataframes/zone4_ccrs.pkl') 

# open *WCR* dfs for Gangopadhyay et al., 2019 zones
zone_wcrs = pd.read_pickle('/Users/elenaperez/Desktop/chatts/data/pd_dataframes/zone_wcrs.pkl') # all zones
zone1_wcrs = pd.read_pickle('/Users/elenaperez/Desktop/chatts/data/pd_dataframes/zone1_wcrs.pkl') 
zone2_wcrs = pd.read_pickle('/Users/elenaperez/Desktop/chatts/data/pd_dataframes/zone2_wcrs.pkl') 
zone3_wcrs = pd.read_pickle('/Users/elenaperez/Desktop/chatts/data/pd_dataframes/zone3_wcrs.pkl') 
zone4_wcrs = pd.read_pickle('/Users/elenaperez/Desktop/chatts/data/pd_dataframes/zone4_wcrs.pkl') 

# Now, we must open the new zone-specific eddy and ring DataFrames in order to create the new 
# eddy and ring count DataFrames in the next cell.

CPU times: user 10.8 ms, sys: 43.9 ms, total: 54.7 ms
Wall time: 92.4 ms


In [19]:
%%time

# third, recreate dataframes of the formation counts for zone-specific wcrs/ccrs
counts_df_names = {'zone_wcr_annual_formations':zone_wcrs, 'zone1_wcr_annual_formations':zone1_wcrs, 'zone2_wcr_annual_formations':zone2_wcrs, 'zone3_wcr_annual_formations':zone3_wcrs, 'zone4_wcr_annual_formations':zone4_wcrs,
                    'zone_ccr_annual_formations':zone_ccrs, 'zone1_ccr_annual_formations':zone1_ccrs, 'zone2_ccr_annual_formations':zone2_ccrs, 'zone3_ccr_annual_formations':zone3_ccrs, 'zone4_ccr_annual_formations':zone4_ccrs}

for name in counts_df_names:
    # CCR
    if name != None and 'ccr' in name:
        count_annual_ring_formations(counts_df_names[name],'ccr',name)
    # WCR
    elif name != None and 'wcr' in name:
        count_annual_ring_formations(counts_df_names[name],'wcr',name)
        
# When called, count_annual_ring_formations takes the zone-specific ring DataFrame and tallies the number of 
# formations by year and saves the annual formation DataFrame to the pd_dataframes folder :
# path_to_data_folder = '/Users/elenaperez/Desktop/chatts/data/pd_dataframes/'

CPU times: user 185 ms, sys: 17.3 ms, total: 202 ms
Wall time: 189 ms


In [20]:
%%time

# fourth, create monthly formations and average monthly formation counts dataframes for zone rings
counts_df_names = {'zone_wcr_monthly_formations':zone_wcrs, 'zone1_wcr_monthly_formations':zone1_wcrs, 'zone2_wcr_monthly_formations':zone2_wcrs, 'zone3_wcr_monthly_formations':zone3_wcrs, 'zone4_wcr_monthly_formations':zone4_wcrs,
                    'zone_ccr_monthly_formations':zone_ccrs, 'zone1_ccr_monthly_formations':zone1_ccrs, 'zone2_ccr_monthly_formations':zone2_ccrs, 'zone3_ccr_monthly_formations':zone3_ccrs, 'zone4_ccr_monthly_formations':zone4_ccrs}

for name in counts_df_names:
    # CCR
    if name != None and 'ccr' in name:
        count_monthly_ring_formations(counts_df_names[name],'ccr',name)
    # WCR
    elif name != None and 'wcr' in name:
        count_monthly_ring_formations(counts_df_names[name],'wcr',name)


CPU times: user 174 ms, sys: 13.2 ms, total: 187 ms
Wall time: 176 ms


In [21]:
%%time

# repeat, but for monthly EDDY formations
counts_df_names = {'zone_aeddy_monthly_formations':zone_aeddies, 'zone1_aeddy_monthly_formations':zone1_aeddies, 'zone2_aeddy_monthly_formations':zone2_aeddies, 'zone3_aeddy_monthly_formations':zone3_aeddies, 'zone4_aeddy_monthly_formations':zone4_aeddies,
                    'zone_ceddy_monthly_formations':zone_ceddies, 'zone1_ceddy_monthly_formations':zone1_ceddies, 'zone2_ceddy_monthly_formations':zone2_ceddies, 'zone3_ceddy_monthly_formations':zone3_ceddies, 'zone4_ceddy_monthly_formations':zone4_ceddies}

for name in counts_df_names:
    # CYCLONIC
    if name != None and 'ceddy' in name:
        count_monthly_eddy_formations(counts_df_names[name],'cyclonic',name)
    # ANTICYCLONIC
    elif name != None and 'aeddy' in name:
        count_monthly_eddy_formations(counts_df_names[name],'anticyclonic',name)


CPU times: user 518 ms, sys: 71.8 ms, total: 590 ms
Wall time: 508 ms


In [22]:
%%time

# fifth, create annual formations and for zone rings
counts_df_names = {'zone_wcr_annual_formations':zone_wcrs, 'zone1_wcr_annual_formations':zone1_wcrs, 'zone2_wcr_annual_formations':zone2_wcrs, 'zone3_wcr_annual_formations':zone3_wcrs, 'zone4_wcr_annual_formations':zone4_wcrs,
                    'zone_ccr_annual_formations':zone_ccrs, 'zone1_ccr_annual_formations':zone1_ccrs, 'zone2_ccr_annual_formations':zone2_ccrs, 'zone3_ccr_annual_formations':zone3_ccrs, 'zone4_ccr_annual_formations':zone4_ccrs}

for name in counts_df_names:
    # CCR
    if name != None and 'ccr' in name:
        count_annual_ring_formations(counts_df_names[name],'ccr',name)
    # WCR
    elif name != None and 'wcr' in name:
        count_annual_ring_formations(counts_df_names[name],'wcr',name)


CPU times: user 181 ms, sys: 14.7 ms, total: 196 ms
Wall time: 183 ms


In [23]:
%%time

# repeat, but for EDDY formations
counts_df_names = {'zone_aeddy_annual_formations':zone_aeddies, 'zone1_aeddy_annual_formations':zone1_aeddies, 'zone2_aeddy_annual_formations':zone2_aeddies, 'zone3_aeddy_annual_formations':zone3_aeddies, 'zone4_aeddy_annual_formations':zone4_aeddies,
                    'zone_ceddy_annual_formations':zone_ceddies, 'zone1_ceddy_annual_formations':zone1_ceddies, 'zone2_ceddy_annual_formations':zone2_ceddies, 'zone3_ceddy_annual_formations':zone3_ceddies, 'zone4_ceddy_annual_formations':zone4_ceddies}

for name in counts_df_names:
    # CYCLONIC
    if name != None and 'ceddy' in name:
        count_annual_eddy_formations(counts_df_names[name],'cyclonic',name)
    # ANTICYCLONIC
    elif name != None and 'aeddy' in name:
        count_annual_eddy_formations(counts_df_names[name],'anticyclonic',name)

CPU times: user 591 ms, sys: 83.4 ms, total: 675 ms
Wall time: 579 ms


In [24]:
%%time

# sixth, recreate dataframes of the all months and years formation counts for zone-specific wcrs/ccrs
counts_df_names = {'zone_wcr_all_formations':zone_wcrs, 'zone1_wcr_all_formations':zone1_wcrs, 'zone2_wcr_all_formations':zone2_wcrs, 'zone3_wcr_all_formations':zone3_wcrs, 'zone4_wcr_all_formations':zone4_wcrs,
                    'zone_ccr_all_formations':zone_ccrs, 'zone1_ccr_all_formations':zone1_ccrs, 'zone2_ccr_all_formations':zone2_ccrs, 'zone3_ccr_all_formations':zone3_ccrs, 'zone4_ccr_all_formations':zone4_ccrs}

for name in counts_df_names:
    # CCR
    if name != None and 'ccr' in name:
        count_all_ring_formations(counts_df_names[name],'ccr',name)
    # WCR
    elif name != None and 'wcr' in name:
        count_all_ring_formations(counts_df_names[name],'wcr',name)

CPU times: user 3.02 s, sys: 183 ms, total: 3.2 s
Wall time: 3.06 s


In [25]:
%%time

# repeat, but for EDDY counts
counts_df_names = {'zone_aeddy_all_formations':zone_aeddies, 'zone1_aeddy_all_formations':zone1_aeddies, 'zone2_aeddy_all_formations':zone2_aeddies, 'zone3_aeddy_all_formations':zone3_aeddies, 'zone4_aeddy_all_formations':zone4_aeddies,
                    'zone_ceddy_all_formations':zone_ceddies, 'zone1_ceddy_all_formations':zone1_ceddies, 'zone2_ceddy_all_formations':zone2_ceddies, 'zone3_ceddy_all_formations':zone3_ceddies, 'zone4_ceddy_all_formations':zone4_ceddies}

for name in counts_df_names:
    # CYCLONIC
    if name != None and 'ceddy' in name:
        count_all_eddy_formations(counts_df_names[name],'cyclonic',name)
    # ANTICYCLONIC
    elif name != None and 'aeddy' in name:
        count_all_eddy_formations(counts_df_names[name],'anticyclonic',name)

CPU times: user 10.1 s, sys: 1.3 s, total: 11.4 s
Wall time: 9.96 s


In [26]:
%%time

# finally, merge the zone-specific counts into one DataFrame for easy plotting
merge_monthly_ring_counts()
merge_annual_ring_counts()
merge_all_ring_counts()

# repeat, but for EDDY counts
merge_monthly_eddy_counts()
merge_annual_eddy_counts()
merge_all_eddy_counts()

# When called, these functions takes the zone-specific formations counts and merges them into one DataFrame.
# The new formation count DataFrame is saved as zone_wcr_formations, zone_ccr_formations, and zone_eddy_formations

CPU times: user 27 ms, sys: 3.19 ms, total: 30.2 ms
Wall time: 32.8 ms
