This notebook imports and refomats environmental data for timeseries stats analysis.
Meredith L. McPherson, University of California - Santa Cruz
Updated 1/25/2021
***********************************************************************************************************************************

PDO INDEX

If the columns of the table appear without formatting on your browser, use https://oceanview.pfeg.noaa.gov/erddap/tabledap/cciea_OC_PDO.htmlTable?time,PDO

Updated standardized values for the PDO index, derived as the 
leading PC of monthly SST anomalies in the North Pacific Ocean, 
poleward of 20N. The monthly mean global average SST anomalies
are removed to separate this pattern of variability from any 
"global warming" signal that may be present in the data. 


For more details, see:

 Zhang, Y., J.M. Wallace, D.S. Battisti, 1997: 
     ENSO-like interdecadal variability: 1900-93. J. Climate, 10, 1004-1020. 

 Mantua, N.J. and S.R. Hare, Y. Zhang, J.M. Wallace, and R.C. Francis,1997: 
     A Pacific interdecadal climate oscillation with impacts on salmon 
     production. Bulletin of the American Meteorological Society, 78, 
     pp. 1069-1079.


Data sources for this index are: 
 UKMO Historical SST data set for 1900-81; 
 Reynold's Optimally Interpolated SST (V1) for January 1982-Dec 2001)
*** OI SST Version 2 (V2) beginning January 2002 -  

** 2002-2018 Derived from OI.v2 SST fields
A graphic comparing monthly PDO values for 1982-2002 derived from the v1 and v2 
sst products is available at 
http://jisao.washington.edu/pdo/img/v1v2PDOComp.png

If you have any questions about this time series, contact
Nathan Mantua at: nate.mantua@noaa.gov

This file is /home/disk/margaret/jisao/pdo/PDO.latest.txt 

***********************************************************************************************************************************

NPGO index

WARNING: Values after Dec-2004 are updated  
using Satellite SSHa from AVISO Delayed Time product.   
http://www.o3d.org/npgo/npgo.php

The update is performed by taking the NPGO spatial pattern of Di Lorenzo et al. 2008 
computed over the period 1950-2004, and projecting the AVISO Satellite SSHa. 
During the pre-processing of the AVISO data, we remove the seasonal cycle based on 
the 1993-2004 seasonal means. 
 
AVISO PRODUCT UPDATE Summer 2014: AVISO has released a re-processed dataset for the sea level. 
Starting from the November 2014, the NPGO index is computed with this updated dataset. NPGO 
values from 2004 onward have been recomputed with very minor differences from previous releases. 

Ref: 
Di Lorenzo et al., 2008: North Pacific Gyre Oscillation  
links ocean climate and ecosystem change, GRL. 

***********************************************************************************************************************************

MEI Index
https://psl.noaa.gov/enso/mei/data/meiv2.data

A new version of the MEI (MEI.v2) has been created that uses 5 variables (sea level pressure (SLP), sea surface temperature (SST), surface zonal winds (U), surface meridional winds (V), and Outgoing Longwave Radiation (OLR)) to produce a time series of ENSO conditions from 1979 to present. The MEI.v2 expands upon the original MEI developed by Wolter and Timlin (1993) which was calculated using 6 variables as proxies for ENSO relevant atmosphere and ocean conditions.
In MEI.v2, the fields of SST, SLP, and surface zonal and meridional winds are obtained from the high-quality JRA-55 global reanalysis (Kobayashi et al. 2015). In contrast, the original MEI (Wolter and Timlin, 1993) used marine ship observations based on the International Comprehensive Ocean Atmosphere Data Set (ICOADS) and used near-surface air temperature as well as SST. The MEI.v2 also uses observations of OLR from NOAA Climate Data Record (CDR) of Monthly Outgoing Longwave Radiation (OLR), Version 2.2-1 (available from NOAA National Centers for Environmental Information (NCEI)); whereas the original MEI used ICOADS cloud cover fraction data. To produce the MEI.v2, all variables are interpolated to a common 2.5° latitude-longitude grid and standardized anomalies are computed with respect to the reference period of 1980-2018. As with the original version of the MEI (Wolter and Timlin, 2011), the MEI.v2 is calculated as the leading principal component (PC) time series of the Empirical Orthogonal Function (EOF) of the standardized anomalies of the above 5 combined variables over the tropical Pacific during 1980-2018. The EOF analysis is based on the covariance matrix and the analysis domain is the same as for the original MEI (30°S-30°N and 100°E-70°W, excluding the Atlantic Ocean and the land regions). A latitudinal weighting prior to the EOF analysis is applied.
The EOF analysis for MEI.v2 is conducted for 12 partially overlapping 2-month "seasons" (e.g., Wolter and Timlin, 1993). To obtain MEI.v2 values before 1980 and after 2018, standardized anomalies maps relative to the 1980-2018 reference period are projected onto the leading EOF pattern. Because the OLR record starts in January 1979, the DJ 1979 MEI.v2 value is based on January 1979 OLR data only.

************************************************************************************************************************************
AVHRR SST
https://www.ncei.noaa.gov/erddap/griddap/ncdc_oisst_v2_avhrr_by_time_zlev_lat_lon.html
Dataset Title: 	OISST-V2-AVHRR Daily 1/4 degree By time, depth, latitude, longitude   RSS
Institution: 	NOAA/NCEI   (Dataset ID: ncdc_oisst_v2_avhrr_by_time_zlev_lat_lon)

37.875,41.125
302,305.875

netcdf file format

In [1]:
# ----- import python modules
import numpy as np
import pandas as pd
import os as os
import glob
import matplotlib.pyplot as plt
import scipy as sp
import scipy.stats as stats
import scipy.io as sio
#from sklearn.linear_model import LinearRegression
from scipy.signal import find_peaks
import datetime
import xlrd
from netCDF4 import Dataset
#from mpl_toolkits.basemap import Basemap

# current date and time  
now = datetime.datetime.now().strftime('%Y%m%d')


In [9]:
# ----- PDO 

# text file format columnwise: YEAR, JAN, FEB, MAR, APR, MAY, JUN, JUL, AUG, SEP, OCT, NOV, DEC
var = 'PDO'
filename = 'PDO_INDEX.txt'
PDO = np.loadtxt(filename, skiprows = 41)
PDO_max = np.zeros((len(PDO),2))
PDO_clim = np.zeros((PDO.shape[1]-1,1))
months = np.arange(1,13,1).reshape(12,1)

# calculate climatology of PDO 

for w in range(PDO.shape[1]-1):
    PDO_clim[w,0] = np.nanmean(PDO[:,w+1])

# calculate the max PDO value from year

for i,val in enumerate(PDO):
    PDO_max[i,0] = int(val[0])
    PDO_remclim = val[1:].reshape(12,1) - PDO_clim
    max = np.nanmax(PDO_remclim)
    PDO_max[i,1] = max
    min = np.nanmin(PDO_remclim)
    if np.absolute(min) > max:    
        PDO_max[i,1] = min
    else:
        PDO_max[i,1] = max

PDO_LS = np.array((PDO_max[np.where(PDO_max[:,0]==1985.)[0][0]:,0],PDO_max[np.where(PDO_max[:,0]==1985.)[0][0]:,1])).T
PDO_stand = np.array((PDO_LS[:,0],sp.stats.zscore(PDO_LS[:,1]))).T

save_filename = ''
np.savetxt(save_filename,
          PDO_stand,fmt = '%i %0.5f',
           header = f'{var} - climatology removed and index standardized {now} \n Year; index value')


In [38]:
# ----- NPGO

# text file format columnwise: YEAR, MONTH, NPGO index 
var = 'NPGO'
filename = 'NPGO_index.txt'
NPGO = np.loadtxt(filename, skiprows = 26)
NPGO_max = np.zeros((int((len(NPGO)/12)+1),2))
NPGO_clim = np.zeros((12,1))
NPGO_remclim = np.zeros((int(len(NPGO)),1))
months = np.arange(1,13,1).reshape(12,1)

# calculate climatology of NPGO

for m, mon in enumerate(months):
    index = np.where(NPGO[:,1]==mon)
    NPGO_clim[m,0] = np.nanmean(NPGO[index,2])
    
for m,mon in enumerate(months):
    mindex = np.where(NPGO[:,1]==mon)
    NPGO_remclim[mindex,0] = NPGO[mindex,2] - NPGO_clim[m]

#calculate the max NPGO value from year

for k, year in enumerate(np.arange(np.min(NPGO[:,0]),np.max(NPGO[:,0])+1,1)):
    #print(year)
    NPGO_max[k,0] = year
    yindex = np.where(NPGO[:,0]==year) 
    max = np.max(NPGO_remclim[yindex[0],0])
    min = np.min(NPGO_remclim[yindex[0],0])
    if np.absolute(min) > max:    
        NPGO_max[k,1] = min
    else:
        NPGO_max[k,1] = max
        
NPGO_LS = np.array((NPGO_max[np.where(NPGO_max[:,0]==1985.)[0][0]:,0],NPGO_max[np.where(NPGO_max[:,0]==1985.)[0][0]:,1])).T
NPGO_stand = np.array((NPGO_LS[:,0],sp.stats.zscore(NPGO_LS[:,1]))).T

save_filename = ''
np.savetxt(save_filename.
           format(var),
          NPGO_stand,fmt = '%i %0.5f',
           header = '{0} - climatology removed and index standardized {1} \n Year; index value'.
           format(var,now))

In [11]:
# ----- MEI

# text file format columnwise: YEAR, DECJAN, JANFEB, FEBMAR, MARAPR, APRMAY, MAYJUN, JUNJUL, JULAUG, AUGSEP, SEPOCT, OCTNOV, NOVDEC
var = 'MEI'
filename = 'MEI_Index_v2'
MEI = np.loadtxt(filename, skiprows = 9)
MEI_max = np.zeros((len(MEI),2))
MEI_clim = np.zeros((MEI.shape[1]-1,1))
months = np.arange(1,13,1).reshape(12,1)

# calculate climatology of MEI

for w in range(MEI.shape[1]-1):
    MEI_clim[w,0] = np.nanmean(MEI[:,w+1])

#calculate the max MEI value from year

for i,val in enumerate(MEI):
    MEI_max[i,0] = int(val[0])
    MEI_remclim = val[1:].reshape(12,1) - MEI_clim
    max = np.nanmax(MEI_remclim)
    min = np.nanmin(MEI_remclim)
    if np.absolute(min) > max:    
        MEI_max[i,1] = min
    else:
        MEI_max[i,1] = max
    
MEI_LS = np.array((MEI_max[np.where(MEI_max[:,0]==1985.)[0][0]:,0],MEI_max[np.where(MEI_max[:,0]==1985.)[0][0]:,1])).T
MEI_stand = np.array((MEI_LS[:,0],sp.stats.zscore(MEI_LS[:,1]))).T

np.savetxt('/Volumes/Mere/Python_MESMA/Statistics/Environmental_data/Standardized_files/{0}_stand.txt'.
           format(var),
          MEI_stand,fmt = '%i %0.5f',
           header = '{0} - climatology removed and index standardized {1} \n Year; index value'.
           format(var,now))

In [None]:
# ----- Process AVHRR SST data: daily timeseries remove climatology
var = 'SST_AVHRR'

# ----- import each netcdf file
filelist = glob.glob('AVHRR_SST/*.nc')
filelist.sort()

# ----- create empty array to fill with mean daily sst from region

sst_dailymean = pd.DataFrame(columns = ['Timestamp',
                                        'Date',
                                        'Year',
                                        'Month',
                                        'Day',
                                        'DOY',
                                        'SST',
                                       'SST-clim',
                                        'SST-clim standardized',
                                        'NO3'])

for file in filelist:
    #print(file)
    nc_f = file  
    nc_fid = Dataset(nc_f, 'r') 
    
    lons = nc_fid.variables['longitude'][:]
    lats = nc_fid.variables['latitude'][:]
    time = nc_fid.variables['time'][:]
    sst = nc_fid.variables['sst'][:]
    
    #print('lon range:', np.min(lons),'-',np.max(lons))
    #print('lon range:', np.min(lats),'-',np.max(lats))

    # nans are the landmask
    for d in range(sst.shape[0]): 
        sst_dailymean = sst_dailymean.append({'SST': np.nanmean(sst[d,:,:,:]),'Timestamp': time[d]}, ignore_index=True)
        
# start date = 1985-01-01
# end date = 2019-11-09
sst_dailymean = sst_dailymean.replace({0:np.nan})
sst_dailymean['Timestamp'] = sst_dailymean['Timestamp'].astype(int)
sst_dailymean['Date'] = pd.to_datetime(sst_dailymean['Timestamp'], unit = 's')
sst_dailymean['Year'] = pd.DatetimeIndex(sst_dailymean['Date']).year
sst_dailymean['Month'] = pd.DatetimeIndex(sst_dailymean['Date']).month
sst_dailymean['Day'] = pd.DatetimeIndex(sst_dailymean['Date']).day
sst_dailymean['DOY'] = pd.DatetimeIndex(sst_dailymean['Date']).dayofyear

# ----- calculate NO3 concentration based on daily sst
sst_dailymean.loc[sst_dailymean.SST > 13.1,'NO3'] = 0
sst_dailymean.loc[sst_dailymean.SST <= 13.1,'NO3'] = 86.2 - (sst_dailymean['SST']* 6.6)

sst_clim = pd.DataFrame()

sst_clim['SST-DOY'] = sst_dailymean.groupby(sst_dailymean['DOY'])['SST'].mean()
#sst_clim['SST-Month'] = sst_dailymean.groupby(sst_dailymean['Month'])['SST'].mean()
sst_clim['NO3-DOY'] = sst_dailymean.groupby(sst_dailymean['DOY'])['NO3'].mean()
#sst_clim['NO3-Month'] = sst_dailymean.groupby(sst_dailymean['Month'])['NO3'].mean()   

# ----- climatology removed in daily T data to get anomalies

for ind in sst_dailymean.index:
    c = sst_dailymean.loc[ind,'DOY']
    
    for i in sst_clim.index:        
        try:
            if i == c:
                sst_dailymean.loc[ind,'SST-clim'] = sst_dailymean.loc[ind,'SST'] - sst_clim.loc[i,'SST-DOY']
                sst_dailymean.loc[ind,'NO3-clim'] = sst_dailymean.loc[ind,'NO3'] - sst_clim.loc[i,'NO3-DOY']
        except IndexError:
            continue
            
# ----- calculate standardized daily anomalies
sst_dailymean['SST-clim standardized'] = sp.stats.zscore(sst_dailymean['SST-clim'])
sst_dailymean['NO3-clim standardized'] = sp.stats.zscore(sst_dailymean['NO3-clim'])

# save daily data
sst_dailymean.to_csv(f'/Volumes/Mere/Python_MESMA/Statistics/Environmental_data/AVHRR_SST/AVHRR_daily_summary.csv')
        
# ----- marine heatwave calculation
# definition of a marine heatwave: 
# temperature greater than 90th percentile based on a 30 year record/baseline (1985 - 2015) = 14.1 degC

sst_anom_MHW = sst_dailymean['SST-clim'].quantile(.9)
sst_MHW = sst_dailymean['SST'].quantile(.9)

# the .diff() function takes the difference between some value and the previous value (I only want values = 5.)
MHW_find = pd.DataFrame(sst_dailymean[sst_dailymean['SST'] >= sst_MHW])
MHW_find['sequential MHW Days'] = MHW_find.DOY.diff(periods=5)
MHW_days = MHW_find[MHW_find['sequential MHW Days'] == 5.0]
 
# ----- create new pandas array grouped by year, with a column for mean, sd. for SST and NO3
stats = pd.DataFrame()

# SST stuff
stats['SST-clim stand Mean'] = sst_dailymean.groupby('Year')['SST-clim standardized'].mean()
stats['SST-clim stand Median'] = sst_dailymean.groupby('Year')['SST-clim standardized'].median()
stats['SST-clim stand SD'] = sst_dailymean.groupby('Year')['SST-clim standardized'].std()
stats['SST-clim stand min'] = sst_dailymean.groupby('Year')['SST-clim standardized'].min()
stats['SST-clim stand max'] = sst_dailymean.groupby('Year')['SST-clim standardized'].max()
stats['SST-clim stand summer'] = sst_dailymean.groupby(sst_dailymean['Year'][(sst_dailymean['DOY']>=172) & (sst_dailymean['DOY']<=265)])['SST-clim standardized'].mean()
stats['SST-clim stand spring'] = sst_dailymean.groupby(sst_dailymean['Year'][(sst_dailymean['DOY']>=78) & (sst_dailymean['DOY']<172)])['SST-clim standardized'].mean()
stats['Degree Days'] = sst_dailymean[sst_dailymean['SST'] >= 14.].groupby('Year')['SST'].sum()
stats['MHW Days'] = MHW_days['Year'].value_counts()
stats = stats.fillna(0) #fill in the nan values with zeros
stats['Degree Days-clim'] = stats['Degree Days'] - stats['Degree Days'].mean()
stats['Degree Days-clim stand'] = sp.stats.zscore(stats['Degree Days-clim'])
stats['MHW Days-clim'] = stats['MHW Days'] - stats['MHW Days'].mean()
stats['MHW Days-clim stand'] =  sp.stats.zscore(stats['MHW Days-clim'])

# NO3 stuff
stats['NO3-clim stand Mean'] = sst_dailymean.groupby('Year')['NO3-clim standardized'].mean()
stats['NO3-clim stand Median'] = sst_dailymean.groupby('Year')['NO3-clim standardized'].median()
stats['NO3-clim stand SD'] = sst_dailymean.groupby('Year')['NO3-clim standardized'].std()
stats['NO3-clim stand min'] = sst_dailymean.groupby('Year')['NO3-clim standardized'].min()
stats['NO3-clim stand max'] = sst_dailymean.groupby('Year')['NO3-clim standardized'].max()
stats['NO3-clim stand summer'] = sst_dailymean.groupby(sst_dailymean['Year'][(sst_dailymean['DOY']>=172) & (sst_dailymean['DOY']<=265)])['NO3-clim standardized'].mean()
stats['NO3-clim stand spring'] = sst_dailymean.groupby(sst_dailymean['Year'][(sst_dailymean['DOY']>=78) & (sst_dailymean['DOY']<172)])['NO3-clim standardized'].mean()

stats['Year'] = stats.index

# ----- plot SST indices
fig,(ax1,ax2,ax3,ax4,ax5,ax6) = plt.subplots(6,1,figsize=(10,15))
ax1.bar(stats.index,stats['Degree Days-clim stand'])
ax1.set(ylabel = 'Degree Days Index',
       xlabel = 'Year',
       ylim = [-3.5,3.5])

ax2.bar(stats.index,stats['MHW Days-clim stand'])
ax2.set(ylabel = 'MHW Days Index',
       xlabel = 'Year',
       ylim = [-3.5,3.5])

ax3.bar(stats.index,stats['SST-clim stand Mean'])
ax3.set(ylabel = 'Mean SST Index',
       xlabel = 'Year',
       ylim = [-3,3])

ax4.bar(stats.index,stats['SST-clim stand Median'])
ax4.set(ylabel = 'Median SST Index',
       xlabel = 'Year',
       ylim = [-3,3])

ax5.bar(stats.index,stats['SST-clim stand summer'])
ax5.set(ylabel = 'Summer SST Index',
       xlabel = 'Year',
       ylim = [-3,3])


ax6.bar(stats.index,stats['SST-clim stand spring'])
ax6.set(ylabel = 'Spring SST Index',
       xlabel = 'Year',
       ylim = [-3,3])

fig.tight_layout()
plt.show()


# ----- plot NO3 indices
fig,(ax1,ax2,ax3,ax4) = plt.subplots(4,1,figsize=(10,12))

ax1.bar(stats.index,stats['NO3-clim stand Mean'])
ax1.set(ylabel = 'Mean NO3 Index',
       xlabel = 'Year',
       ylim = [-2,2])

ax2.bar(stats.index,stats['NO3-clim stand Median'])
ax2.set(ylabel = 'Median NO3 Index',
       xlabel = 'Year',
       ylim = [-2,2])

ax3.bar(stats.index,stats['NO3-clim stand summer'])
ax3.set(ylabel = 'Summer NO3 Index',
       xlabel = 'Year',
       ylim = [-2,2])


ax4.bar(stats.index,stats['NO3-clim stand spring'])
ax4.set(ylabel = 'Spring NO3 Index',
       xlabel = 'Year',
       ylim = [-2,2])

fig.tight_layout()
plt.show()




# save SST data
save_name = ['DegDays','MHWDays','Mean','Median','Summer','Spring']
col_name = ['Degree Days-clim stand','MHW Days-clim stand','SST-clim stand Mean','SST-clim stand Median','SST-clim stand summer','SST-clim stand spring']
for n,name in enumerate(save_name):
    np.savetxt(f'T_{name}_{var}_stand.txt',
              stats[['Year',col_name[n]]],
              fmt = '%i %0.3f',
              header = f'{var}\n{col_name[n]} \nrun on:{now}\nYear; index value')

# save NO3 data
save_name = ['Mean','Median','Summer','Spring']
col_name = ['NO3-clim stand Mean','NO3-clim stand Median','NO3-clim stand summer','NO3-clim stand spring']
for n,name in enumerate(save_name):
    np.savetxt(f'NO3_{name}_{var}_stand.txt',
              stats[['Year',col_name[n]]],
              fmt = '%i %0.3f',
              header = f'{var}\n{col_name[n]} \nrun on:{now}\nYear; index value')
    
    
# ----- visualize SST data with a CDF
sst_dailymean['SST-clim'].hist(cumulative=True, density=1,bins=200)
sst_dailymean['SST'].hist(cumulative=True, density=1,bins=200)


# ----- plot histograms of T distribution per year
sst_dailymean['SST-clim standardized'].hist(figsize = (20,20),by = sst_dailymean['Year'], sharey = True,sharex = True, bins=50)


In [2]:
# ----- Process NOAA buoy data: remove climatology NOAA NBDC - Bodega Bay Station 46013
filename = 'NBDC_BodegaBay_all.csv'
all_data = pd.read_csv(filename)
all_data = all_data.replace(999,np.NaN)

# get daily data

daily_data = all_data.groupby(['DATE_ID']).mean()
daily_data.index = pd.to_datetime(daily_data.index,format='%Y%m%d')

daily_data.insert(1,'DOY', 0)
daily_data['DOY'] = daily_data.index.dayofyear
daily_data = daily_data.drop('hh',1).drop('mm',1)
#daily_data = daily_data.drop('MM',1).drop('DD',1).drop('hh',1).drop('mm',1)


# add climatology and anomaly columns to daily data

anom_data = pd.DataFrame(index=daily_data.index,columns=daily_data.keys(),dtype=float)
anom_data['DOY'] = daily_data['DOY']
anom_data['YY'] = daily_data['YY']
anom_data['MM'] = daily_data['MM']
anom_data['DD'] = daily_data['DD']



# daily climatology across the timeseries

NOAA_climatology = daily_data.groupby(['DOY']).mean()


# remove daily climatology from daily data

for c in NOAA_climatology.index:
    for i in daily_data.index:
        val = daily_data.at[i,'DOY']
        try:
            if val == c:
                anom_data.loc[i,'WD':'PRES'] = daily_data.loc[i,'WD':'PRES'].subtract(NOAA_climatology.iloc[c,1:])
        except IndexError:
            continue

all_data.to_csv('NBDC_summary.csv')

In [None]:
# ----- Process NOAA buoy data: condense relevant Hs and SST from NOAA NBDC - Bodega Bay Station 46013

var = 'NBDC'

# ----- create timeseries with just T, Hs

T_Hs_anom_data = pd.DataFrame(index=anom_data.index,columns=['YY','MM','DOY','WVHT','WTMP'])
T_Hs_anom_data['YY'] = anom_data['YY']
T_Hs_anom_data['MM'] = anom_data['MM']
T_Hs_anom_data['DOY'] = anom_data['DOY']
T_Hs_anom_data['WVHT'] = anom_data['WVHT']
T_Hs_anom_data['WTMP'] = anom_data['WTMP']
     
# ----- creating annual anomaly indices for all variables
# max annual wave height and temp anomolies, max wave height and temp anomolies timing

annual_anom = pd.DataFrame(index=np.arange(T_Hs_anom_data['YY'].min(),T_Hs_anom_data['YY'].max()+2,1),columns=['WVHT','WTMP'])
monthly_anom = anom_data.groupby(['MM']).mean()


for year in annual_anom.index:
    for key in T_Hs_anom_data.keys()[2:]:
        annual_anom_max = T_Hs_anom_data[T_Hs_anom_data['YY'] == year][key].max()
        annual_array = np.array((T_Hs_anom_data[T_Hs_anom_data['YY'] == year]['DOY'],T_Hs_anom_data[T_Hs_anom_data['YY'] == year][key])).T
        try:
            index_max = np.argmax(annual_array,axis=0)
        except ValueError:
            continue
        annual_doy_max = annual_array[index_max[1],0]
        annual_anom_min = anom_data[anom_data['YY'] == year][key].min()
        index_min =  np.argmin(annual_array,axis=0)      
        annual_doy_min = annual_array[index_min[1],0]
      
        if abs(annual_anom_min)>annual_anom_max:
            annual_anom.loc[year,key] = annual_anom_min
            #annual_DOY.loc[year,key] = annual_doy_min
            #annual_array.loc[year,key] = annual_doy_min
        else:
            annual_anom.loc[year,key] = annual_anom_max
            #annual_DOY.loc[year,key] = annual_doy_max
            #annual_array.loc[year,key] = annual_doy_max


# rename and add columns to annual_anom

annual_anom = annual_anom.rename(columns = {'WVHT':'WVHT_max','WTMP':'WTMP_max'})
add_cols = ['WVHT_mean','WTMP_phys','WTMP_mean','WTMP_spring','WTMP_summer','WVHT_winter']

for i in range(2,6):
        annual_anom.insert(i,add_cols[i-3],0)

# number of days SST >17 degC

SST_phys_calc = daily_data.loc[daily_data['WTMP']>17.]
SST_phys_counts = SST_phys_calc['YY'].value_counts()
for year in annual_anom.index:
    try:
        annual_anom['WTMP_phys'].loc[year] = SST_phys_counts.loc[year]
    except KeyError:
        continue
        
# Avg SST anomaly during sporophyte growth season (spring; Mar 20 - June 21)
# Avg SST anomaly during max photosyn season ( summer; June 22 - Sept 23)
# physiological T limits of nereo - 3 to 17 degrees C
# min T across this timeseries is ~8 deg C

spring_all = pd.DataFrame(anom_data.loc[(anom_data['DOY']<=172) & (anom_data['DOY']>=80)],dtype=float)
summer_all = pd.DataFrame(anom_data.loc[(anom_data['DOY']>=173) & (anom_data['DOY']<=266)],dtype=float)
winter_all = pd.DataFrame(anom_data.loc[anom_data['DOY']<=79],dtype=float)

annual_anom['WTMP_spring'] = spring_all.groupby(['YY'])['WTMP'].mean()
annual_anom['WTMP_summer'] = summer_all.groupby(['YY'])['WTMP'].mean()
annual_anom['WTMP_mean'] = anom_data.groupby(['YY'])['WTMP'].mean()

annual_anom['WVHT_winter'] = winter_all.groupby(['YY'])['WVHT'].mean()
annual_anom['WVHT_mean'] = anom_data.groupby(['YY'])['WVHT'].mean()


# ----- Index stadardization 
# because there are NaNs in the timeseries I have to calculate zscore by hand rather than with the scipy function

T_max_LS =  pd.DataFrame(annual_anom.loc[annual_anom.index >= 1985,'WTMP_max'])
T_max_LS['cubic interp'] = T_max_LS['WTMP_max'].interpolate(method='cubic',limit=None)
T_max_stand = np.array((T_max_LS.index,(T_max_LS['WTMP_max'] - T_max_LS['WTMP_max'].mean())/T_max_LS['WTMP_max'].std(ddof=0))).T
T_max_stand_interp = np.array((T_max_LS.index,(T_max_LS['cubic interp'] - T_max_LS['cubic interp'].mean())/T_max_LS['cubic interp'].std(ddof=0))).T

T_phys_LS = pd.DataFrame(annual_anom.loc[annual_anom.index >= 1985,'WTMP_phys'])
T_phys_stand = np.array((T_phys_LS.index,(T_phys_LS['WTMP_phys'] - T_phys_LS['WTMP_phys'].mean())/T_phys_LS['WTMP_phys'].std(ddof=0))).T

T_mean_LS = pd.DataFrame(annual_anom.loc[annual_anom.index >= 1985,'WTMP_mean'])
T_mean_stand = np.array((T_mean_LS.index,(T_mean_LS['WTMP_mean'] - T_mean_LS['WTMP_mean'].mean())/T_mean_LS['WTMP_mean'].std(ddof=0))).T

T_spring_LS = pd.DataFrame(annual_anom.loc[annual_anom.index>=1985,'WTMP_spring'])
T_spring_LS['cubic interp'] = T_spring_LS['WTMP_spring'].interpolate(method='cubic',limit=None)
T_spring_LS = pd.DataFrame(T_spring_LS.loc[T_spring_LS.index>=1985])
T_spring_stand = np.array((T_spring_LS.index,(T_spring_LS['WTMP_spring'] - T_spring_LS['WTMP_spring'].mean())/T_spring_LS['WTMP_spring'].std(ddof=0))).T
T_spring_stand_interp = np.array((T_spring_LS.index,(T_spring_LS['cubic interp'] - T_spring_LS['cubic interp'].mean())/T_spring_LS['cubic interp'].std(ddof=0))).T

T_summer_LS = pd.DataFrame(annual_anom.loc[annual_anom.index>=1985,'WTMP_summer'])
T_summer_LS['cubic interp'] = T_summer_LS['WTMP_summer'].interpolate(method='cubic',limit=None)
T_summer_stand = np.array((T_summer_LS.index,(T_summer_LS['WTMP_summer'] - T_summer_LS['WTMP_summer'].mean())/T_summer_LS['WTMP_summer'].std(ddof=0))).T
T_summer_stand_interp = np.array((T_summer_LS.index,(T_summer_LS['cubic interp'] - T_summer_LS['cubic interp'].mean())/T_summer_LS['cubic interp'].std(ddof=0))).T

HS_max_LS = pd.DataFrame(annual_anom.loc[annual_anom.index>=1985,'WVHT_max'])
HS_max_stand = np.array((HS_max_LS.index,(HS_max_LS['WVHT_max'] - HS_max_LS['WVHT_max'].mean())/HS_max_LS['WVHT_max'].std(ddof=0))).T

HS_mean_LS = pd.DataFrame(annual_anom.loc[annual_anom.index>=1985,'WVHT_mean'])
HS_mean_stand = np.array((HS_mean_LS.index,(HS_mean_LS['WVHT_mean'] - HS_mean_LS['WVHT_mean'].mean())/HS_mean_LS['WVHT_mean'].std(ddof=0))).T

HS_winter_LS = pd.DataFrame(annual_anom.loc[annual_anom.index>=1985,'WVHT_winter'])
HS_winter_LS['cubic interp'] = HS_winter_LS['WVHT_winter'].interpolate(method='cubic',limit=None)
HS_winter_stand = np.array((HS_winter_LS.index,(HS_winter_LS['WVHT_winter'] - HS_winter_LS['WVHT_winter'].mean())/HS_winter_LS['WVHT_winter'].std(ddof=0))).T
HS_winter_stand_interp = np.array((HS_winter_LS.index,(HS_winter_LS['cubic interp'] - HS_winter_LS['cubic interp'].mean())/HS_winter_LS['cubic interp'].std(ddof=0))).T


# ----- Output of buoy data to text files

NBDC_buoy = 'Bodega Bay'

var_names = [HS_mean_stand, HS_max_stand,HS_winter_stand,T_max_stand,T_phys_stand,T_mean_stand,
             T_spring_stand,T_summer_stand]

var_str = ['HS_mean_stand','HS_max_stand','HS_winter_stand','T_max_stand','T_phys_stand','T_mean_stand',
           'T_spring_stand','T_summer_stand']

var_descript = ['annual mean Hs anomaly - climatology removed and index standardized',
                'annual max Hs anomaly - climatology removed and index standardized',
                'annual winter mean Hs anomaly - climatology removed and index standardized',
                'annual max T anomaly - climatology removed and index standardized',
                'annual number of days greater than 17 degC - climatology not removed and index standardized',
                'annual mean of temperature anomalies - climatology removed and index standardized',
                'annual spring mean T anomaly - climatology removed and index standardized',
                'annual summer mean T anomaly - climatology removed and index standardized']
                

for n,name in enumerate(var_names):

    np.savetxt(f'{var}_{var_str[n]}.txt',
              name,fmt = '%i %0.5f',
               header = f'{var} {NBDC_buoy} \n{var_descript[n]} \nrun on:{now}\nYear; index value')
    

# ----- Output of interpolated buoy data to text files
var_names_interp = [HS_winter_stand_interp,T_max_stand_interp,T_spring_stand_interp,T_summer_stand_interp]

var_str_interp = ['HS_winter_stand_interp','T_max_stand_interp','T_spring_stand_interp','T_summer_stand_interp']

var_descript_interp = ['annual winter mean Hs anomaly - climatology removed, interpolated over data gaps, and index standardized',
               'annual max T anomaly - climatology removed, interpolated over data gaps, and index standardized',
               'annual spring mean T anomaly - climatology removed, interpolated over data gaps, and index standardized',
               'annual summer mean T anomaly - climatology removed, interpolated over data gaps, and index standardized']

for n,name in enumerate(var_names_interp):

    np.savetxt(f'{var}_{var_str_interp[n]}.txt',
              name,fmt = '%i %0.5f',
               header = f'{var} {NBDC_buoy} \n{var_descript_interp[n]} \nrun on:{now}\nYear; index value')
         
anom_data.to_csv('NBDC_BodegaBay_anom.csv',
             na_rep=999,index=False)


In [None]:
# Biological data
# Reef Check Purple urchin
# Reef Check Pycnopodia


var = 'bio'

filelist = glob.glob('CDFW*_means.xlsx')
biology = pd.DataFrame()

for file in filelist:
    try:
        data = pd.read_excel(file,'means')
        biology = biology.append(data)
    except:
        continue

#purps_mean = biology['CDFW+RC Purps Annual Mean'].mean()
#pycno_mean = biology['CDFW+RC Pycno Annual Mean'].mean()
means = biology.iloc[biology.index<11].mean()

# calculate anomolies - no climatology removed (need to look up how the data is collected and processed)
bioanoms = pd.DataFrame(columns = ['Year','Purple Urchin Anom','Purple Urchin Stand Index','Pycnopodia Anom','Pycnopodia Stand Index'])
bioanoms['Year'] = biology['Year']
#bioanoms['Purple Urchin Anom'] = biology['CDFW+RC Purps Annual Mean'] - purps_mean
bioanoms['Purple Urchin Anom'] = biology['CDFW+RC Purps Annual Mean'] - means['CDFW+RC Purps Annual Mean']
#bioanoms['Pycnopodia Anom'] = biology['CDFW+RC Pycno Annual Mean'] - pycno_mean
bioanoms['Pycnopodia Anom'] = biology['CDFW+RC Pycno Annual Mean'] - means['CDFW+RC Pycno Annual Mean']
bioanoms['Purple Urchin Stand Index'] = sp.stats.zscore(bioanoms['Purple Urchin Anom'],nan_policy='omit')
bioanoms['Pycnopodia Stand Index'] = sp.stats.zscore(bioanoms['Pycnopodia Anom'],nan_policy='omit')

save_name = ['bio_purps','bio_pycno']
col_name = ['Purple Urchin Stand Index','Pycnopodia Stand Index']
for n,name in enumerate(save_name):
    np.savetxt(f'{var}_{name}_index_stand.txt',
              bioanoms[['Year',col_name[n]]],
              fmt = '%i %0.3f',
              header = f'{var} - {col_name[n]}: Indices are in density (units of # per 60 m2) \nrun on:{now}\nYear; index value')

fig, ax = plt.subplots(1,1)
#ax.bar(bioanoms['Year'],bioanoms['Purple Urchin Anom'],label='Purple Urchins',color='purple')
ax.bar(bioanoms['Year'],bioanoms['Purple Urchin Stand Index'],label='Purple Urchins',color='purple')
ax.bar(bioanoms['Year'],bioanoms['Pycnopodia Stand Index'],label='Pycnopodia',color='grey')
#ax.bar(bioanoms['Year'],bioanoms['Pycnopodia Anom'],label='Pycnopodia',color='grey')
ax.set(ylabel = 'Biological Index',
      xlabel = 'Year',
      ylim = [-2,3])
fig.legend()
fig.tight_layout()
plt.show()

In [None]:
# Timeseries plot of largescale indices

fig1,(ax1,ax2,ax3,ax4,ax5,ax6,ax7) = plt.subplots(7,1,figsize = (9,15),sharex = True)
ax1.bar(kelpanom.index,kelpanom['Standardized Kelp anom'],color='lightsteelblue')
ax1.set(ylim = [-3,3])
ax1.set_title('Sonoma County Bull Kelp Index and Relevant Environmental Variables', fontsize = 18)
ax1.set_ylabel('Bull kelp\n(summed frac)', fontsize=12)

ax2.bar(PDO_stand[:,0],PDO_stand[:,1],color='lightsteelblue')
ax2.set(ylim = [-2,2])
ax2.set_ylabel('PDO Index',fontsize = 12)

ax3.bar(NPGO_stand[:,0],NPGO_stand[:,1],color='lightsteelblue')
ax3.set(ylim = [-2,2])
ax3.set_ylabel('NPGO Index',fontsize = 12)

ax4.bar(MEI_stand[:,0],MEI_stand[:,1],color='lightsteelblue')
ax4.set(ylim = [-2,2])
ax4.set_ylabel('MEI Index',fontsize = 12)

ax5.bar(ONI_stand[:,0],ONI_stand[:,1],color='lightsteelblue')
ax5.set(ylim = [-2,2])
ax5.set_ylabel('ONI Index',fontsize = 12)

ax6.bar(near_max_stand[:,0],near_max_stand[:,1],color='lightsteelblue')
ax6.set(ylim = [-3,3])
ax6.set_ylabel('Nearshore ssha \n(mm)',fontsize = 12)

ax7.bar(off_max_stand[:,0],off_max_stand[:,1],color='lightsteelblue')
ax7.set(xlim = [1985,2019],
       ylim = [-3,3])
ax7.set_xlabel(xlabel = 'Year',fontsize = 14)
ax7.set_ylabel('Offshore ssha \n(mm)',fontsize = 12)

fig1.tight_layout()