# Introduction
This project is aimed to use a statistical model to predict the wind stress anomalies by sea surface temperature anomalies (SSTA). The relationship can be represented as $\boldsymbol{\tau_s} = \boldsymbol{C} \boldsymbol{T}$, where $\boldsymbol{\tau_s}$ and $\boldsymbol{T}$ are state vectors of wind stress anomolies and SSTA, respectively, and $\boldsymbol{C}$ is a constant coefficient (a matrix). Our goal is to derive $\boldsymbol{C}$ using Singular Value Decomposition (SVD) analysis on a training dataset, and apply the relationship on a testing dataset.

This project is going to:

1. Formulate a statistical atmosphere model for predicting surface wind stress anomalies for given SST anomalies;
2. Validate the simulated surface wind stress anomalies against observations;
3. Perform sensitivity tests to see how the results depend on the number of SVDs used to compute the anomalies;
4. Discuss whether the results make physical sense and why;

# Dataset and Method
Surface temperature ($T_s$), zonal wind stress ($u$), meridional wind stress ($v$), and a land mask dataset are given. $T_s, u, v$ are three-dimensional, with two space dimensions and one time dimension. Datasets are divided into two parts: data in 1948-1999 are used as training dataset, and data in 2000-2017 are used as testing dataset.

We follow the following steps to perform the SVD analysis:

1. Normalize all the anomaly fields by dividing each of variables by its own standard deviation;
2. Form a state vector for SSTA ($\boldsymbol{T}$), and another one for zonal and meridional wind stress anomaly ($\boldsymbol{\tau_s}$). Note that $\boldsymbol{\tau_s}$ contains both wind stress component and has larger dimension than $\boldsymbol{T}$;
3. Form a normalized covariance matrix, $\boldsymbol{A=\tau _s T'}$, whose dimension should be $M\times N$, where $M$ is the length for $\boldsymbol{\tau_s}$ and $N$ is length for $\boldsymbol{T}$;
4. Apply SVD on $\boldsymbol{A}$ by calling Matlab’s SVD function, which results in $\boldsymbol{A=USV'}$, $S$ is a diagonal matrix with diagonal elements representing singular values - explained squared covariance in each SVD, $\boldsymbol{U}$ contains all wind stress singular vectors and $V$ contains all SST singular vectors, which are self-orthogonal, i.e., $\boldsymbol{VV'=I}$ and $\boldsymbol{UU'=I}$. 

# Results
## SVD analysis
Although the SVD MATLAB scripts are provided, I rewrite them in Python in order to better understand the SVD method. The Python script is below:

In [5]:
import numpy.linalg as la
import xarray as xr
# read in the data
dir = "./NECP_monthly_mean_data/"
taux_file = dir + "uflx.sfc.mon.mean.tropics.nc"
tauy_file = dir + "vflx.sfc.mon.mean.tropics.nc"
sst_file = dir + "skt.sfc.mon.mean.tropics.nc"
grid_file = dir + "lsmask.tropics.nc"
taux_ds = xr.open_dataset(taux_file)
tauy_ds = xr.open_dataset(tauy_file)
sst_ds = xr.open_dataset(sst_file)
grid_ds = xr.open_dataset(grid_file)
# get the data
taux, tauy, sst, grid = taux_ds.uflx, tauy_ds.vflx, sst_ds.skt, grid_ds.lsmask
# select data from 1948 to 1999, in the tropics (30N-30S) and the longitudes 100-300
taux = taux.sel(time=slice("1948-01-01", "1999-12-31"), lat=slice(-30, 30), lon=slice(100, 300))
tauy = tauy.sel(time=slice("1948-01-01", "1999-12-31"), lat=slice(-30, 30), lon=slice(100, 300))
sst = sst.sel(time=slice("1948-01-01", "1999-12-31"), lat=slice(-30, 30), lon=slice(100, 300))
# lon and lat
lon = grid_ds.lon.sel(lon=slice(100, 300))
lat = grid_ds.lat.sel(lat=slice(-30, 30))

<xarray.DataArray 'uflx' (time: 624, lat: 0, lon: 107)>
array([], shape=(624, 0, 107), dtype=float32)
Coordinates:
  * lat      (lat) float32 
  * lon      (lon) float32 101.2 103.1 105.0 106.9 ... 294.4 296.2 298.1 300.0
  * time     (time) datetime64[ns] 1948-01-01 1948-02-01 ... 1999-12-01
Attributes:
    long_name:     Monthly Mean of Momentum Flux, U-Component
    valid_range:   [-4.  5.]
    units:         N/m^2
    precision:     3
    GRIB_id:       124
    GRIB_name:     U FLX
    var_desc:      Momentum Flux, u-component
    level_desc:    Surface
    statistic:     Mean
    parent_stat:   Individual Obs
    dataset:       NCEP Reanalysis Derived Products
    actual_range:  [-1.1202924  1.0410981]


  dtype = _decode_cf_datetime_dtype(data, units, calendar, self.use_cftime)
  return np.asarray(array[self.key], dtype=None)
