In [None]:
plt.rcParams['axes.prop_cycle'] = plt.cycler(color=plt.cm.viridis(np.linspace(0, 1, 6)))

By reducing the batch size, I was able to train a version of the neural network which did not produce long term bias in a single column simulation. Now, I just have to get this network working with SAM. This is challenging becaus I refactored all my code.

I ran a SAM simulation, but it is quickly blowing up, and I cannot figure out why.

In [None]:
path = "../data/runs/2018-10-23-NN/OUT_3D/*.nc"


data_3d = xr.open_mfdataset(path)

In [None]:
data_3d.FQTNN[1,11].plot()

The neural networks predicted moistening seems quite reasonable the first time it is computed.

But here it is after just one more time step:

In [None]:
data_3d.FQTNN[2,11].plot()

There is 

This shows that something is going terrible wrong after a single time step. I bet this problem is caused by an incompatability in the FQT and FSLI used to train the neural network, and the current configuration of the model. I changed the SGS scheme in the boundary layer of the model following some advice of Peter's. Here is the description:

    commit c44881b1fc6f55637aaf46a752a7e549903957b4
    Author: Noah D. Brenowitz <nbren12@gmail.com>
    Date:   Mon Sep 10 18:13:24 2018 -0700

        Changing damping and SGS schemes

        I increased the time-scales of the sponge layer, and modified the surface scheme
        to only compute the momentum flux. The temperature and humidity fluxes are fixed
        at 0. This improves the schemes performance in the boundary layer, but not so
        much elsewhere.
        
Actually, I don't think this is true since the newest version of the data was generated after this commit. 
        
This change would alter how things are looking at the bottom of the atmosphere. Let's look at the humidity there at time step 1 and time step 2:

In [None]:
data_3d.QT[0,0].plot()

In [None]:
data_3d.QT[1,0].plot()

There is a huge difference in the moisutre after only 10 seconds. Also, this should not be effected by the neural network because it is only one time-step later. Let's check this is true by running the simulation w/o the net.

In [None]:
data_3d_no_nn = xr.open_mfdataset("../data/runs/2018-10-23-noNN/OUT_3D/*.nc")

In [None]:
data_3d_no_nn.QT[1,0].plot()

Actually, there is almost no change at all without the neural net. Therefore, there must be something wrong with the size of the time step that the network is taking. Otherwise it would not be able to change the state that much! Maybe there is some unit conversion error.

In [None]:
qt_pred = data_3d.QT[0] + data_3d.FQTNN[1] * 10.0 * 1000.0 

In [None]:
qt_pred[0].plot()

This is the problem exactly. Why is the moistening predicted to be sooo strong in the boundary layer. Here is the domain mean of FQTNN near the boundary layer:

In [None]:
(86400 * data_3d.FQTNN[1].mean(['x', 'y'])).plot(y='z')
plt.ylim([0,2000])
plt.xlabel('FQTNN (g/kg/day)')

There must be some factor 1000 error in these rates. I am not sure why this is happening though.

I found the error. Near line 316 in `model.py`
```python
            apparent_src = nn_output.pop(key) / 86400
            prog[key] = prog[key] + apparent_src * self.time_step
            # store neural network diagnostics for layer
            nn_output['F' + key + 'NN'] = apparent_src
```

Intead of using the new dt I am using the stored time_step.

# Fixing this blow-up

I fixed this bug and started another experiment, which also quits after 1day. Maybe I can fix this by decreasing the time step.

In [None]:
ds_fixed =xr.open_mfdataset("../data/runs/2018-10-23-fix-dt-bug/OUT_3D/*.nc")
ds_fixed_2d =xr.open_mfdataset("../data/runs/2018-10-23-fix-dt-bug/OUT_2D/*.nc")

In [None]:
ds_fixed.W[:,10].plot(col='time')

In [None]:
ds_fixed.FSLINN[:,15].plot(col='time')

In [None]:
ds_fixed.FSLINN.sel(y=5e6, method='nearest').plot(y='z', col='time')

In [None]:
ds_fixed.FQTNN[:,3:].sel(y=5e6, method='nearest').plot(y='z', col='time')

In [None]:
ds_fixed.QT[:,:20].sel(y=5e6, method='nearest').plot(y='z', col='time', vmin=0)

The moistening is moist profound in the boundary layer.

In [None]:
ds_fixed.FQTNN[:,0].sel(y=5e6, method='nearest').plot(hue='time')

In [None]:
ds_fixed.W[:,1].sel(y=5e6, method='nearest').plot(hue='time')
plt.xlim([1.0e7, 1.5e7])

In [None]:
ds_fixed.QT[:,1].sel(y=5e6, method='nearest').plot(hue='time')
plt.xlim([0.0e7, 1.5e7])

In [None]:
ds_fixed.U[:,0].sel(y=5e6, method='nearest').plot(hue='time')
plt.xlim([0.0e7, 1.5e7])

This looks kind of like some kind of CISK feedback. It is interesting this is occuring over a region which had initial wind-convergence.

In [None]:
ds_fixed.FQTNN[-1].mean('x').plot(vmax=1e-4)

In [None]:
ds_fixed.FSLINN[-1].mean('x').plot()

# Run: 2018-10-23-fix-dt-bug-dt30

I decreased the time step to 30 seconds. Hopefully this will solve the numerical instability. If this doesn't work, I think I should work on the forcing estimates again. The magnitude of the moistening is far to large in the boundary layer. And there are also some strange vertical structures in FQTNN.

In [None]:
fix_2d = xr.open_mfdataset("../data/runs//2018-10-23-fix-dt-bug-dt30/OUT_2D/CASE__1.2Dbin_1.nc")

In [None]:
fix_2d.PW[::8].plot(col='time')

# Why do these runs fail?

Let's compare to the "good run" we showed in our GMU slides. 

In [None]:
good_run  =xr.open_mfdataset("../models/17/test/OUT_3D/*.nc")

In [None]:
data_3d.FQTNN[1,0 ].plot()

In [None]:
good_run.FQTNN[1,0].plot()

We can see that the good runs predicted BL moistenting is much more zonally uniform and looks much more like the LHF map. Several things could be happening

1. By massweighting the loss, I am de-emphasizing the importance of the narrow boundary layer levels.
2. The trapezoid integration rule hurts (I don't see why this would matter).
3. The network does not have enough layers. I aslo changed this.
4. changes in things like batch_size, skip, 

# New trainining models

## #95: trained with all x locations
```
python -m uwnet.train with data=data/processed/training.nc examples/SAM.yaml batch_size=32  skip=10 seq_length=10  lr=.001 n_epochs=10 -m uwnet
```

## #97: no mass weighting in loss

In [None]:
from src.sacred import get_last_model, get_run
from uwnet.model import call_with_xr, model_factory

train_data_path = "../data/processed/training.nc"
ds = xr.open_dataset(train_data_path)


def get_output(id, ds=ds.isel(time=slice(0,1))):
    model = model_factory().from_dict(torch.load(get_last_model(id))['dict'])
    model.disable_forcing()
    return call_with_xr(model, ds.isel(z=model.heights), n=1, drop_times=0)

def plot_first_level(id, ds=ds.isel(time=slice(0, 1))):
    output = get_output(id, ds)
    output.FQTNN[0,0].plot()
    plt.title(f"ID=  {id}")

In [None]:
plot_first_level(id=97)

this looks pretty similar, which indicates that mass weighting is not the problem.

In [None]:
plot_first_level(id=95)

This is just what the boundary layer looks like in this setup, so I am probably doing things correctly:

"Models/17" is actually run #10 in the mongo database:

## #99: no trapezoid rule

In [None]:
plot_first_level(99)

Wow. who would have though using the trapezoid rule would make such a big difference.

## #100: no-trapezoid rule for full dataset

I am training the no-trapezoid rule version of the code for many dataset. Wait...for more training this solution goes hay wire as well.

In [None]:
plot_first_level(id=100)

In [None]:
good_run.FQTNN[1,4::,32,0].plot()
get_output(100).FQTNN[0,4:,32,0].plot()

I think there is some error with how the time stepper in my training works. Perhaps it is comparing the two-step ahead predictions. This might explain why FQTNN and FSLINN on the raw data look strange.

## #107: old version w/ smaller batch size

I reverted the code back to the GMU talk, but changed only the experiment logging code and the file saving stuff

In [None]:
ds = xr.open_mfdataset("../data/runs/2018-10-24-old-master/OUT_3D/*.nc")

In [None]:
ds.FQTNN[2,0].plot()

Actually this data was only trained on the equatorial region.

## #108: Train on full domain.

In [None]:
ds = xr.open_mfdataset("../data/runs/2018-10-24-NN108-epoch0/OUT_3D/*.nc")

In [None]:
ds.FQTNN[2,0].plot()

This looks more like the original answer we had. I wonder what the problem with the new code is. Why does it predict double the observed tendency?

In [None]:
ds.FQTNN[2,:, 1,:].plot()

In [None]:
stor = (train.QT[1]-train.QT[0])/.125/86400

In [None]:
stor[0].plot()

## Longer run

This completed succesfully

In [None]:
d2d = xr.open_dataset("../data/runs/2018-10-24-NN108-epoch1-long/OUT_2D/CASE__1.2Dbin_1.nc")
d3d = xr.open_mfdataset("../data/runs/2018-10-24-NN108-epoch1-long/OUT_3D/*.nc")

In [None]:
d2d.PW[::12].plot(col='time', col_wrap=4)

In [None]:
cfqtnn = (d3d.FQTNN * train.layer_mass).sum('z')/1000*86400
cfslinn = (d3d.FSLINN * train.layer_mass).sum('z')*1004

In [None]:
cfqtnn[1].mean('x').plot()

In [None]:
cfslinn[1].mean('x').plot()
plt.ylabel('cFSLINN (w/m2)')

In [None]:
d2d.W500.mean('x')[::24].plot(hue='time')

These changes are accompanied by a strong heating near the equator.

In [None]:
d2d.PW.mean('x')[:,32].plot()

In [None]:
d2d.W500.mean('x')[:,32].plot()

From this, it looks like the increase in PW along the equator is driven by changes in the circulation there that appear relatively quickly.

Ok so I have re-created the results from the GMU talk.

1. Where is my new code wrong?
2. Does the peak in net precip appear in the initial time step? 