## CNN_LSTM Base
- to capture spatial (CNN) -temporal (LSTM) information
- often used in NEXT FRAME VIDEO PREDICTION problem in vision

### Colab Link
- Run it with GPU: https://colab.research.google.com/drive/1mNi_gSTWDSto7EWnyh4ZVuQA3VfN56Hm#scrollTo=zVGtIJiqnJLB


### Sources

https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9063513

https://keras.io/examples/vision/conv_lstm/

In [2]:
import imp
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import xarray as xr
import sys
import tensorflow as tf
from tensorflow import keras
import os
from sklearn.model_selection import train_test_split
sys.path.insert(0, '../src')
from utils import df_to_xarray,read_xarray,plot_image,preprocess_image,create_shifted_frames

#!module load cuda11.0/toolkit cuda11.0/blas cudnn8.0-cuda11.0
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))


# Reading Data

dir_name="../data/data1"
val_dir_name="../data/data2"

chl,mld,sss,sst,u10,fg_co2,xco2,icefrac,patm,pco2=read_xarray(dir_name)

2021-09-15 16:28:43.198802: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-09-15 16:28:43.883885: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-09-15 16:28:44.333790: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:d8:00.0 name: Quadro RTX 8000 computeCapability: 7.5
coreClock: 1.62GHz coreCount: 72 deviceMemorySize: 44.49GiB deviceMemoryBandwidth: 581.23GiB/s
2021-09-15 16:28:44.343456: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-09-15 16:28:48.163021: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-09-15 16:28:48.173743: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-0

Num GPUs Available:  1


Cannot find the ecCodes library


In [4]:
# Preprocessing the Data
chl_images=preprocess_image(chl.Chl.data)
mld_images=preprocess_image(mld.MLD.data)
sss_images=preprocess_image(sss.SSS.data)
sst_images=preprocess_image(sst.SST.data)
xco2_images=preprocess_image(xco2.XCO2.data,xco2=True)
pco2_images=preprocess_image(pco2.pCO2.data,pco2=True)

train_data = np.stack((chl_images, mld_images, sss_images, xco2_images, sst_images,pco2_images), axis = 1)
x_train, y_train = create_shifted_frames(train_data)

print("Training Dataset Shapes: " + str(x_train.shape) + ", " + str(y_train.shape))

Training Dataset Shapes: (421, 5, 180, 360, 1), (421, 5, 180, 360, 1)


In [7]:
from tensorflow import keras
from tensorflow.keras import layers


inp = layers.Input(shape=(None, *x_train.shape[2:]))

# We will construct 3 `ConvLSTM2D` layers with batch normalization,
# followed by a `Conv3D` layer for the spatiotemporal outputs.
x = layers.ConvLSTM2D(
    filters=64,
    kernel_size=(5, 5),
    padding="same",
    return_sequences=True,
    activation="relu",
)(inp)
x = layers.BatchNormalization()(x)
x = layers.ConvLSTM2D(
    filters=64,
    kernel_size=(3, 3),
    padding="same",
    return_sequences=True,
    activation="relu",
)(x)
x = layers.BatchNormalization()(x)
x = layers.ConvLSTM2D(
    filters=64,
    kernel_size=(1, 1),
    padding="same",
    return_sequences=True,
    activation="relu",
)(x)
x = layers.Conv3D(
    filters=1, kernel_size=(3, 3, 3), activation="relu", padding="same"
)(x)

# Next, we will build the complete model and compile it.
model = keras.models.Model(inp, x)
model.compile(
    loss="mean_squared_error", optimizer=keras.optimizers.Adam(),
)

2021-09-15 16:31:24.783338: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX512F
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-09-15 16:31:24.840032: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-09-15 16:31:24.852221: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:d8:00.0 name: Quadro RTX 8000 computeCapability: 7.5
coreClock: 1.62GHz coreCount: 72 deviceMemorySize: 44.49GiB deviceMemoryBandwidth: 581.23GiB/s
2021-09-15 16:31:24.994130: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-09-15 16:31:24.994191: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Succ

In [8]:
model.summary()

Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         [(None, None, 180, 360, 1 0         
_________________________________________________________________
conv_lst_m2d (ConvLSTM2D)    (None, None, 180, 360, 64 416256    
_________________________________________________________________
batch_normalization (BatchNo (None, None, 180, 360, 64 256       
_________________________________________________________________
conv_lst_m2d_1 (ConvLSTM2D)  (None, None, 180, 360, 64 295168    
_________________________________________________________________
batch_normalization_1 (Batch (None, None, 180, 360, 64 256       
_________________________________________________________________
conv_lst_m2d_2 (ConvLSTM2D)  (None, None, 180, 360, 64 33024     
_________________________________________________________________
conv3d (Conv3D)              (None, None, 180, 360, 1) 1729  

In [11]:
model_path="../models/base_CNN_LSTM.h5"

early_stopings = tf.keras.callbacks.EarlyStopping(monitor='val_loss', min_delta=0, patience=4, verbose=1, mode='min')
checkpoint =  tf.keras.callbacks.ModelCheckpoint(model_path, monitor='val_loss', save_best_only=True, mode='min', verbose=0)
callbacks=[early_stopings,checkpoint]
# Define modifiable training hyperparameters.
epochs = 20
batch_size = 4

# Fit the model to the training data.
model.fit(
    x_train,
    y_train,
    batch_size=batch_size,
    epochs=epochs,
    validation_data=(x_train, y_train),
    callbacks=callbacks,
)

Epoch 1/20


2021-09-15 16:38:53.642718: W tensorflow/core/common_runtime/bfc_allocator.cc:433] Allocator (GPU_0_bfc) ran out of memory trying to allocate 63.28MiB (rounded to 66355200)requested by op model/conv_lst_m2d/while/body/_1/model/conv_lst_m2d/while/convolution_4
Current allocation summary follows.
2021-09-15 16:38:53.642776: I tensorflow/core/common_runtime/bfc_allocator.cc:972] BFCAllocator dump for GPU_0_bfc
2021-09-15 16:38:53.642784: I tensorflow/core/common_runtime/bfc_allocator.cc:979] Bin (256): 	Total Chunks: 119, Chunks in use: 119. 29.8KiB allocated for chunks. 29.8KiB in use in bin. 5.8KiB client-requested in use in bin.
2021-09-15 16:38:53.642788: I tensorflow/core/common_runtime/bfc_allocator.cc:979] Bin (512): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-15 16:38:53.642793: I tensorflow/core/common_runtime/bfc_allocator.cc:979] Bin (1024): 	Total Chunks: 13, Chunks in use: 13. 13.2KiB allocated for 

ResourceExhaustedError:  OOM when allocating tensor with shape[4,64,180,360] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[{{node model/conv_lst_m2d/while/body/_1/model/conv_lst_m2d/while/convolution_4}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
 [Op:__inference_train_function_10287]

Function call stack:
train_function
