# 북극 관련 추가 데이터 활용 

< 참조 논문 > 
### Prediction of monthly Arctic sea ice concentrations using satellite and reanalysis data based on convolutional neural networks      
[NISDC](https://nsidc.org/data/search/#keywords=CLIMATE/sortKeys=score,,desc/facetFilters=%257B%257D/pageNumber=1/itemsPerPage=25)

1. [일일 해빙 농도 데이터](https://nsidc.org/data/G02135/versions/3)   
The first dataset is the daily sea ice concentration observation dataset, obtained from the National Snow and Ice Data Center (NSIDC), which is derived from the Nimbus-7 Scanning Multichannel Microwave Radiometer (SMMR) and the Defense Meteorological Satellite Program (DMSP) Special Sensor Microwave Imager (SSM/I and SS- MIS).

2. [일일 해수면 온도 데이터](https://psl.noaa.gov/data/gridded/tables/arctic.html)   
The second dataset is the daily sea surface temperature dataset, obtained from National Oceanic and Atmospheric Administration (NOAA) Optimal Interpolation Sea Surface Temperature (OISST) version 2, which is con- structed from Advanced Very High Resolution Radiometer (AVHRR) observation data with 0.25◦ resolution from 1988 to 2017. 

3. [유럽 중거리 일기 예보 센터](https://www.ecmwf.int/en/forecasts/datasets/browse-reanalysis-datasets)     
The third dataset is the monthly European Centre for Medium-Range Weather Forecasts (ECMWF) reanalysis (ERA-Interim) dataset, which is used in order to construct predictors for 1-month SIC prediction, including the surface air temperature, albedo, and v-wind vector with 0.125◦ resolution.



* 해수면 Dataset 
https://climatedataguide.ucar.edu/climate-data?page=3

In [2]:
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow import keras

import time
import datetime
import os , glob
from tqdm import tqdm
from pathlib import Path 


In [3]:
def sorted_list(path):
    tmplist = glob.glob(path)
    tmplist.sort()
    
    return tmplist 

In [4]:
list_train = sorted_list(os.path.join('data/train','*'))
filenames = os.listdir('data/train')
filenames.sort()

data = []

for filename in tqdm(filenames):
    data.append(
        np.load(f"data/train/{filename}"))
    
data = np.array(data)

100%|██████████| 482/482 [00:02<00:00, 206.65it/s]


In [5]:
data.shape

(482, 448, 304, 5)

In [8]:
data[...,0].shape # 해빙 농도 

(482, 448, 304)

In [9]:
np.save('data/total_data', data)

# OOM Error 문제 발생 
data 의 image 크기가 너무 크기 때문에 기본 Colab에서 실행을 했을 때에도 ResoureExhaustedError가 발생한다.  


OOM stands for "out of memory". Your GPU is running out of memory, so it can't allocate memory for this tensor. There are a few things you can do:

* Decrease the number of filters in your Dense, Conv2D layers
* Use a smaller batch_size (or increase steps_per_epoch and validation_steps)
* Use grayscale images (you can use tf.image.rgb_to_grayscale)
* Reduce the number of layers
* Use MaxPooling2D layers after convolutional layers
* Reduce the size of your images (you can use tf.image.resize for that)
* Use smaller float precision for your input, namely np.float32
* If you're using a pre-trained model, freeze the first layers (like this)



# ConvLSTM2D 

CNN + LSTM : ConvLSTM2D    
https://www.tensorflow.org/tutorials/structured_data/time_series?hl=ko

<고급 : 자기 회귀 모델> 

# 이미지 크기 조절 
## Pooling 
https://keras.io/ko/layers/pooling/

## 이미지 내 색상 압축 



In [37]:
class FeedBack(tf.keras.Model):
    
    def __init__(self, out_steps):
        super().__init__()
        
        self.out_steps = out_steps
        self.convlstm = tf.keras.layers.ConvLSTM2D(256, 3, padding ='same', 
                                                    return_state = True ) 
        
        # Also wrap the LSTMCell in an RNN to simplify the 'warmup' method 
        # batch normalization 정규화 

        self.bn = tf.keras.layers.BatchNormalization()
        
        # activation funcation : Relu
        self.relu = tf.keras.layers.ReLU()
        
        # output layers : conv2d
        self.conv2d = tf.keras.layers.Conv2D(1,1, activation ='sigmoid')

    
    # 단일 타임스텝 예측과 LSTM 내부 상태 반환
    def warmup(self, inputs):
        # input.shape => (batch, time, features )
        # x.shape => (batch, lstm_units)
        
        x, *state = self.convlstm(inputs)
        
        # prediction.shape => (batch, features)
        x = self.bn(x)
        x = self.relu(x)
        prediction = self.conv2d(x)

        return prediction, state
    
    # 출력 예측을 수집하는 가장 간단한 방법은 python 목록을 사용하여 루프 후 tf.stack을 사용하는 것 . 
    
    def call(self, inputs, training = None):

        # user a TensorArray to capture dynamically unrolled outputs 
        predictions =[]
        
        # lstm 모델 초기화 
        prediction, state = self.warmup(inputs)
        
        # Insert the first prediction 
        predictions.append(prediction)
        
        # Run the rest of the prediction steps 
        for _ in range(1, self.out_steps):
            
            x = prediction
            # one lstm step 
            x, *state = self.convlstm(tf.expand_dims(x,axis =1),
                                     initial_state = state, training = training  )

            # prediction 값 변경 
            x = self.bn(x)
            x = self.relu(x)
            prediction = self.conv2d(x)
            
            # add output 
            predictions.append(prediction)
            
        
        #prediction.shape = > (time, batch, features )
        predictions = tf.stack(predictions)
        
        return predictions
    
    

In [8]:
ice_tr_npy = data[:-48, ..., [0]]
ice_ts_npy = data[-48:, ..., [0]]
print(ice_tr_npy.shape, ice_ts_npy.shape)

(434, 448, 304, 1) (48, 448, 304, 1)


In [9]:
split_len = int(ice_tr_npy.shape[0] * 0.25)  # validation set : 25% 
ice_vl_npy = ice_tr_npy[-split_len:]
ice_tr_npy = ice_tr_npy[:-split_len]
print(ice_tr_npy.shape, ice_vl_npy.shape,ice_ts_npy.shape)

(326, 448, 304, 1) (108, 448, 304, 1) (48, 448, 304, 1)


In [10]:
@tf.function
def rescaling(images):
    return (tf.cast(images, dtype = tf.dtypes.float32) / 250.)

@tf.function
def split_window(images):
    inputs, target = tf.split(images, [4,1], axis =1)
    return (inputs, target)

In [11]:
i_tr_ten = tf.constant(ice_tr_npy)
i_vl_ten = tf.constant(ice_vl_npy)
i_ts_ten = tf.constant(ice_ts_npy)

i_tr_ds = tf.data.Dataset.from_tensor_slices(i_tr_ten
                                    ).window(4 + 1, shift = 1, stride = 24,drop_remainder = True
                                    ).flat_map(lambda x : x.batch(4+1)
                                    ).shuffle(buffer_size = 1000
                                    ).batch(8
                                    ).map(rescaling
                                    ).map(split_window
                                    ).prefetch(tf.data.experimental.AUTOTUNE)

i_vl_ds = tf.data.Dataset.from_tensor_slices(i_vl_ten
                                    ).window(4 + 1, shift = 1, stride = 24,drop_remainder = True
                                    ).flat_map(lambda x : x.batch(4+1)
                                    #).shuffle(buffer_size = 1000
                                    ).batch(8
                                    ).map(rescaling
                                    ).map(split_window
                                    ).prefetch(tf.data.experimental.AUTOTUNE)


i_ts_ds = tf.data.Dataset.from_tensor_slices(i_vl_ten
                                    ).window(4 , shift = 1, stride = 24,drop_remainder = True
                                    ).flat_map(lambda x : x.batch(4)
                                    #).shuffle(buffer_size = 1000
                                    ).batch(8
                                    ).map(rescaling
                                    ).prefetch(tf.data.experimental.AUTOTUNE)

In [12]:
print(f'{i_tr_ds.element_spec} \n',
      f'{i_vl_ds.element_spec} \n',
      f'{i_ts_ds.element_spec}')

(TensorSpec(shape=(None, 4, 448, 304, 1), dtype=tf.float32, name=None), TensorSpec(shape=(None, 1, 448, 304, 1), dtype=tf.float32, name=None)) 
 (TensorSpec(shape=(None, 4, 448, 304, 1), dtype=tf.float32, name=None), TensorSpec(shape=(None, 1, 448, 304, 1), dtype=tf.float32, name=None)) 
 TensorSpec(shape=(None, None, 448, 304, 1), dtype=tf.float32, name=None)


In [13]:
for inp, tag in i_tr_ds.take(1):
    print(f'input shape : {inp.shape}')
    print(f'label shape : {tag.shape}')

input shape : (8, 4, 448, 304, 1)
label shape : (8, 1, 448, 304, 1)


In [14]:
@tf.function
def mae_score(y_true, y_pred):
    return tf.math.reduce_mean(tf.math.abs(y_true - y_pred))


@tf.function
def f1_score(y_true, y_pred, lower_bound = 0.05, upper_bound = 0.5,
            threshold = 0.3, epsilon = 1e-8): 
    
    y_true = tf.where(y_true > upper_bound, 0., y_true)
    y_true = tf.where(y_true < lower_bound, 0., y_true)
    y_pred = tf.where(y_pred > upper_bound, 0., y_pred)
    y_pred = tf.where(y_pred < lower_bound, 0., y_pred)
    
    y_true = tf.where(y_true < 0.15, 0., 1.)
    y_pred = tf.where(y_pred < 0.15, 0., 1.)
    
    
    tp = tf.math.reduce_sum(tf.where(y_true*y_pred == 1., 1., 0.))
    precision = tp /(tf.math.reduce_sum(y_true) + epsilon)
    recall = tp / (tf.math.reduce_sum(y_pred)+epsilon)
    
    return 2*precision*recall / (precision + recall+ precision)

@tf.function
def mae_over_f1(y_true, y_pred, epsilon = 1e-8):
    return tf.math.divide_no_nan(mae_score(y_true, y_pred),
                                f1_score(y_true, y_pred)+ epsilon)



In [38]:
model = FeedBack(24)

In [39]:
model.compile(optimizer ='adam', 
             loss = mae_over_f1, 
             metrics = [mae_score, f1_score])

In [None]:
history = model.fit(i_tr_ds, validation_data = i_vl_ds,
                   epochs = 10, verbose = 2)

Epoch 1/10
