<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Imports-and-setup-functions" data-toc-modified-id="Imports-and-setup-functions-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Imports and setup functions</a></span></li><li><span><a href="#Get-data-from-server" data-toc-modified-id="Get-data-from-server-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Get data from server</a></span></li><li><span><a href="#Process-files-through-command-line" data-toc-modified-id="Process-files-through-command-line-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Process files through command line</a></span></li><li><span><a href="#Load-processed-data" data-toc-modified-id="Load-processed-data-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Load processed data</a></span><ul class="toc-item"><li><span><a href="#Methods" data-toc-modified-id="Methods-4.1"><span class="toc-item-num">4.1&nbsp;&nbsp;</span>Methods</a></span></li></ul></li><li><span><a href="#Construct-time-series" data-toc-modified-id="Construct-time-series-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Construct time series</a></span><ul class="toc-item"><li><span><a href="#Load-all-data" data-toc-modified-id="Load-all-data-5.1"><span class="toc-item-num">5.1&nbsp;&nbsp;</span>Load all data</a></span></li><li><span><a href="#Transform" data-toc-modified-id="Transform-5.2"><span class="toc-item-num">5.2&nbsp;&nbsp;</span>Transform</a></span></li><li><span><a href="#Save" data-toc-modified-id="Save-5.3"><span class="toc-item-num">5.3&nbsp;&nbsp;</span>Save</a></span></li><li><span><a href="#Sample-a-point" data-toc-modified-id="Sample-a-point-5.4"><span class="toc-item-num">5.4&nbsp;&nbsp;</span>Sample a point</a></span></li><li><span><a href="#Slice-images-to-size" data-toc-modified-id="Slice-images-to-size-5.5"><span class="toc-item-num">5.5&nbsp;&nbsp;</span>Slice images to size</a></span></li></ul></li><li><span><a href="#Exploratory-Analysis" data-toc-modified-id="Exploratory-Analysis-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Exploratory Analysis</a></span><ul class="toc-item"><li><span><a href="#Time-series" data-toc-modified-id="Time-series-6.1"><span class="toc-item-num">6.1&nbsp;&nbsp;</span>Time series</a></span></li></ul></li></ul></div>

# Imports and setup functions

In [2]:
import os
from matplotlib import pyplot as plt
import numpy as np
import netCDF4 as nc
import ftplib
import math
import pandas as pd
import xarray as xr
import random

In [3]:
def dataDir(x):
    cwd = os.getcwd()
    
    return cwd + "/data/" + x

# Get data from server

In [3]:
## Create connection to FTP

def access_to_server(USERNAME,PASSWORD, PRODUCT_ID, DATASET_ID):
    # Access to ftp server
    HOSTNAME = ['nrt.cmems-du.eu', 'my.cmems-du.eu']
    try: 
        # NRT server 
        ftp = ftplib.FTP(HOSTNAME[0], USERNAME, PASSWORD)
        ftp.encoding = "utf-8"
        # Move to dataset directory
        ftp.cwd(f'Core/{PRODUCT_ID}/{DATASET_ID}')
    except:
        # MY server
        ftp = ftplib.FTP(HOSTNAME[1], USERNAME, PASSWORD)
        ftp.encoding = "utf-8"
        # Move to dataset directory
        ftp.cwd(f'Core/{PRODUCT_ID}/{DATASET_ID}')
    return ftp 


## Browse and download

def download_ftp_tree(ftp,OUTDIR):
    # Create directory if doesn't exist
    if not os.path.exists(OUTDIR):
        os.makedirs(OUTDIR)
    # Save the initial directory
    original_cwd = ftp.pwd()
    # Show the content of the FTP_address directory
    ftp_content = ftp.nlst()
    for fc in ftp_content:
        try: 
        # check if fc is a directory and create a local subfolder with the same name
            ftp.cwd(f'{fc}')
            print(f'{fc}')
            if not os.path.exists(f'{OUTDIR}/{fc}'):
                os.makedirs(f'{OUTDIR}/{fc}')
                print(f'{OUTDIR}/{fc} is now created')
            download_ftp_tree(ftp,f'{OUTDIR}/{fc}')
            ftp.cwd(original_cwd)
        except: 
        # fc is not a directory but a file, so it's downloaded in its local subfolder
            local_filename = os.path.join(f'{OUTDIR}', fc)
            file = open(local_filename, 'wb')
            ftp.retrbinary('RETR '+ fc, file.write)
            file.close()

In [4]:
HOST = "my.cmems-du.eu"

USERNAME = "lvilallonga"
PASSWORD = "dumbPW0123"

PRODUCT_ID = "IBI_ANALYSISFORECAST_PHY_005_001" 
DATASET_ID = "cmems_mod_ibi_phy_anfc_0.027deg-3D_P1D-m" 
OUT_DIR = dataDir("IBI_ANALYSISFORECAST_PHY_005_001")

In [5]:
FTP = access_to_server(USERNAME, PASSWORD, PRODUCT_ID, DATASET_ID)
    
# Download the content of the dataset
download_ftp_tree(FTP, OUT_DIR)
print('Download complete!')
    
# Closure 
FTP.close()

2020
/home/lucia/projects/FORMES/rainfall-pde-ml/data/IBI_ANALYSISFORECAST_PHY_005_001/2020 is now created
11
/home/lucia/projects/FORMES/rainfall-pde-ml/data/IBI_ANALYSISFORECAST_PHY_005_001/2020/11 is now created
12
/home/lucia/projects/FORMES/rainfall-pde-ml/data/IBI_ANALYSISFORECAST_PHY_005_001/2020/12 is now created
2021
/home/lucia/projects/FORMES/rainfall-pde-ml/data/IBI_ANALYSISFORECAST_PHY_005_001/2021 is now created
01
/home/lucia/projects/FORMES/rainfall-pde-ml/data/IBI_ANALYSISFORECAST_PHY_005_001/2021/01 is now created
02
/home/lucia/projects/FORMES/rainfall-pde-ml/data/IBI_ANALYSISFORECAST_PHY_005_001/2021/02 is now created
03
/home/lucia/projects/FORMES/rainfall-pde-ml/data/IBI_ANALYSISFORECAST_PHY_005_001/2021/03 is now created
04
/home/lucia/projects/FORMES/rainfall-pde-ml/data/IBI_ANALYSISFORECAST_PHY_005_001/2021/04 is now created
05
/home/lucia/projects/FORMES/rainfall-pde-ml/data/IBI_ANALYSISFORECAST_PHY_005_001/2021/05 is now created
06
/home/lucia/projects/FORMES

# Process files through command line

Using the program [cdo](https://code.mpimet.mpg.de/projects/cdo/wiki/Cdo%7Brbpy%7D), or Climate Data Operators software, by the Max-Planck Institute for Meteorology, greatly speeds up processing steps. The software can be run from the command line and can chain multiple commands together (eg. select variable -> select lat/lon area). See the script file processCDO.py for more details on how this was done for our case.  

# Load processed data

## Methods

In [3]:
def create_dataset(year, month_path):
    
    dataset = {}
    
    d_paths = [month_path + "/" + d for d in os.listdir(month_path)]
    d_paths.sort()
    
    print("Creating dataset from month: " + month_path[-2:] + "...")
    
    for d in d_paths:
        
        # Create ID
        d_id = "sst_" + str(year) + "-" + month_path[-2:] + "-" + d[-20:-18]
        
        # Read data into Dataset
        #print("...Day: " + d[-20:-18] + "...")
        
        d_nc = nc.Dataset(d)
       
        # Save to dictionary with datasets
        dataset[d_id] = d_nc
        
    return dataset

In [8]:
def slice_img(img, end_shape):
    """
       Parameters
       ----------
       img : numpy.ndarray or numpy.ma.core.MaskedArray
       end_shape : tuple
    """
    
    slice_dataset = {}
    M, N = end_shape[0], end_shape[1]
    m, n = img.shape[0], img.shape[1]
    
    assert (m > M) & (n > N), "Image too small for desired slice."
        
    
    n_regions = math.floor(m / M) * math.floor(n / N)
    
    starti = 0
    endi = M
    startj = 0
    endj = N
    
    print("Slicing image into " + str(n_regions) + " regions...")
    
    for i in range(math.floor(m / M)):
        for j in range(math.floor(n / N)):
            
            #print("Region: " + str(i) + str(j))
            slice_key = "region_" + str(i) + str(j)
                        
            slice_dataset[slice_key] = img[starti:endi, startj:endj]
            
            startj = endj
            endj += N
        
        starti = endi
        endi += M
            
    return slice_dataset

# Exploratory Analysis

In [None]:
# TODO next (6/30): 

#  1. Modify cdo script to output daily files
#  2. Modify cdo script to accept command-line arguments:
#      + lat/lon, variables, output type (time series vs daily files), paths
#      + print options so user can see what to ask for
#  3. Test 1 & 2 on existing SST data, then
#  4. Adapt for ERA-5 data for Ghana
#  5. Delete time series code & files
#  6. Create image showing regions used

https://www.earthinversion.com/utilities/reading-NetCDF4-data-in-python/

https://towardsdatascience.com/fast-and-robust-sliding-window-vectorization-with-numpy-3ad950ed62f5

https://github.com/emited/flow/blob/master/flow/modules/estimators.py#L64

https://github.com/emited/flow/blob/master/flow/datasets/nc.py

Data sources:
1. https://data.marine.copernicus.eu/product/IBI_ANALYSISFORECAST_PHY_005_001/description
2. https://help.marine.copernicus.eu/en/articles/6444313-how-to-fetch-marine-data-from-copernicus-marine-ftp-server-in-python