# Make new zipped input data file with modified surface data and COSMOREA

Started 2023-02-28
Script by Eva Lieungh, Hui Tang, Elin C.R. Aas, ChatGPT

This script takes the modified surface data (created with dataprep_surfacedata notebook) and combines it with COSMOREA data in a new zipped folder that can serve as input to CTSM or the LSP with some local modifications. The cosmorea_readme.md file explains the local changes. 

As explained in [Hui's readme file](https://github.com/huitang-earth/scripts_ctsm_region/tree/main/atm_forcing/cosmo_rea_6km), COSMA data are available from https://opendata.dwd.de/climate_environment/REA/COSMO_REA6/ and single-site forcing subset from the global data set is already prepared by Hui on Fram and Saga. Elin downloaded a local copy of the VCG site forcing data, which is used here. 

In [20]:
# import libraries
import xarray as xr  # NetCDF data handling
import netCDF4 
import matplotlib.pyplot as plt  # Plotting
import time  # Keeping track of runtime
import json  # For reading data dictionaries stored in json format
import pandas as pd  # Tabular data analysis
import datetime as dt  # For workaround with long simulations (beyond year 2262)
import statistics as stats # For mean and other calculations
from pathlib import Path  # For easy path handling
import zipfile # for unzipping
import shutil # easiest whole-directory zipping
import glob # for wildcard * searching in file names
import os

In [27]:
# set path to modified input data made with dataprep_surfacedata.ipynb
modified_surfdat_path = Path(f"C:/Users/evaler/OneDrive - Universitetet i Oslo/Eva/PHD/FATES_INCLINE/data")

# set path to cosmorea files downloaded from Elin
cosmorea_path = str(Path(f"C:/Users/evaler/OneDrive - Universitetet i Oslo/Eva/PHD/FATES_INCLINE/data/VCG/COSMOREA_VCG"))

# set path for where to store finished cosmorea + modified surface data
new_inputdata_path = str(Path(f"C:/Users/evaler/OneDrive - Universitetet i Oslo/Eva/PHD/FATES_INCLINE/data"))

In [8]:
# define LSP site identities and corresponding names 
siteID = ["ALP1","ALP2","ALP3","ALP4","SUB1","SUB2","SUB3","SUB4","BOR1","BOR2","BOR3","BOR4"]
siteID2 = ["Ulvehaugen","Lavisdalen","Gudmedalen","Skjellingahaugen",
           "Alrust","Hogsete","Rambera","Veskre",
           "Fauske","Vikesland","Arhelleren","Ovstedalen"]

### Extract surfacedata file from the zipped folder

In [26]:
for i in range(0,12):
    print("-------------------------------------")
    print("site: ", siteID[i], siteID2[i])

    # Specify the name of the zipped folder, where to extract output to, and what filename pattern to look for 
    zipped_folder_name = str(modified_surfdat_path / f"{siteID[i]}.zip")
    extracted_surfacedata_path =  modified_surfdat_path / "surfacedata" / siteID[i]
    wildcard_filename = "surfdata*.nc"

    # Open the zip file
    with zipfile.ZipFile(zipped_folder_name, "r") as zip_file:

        # get a list of all the file names in the zip
        file_list = zip_file.namelist()

        # find the first file that matches the wildcard
        matched_file = next((f for f in file_list if glob.fnmatch.fnmatch(f, wildcard_filename)), None)

        # if a matching file was found, extract it to the output folder
        if matched_file:
            zip_file.extract(matched_file, extracted_surfacedata_path)
            print(f"{matched_file} extracted to {extracted_surfacedata_path}")
        else:
            print(f"No matching file found in {zipped_folder_name}")



-------------------------------------
site:  ALP1 Ulvehaugen
surfdata_0.9x1.25_hist_16pfts_Irrig_CMIP6_simyr2000_ALP1_c221026.nc extracted to C:\Users\evaler\OneDrive - Universitetet i Oslo\Eva\PHD\FATES_INCLINE\data\surfacedata\ALP1
-------------------------------------
site:  ALP2 Lavisdalen
surfdata_0.9x1.25_hist_16pfts_Irrig_CMIP6_simyr2000_ALP2_c221026.nc extracted to C:\Users\evaler\OneDrive - Universitetet i Oslo\Eva\PHD\FATES_INCLINE\data\surfacedata\ALP2
-------------------------------------
site:  ALP3 Gudmedalen
surfdata_0.9x1.25_hist_16pfts_Irrig_CMIP6_simyr2000_ALP3_c221027.nc extracted to C:\Users\evaler\OneDrive - Universitetet i Oslo\Eva\PHD\FATES_INCLINE\data\surfacedata\ALP3
-------------------------------------
site:  ALP4 Skjellingahaugen
surfdata_0.9x1.25_hist_16pfts_Irrig_CMIP6_simyr2000_ALP4_c221027.nc extracted to C:\Users\evaler\OneDrive - Universitetet i Oslo\Eva\PHD\FATES_INCLINE\data\surfacedata\ALP4
-------------------------------------
site:  SUB1 Alrust
s

***NB! Right now this code might extract the wrong surface data file for ALP1 since that site has two... Check and fix the code!***

### Combine COSMOREA forcing and surfacedata files in new zipped archive

In [28]:
print("surfacedata is here: ", modified_surfdat_path / "surfacedata")
print("COSMOREA data is here: ", cosmorea_path)

surfacedata is here:  C:\Users\evaler\OneDrive - Universitetet i Oslo\Eva\PHD\FATES_INCLINE\data\surfacedata
COSMOREA data is here:  C:\Users\evaler\OneDrive - Universitetet i Oslo\Eva\PHD\FATES_INCLINE\data\VCG\COSMOREA_VCG


The COSMOREA folders have a different file structure than the LSP expects. In the LSP input files, e.g. ALP2.zip contains folders `datmdata` with GSWP3 forcing .nc files, `user_mods` folder with files "shell_commands", "user_nl_clm", and "user_nl_datm_streams", as well as a domain and surface data file directly in the root folder. 

Make these folders and move files to get the same structure:

In [3]:
os.getcwd()

'c:\\Users\\evaler\\OneDrive - Universitetet i Oslo\\Eva\\PHD\\FATES_INCLINE\\src'

In [11]:
os.path.exists(cosmorea_path + "/" + "ALP1/datmdata")

False

In [16]:
for i in range(0,12):
    print("-----", siteID[i], "-----")
    # create datmdata and user_mods directories 
    if not os.path.exists(cosmorea_path + "/" + siteID[i] + "/datmdata"):
        os.mkdir(cosmorea_path + "/" + siteID[i] + "/datmdata")
    if not os.path.exists(cosmorea_path + "/" + siteID[i] + "/user_mods"):
        os.mkdir(cosmorea_path + "/" + siteID[i] + "/user_mods")
        print("created datmdata and user_mods")

    # move files
    for filename in os.listdir(cosmorea_path + "/" + siteID[i]):
        if filename.startswith("clm1pt_"):
            source_path = os.path.join(cosmorea_path + "/" + siteID[i] + "/", filename)
            destination_path = os.path.join(cosmorea_path + "/" + siteID[i] + "/datmdata", filename)
            shutil.move(source_path, destination_path)
    print("forcing data moved to datmdata")


----- ALP1 -----
forcing data moved to datmdata
----- ALP2 -----
forcing data moved to datmdata
----- ALP3 -----
forcing data moved to datmdata
----- ALP4 -----
forcing data moved to datmdata
----- SUB1 -----
forcing data moved to datmdata
----- SUB2 -----
forcing data moved to datmdata
----- SUB3 -----
forcing data moved to datmdata
----- SUB4 -----
forcing data moved to datmdata
----- BOR1 -----
forcing data moved to datmdata
----- BOR2 -----
forcing data moved to datmdata
----- BOR3 -----
forcing data moved to datmdata
----- BOR4 -----
forcing data moved to datmdata


The `user_mods` dir is still empty. Copy over the filed from the default forcing

In [22]:
GSWP3_default_path = modified_surfdat_path

for i in range(0,12):
    print("-----", siteID[i], "-----")
    with zipfile.ZipFile(GSWP3_default_path / f"{siteID[i]}.zip", "r") as zip_ref:
        for file in zip_ref.namelist():
            if file.startswith("user_mods/"):
                zip_ref.extract(file, cosmorea_path + "/" + siteID[i])
    print("user_mods files successfully moved")

----- ALP1 -----
----- ALP2 -----
----- ALP3 -----
----- ALP4 -----
----- SUB1 -----
----- SUB2 -----
----- SUB3 -----
----- SUB4 -----
----- BOR1 -----
----- BOR2 -----
----- BOR3 -----
----- BOR4 -----


Open a few of the folders and compare with the default GSWP3 data to check that it worked and there are no duplicates.

... manually:

1. check that there are no surface files in the COSMO folders already
2. copy/move modified surface data into the COSMOREA folders
3. then make zip archives:

In [26]:
print("path where new files are stored: ", new_inputdata_path)
print("path to files that will be zipped: ", cosmorea_path)
print("-----------------------------------")
for i in range(0,1):
    print("making zipped folder: ", siteID[i] + "_cosmorea.zip")
    folder_to_zip = str(cosmorea_path + "/" + siteID[i])
    save_zipped_data_here = str(new_inputdata_path + "/" + siteID[i] + "_cosmorea")
    shutil.make_archive(save_zipped_data_here,
                        'zip', 
                        folder_to_zip)
    print("done")

path where new files are stored:  C:\Users\evaler\OneDrive - Universitetet i Oslo\Eva\PHD\FATES_INCLINE\data_processed
path to files that will be zipped:  C:\Users\evaler\OneDrive - Universitetet i Oslo\Eva\PHD\FATES_INCLINE\data\VCG\COSMOREA_VCG
-----------------------------------
making zipped folder:  ALP1_cosmorea.zip
done


In [29]:
print("path where new files are stored: ", new_inputdata_path)
print("path to files that will be zipped: ", cosmorea_path)
print("-----------------------------------")
for i in range(0,12):
    print("making zipped folder: ", siteID[i] + "_cosmorea.zip")
    folder_to_zip = str(cosmorea_path + "/" + siteID[i])
    save_zipped_data_here = str(new_inputdata_path + "/" + siteID[i] + "_cosmorea")
    shutil.make_archive(save_zipped_data_here,
                        'zip', 
                        folder_to_zip)
    print("done")

path where new files are stored:  C:\Users\evaler\OneDrive - Universitetet i Oslo\Eva\PHD\FATES_INCLINE\data
path to files that will be zipped:  C:\Users\evaler\OneDrive - Universitetet i Oslo\Eva\PHD\FATES_INCLINE\data\VCG\COSMOREA_VCG
-----------------------------------
making zipped folder:  ALP1_cosmorea.zip
done
making zipped folder:  ALP2_cosmorea.zip
done
making zipped folder:  ALP3_cosmorea.zip
done
making zipped folder:  ALP4_cosmorea.zip
done
making zipped folder:  SUB1_cosmorea.zip
done
making zipped folder:  SUB2_cosmorea.zip
done
making zipped folder:  SUB3_cosmorea.zip
done
making zipped folder:  SUB4_cosmorea.zip
done
making zipped folder:  BOR1_cosmorea.zip
done
making zipped folder:  BOR2_cosmorea.zip
done
making zipped folder:  BOR3_cosmorea.zip
done
making zipped folder:  BOR4_cosmorea.zip
done


Then push the changes back to github or some other storage accessible by URL.