# Saving the data in a TXT file with the correct structure for GAMCR

To properly use the GAMCR package, you should have a folder for each site with the following structure:

- this folder should have name `site`
- in this folder, you should have the `data_{site}.txt` file saved
- GAMCR will save in this folder the different models that you will train for that site
- in this folder, two subfolders will be created and used by GAMCR. 
    * The first subfolder `data` will be created to save the preprocessed data when calling a `save_batch` type method
    * The second subfolder `results` will be created to save some statistics on the results of a trained model when calling the `compute_statistics` method
    
This notebook will create the folder `site` and the txt file `data_{site}.txt` in it. This text file needs to have the following columns:
- `q`: streamflow time series
- `p`: precipitation time series
- `timeyear`: fractional year (e.g. 2022.5 for 2nd July 2022)
- `date`: date of the year (datetime python object)
- `pet`: potential evapotranspiration

In [12]:
import numpy as np
import pandas as pd
import os, sys
sys.path.append('../../')
import GAMCR


all_sites = ['damped','base','flashy']
df = pd.read_csv('../data/simulated_data/hourly_data_withPET/base/data_base.txt', sep=',')
df

Unnamed: 0,timeyear,p,pet,et,R,q,qOF,qSS,qGW,date
0,1981.000114,0.0,0.040533,0.024928,0.031154,0.105961,0.0,0.060852,0.031902,1981-01-01 01:00:00
1,1981.000228,0.0,0.040533,0.024899,0.029737,0.105504,0.0,0.058132,0.031902,1981-01-01 02:00:00
2,1981.000342,0.0,0.040533,0.024871,0.028430,0.104980,0.0,0.055619,0.031901,1981-01-01 03:00:00
3,1981.000457,0.0,0.040533,0.024844,0.027220,0.104398,0.0,0.053291,0.031900,1981-01-01 04:00:00
4,1981.000571,0.0,0.040533,0.024818,0.026098,0.103765,0.0,0.051127,0.031899,1981-01-01 05:00:00
...,...,...,...,...,...,...,...,...,...,...
376891,2023.996804,0.0,0.024443,0.013286,0.000219,0.038099,0.0,0.000447,0.037143,2023-12-30 20:00:00
376892,2023.996918,0.0,0.024443,0.013284,0.000218,0.038087,0.0,0.000444,0.037135,2023-12-30 21:00:00
376893,2023.997032,0.0,0.024443,0.013282,0.000217,0.038076,0.0,0.000441,0.037127,2023-12-30 22:00:00
376894,2023.997146,0.0,0.024443,0.013279,0.000215,0.038064,0.0,0.000439,0.037119,2023-12-30 23:00:00


In [13]:
for site in all_sites:
    df = pd.read_csv('../data/simulated_data/hourly_data_withPET/{0}/data_{0}.txt'.format(site), sep=',')
    df = df.rename(columns={"discharge": "q"})
    
    df["date"] = [GAMCR.fractional_year_to_datetime(el) for el in df['timeyear']]
    df.loc[df['p']<=0.1,'p'] = 0
    df = df.fillna(0)
    
    df.reset_index(inplace=True, drop=True)
    import os
    directory = './{0}/'.format(site)
    if not os.path.exists(directory):
        os.makedirs(directory)
        
    df.to_csv(directory+'data_{0}.txt'.format(site), index=False)
    
    try:
        directory = './{0}/data/'.format(site)
        if not os.path.exists(directory):
            os.makedirs(directory)
        src = '../data/simulated_data/hourly_data_withPET/{0}/transfer.npy'.format(site)
        dst = os.path.join(directory, 'transfer.npy')
        temp = np.load(src)
        np.save(dst, temp)
        src = '../data/simulated_data/hourly_data_withPET/{0}/lst_transfer.npy'.format(site)
        dst = os.path.join(directory, 'lst_transfer.npy')
        temp = np.load(src)
        np.save(dst, temp)
    except:
        pass

In [14]:
df

Unnamed: 0,timeyear,p,pet,et,R,q,date
0,1981.000114,0.0,0.040533,0.029746,0.033681,0.105081,1981-01-01 01:00:00
1,1981.000228,0.0,0.040533,0.029734,0.033001,0.103832,1981-01-01 02:00:00
2,1981.000342,0.0,0.040533,0.029722,0.032344,0.102598,1981-01-01 03:00:00
3,1981.000457,0.0,0.040533,0.029709,0.031711,0.101380,1981-01-01 04:00:00
4,1981.000571,0.0,0.040533,0.029697,0.031099,0.100180,1981-01-01 05:00:00
...,...,...,...,...,...,...,...
376891,2023.996804,0.0,0.024443,0.016794,0.001676,0.017026,2023-12-30 20:00:00
376892,2023.996918,0.0,0.024443,0.016792,0.001670,0.016992,2023-12-30 21:00:00
376893,2023.997032,0.0,0.024443,0.016791,0.001665,0.016959,2023-12-30 22:00:00
376894,2023.997146,0.0,0.024443,0.016790,0.001659,0.016925,2023-12-30 23:00:00
