# Preparing to Interpolate through Model Outputs on s3
This notebook guides you on how to prepare the model outputs for interpolation when located in s3 buckets. This process must be performed *before* moving the data to s3 buckets because Kamodo does not support writing files in s3 buckets. The example below is for a sample model output from the OpenGGCM model (GM outputs). 

Enter the information for the file locations and datasets into the first block below. Select the 'PyHCs3' environment at the top right, then run the entire notebook. __THE FIRST BLOCK IS THE ONLY BLOCK THAT SHOULD BE CHANGED!__ Finally, check that the number of files copied to the s3 bucket matches the expected number of files (see output of block 4 below). 

## Data preparation on efs
The second block prepares the data files for interpolation with Kamodo-ccmc. This can take a while depending on the model output.

In [1]:
# User- and dataset-specific information
efs_dir = '/home/jovyan/efs/raringuette/'  # current location of data
s3_dir = 's3://helio-dh-data/raringuette/'  # desired location of data
model = 'OpenGGCM_GM'  # model string found with MW.Choose_Model()
# sample datasets are 'OpenGGCM_GM' and 'SWMF_GM'
run_name = '/Yihua_Zheng_040122_1/'
# sample datasets are '/Yihua_Zheng_040122_1/' and '/James_Webster_051716_1/'

In [2]:
import kamodo_ccmc.flythrough.model_wrapper as MW

model_dir = model + run_name #'OpenGGCM_GM/Yihua_Zheng_040122_1/'
# This command performs all file conversions and other preparations needed. This sometimes takes a while.
MW.File_Times(model, efs_dir+model_dir)

This unreleased version of SpacePy is not supported by the SpacePy team.
UTC time ranges
------------------------------------------
Start Date: 2015-10-16  Time: 11:30:00
End Date: 2015-10-16  Time: 16:59:00


(datetime.datetime(2015, 10, 16, 11, 30, tzinfo=datetime.timezone.utc),
 datetime.datetime(2015, 10, 16, 16, 59, 0, 916, tzinfo=datetime.timezone.utc))

## Model_times.txt File Adjustment
The model_list.txt file generated by the above statement has the local or efs file directory as the location for each file. This file path needs to be replaced with the s3 bucket file path. The block below performs this substitution.

In [6]:
import os

file_in = open(efs_dir+model_dir+model+'_list.txt')
file_out = open(efs_dir+model_dir+model+'_lists3.txt', 'w')
data_in = file_in.readlines()
file_out.write(data_in[0][:-2])  # copy title line to new file
for line in data_in[1:]:
    items = line.split()
    new_file = s3_dir + model_dir + os.path.basename(items[0])
    items_out = ''.join([f+' ' for f in items[1:]])[:-1]
    file_out.write('\n'+new_file+' '+items_out)
file_in.close()
file_out.close()

## Moves Files to s3 Bucket.
The blocks below copy the files to the desired s3 bucket. The model_lists3.txt file is moved and renamed to model_list.txt file first. Then, the remaining files are copied to the s3 bucket.

In [3]:
import os
from glob import glob

time_file = efs_dir+model_dir+model+'_times.txt'
s3list_file = efs_dir+model_dir+model+'_lists3.txt'
list_file = efs_dir+model_dir+model+'_list.txt'
data_files = [f for f in sorted(glob(efs_dir+model_dir+'*')) if f not in [time_file, s3list_file, list_file]]
print(f'Copying {len(data_files)+2} files to the s3 bucket. This may take a while....')
print(os.popen('aws s3 mv '+s3list_file+' '+s3_dir+model_dir+model+'_list.txt').read())  # move and rename in the process
print(os.popen('aws s3 cp '+time_file+' '+s3_dir+model_dir).read())  # copy over as is
for i, f in enumerate(data_files):  # copy over the data files. This may take a while.
    print(os.popen('aws s3 cp '+f+' '+s3_dir+model_dir+' --quiet').read())
    print(f'\rFile {i+1} of {len(data_files)} copied to {s3_dir+model_dir+os.path.basename(f)}.', end="")
print('\n')
print(os.popen('aws s3 ls '+s3_dir+model_dir+' --summarize').read())
print('Done. Please compare the number of files in the bucket printed at the bottom with the number printed at the top.')

Copying 331 files to the s3 bucket. This may take a while....
Completed 55.6 KiB/55.6 KiB (437.1 KiB/s) with 1 file(s) remaining
move: OpenGGCM_GM/Yihua_Zheng_040122_1/OpenGGCM_GM_lists3.txt to s3://helio-dh-data/raringuette/OpenGGCM_GM/Yihua_Zheng_040122_1/OpenGGCM_GM_list.txt

Completed 3.0 KiB/3.0 KiB (31.7 KiB/s) with 1 file(s) remaining
upload: OpenGGCM_GM/Yihua_Zheng_040122_1/OpenGGCM_GM_times.txt to s3://helio-dh-data/raringuette/OpenGGCM_GM/Yihua_Zheng_040122_1/OpenGGCM_GM_times.txt


File 1 of 329 copied to s3://helio-dh-data/raringuette/OpenGGCM_GM/Yihua_Zheng_040122_1/Yihua_Zheng_040122_1.3df_2015-10-16_11_30.nc.
File 2 of 329 copied to s3://helio-dh-data/raringuette/OpenGGCM_GM/Yihua_Zheng_040122_1/Yihua_Zheng_040122_1.3df_2015-10-16_11_31.nc.
File 3 of 329 copied to s3://helio-dh-data/raringuette/OpenGGCM_GM/Yihua_Zheng_040122_1/Yihua_Zheng_040122_1.3df_2015-10-16_11_32.nc.
File 4 of 329 copied to s3://helio-dh-data/raringuette/OpenGGCM_GM/Yihua_Zheng_040122_1/Yihua_Zheng_

In [4]:
# Checking that the list file was moved correctly.
# You should see a list of files with dates and times printed below.
import s3fs
s3 = s3fs.S3FileSystem(anon=False)
file_obj = s3.open(s3_dir+model_dir+model+'_list.txt')
data = file_obj.readlines()
file_obj.close()
data[:10]

[b'OpenGGCM_GM file list start and end dates and times\n',
 b'\n',
 b's3://helio-dh-data/raringuette/OpenGGCM_GM/Yihua_Zheng_040122_1/Yihua_Zheng_040122_1.3df_2015-10-16_11_30.nc Date: 2015-10-16 Time: 11:30:00 Date: 2015-10-16 Time: 11:30:00\n',
 b's3://helio-dh-data/raringuette/OpenGGCM_GM/Yihua_Zheng_040122_1/Yihua_Zheng_040122_1.3df_2015-10-16_11_31.nc Date: 2015-10-16 Time: 11:31:00 Date: 2015-10-16 Time: 11:31:00\n',
 b's3://helio-dh-data/raringuette/OpenGGCM_GM/Yihua_Zheng_040122_1/Yihua_Zheng_040122_1.3df_2015-10-16_11_32.nc Date: 2015-10-16 Time: 11:32:00 Date: 2015-10-16 Time: 11:32:00\n',
 b's3://helio-dh-data/raringuette/OpenGGCM_GM/Yihua_Zheng_040122_1/Yihua_Zheng_040122_1.3df_2015-10-16_11_33.nc Date: 2015-10-16 Time: 11:33:00 Date: 2015-10-16 Time: 11:33:00\n',
 b's3://helio-dh-data/raringuette/OpenGGCM_GM/Yihua_Zheng_040122_1/Yihua_Zheng_040122_1.3df_2015-10-16_11_34.nc Date: 2015-10-16 Time: 11:34:00 Date: 2015-10-16 Time: 11:34:00\n',
 b's3://helio-dh-data/raringuette

In [5]:
# To remove files copied to the wrong directory use
# print(os.popen('aws s3 rm '+s3_dir+model_dir+' --recursive').read())