# Rsync GCS glider regions files

The code for processiong shadowgraph images (Cutter's code, adapted from Ohman et al methods, in us-amlr/amlr-shadowgraph) wrote out a regions folder under a directory folder. The purpose of this notebook is to copy these files to the new proc imagery bucket, with a new folder strucutre, to be imported into VIAME-Web-AMLR.

Rsync files in GCS from bucket to bucket. Specifically, rsync the regions directories to the new amlr-gliders-imagery-proc-dev bucket

NOTE: the output directories are copied using a different strategy, over in 'file_copy_...'

In [None]:
from filemgmt.utils import rsync_imagery_proc_regions, create_pre
from filemgmt.gcs import list_blobs_with_prefix
from google.cloud import storage

## Set variable names
User tasks: update variable names as necessary in this block.

In [None]:
bucket_source_name      = "amlr-imagery-proc-dev"
bucket_destination_name = "amlr-gliders-imagery-proc-dev"

glider_deployment = "amlr08-20220513"
pre_source, pre_destination = create_pre(glider_deployment)

file_prefix    = f"{pre_source}/{glider_deployment}/shadowgraph/images/Dir000"
file_substr    = "/regions/"

## Generate source list

Generate the list of files to rename. 

NOTE: It likely would be more efficient and just as 'robust' to just generate the source strings using a for loop.

In [None]:
file_list_orig = list_blobs_with_prefix(
    bucket_source_name, file_prefix, file_substr=file_substr)    

In [None]:
print(f"there are {len(file_list_orig)} files with substring '{file_substr}' " +
      f"with the prefix '{bucket_source_name}/{file_prefix}'")
for i in file_list_orig[0:5]:
    print(i)

Create a new file list by filtering for the Directory paths only. Then create a list of the rsync source strings

In [None]:
rsync_list_source = [f"gs://{bucket_source_name}/{i[:-1]}" for i in file_list_orig if len(i) <= 70]
print(f"There are {len(rsync_list_source)} directory paths")
print("The rsync source strings are as follows:")
for i in rsync_list_source:
    print(i)

## Run rsync commands

In [None]:
for src in rsync_list_source:
    print("-------------------------------------------------")
    print(src)
    rsync_out = rsync_imagery_proc_regions(
        src, 
        f"{bucket_source_name}/{pre_source}",
        f"{bucket_destination_name}/{pre_destination}", 
        text_only=True
    )
    print(rsync_out)