#  Part 1 - Cleaning Up Landsat Folders

We want to reorganize the chaotic landsat folders into something more sane.  For this notebook, we check a staging area and move data into a collection folder. 

The issue is that a single Collection ID may have 4+ different product types.  There is the raw input files, Surface Temperature, Surface Reflectance, TOA products, and Quality. 

In [1]:
## Step 1 - Import Required Libraries

In [2]:
import configparser, datetime, glob, logging, os, sys, time

import pandas as pd

from tqdm import notebook as tqdm

In [3]:
#  DUG API
sys.path.insert(0,'..')
from dug_api.CollectID import CollectID, FileType
from dug_api.config import Configuration

Configure the logger.  

In [4]:
logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger('Notebook')
logger.info( f'This Notebook run at {datetime.datetime.now().strftime("%Y:%m:%d %H:%M:%S")}' )

INFO:Notebook:This Notebook run at 2024:01:05 15:01:34


## Step 2 - Loading Configuration Data

In [5]:
config = Configuration( '../data/options.cfg' )
logger.info( f'Is Configuration Valid:  {config.is_valid()}' )

DEBUG:root:No Landsat Collection List at /Volumes/data/imagery/Landsat/collections.xlsx
INFO:Notebook:Is Configuration Valid:  True


In [6]:
staging_path = config.config['general']['image_staging_path']
collect_path = config.config['general']['image_collection_path']

## Step 3 - Look at Staging Folders

In [7]:
invalid_folders = []
valid_folders = []

paths = glob.glob( f'{staging_path}/*' )

for path in paths:
    
    #  Check if directory has a Landsat CID
    c = CollectID.from_pathname( path )

    #  Capture invalid folders for double-checking
    if c is None:
        invalid_folders.append( path )

    else:
        # Create a folder in the destination
        valid_folders.append( { 'src': path, 'dest': c.to_cid_folder() } )

All valid CID folders need to be created, with the collection category stripped out, to the destination folder.

In [8]:
already_exists = []
didnt_exist = []

for entry in valid_folders:

    pname = os.path.join( collect_path, entry['dest'] )
    
    if os.path.exists( pname ):
        already_exists.append( pname )
    else:
        didnt_exist.append( pname )
        os.makedirs( pname )

Here is what we are going to copy over

In [9]:
#for f in didnt_exist:
#    print(f)

Here is what already existed

In [10]:
#for f in already_exists:
#    print(f)

Now we copy everything

In [11]:
pbar_dirs = tqdm.tqdm( total = len(valid_folders) )

for entry in valid_folders:

    valid_path = f'{entry["src"]}/*'
    paths = glob.glob( valid_path )

    cid = os.path.basename( entry['dest'] )

    #  For each object, check if it's destination path already exists
    for p in paths:

        dst_path = os.path.join( collect_path, cid, os.path.basename( p ) )

        if os.path.exists( dst_path ):
            logging.error( f'destination path exists ({dst_path}) for input ({p}).' )
        else:
            cmd = f'mv {p} {dst_path}'
            logging.debug( cmd )
            os.system( cmd )

    pbar_dirs.update(1)
    

0it [00:00, ?it/s]