# CSS (Catalina Sky Survey) Data Access Demo

This notebook demonstrates how to **find and list CSS data files** under the PDS SBN volume `pds.sbn/catalina`.

**What we'll do:**
1. List the files and directories in the top-level directory `pds.sbn/catalina`.
2. Recursively find all files in `pds.sbn/catalina` and print a summary (total number of files).
3. Demonstrate listing files by constructing a directory path from variables: `processing_level`, `instrument`, and `date` (e.g. `pds.sbn/catalina/processing_level/instrument/YYYY/YYMonDD`).
4. Refine the file listing to only include files with the extension `.fz`.

## 1) List files and directories in `pds.sbn/catalina`

List the top-level contents of the Catalina data directory.

In [1]:
from pathlib import Path

DATA_DIR = Path('pds.sbn/catalina')

if not DATA_DIR.is_dir():
    print(f"Directory {DATA_DIR} does not exist.")
else:
    entries = sorted(DATA_DIR.iterdir())
    print(f"Top-level contents of {DATA_DIR}:")
    for entry in entries:
        kind = "dir " if entry.is_dir() else "file"
        print(f"  [{kind}] {entry.name}")

Top-level contents of pds.sbn/catalina:
  [file] .DS_Store
  [file] bundle_gbo.ast.catalina.survey_v1.0.xml
  [dir ] calibration
  [file] current.txt
  [dir ] data_calibrated
  [dir ] data_derived
  [dir ] data_partially_processed
  [dir ] data_raw
  [dir ] document
  [dir ] miscellaneous
  [file] old.txt


## 2) List files using path variables: `processing_level`, `instrument`, and `date`

Construct a directory path of the form `pds.sbn/catalina/processing_level/instrument/YYYY/YYMonDD` using variables, then list the files in that directory.

**Demo:** `processing_level` = calibrated, `instrument` = G96, `date` = April 21, 2020 â†’ `path` includes `2020/20Apr21`.

In [None]:
from datetime import datetime

# Path components (variables)
processing_level = "data_calibrated"
instrument = "G96"
date = datetime(2020, 4, 21)  # April 21, 2020

# Build YYYY and YYMonDD (e.g. 2020, 20Apr21)
year_str = date.strftime("%Y")           # 2020
date_folder = date.strftime("%y%b%d")   # 20Apr21

# Construct full directory path
target_dir = Path('pds.sbn/catalina') / processing_level / instrument / year_str / date_folder
print(f"Constructed path: {target_dir}")

MAX_DISPLAY = 50

if not target_dir.is_dir():
    print(f"Directory does not exist.")
else:
    entries = sorted(target_dir.iterdir())
    total = len(entries)
    print(f"\nContents ({total} items):")
    to_show = entries[:MAX_DISPLAY]
    for entry in to_show:
        kind = "dir " if entry.is_dir() else "file"
        print(f"  [{kind}] {entry.name}")
    if total > MAX_DISPLAY:
        print(f"  ... and {total - MAX_DISPLAY} more")

## 3) Refine file listing to only `.fz` files

Within the same directory constructed in the previous step, list only files with the extension `.fz`.

In [None]:
# Same directory as in the previous step
target_dir = Path('pds.sbn/catalina') / processing_level / instrument / year_str / date_folder
FILE_EXT = '.fz'

if not target_dir.is_dir():
    print(f"Directory {target_dir} does not exist.")
else:
    fits_fz_files = sorted(p for p in target_dir.iterdir() if p.is_file() and p.suffix == FILE_EXT)
    total = len(fits_fz_files)
    print(f"--- Files with extension {FILE_EXT} in {target_dir} ---")
    print(f"  Total number of files: {total}")
    if fits_fz_files:
        print(f"\n  First 5 files:")
        for p in fits_fz_files[:5]:
            print(f"    {p}")