# Pre-Cruise Data Aggregator

**Goal:** Aggregate oceanographic datasets (Biology, Geology, Chemistry, Bathymetry) for a specific geographic region of interest (AOI).
**Output:** A single GeoPackage (`.gpkg`) importable into QGIS/ArcGIS.

**Datasets:**
1. **GBIF** (Biological Occurrences)
2. **OBIS** (eDNA & Occurrences)
3. **IMLGS** (Geological Samples)
4. **GLODAP** (Water Chemistry)
5. **WCSD** (Water Column Sonar Footprints)
6. **NCEI Bathymetry** (Trackline Coverage)

In [None]:
import os
import sys

# Detect if running in Google Colab
try:
    import google.colab
    IN_COLAB = True
    print("Running in Google Colab. Installing dependencies...")
    # Install required packages
    !pip install -q geopandas pygbif pyobis
except ImportError:
    IN_COLAB = False
    print("Running locally.")

import requests
import pandas as pd
import geopandas as gpd
from shapely.geometry import box, Polygon
from pygbif import occurrences as gbif_occ
from pyobis import occurrences as obis_occ
import json
import io
import zipfile

# Ensure data directory exists
if IN_COLAB:
    # In Colab, we save to the current working directory content
    DATA_DIR = "."
else:
    # Locally, we save to a data folder
    DATA_DIR = "../data"
    os.makedirs(DATA_DIR, exist_ok=True)

## Define Area of Interest (AOI)
Enter the bounding box coordinates for your cruise region.

In [None]:
# EXAMPLE: BLAKE PLATEAU / SOUTHEAST US
MIN_LAT = 28.0
MAX_LAT = 32.0
MIN_LON = -80.0
MAX_LON = -76.0

# Create a Shapely Polygon for the AOI
aoi_polygon = box(MIN_LON, MIN_LAT, MAX_LON, MAX_LAT)
aoi_wkt = aoi_polygon.wkt

print(f"Area of Interest defined: {aoi_wkt}")

# Output Filename
OUTPUT_FILENAME = os.path.join(DATA_DIR, "PreCruise_Data_Package.gpkg")

## Get Bathymetry
Get available high-resolution (masked and unmasked) bathymetry from GMRT synthesis