## Introduction to Python for Physical Oceanography

There are a lot of useful python tutorials out there, but few that really focus on the tools and types of workflows that are commonly employed when doing data analysis in physical oceanography. Here we'll go over some of the basic tools and how you'd might use them to load data, make figures, and do scientific analysis. More resources for learning python in general can be found [here] and [here]

### Scientific Computing

In [1]:
import numpy as np

The haversine formula calculates the great-circle distance between two points on a sphere:

$$
a = \sin^2\left(\frac{\Delta\phi}{2}\right) + \cos(\phi_1)\cos(\phi_2)\sin^2\left(\frac{\Delta\lambda}{2}\right)
$$

$$
c = 2 \arcsin\left( \sqrt{a} \right)
$$

$$
d = R \cdot c
$$

where:

- \( $\phi_1, \phi_2$ \) are the latitudes in radians  
- $\Delta\phi = \phi_2 - \phi_1$ is the difference in latitude  
- $\Delta\lambda = \lambda_2 - \lambda_1$ is the difference in longitude  
- $R$ is Earth’s radius (≈ 6367 km)


In [2]:
def haversine_np(coord1, coord2):
    """
    Calculate the great-circle distance between two points 
    on the earth (specified in decimal degrees), given as (lat, lon) tuples.

    Args:
        coord1: tuple (lat1, lon1)
        coord2: tuple (lat2, lon2)

    Returns:
        Distance in kilometers.
    """
    lat1, lon1 = np.radians(coord1)
    lat2, lon2 = np.radians(coord2)

    dlon = lon2 - lon1
    dlat = lat2 - lat1

    a = np.sin(dlat / 2.0) ** 2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon / 2.0) ** 2
    c = 2 * np.arcsin(np.sqrt(a))
    
    km = 6367 * c  # Earth's radius in km
    return km

In [3]:
# live code this
# Coordinates: (lat, lon)
accra = (5.6037, -0.1870)          # Accra, Ghana
woods_hole = (41.5265, -70.6737)   # Woods Hole, MA, USA

distance_km = haversine_np(accra, woods_hole)
print(f"Great-circle distance: {distance_km:.2f} km")

Great-circle distance: 7970.24 km


### Loading Data

In physical oceanography, we're often working with large datasets that are *geo-referenced* - arrays of physical variables like temperature and salinity in which each measurement has a latitude and longitude coordinate. Data structured this way is often packaged in "NetCDF" format (file extension ".nc") and can be read by several python packages. One of the most commonly used is **xarray** [link to xarray github and include some of their info about it]

In [6]:
import xarray as xr
import fsspec
import json

In [12]:
with open("./data/data_manifest_January.json", "r") as f:
    manifest = json.load(f)

In [None]:
# choose data type and optional density contour flag
data_type = 'ctd'
density_contour = True

# coad main datasets
datasets = []
for url in manifest.get(data_type, []):
    with fsspec.open(url, mode='rb') as f:
        ds = xr.open_dataset(f, decode_timedelta=True) # flag decode_timedelta=True to suppress warnings
        datasets.append(ds)

# coad optional CTD datasets for density contour
datasets_ctd = []
if density_contour and data_type != 'ctd':
    for url in manifest.get('ctd', []):
        with fsspec.open(url, mode='rb') as f:
            ds = xr.open_dataset(f, decode_timedelta=True)
            datasets_ctd.append(ds)


  ds = xr.open_dataset(f)
  ds = xr.open_dataset(f)
  ds = xr.open_dataset(f)
  ds = xr.open_dataset(f)
  ds = xr.open_dataset(f)
  ds = xr.open_dataset(f)
  ds = xr.open_dataset(f)
  ds = xr.open_dataset(f)
  ds = xr.open_dataset(f)
  ds = xr.open_dataset(f)
  ds = xr.open_dataset(f)
  ds = xr.open_dataset(f)
  ds = xr.open_dataset(f)
  ds = xr.open_dataset(f)
  ds = xr.open_dataset(f)
  ds = xr.open_dataset(f)
  ds = xr.open_dataset(f)
  ds = xr.open_dataset(f)
  ds = xr.open_dataset(f)
  ds = xr.open_dataset(f)
  ds = xr.open_dataset(f)
  ds = xr.open_dataset(f)
  ds = xr.open_dataset(f)
  ds = xr.open_dataset(f)
  ds = xr.open_dataset(f)
  ds = xr.open_dataset(f)
  ds = xr.open_dataset(f)
  ds = xr.open_dataset(f)
  ds = xr.open_dataset(f)
  ds = xr.open_dataset(f)
  ds = xr.open_dataset(f)
  ds = xr.open_dataset(f)
  ds = xr.open_dataset(f)
  ds = xr.open_dataset(f)
  ds = xr.open_dataset(f)
  ds = xr.open_dataset(f)
  ds = xr.open_dataset(f)
  ds = xr.open_dataset(f)
  ds = xr.op

Loaded 42 CTD files


  ds = xr.open_dataset(f)
  ds = xr.open_dataset(f)
