# Introduction to xarray: Why and How

## Why use xarray?

- **N-dimensional labeled arrays:** Unlike plain NumPy arrays, xarray supports labels (names) for dimensions, coordinates, and metadata.  
- **Easy handling of multi-dimensional scientific data:** Perfect for datasets like climate model outputs, satellite data, and geospatial grids.  
- **Powerful indexing and slicing:** Access data by coordinate labels instead of integer indices, making code more readable and less error-prone.  
- **Integration with other libraries:** Works well with pandas, NumPy, matplotlib, and Dask for parallel computing.  
- **Built-in support for NetCDF:** A common format for climate and oceanographic data.

---

## How does xarray work?

- The core data structure is the **`xarray.DataArray`**, which holds multi-dimensional data with dimension names and coordinates.  
- Larger collections of variables and coordinates are managed via **`xarray.Dataset`**, like a dict of DataArrays.  
- Coordinates provide meaningful labels for axes (e.g., time, latitude, longitude).  
- You can perform arithmetic, group operations, resampling, and more with labeled data.

---


In [None]:
# Install packages from requirements.txt (needed for this session)
!pip numpy xarray matplotlib pooch

## Setup: use a tutorial Dataset



In [None]:
import numpy as np
import pandas as pd
import xarray as xr
import matplotlib.pyplot as plt

In [None]:
ds = xr.tutorial.load_dataset("air_temperature") #<----- dataset
#ds = xr.tutorial.load_dataset("air_temperature")['air'] #<----- dataARRAY - adding variable name
print(ds)

### Latitude Profile of Mean Temperature for June 2014

1. **Select Time Range**  
   - `ds.air.sel(time="2014/06")` selects **all days in June 2014** from the dataset along the `time` dimension.  

2. **Compute Mean**  
   - `.mean(["time","lon"])` calculates the **average over the time and longitude dimensions**:  
     - **Time:** combines all days in June 2014.  
     - **Longitude:** averages across the full east-west extent.  
   - The result is a **1D array along latitude**, showing how temperature varies from south to north.

3. **Plotting**  
   - `.plot()` visualizes the latitude profile:  
     - X-axis: Temperature (°C or K, depending on the dataset).  
     - Y-axis: Latitude (degrees).  
   - This provides a **meridional (north-south) temperature profile** for that month.

4. **Interpretation**  
   - The plot highlights **how temperature changes with latitude** during June 2014.  
   - Peaks and valleys indicate warmer or cooler zones along the north-south axis.  
   - This is a useful way to summarize **monthly climatology along a single spatial dimension**.


In [None]:
ds.air.sel(time="2014/06").mean(["time","lon"]).plot()

### Monthly Mean Temperature for January

1. **Group by Month**  
   - `ds.air.groupby("time.month")` groups the air temperature data along the `time` dimension by calendar month (1–12).  

2. **Compute Mean**  
   - `.mean()` calculates the average temperature for each month across all years and all days within that month.  
   - This produces a new DataArray with a `"month"` coordinate instead of `"time"`.  

3. **Select January**  
   - `.sel(month=1)` extracts the data corresponding to **January**.  

4. **Plotting**  
   - `.plot(cmap='seismic', robust=True)` displays the January temperature spatially:  
     - `cmap='seismic'` uses a red-to-blue diverging color map (cool vs. warm).  
     - `robust=True` ensures that the color scale is not skewed by extreme outliers.  

5. **Interpretation**  
   - This map shows the **spatial distribution of mean January temperatures** across the dataset’s latitude-longitude grid.  
   - Useful for quickly identifying cold and warm regions in a specific month.  
   - You can repeat for other months by changing `.sel(month=2)` for February, `.sel(month=3)` for March, etc.


In [None]:
ds.air.groupby("time.month").mean().sel(month=1).plot(cmap='seismic', robust=True)


### Filtering and Visualizing Temperature with Multiple Conditions

1. **Define Conditions**  
   - `cond1 = ds.air > 275`: Selects all grid points where the air temperature exceeds 275 K.  
   - `cond2 = ds.lat < 50`: Limits the selection to latitudes below 50°N.  
   - `cond3 = ds.lon < 280`: Further restricts the selection to longitudes below 280°E (or equivalent in the dataset's coordinate system).  

2. **Apply Combined Conditions**  
   - Combine the conditions using the logical AND operator `&`:
   ```python
   total_conditions = cond1 & cond2 & cond3


In [None]:

cond1 = ds.air>275
cond2 = ds.lat<50


ds.air.where((cond1) & (cond2)).isel(time=-1).plot()
plt.plot([190, 330],[50, 50], ls=':', color='red')

# Add yet another one! 
cond3 = ds.lon<280
plt.figure()
total_conditions = (cond1) & (cond2) & (cond3)

ds.air.where(total_conditions).isel(time=-1).plot()
plt.plot([280, 280],[15, 80], ls=':', color='red')
plt.plot([190, 330],[50, 50], ls=':', color='red')
