# 2D interpolation
## Bivariate
Perform a bivariate interpolation of gridded data points.

The distribution contains a 2D field mss.nc that will be used in this help. This file is located in the tests/dataset directory at the root of the project.

> Warning This file is an old version of the sub-sampled quarter step MSS CNES/CLS. Do not use it for
> scientific purposes, download the latest updated high-resolution version instead here.

The first step is to load the data into memory:
    

In [None]:
import netCDF4
import pyinterp.grid
import pyinterp.bivariate

ds = netCDF4.Dataset("tests/dataset/mss.nc")

Afterwards, build the axes associated with the grid:

In [None]:
import pyinterp.core

x_axis = pyinterp.core.Axis(ds.variables["lon"][:], is_circle=True)
y_axis = pyinterp.core.Axis(ds.variables["lat"][:])

Finally, we can build the object defining the grid to interpolate:

In [None]:
# The shape of the bivariate values must be (len(x_axis), len(y_axis))
mss = ds.variables["mss"][:].T
# The undefined values must be set to nan.
mss[mss.mask] = float("nan")
grid = pyinterp.grid.Grid2D(x_axis, y_axis, mss.data)

We will then build the coordinates on which we want to interpolate our grid:

In [None]:
import numpy as np

# The coordinates used for interpolation are shifted to avoid using the
# points of the bivariate function.
mx, my = np.meshgrid(np.arange(-180, 180, 1) + 1 / 3.0,
                     np.arange(-89, 89, 1) + 1 / 3.0,
                     indexing='ij')

The grid is interpolated to the desired coordinates:

In [None]:
mss = pyinterp.bivariate.bivariate(grid, mx.flatten(), my.flatten()).reshape(mx.shape)

Values can be interpolated with several methods: *bilinear*, *nearest*, and *inverse distance weighting*. Distance calculations, if necessary, are calculated using the [Haversine formula](https://en.wikipedia.org/wiki/Haversine_formula)

An experimental module of the library simplifies the use of the library by using xarray and CF information contained in dataset. This module implements all the other interpolators of the regular grids presented below.

In [None]:
import pyinterp.backends.xarray
import xarray as xr

ds = xr.open_dataset("tests/dataset/mss.nc")
interpolator = pyinterp.backends.xarray.Grid2D(ds.data_vars["mss"])
mss = interpolator.bivariate(dict(lon=mx.flatten(), lat=my.flatten()))

## Bicubic

Interpolating data points on two-dimensional regular grid. The interpolated surface is smoother than the corresponding surfaces obtained by bilinear interpolation. Bicubic interpolation is achieved by spline functions provided by GSL.

In [None]:
import pyinterp.bicubic

mss = pyinterp.bicubic.bicubic(
    grid, mx.flatten(), my.flatten(), nx=3, ny=3).reshape(mx.shape)

It is also possible to simplify the interpolation of the dataset by using xarray:

In [None]:
mss = interpolator.bicubic(dict(lon=mx.flatten(), lat=my.flatten()))

# 3D interpolation
## Trivariate

The **trivariate** interpolation allows to obtain values at arbitrary points in a 3D space of a function defined on a grid.

The distribution contains a 3D field `tcw.nc` that will be used in this help. This file is located in the `tests/dataset` directory at the root of the project.

This method performs a bilinear interpolation in 2D space by considering the axes of longitude and latitude of the grid, then performs a linear interpolation in the third dimension. Its interface is similar to the *bivariate* interpolation except for a third axis which is handled by this object.

In [None]:
import pyinterp.trivariate

ds = netCDF4.Dataset("tests/dataset/tcw.nc")
x_axis = pyinterp.core.Axis(ds.variables["longitude"][:], is_circle=True)
y_axis = pyinterp.core.Axis(ds.variables["latitude"][:])
z_axis = pyinterp.core.Axis(ds.variables["time"][:])
# The shape of the bivariate values must be
# (len(x_axis), len(y_axis), len(z_axis))
tcw = ds.variables['tcw'][:].T
# The undefined values must be set to nan.
tcw[tcw.mask] = float("nan")
grid = pyinterp.grid.Grid3D(
    x_axis, y_axis, z_axis, tcw.data)
# The coordinates used for interpolation are shifted to avoid using the
# points of the bivariate function.
mx, my, mz = np.meshgrid(np.arange(-180, 180, 1) + 1 / 3.0,
                         np.arange(-89, 89, 1) + 1 / 3.0,
                         898500 + 3,
                         indexing='ij')
tcw = pyinterp.trivariate.trivariate(
    grid, mx.flatten(), my.flatten(), mz.flatten()).reshape(mx.shape)

It is also possible to simplify the interpolation of the dataset by using xarray:

In [None]:
ds = xr.open_dataset("tests/dataset/tcw.nc")
interpolator = pyinterp.backends.xarray.Grid3D(ds.data_vars["tcw"])
tcw = interpolator.trivariate(
    dict(longitude=mx.flatten(), latitude=my.flatten(), time=mz.flatten()))

# Unstructured grid

The interpolation of this object is based on an **R\*Tree** structure. To begin with, we start by building this object. By default, this object considers WGS-84 geodetic coordinate system. But you can define another one using class **System**.

In [None]:
import pyinterp.rtree

mesh = pyinterp.rtree.RTree()

Then, we will insert points into the tree. The class allows you to insert points using two algorithms. The first one called **packing** allows you to insert the values in the tree at once. This mechanism is the recommended solution to create an optimized in-memory structure, both in terms of construction time and queries. When this is not possible, you can insert new information into the tree as you go along using the **insert** method.

In [None]:
ds = netCDF4.Dataset("tests/dataset/mss.nc")
# The shape of the bivariate values must be (len(longitude), len(latitude))
mss = ds.variables['mss'][:].T
mss[mss.mask] = float("nan")
# Be careful not to enter undefined values in the tree.
x_axis, y_axis = np.meshgrid(
    ds.variables['lon'][:], ds.variables['lat'][:], indexing='ij')
mesh.packing(
    np.vstack((x_axis.flatten(), y_axis.flatten())).T,
    mss.data.flatten())

When the tree is created, you can interpolate the data or make various queries on the tree.

In [None]:
mx, my = np.meshgrid(
    np.arange(-180, 180, 1) + 1 / 3.0,
    np.arange(-90, 90, 1) + 1 / 3.0,
    indexing="ij")
mss, neighbors = mesh.inverse_distance_weighting(
    np.vstack((mx.flatten(), my.flatten())).T,
    within=False,
    radius=35434,
    k=8,
    num_threads=0)

# Fill NaN values

The undefined values in the grids do not allow interpolation of values located
in the neighborhood. This behavior is a concern when you need to interpolate
values near the land/sea mask of some maps. The library provides two functions
to fill the undefined values.

## LOESS

The first method applies a weighted local regression to extrapolate the boundary
between defined and undefined values. The user must indicate the number of pixels
on the X and Y axes to be considered in the calculation. For example:

In [None]:
ds = xr.open_dataset("/home/fbriol/Data/SWOT_GEO/surface_type.nc")
grid = pyinterp.backends.xarray.Grid2D(ds.data_vars["mask"])

In [None]:
mask = grid.bivariate(dict(lon=lons.flatten(), lat=lats.flatten()), interpolator='nearest').reshape(lons.shape)

In [None]:
mask = mask == 0

In [None]:
import pyinterp.fill

ds = xr.open_dataset("tests/dataset/mss.nc")
grid = pyinterp.backends.xarray.Grid2D(ds.data_vars["mss"])
filled = pyinterp.fill.loess(grid, nx=3, ny=3)

In [None]:
grid.array[~mask] = np.nan

The image below illustrates the result:

In [None]:
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
%matplotlib inline

fig = plt.figure(figsize=(10, 5))
ax = fig.add_subplot(121, projection=ccrs.PlateCarree())
lons, lats = np.meshgrid(grid.x, grid.y, indexing='ij')
ax.contourf(lons, lats, grid.array,
            transform=ccrs.PlateCarree())
ax.coastlines()
ax.set_title("Original MSS")
ax.set_extent([80, 170, -45, 30], crs=ccrs.PlateCarree())

ax = fig.add_subplot(122, projection=ccrs.PlateCarree())
ax.contourf(lons, lats, filled,
            transform=ccrs.PlateCarree())
ax.coastlines()
ax.set_title("MSS modified using the LOESS filter")
ax.set_extent([80, 170, -45, 30], crs=ccrs.PlateCarree())
plt.show()

## Gauss-Seidel

The second method consists of replacing all undefined values (NaN)
in a grid using the Gauss-Seidel method by relaxation.

In [None]:
has_converged, filled = pyinterp.fill.gauss_seidel(grid)

The image below illustrates the result:

In [None]:
fig = plt.figure(figsize=(10, 5))
ax = fig.add_subplot(121, projection=ccrs.PlateCarree())
lons, lats = np.meshgrid(grid.x, grid.y, indexing='ij')
ax.contourf(lons, lats, grid.array,
            transform=ccrs.PlateCarree())
ax.coastlines()
ax.set_title("Original MSS")
ax.set_extent([80, 170, -45, 30], crs=ccrs.PlateCarree())

ax = fig.add_subplot(122, projection=ccrs.PlateCarree())
ax.contourf(lons, lats, filled,
            transform=ccrs.PlateCarree())
ax.coastlines()
ax.set_title("MSS modified using Gauss-Seidel")
ax.set_extent([80, 170, -45, 30], crs=ccrs.PlateCarree())
plt.show()