# Computing the centroid
I want to be able to compute the centroid of raster while taking into account the valuesof the raster. In other words, I want the centre of mass of the raster. 

I found a way of doing this [here](https://stackoverflow.com/a/71459644) and below I explore why this works.

Make an example datarray

In [1]:
import xarray as xr
import numpy as np


In [2]:
A = xr.DataArray(np.array([[0,0,1,1,0,0],
                           [0,1,1,1,0,0],
                           [0,1,1,1,0,0],
                           [0,1,1,0,0,0],
                           [0,0,0,0,1,0]]),
                 dims=['y','x'],
                 coords={'x': [10,20,30,40,50,60],
                         'y': [50,40,30,20,10]})

In [3]:
A

One way of computing the center of mass of this array is to use a scipy function:

In [4]:
from scipy import ndimage
CM = ndimage.center_of_mass(A.values)
x_center = np.interp(CM[1], np.arange(0,len(A.x)), A.x) 
x_center

31.818181818181817

This has the disadvantage that it takes us out of xarray, risking getting mixed up about the orientation of the array.

This is the method that I found in the stackoverflow post. 

In [5]:
A.x.weighted(A).mean()

It gives the same answer as the scipy method, but initially i wasnt sure what it was doing because `A.x` is 1D while the array it is beind weighted by (`A`) is 2D. The answer is that `A.x` is broadcast to the shape of `A` before the weighted mean is computed. 

It makes sense that this would give us the x cooridianate of the centroid, and to make sure this is what is happening we can do the broadcasting manually and check we get the same answer:

In [6]:
# extrude x and y to 2D meshgrids
x, y = np.meshgrid(A.x.values, A.y.values)
# put the x-coordinate meshgrid into an xarray
x_mesh_xr = xr.DataArray(x, dims=['y','x'], coords={'x': A.x, 'y': A.y})
# and use A to weight the x meshgrid in the mean calculation. 
x_mesh_xr.weighted(A).mean()

This is the same answer, so I am confident this is what is happening with 
```
A.x.weighted(A).mean()
```

Next is to test how this approach with a 3D array.

In [34]:
temp = xr.tutorial.load_dataset('air_temperature')
A3d = temp.air.isel(time=slice(0,30))
A3d

We want to do the center of mass calculation at every time step. It would be convenient if the following did this:

In [35]:
lon_center_xr = A3d.lon.weighted(A3d).mean(dim=['lat','lon'])
lon_center_xr

It seems to work, but to check, we can loop of the time dimension and compute the x-coord of the center of mass for each one:

In [51]:
lon_centers = []
times = A3d.time.values
for time in times:
    lon_centers.append(A3d.lon.weighted(A3d.sel(time=time)).mean(dim = ['lon', 'lat']).values)
# convert this to a numpy array
lon_centers = np.array(lon_centers)


This is the same as simply calling ```A3d.lon.weighted(A3d).mean(dim=['lat','lon'])```, so this appears to be working. 

In [53]:
all(lon_centers == lon_center_xr.values)

True