# Finding nearest points and mapping values from those points

Written by Simon M. Mudd, last update 09/11/2021

This short tutorial is for the case when you have two sets of point data and want to map values of one of the point datasets to the nearest point on the other dataset.

The example here will be a point dataset that represents a channel, and that includes elevation, drainage area, and other values, and a second dataset that represents measurements of channel width. 

## Import the necessary packages

In [1]:
import geopandas as gpd
import numpy as np
import pandas as pd

from scipy.spatial import cKDTree
from shapely.geometry import Point

## Load the necessary datasets

We have two datasets. One is the channel data and the other is data about channel width. This second dataset could be any set of points. 

We will, in the next step, merge these datasets based on the nearest neighbour to one of the set of points (i.e., mapping channel data to the nearest channel width point). 

For this to work, **the two datasets must be in the same coordinate reference system**.

In the example below, we use `.crs` to define the coordinate reference system. We can do this because we know that one of the datasets is in `EPSG:4326` because it has latitude and longitude data, and the other one is in `EPSG:27700`, which is the British National Grid, because it is mean to mimic data collected by students in the field using GPS that have the British National Grid as default. 

We then convert the data from British National Grid to `EPSG:4326` using the function `.to_crs`

In [8]:
# Load the channel data
dfA = pd.read_csv("el_study_chi_data_map.csv")
# Convert to a geopandas dataframe
gdfA = gpd.GeoDataFrame(
    dfA, geometry=gpd.points_from_xy(dfA.longitude, dfA.latitude))
# We have to tell the geopandas data what geographic system we are in by using something called an EPSG code. 
# All major geographic projection and transformation system have this code. 
gdfA.crs = "EPSG:4326" 


# Load the width data
dfB = pd.read_csv("channel_width_test.csv")
gdfB = gpd.GeoDataFrame(
    dfB, geometry=gpd.points_from_xy(dfB.easting, dfB.northing))
# We have to tell the geopandas data what geographic system we are in by using something called an EPSG code. 
# All major geographic projection and transformation system have this code. 
gdfB.crs = "EPSG:27700" 

# IMPORTANT: we convert one of the datasets to the coordinate reference system of the other
gdfC = gdfB.to_crs(4326)

The next three lines just show what the first few lines of data looks like.

In [9]:
gdfA.head()

Unnamed: 0,latitude,longitude,chi,elevation,flow_distance,drainage_area,source_key,basin_key,geometry
0,55.877436,-2.549256,9.1343,389.98,4230.0,46852.0,0,0,POINT (-2.54926 55.87744)
1,55.877454,-2.549224,9.1119,389.98,4227.2,46864.0,0,0,POINT (-2.54922 55.87745)
2,55.877472,-2.549224,9.0961,389.95,4225.2,46912.0,0,0,POINT (-2.54922 55.87747)
3,55.87749,-2.549224,9.0803,389.95,4223.2,48104.0,0,0,POINT (-2.54922 55.87749)
4,55.877508,-2.549192,9.0581,389.91,4220.3,48160.0,0,0,POINT (-2.54919 55.87751)


In [10]:
gdfB.head()

Unnamed: 0,easting,northing,width,geometry
0,366424,662908,3.0,POINT (366424.000 662908.000)
1,366393,662855,2.0,POINT (366393.000 662855.000)
2,366365,662798,2.2,POINT (366365.000 662798.000)
3,366346,662732,1.8,POINT (366346.000 662732.000)
4,366302,662688,1.5,POINT (366302.000 662688.000)


In [11]:
gdfC.head()

Unnamed: 0,easting,northing,width,geometry
0,366424,662908,3.0,POINT (-2.53795 55.85818)
1,366393,662855,2.0,POINT (-2.53844 55.85771)
2,366365,662798,2.2,POINT (-2.53888 55.85719)
3,366346,662732,1.8,POINT (-2.53918 55.85660)
4,366302,662688,1.5,POINT (-2.53987 55.85620)


## Add the function for combining datasets

The below function merges two datasets using nearest neighbours. 
**You don't need to change anything in this function.**
The first dataframe keeps its data elements and adds properties from the nearest neighbour that are closest to the points in the first dataframe. 

In [12]:
def ckdnearest(gdA, gdB):

    nA = np.array(list(gdA.geometry.apply(lambda x: (x.x, x.y))))
    nB = np.array(list(gdB.geometry.apply(lambda x: (x.x, x.y))))
    btree = cKDTree(nB)
    dist, idx = btree.query(nA, k=1)
    gdB_nearest = gdB.iloc[idx].drop(columns="geometry").reset_index(drop=True)
    gdf = pd.concat(
        [
            gdA.reset_index(drop=True),
            gdB_nearest,
            pd.Series(dist, name='dist')
        ], 
        axis=1)

    return gdf


Unnamed: 0,easting,northing,width,geometry,latitude,longitude,chi,elevation,flow_distance,drainage_area,source_key,basin_key,dist
0,366424,662908,3.0,POINT (-2.53795 55.85818),55.858165,-2.537945,1.5072,231.75,1565.8,3597400.0,35,4,2e-05
1,366393,662855,2.0,POINT (-2.53844 55.85771),55.8577,-2.538398,1.6152,234.84,1661.4,3479300.0,35,4,4.2e-05
2,366365,662798,2.2,POINT (-2.53888 55.85719),55.857181,-2.538883,1.7068,237.54,1741.7,3432600.0,35,4,1.1e-05
3,366346,662732,1.8,POINT (-2.53918 55.85660),55.856589,-2.53921,1.8081,240.98,1830.1,3410200.0,35,4,3.6e-05
4,366302,662688,1.5,POINT (-2.53987 55.85620),55.856214,-2.539885,1.8845,242.72,1896.6,3393700.0,35,4,2e-05


In [None]:
## Merge the two files!