# GEO877: Spatial Algorithms

## Practical 1: Points and Spatial Distance Measures
v1, class embedded, sample data, and dataset will load if in same folder as this file

### Problem
A dataset containing 10,000 records with Cartesian (x, y) and WGS84 coordinates (longitude, latitude) and a 'region' classifier is provided.

For all points and for points in each region, calculate the mean and standard deviation of the differences between Cartesian and a spherical distance measures for all pairs of points. Also calculate the maximum absolute error in distance and maximum relative error.

A sample Points class is provided (embedded below) to help get you started...


### Distance functions

#### Euclidean

$$d_{Euclidean}(a, b) = \sqrt{(a_{x} - b_{x})^{2} + (a_{y} - b_{y})^{2} }$$ 

#### Spherical

...

### Pseudocode/Flow diagram
...

### Classes and methods

In [1]:
# class and methods for a geometric point
# =======================================
from numpy import sqrt

class Point():
    # initialise
    def __init__(self, x=None, y=None):
        self.x = x
        self.y = y
    
    # representation
    def __repr__(self):
        return f'Point(x={self.x}, y={self.y})'
        
    # calculate Euclidean distance between two points
    def distEuclidean(self, other):
        return sqrt((self.x-other.x)**2 + (self.y-other.y)**2)
    
    # calculate Manhattan distance between two points
    def distManhattan(self, other):
        return abs(self.x-other.x) + abs(self.y-other.y)
    # Test for equality between Points
    def __eq__(self, other): 
        if not isinstance(other, Point):
            # don't attempt to compare against unrelated types
            return NotImplemented

        return self.x == other.x and self.y == other.y
    # We need this method so that the class will behave sensibly in sets and dictionaries
    def __hash__(self):
        return hash((self.x, self.y))


### Data

In [2]:
import pandas as pd

# sample data for testing
# -----------------------
# 2 cities with x, y, lat and lng
bern = (600500, 206750, 47.01180, 7.44521)
zurich = (682000, 252750, 47.42045, 8.52532)

# Flickr set of 10K records
# -------------------------
data_folder = "../../raw_data/"   # specify folder where dataset is, if different to this file - here a relative path
data_file = "flickr_10000_uk_adm.csv"
input_string = data_folder + data_file
df = pd.read_csv(input_string, sep = ",")

print(df.info()) # Show the data types of the entries in the frame
df[:3] #Output the first three rows of the frame

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 8 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   photo_id   10000 non-null  int64  
 1   longitude  10000 non-null  float64
 2   latitude   10000 non-null  float64
 3   X          10000 non-null  float64
 4   Y          10000 non-null  float64
 5   adm1_code  10000 non-null  object 
 6   name       10000 non-null  object 
 7   region     10000 non-null  object 
dtypes: float64(4), int64(1), object(3)
memory usage: 625.1+ KB
None


Unnamed: 0,photo_id,longitude,latitude,X,Y,adm1_code,name,region
0,1559,-2.982906,56.456295,339520.5448,729782.1624,GBR-2020,Dundee,Eastern
1,2534,-4.287759,55.86793,256939.5208,666228.5591,GBR-2004,Glasgow,South Western
2,5426,-0.384564,51.828854,511419.5498,215704.3552,GBR-2752,Luton,East


### Solution

In [None]:
# ... goes here
