Skip to content

Remove erroneous points from a dataset of coordinates

Notifications You must be signed in to change notification settings

ljoly/betterpath

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

betterpath

The goal of this project is to develop a program that, given a series of points (latitude, longitude, timestamp) for a courier journey from A-B, will disregard potentially erroneous points

Usage

go build cmd/betterpath.go
./betterpath assets/points.csv

Issues

Understanding the problem

The first thing to do was to spot the erroneous points from the dataset given with the subject.
I plotted the coordinates on a map and noticed that some points were far from the main path:

Alt text

Heuristics

The second thing to do was to identify which aspects of these points made them erroneous.
I chose to work on the distances between the points, rather than the speed of the courier because it would have required trigonometry computations and a greater overhead.
Since the "wrong" distances seemed significantly higher than the "normal" ones I decided to work on the standard deviation of the distances, inspired by the work I did on linear and logistic regressions and an interesting article about the concept of normal distribution.

Position of the points

Another matter was the position of the wrong points after being sorted by timestamp. Whether they were contiguous, at the beginning of the path or at the end, it required to update carefully the data stored.

Design decisions

Two slices:

  • for storing the coordinates and the timestamp of each point, using a structure Point
type Point struct {
	x float64
	y float64
	t int64
}
  • for storing the distances between points and to calculate the standard deviation

The two slices are browsed only once, and simultaneously. When a distance is considered too high (ie. superior to the standard deviation), the erroneous point of the pair is identified, deleted, and the distances are updated (and checked because two erroneous points can be contiguous).

About

Remove erroneous points from a dataset of coordinates

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages