problem with the rsd values compared with the aspace or phonTools R packages #3

Open
mlaloux opened this Issue May 25, 2016 · 10 comments

Projects

None yet

2 participants

@mlaloux
mlaloux commented May 25, 2016

I have a problem with your majorsd and minorrsd values: they are smaller than those of the aspace R package (Standard Distance Deviation). Therefore the resulting ellipse is also smaller.

If I compare the results with the R phonTools package (Standard deviation Ellipse ), it is possible to see that the standard deviations values are also different.

@havatv
Owner
havatv commented May 26, 2016

Thank you for your feedback @mlaloux ! I tried to implement the method that I found in the reference (“The Standard Deviational Ellipse; An Updated Tool for Spatial Description”. Robert S. Yuill. Geografiska Annaler. Series B, Human Geography. Vol. 53, No. 1 (1971), pp. 28-39. URL: https://www.jstor.org/stable/490885). I may have missed something. If you have the time, it would be great if you could have a look at the code and compare to Yuill's formulae. I will also check the R packages try to find out what causes the discrepancies. If you could you provide some details about your tests (size of data set, differences in the results, ...), that would be great.

@mlaloux
mlaloux commented May 26, 2016 edited

You can use the processing R scripts (.rsx) to compare.

  • For aspace look at the script in my answer in Standard Deviational Ellipse
  • For phonTools
    ##Layer=vector
    ##stdv=number 2
    ##Output = output vector
    library(phonTools)
    coords<-coordinates(Layer)
    fin_coord = sdellipse (coords,col = 2,stdev = stdv ,show = FALSE)[ ,c(1,2)]
    sp = SpatialPolygons( list(Polygons(list(Polygon(fin_coord )), 1)))
    sp.df <- data.frame( ID=1:length(sp))
    res = SpatialPolygonsDataFrame(sp,sp.df)
    Output<-res

(stdev = the number of standard deviations to be enclosed by the ellipse. You can run the script with values 0.5, 1, 1.5 and 2 for example)

For my test data with 20 points, the discrepancies are in the minor rsd, 724.94 for your plugin and 1771.17 for aspace (major rsd respectively 1132.85 and 1133.43). The resulting ellipse of your plugin correspond to a Standard Deviation ellipse ( with phonTools) of 1 and the aspace ellispse, 1.5)

Mathematically, you can also use the covariance matrix and its eigenvalues

  cov = np.cov(x, y)
  eigenvalues, eigenvectors  = np.linalg.eig(cov)
  lambda_ = np.sqrt(eigenvalues)

And now, you can plot the ellipses ( How to draw a covariance error ellipse? , Creating a Confidence Ellipses in a sccatterplot using matplotlib or Plot an ellipse) and save the result as shapefiles (very easy, same results as phonTools)

@havatv
Owner
havatv commented May 27, 2016

Thanks again @mlaloux . I have tested your R script (with an sdv of 1.0). The ellipses produced by the SDEllipse plugin and your script are very close to identical when run on a couple of point data sets (UTM 32N/33N, WGS1984). Could it have something to do with the coordinate reference system?

@mlaloux
mlaloux commented May 27, 2016 edited

When you work with numerical data, it is better use Numpy, which greatly simplifies your script (less for loops) or Lists comprehension. The solution here is based on the calc_sde.R of aspace and on Plot an ellipse for the ellipse. The results are the same as aspace and it is an adaption of one of my scripts in pure Python with Fiona and Shapely (much easier to do)

  1. Introduction
import numpy as np
weight = 1 #here
x = np.array(x) 
y = np.array(y) 
meanx = np.mean(x) # or  np.sum(x)/n if weight 
meany = np.mean(y)  #  np.sum(y)/n weight 
xd = x-meanx
yd = y-meany
xyw = sum(xd*yd) * weight
x2w = sum(xd**2) * weight
y2w = sum(yd**2) * weight
  1. The angles (compute weighted angles)
  top1 = weight*x2w - weight*y2w
  top2 = np.sqrt( (weight*x2w - weight*y2w)**2 + 4 * (weight*xyw)**2)
  bottom = (2 * weight *xyw)
  tantheta1 = (top1 + top2 ) / bottom
  tantheta2 = (top1 - top2 ) / bottom
  theta1 = np.degrees(np.arctan(tantheta1))
  theta2 = np.degrees(np.arctan(tantheta2))
  print theta1,theta2 # in degrees
  31.6074905402 -58.3925094598    # 58.3925094598 -31.6074905402 with your plugin
  1. compute the SDE
 costheta = np.cos(np.radians(theta1))
 sintheta = np.sin(np.radians(theta1))
 sin2theta = sintheta**2
 cos2theta = costheta**2
 sinthetacostheta =  sintheta * costheta
 SDEx = np.sqrt(2) * np.sqrt( ((x2w)*(cos2theta) - 2*(xyw)*(sinthetacostheta) + (y2w)*(sin2theta))  / (n - 2)) 
 SDEy = np.sqrt(2) * np.sqrt( ((x2w)*(sin2theta) + 2*(xyw)*(sinthetacostheta) + (y2w)*(cos2theta)) / (n - 2))
 print SDEx, SDEy
 1133.42556259 1771.17313503  #1132.85 and 724.94  with your plugin

You can also calculate the eccentricity of the ellipse

  eccentricity = np.sqrt(1 - ((min(SDEx, SDEy)**2)/(max(SDEx, SDEyy)**2)))
  1. plot the ellipse (with 100 points in your case -> int(360/3.6, 360 only gives 360 points)
angle = theta1
pts = np.zeros((int(360/3.6)+1, 2))
beta = -np.radians(theta1)
sin_beta = np.sin(beta)
cos_beta = np.cos(beta)
alpha = np.radians(np.r_[0.:360.:1j*(int(360/3.6)+1)])
sin_alpha = np.sin(alpha)
cos_alpha = np.cos(alpha)  
pts[:, 0] = meanx + (SDEx * cos_alpha * cos_beta - SDEy * sin_alpha * sin_beta)
pts[:, 1] = meany + (sSDEx * cos_alpha * sin_beta + SDEy * sin_alpha * cos_beta)

and it is easy to create a Polygon from the points (you can simplify this part)

geom = QgsGeometry.fromPolygon([[QgsPoint(xy[0],xy[1]) for xy in pts]]) 
 ....
@havatv
Owner
havatv commented May 27, 2016

@mlaloux , thank you for your help in improving the code. Many good points. I will go through it.

But I am still puzzled that I can not reproduce the large deviations / errors that you get. The deviations I get are very small (and look like they could be due to the sample / population variance - dividing by N-1 or N). I am interested in finding the source of the errors. It would therefore be interesting to see your dataset.

Could you test with the points in the attached zip-compressed CSV-file?
points.csv.zip

@mlaloux
mlaloux commented May 28, 2016 edited

I test your points and I have the same problem with the minor rsd.

in R

You can test the complete aspace rscript

##Ellipse=group
##Layer=vector
##Field=Field Layer
##SDE_shape = output vector
##weighted=boolean True.
library(aspace)
coords<-coordinates(Layer)
if(weighted) {
calc_sde(id=1, filename="/Users/me/SDE_Output.txt", centre.xy=NULL, calccentre=TRUE, weighted=TRUE, weights=Layer[[Field]], points=coords, verbose=FALSE)
}else {
calc_sde(id=1, filename="/Users/nouvportable/SDE_Output.txt", centre.xy=NULL, calccentre=TRUE, weighted=FALSE, weights=NULL, points=coords, verbose=FALSE)
}
fin_coords = c(sdeloc[2],sdeloc[3])
sp = SpatialPolygons( list(  Polygons(list(Polygon(fin_coords)), 1)))
res = SpatialPolygonsDataFrame(sp, sdeatt)
proj4string(res) <- Layer@proj4string
SDE_shape = res

Results

weighted (in red the aspace solution, in blue the plugin )


aspace
380,308365072648, 221,164903371421, 72,0691130562294, 144,56539302743, 85,0285710508412
plugin
380,308365072648, 221,164903371421, 49,9513386519936, 100,198748093897, -1,48402852310311, 0,086767803691782

non weighted


aspace
320,889208633094,215,661390887291, 67,0310153611684, 165,633050304211, 77,1191513516296
plugin
320,889208633094, 215,661390887291, 44,1252162466994, 109,033021846596 -1,34598310742977, 0,224813219365126

Comparison with the classical standard deviations values (Bivariate normal ellipse, covariance ellipse)

You can compute that with the phonTools, car, Ellipse R packages and many others (I have rscripts for that) . There also Python modules.

In ArcGIS: How Directional Distribution (Standard Deviational Ellipse) works

  • a one standard deviation ellipse polygon will cover approximately 68 percent of the features
  • two standard deviations will contain approximately 95 percent of the features
  • three standard deviations will cover approximately 99 percent of the feature

Therefore, you solution covers less than the ashape solution (area)

In Python

I try to adapt my solution to your script (but without for loops)

x = np.array(x) 
y = np.array(y) 
meanx = sum(x*weights )/sum(weights) 
meany = sum(y*weights )/sum(weights) 
xd = x-meanx
yd = y-meany
sumweight = sum(weights)
xyw= sum(xd*yd* weights)
x2w = sum(np.power(xd,2)* weights)
y2w = sum(np.power(yd,2)* weights)
print meanx,meany
380.308365073 221.164903371

Now, the angles and the SDEs with the algorithm of calc_sde.R (aspace)

top1 = x2w - y2w
top2 = np.sqrt((x2w - y2w)**2 +4 * (xyw)**2)
bottom = (2 * xyw)
tantheta1 = (top1 + top2 ) / bottom
tantheta2 = (top1 - top2 ) / bottom
theta1 = np.degrees(np.arctan(tantheta1))
theta2 = np.degrees(np.arctan(tantheta2))
print theta1,theta2 
85.0285710508 -4.97142894916

The result of your plugin are

majorangle =np.degrees(0.086767803691782)
print majorangle
4.971428949158752
minorangle = np.degrees(-1.48402852310311)
 -85.028571050840995

Now, the SDE

costheta = np.cos(np.radians(theta1))
sintheta = np.sin(np.radians(theta1))
sigmax = np.sqrt(2) * np.sqrt( ( (sum(xd**2*weights))*(cos2theta) - 2*(sum(xd*yd*weights))*  (sinthetacostheta) + (sum(yd**2*weights))*(sin2theta) ) / (sum(weights) - 2) )
sigmay = np.sqrt(2) * np.sqrt( ( (sum(xd**2*weights))*(sin2theta) + 2*(sum(xd*yd*weights))*(sinthetacostheta) + (sum(yd**2*weights))*(cos2theta) ) / (sum(weights) - 2) )
print sigmax, sigmay
72.069113056229341, 144.56539302742954

The result of your plugin is 49.951338652, 100.198748094

My conclusions

Therefore, the angles are the same (except the sign) and the discrepancies occurs in the computation of the SDEs and not on the coordinate reference system (purely geometric)

@havatv
Owner
havatv commented May 30, 2016 edited

Thanks a lot!
The lack of correction for degrees of freedom should explain a small part of this (difference from the phonTools ellipse), while the lack of the sqrt(2) correction as used by Crimestat should explain the rest (difference from the aspace ellipse with an sdv of 1.0). The reported angles from the plugin are counter-clockwise relative to the first axis, while the aspace angle seems to be clockwise relative to the second axis.

I have now tried to explain the behaviour of the plugin as follows:

A QGIS plugin to create a standard deviational ellipse, according to the method presented by Robert Yuill (1971).
This method does not give a radius equal to the standard distance deviation for a random point distribution. To achieve this, the SDs should be multiplied by sqrt(2), as explained in the CrimeStat
documentation.
The method does not correct for degrees of freedom.
The plugin works on point vector layers.
The user can choose to use only selected features, and can choose an attribute for weighting.
A polygon vector layer with the standard deviational ellipse is produced, containing the attributes: meanx, meany, majorangle, minorangle, majorsd and minorsd.
Angles are counter-clockwise relative to the first axis.

What do you think?

@mlaloux
mlaloux commented May 30, 2016 edited

The principal problem is that some people use the areas of the SDE (ecology for example) and the results differ.
The SDE problem is more complicated than the Robert S. Yuill solution, read Clarifying the Standard Deviational Ellipse or Confidence Analysis of Standard Deviational Ellipse and Its Extension into Higher Dimensional Euclidean Space for example.

Here is the "synthesis" of the problem

  • dotted line : elliptical contours of the normal probability or confidence levels with car
  • in magenta, a simple solution with eigenvectors/eigenvalues of the variance matrix. You can do the same processing with the covariance matrix
  • in blue, your plugin
  • in red, the ashape solution

Now, what is the best solution ?

From "Clarifying the Standard Deviational Ellipse"

More than seventy years ago there were two interesting articles published by Lefever (1926) and Furfey (1927) in The American Journal of Sociology. Both discussed what Lefever called “standard deviational ellipse” and its application in spatial analysis for a set of geographical units regarded as point set in two-dimensional space. “Standard deviational ellipse,”or SDE for short, is not at all an ellipse as its name implies.This fact was clarified by Furfey in 1927.Despite this,SDE has been widely introduced and applied as an ellipse in later studies. To the present, it has remained unclear what the actual curve of SDE is.

From "Confidence Analysis of Standard Deviational Ellipse and Its Extension into Higher Dimensional Euclidean Space"

Although SDE has extensive applications in various fields ever since 1926, it still has not been correctly clarified sometimes. For instance, from the latest resources in ArcGIS Help 10.1 describing how standard deviational ellipse works, it is stated that one, two and three standard deviation(s) can encompass approximately 68%, 95% and 99% of all input feature centroids respectively, supposing the features concerned follow a spatially normal distribution. However, this content corresponds to the well-known 3-sigma rule with respect to univariate normal distribution, rather than bivariate case. Worse still, there is even an attached illustration therein depicting several bivariate geographical features located within a planar map. Obviously, such confusing interpretation may mislead the GIS users to believe the univariate 3-sigma rule remains valid in two-dimensional Euclidean space, or even higher dimensions.

and

The upcoming discussed method presented by Yuill was actually an amelioration of Lefever’s original model despite of suffering from certain criticisms

@havatv
Owner
havatv commented May 31, 2016 edited

It is probably useful to offer a choice of method (Yuill, aspace/CrimeStat, ...) through the plugin.
I have uploaded a revised version (1.1) of the plugin with options for Yuill and CrimeStat/aspace. More could be added.

@havatv havatv self-assigned this Jun 1, 2016
@mlaloux
mlaloux commented Jun 1, 2016 edited

Very good, I can give you others implementations in Python if you want (as pure covariance/variance error ellipses with Chi-Square confidence intervals or bivariate normal ellipses witch are real ellispes )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment