problem with the rsd values compared with the aspace or phonTools R packages #3

Open
opened this Issue May 25, 2016 · 10 comments

Projects
None yet
2 participants

mlaloux commented May 25, 2016

 I have a problem with your majorsd and minorrsd values: they are smaller than those of the aspace R package (Standard Distance Deviation). Therefore the resulting ellipse is also smaller. If I compare the results with the R phonTools package (Standard deviation Ellipse ), it is possible to see that the standard deviations values are also different.
Owner

havatv commented May 26, 2016 • edited Edited 1 time havatv edited May 15, 2017 (most recent)

 Thank you for your feedback @mlaloux ! I tried to implement the method that I found in the reference (“The Standard Deviational Ellipse; An Updated Tool for Spatial Description”. Robert S. Yuill. Geografiska Annaler. Series B, Human Geography. Vol. 53, No. 1 (1971), pp. 28-39. URL: https://www.jstor.org/stable/490885). I may have missed something. If you have the time, it would be great if you could have a look at the code and compare to Yuill's formulae. I will also check the R packages and try to find out what causes the discrepancies. If you could you provide some details about your tests (size of data set, differences in the results, ...), that would be great.

mlaloux commented May 26, 2016 • edited Edited 1 time mlaloux edited May 26, 2016 (most recent)

 You can use the processing R scripts (.rsx) to compare. For aspace look at the script in my answer in Standard Deviational Ellipse For phonTools ##Layer=vector ##stdv=number 2 ##Output = output vector library(phonTools) coords<-coordinates(Layer) fin_coord = sdellipse (coords,col = 2,stdev = stdv ,show = FALSE)[ ,c(1,2)] sp = SpatialPolygons( list(Polygons(list(Polygon(fin_coord )), 1))) sp.df <- data.frame( ID=1:length(sp)) res = SpatialPolygonsDataFrame(sp,sp.df) Output<-res (stdev = the number of standard deviations to be enclosed by the ellipse. You can run the script with values 0.5, 1, 1.5 and 2 for example) For my test data with 20 points, the discrepancies are in the minor rsd, 724.94 for your plugin and 1771.17 for aspace (major rsd respectively 1132.85 and 1133.43). The resulting ellipse of your plugin correspond to a Standard Deviation ellipse ( with phonTools) of 1 and the aspace ellispse, 1.5) Mathematically, you can also use the covariance matrix and its eigenvalues `````` cov = np.cov(x, y) eigenvalues, eigenvectors = np.linalg.eig(cov) lambda_ = np.sqrt(eigenvalues) `````` And now, you can plot the ellipses ( How to draw a covariance error ellipse? , Creating a Confidence Ellipses in a sccatterplot using matplotlib or Plot an ellipse) and save the result as shapefiles (very easy, same results as phonTools)
Owner

havatv commented May 27, 2016

 Thanks again @mlaloux . I have tested your R script (with an sdv of 1.0). The ellipses produced by the SDEllipse plugin and your script are very close to identical when run on a couple of point data sets (UTM 32N/33N, WGS1984). Could it have something to do with the coordinate reference system?

mlaloux commented May 27, 2016 • edited Edited 1 time mlaloux edited May 27, 2016 (most recent)

 When you work with numerical data, it is better use Numpy, which greatly simplifies your script (less for loops) or Lists comprehension. The solution here is based on the calc_sde.R of aspace and on Plot an ellipse for the ellipse. The results are the same as aspace and it is an adaption of one of my scripts in pure Python with Fiona and Shapely (much easier to do) Introduction ``````import numpy as np weight = 1 #here x = np.array(x) y = np.array(y) meanx = np.mean(x) # or np.sum(x)/n if weight meany = np.mean(y) # np.sum(y)/n weight xd = x-meanx yd = y-meany xyw = sum(xd*yd) * weight x2w = sum(xd**2) * weight y2w = sum(yd**2) * weight `````` The angles (compute weighted angles) `````` top1 = weight*x2w - weight*y2w top2 = np.sqrt( (weight*x2w - weight*y2w)**2 + 4 * (weight*xyw)**2) bottom = (2 * weight *xyw) tantheta1 = (top1 + top2 ) / bottom tantheta2 = (top1 - top2 ) / bottom theta1 = np.degrees(np.arctan(tantheta1)) theta2 = np.degrees(np.arctan(tantheta2)) print theta1,theta2 # in degrees 31.6074905402 -58.3925094598 # 58.3925094598 -31.6074905402 with your plugin `````` compute the SDE `````` costheta = np.cos(np.radians(theta1)) sintheta = np.sin(np.radians(theta1)) sin2theta = sintheta**2 cos2theta = costheta**2 sinthetacostheta = sintheta * costheta SDEx = np.sqrt(2) * np.sqrt( ((x2w)*(cos2theta) - 2*(xyw)*(sinthetacostheta) + (y2w)*(sin2theta)) / (n - 2)) SDEy = np.sqrt(2) * np.sqrt( ((x2w)*(sin2theta) + 2*(xyw)*(sinthetacostheta) + (y2w)*(cos2theta)) / (n - 2)) print SDEx, SDEy 1133.42556259 1771.17313503 #1132.85 and 724.94 with your plugin `````` You can also calculate the eccentricity of the ellipse `````` eccentricity = np.sqrt(1 - ((min(SDEx, SDEy)**2)/(max(SDEx, SDEyy)**2))) `````` plot the ellipse (with 100 points in your case -> int(360/3.6, 360 only gives 360 points) ``````angle = theta1 pts = np.zeros((int(360/3.6)+1, 2)) beta = -np.radians(theta1) sin_beta = np.sin(beta) cos_beta = np.cos(beta) alpha = np.radians(np.r_[0.:360.:1j*(int(360/3.6)+1)]) sin_alpha = np.sin(alpha) cos_alpha = np.cos(alpha) pts[:, 0] = meanx + (SDEx * cos_alpha * cos_beta - SDEy * sin_alpha * sin_beta) pts[:, 1] = meany + (sSDEx * cos_alpha * sin_beta + SDEy * sin_alpha * cos_beta) `````` and it is easy to create a Polygon from the points (you can simplify this part) ``````geom = QgsGeometry.fromPolygon([[QgsPoint(xy[0],xy[1]) for xy in pts]]) .... ``````
Owner

havatv commented May 27, 2016

 @mlaloux , thank you for your help in improving the code. Many good points. I will go through it. But I am still puzzled that I can not reproduce the large deviations / errors that you get. The deviations I get are very small (and look like they could be due to the sample / population variance - dividing by N-1 or N). I am interested in finding the source of the errors. It would therefore be interesting to see your dataset. Could you test with the points in the attached zip-compressed CSV-file? points.csv.zip

mlaloux commented May 28, 2016 • edited Edited 1 time mlaloux edited May 30, 2016 (most recent)

I test your points and I have the same problem with the minor rsd.

in R

You can test the complete aspace rscript

``````##Ellipse=group
##Layer=vector
##Field=Field Layer
##SDE_shape = output vector
##weighted=boolean True.
library(aspace)
coords<-coordinates(Layer)
if(weighted) {
calc_sde(id=1, filename="/Users/me/SDE_Output.txt", centre.xy=NULL, calccentre=TRUE, weighted=TRUE, weights=Layer[[Field]], points=coords, verbose=FALSE)
}else {
calc_sde(id=1, filename="/Users/nouvportable/SDE_Output.txt", centre.xy=NULL, calccentre=TRUE, weighted=FALSE, weights=NULL, points=coords, verbose=FALSE)
}
fin_coords = c(sdeloc[2],sdeloc[3])
sp = SpatialPolygons( list(  Polygons(list(Polygon(fin_coords)), 1)))
res = SpatialPolygonsDataFrame(sp, sdeatt)
proj4string(res) <- Layer@proj4string
SDE_shape = res
``````

Results

weighted (in red the aspace solution, in blue the plugin )

aspace
380,308365072648, 221,164903371421, 72,0691130562294, 144,56539302743, 85,0285710508412
plugin
380,308365072648, 221,164903371421, 49,9513386519936, 100,198748093897, -1,48402852310311, 0,086767803691782

non weighted

aspace
320,889208633094,215,661390887291, 67,0310153611684, 165,633050304211, 77,1191513516296
plugin
320,889208633094, 215,661390887291, 44,1252162466994, 109,033021846596 -1,34598310742977, 0,224813219365126

Comparison with the classical standard deviations values (Bivariate normal ellipse, covariance ellipse)

You can compute that with the phonTools, car, Ellipse R packages and many others (I have rscripts for that) . There also Python modules.

• a one standard deviation ellipse polygon will cover approximately 68 percent of the features
• two standard deviations will contain approximately 95 percent of the features
• three standard deviations will cover approximately 99 percent of the feature

Therefore, you solution covers less than the ashape solution (area)

In Python

I try to adapt my solution to your script (but without for loops)

``````x = np.array(x)
y = np.array(y)
meanx = sum(x*weights )/sum(weights)
meany = sum(y*weights )/sum(weights)
xd = x-meanx
yd = y-meany
sumweight = sum(weights)
xyw= sum(xd*yd* weights)
x2w = sum(np.power(xd,2)* weights)
y2w = sum(np.power(yd,2)* weights)
print meanx,meany
380.308365073 221.164903371
``````

Now, the angles and the SDEs with the algorithm of calc_sde.R (aspace)

``````top1 = x2w - y2w
top2 = np.sqrt((x2w - y2w)**2 +4 * (xyw)**2)
bottom = (2 * xyw)
tantheta1 = (top1 + top2 ) / bottom
tantheta2 = (top1 - top2 ) / bottom
theta1 = np.degrees(np.arctan(tantheta1))
theta2 = np.degrees(np.arctan(tantheta2))
print theta1,theta2
85.0285710508 -4.97142894916
``````

The result of your plugin are

``````majorangle =np.degrees(0.086767803691782)
print majorangle
4.971428949158752
minorangle = np.degrees(-1.48402852310311)
-85.028571050840995
``````

Now, the SDE

``````costheta = np.cos(np.radians(theta1))
sigmax = np.sqrt(2) * np.sqrt( ( (sum(xd**2*weights))*(cos2theta) - 2*(sum(xd*yd*weights))*  (sinthetacostheta) + (sum(yd**2*weights))*(sin2theta) ) / (sum(weights) - 2) )
sigmay = np.sqrt(2) * np.sqrt( ( (sum(xd**2*weights))*(sin2theta) + 2*(sum(xd*yd*weights))*(sinthetacostheta) + (sum(yd**2*weights))*(cos2theta) ) / (sum(weights) - 2) )
print sigmax, sigmay
72.069113056229341, 144.56539302742954
``````

The result of your plugin is 49.951338652, 100.198748094

My conclusions

Therefore, the angles are the same (except the sign) and the discrepancies occurs in the computation of the SDEs and not on the coordinate reference system (purely geometric)

Owner

havatv commented May 30, 2016 • edited Edited 1 time havatv edited May 30, 2016 (most recent)

 Thanks a lot! The lack of correction for degrees of freedom should explain a small part of this (difference from the phonTools ellipse), while the lack of the sqrt(2) correction as used by Crimestat should explain the rest (difference from the aspace ellipse with an sdv of 1.0). The reported angles from the plugin are counter-clockwise relative to the first axis, while the aspace angle seems to be clockwise relative to the second axis. I have now tried to explain the behaviour of the plugin as follows: A QGIS plugin to create a standard deviational ellipse, according to the method presented by Robert Yuill (1971). This method does not give a radius equal to the standard distance deviation for a random point distribution. To achieve this, the SDs should be multiplied by sqrt(2), as explained in the CrimeStat documentation. The method does not correct for degrees of freedom. The plugin works on point vector layers. The user can choose to use only selected features, and can choose an attribute for weighting. A polygon vector layer with the standard deviational ellipse is produced, containing the attributes: meanx, meany, majorangle, minorangle, majorsd and minorsd. Angles are counter-clockwise relative to the first axis. What do you think?

mlaloux commented May 30, 2016 • edited Edited 1 time mlaloux edited May 30, 2016 (most recent)

 The principal problem is that some people use the areas of the SDE (ecology for example) and the results differ. The SDE problem is more complicated than the Robert S. Yuill solution, read Clarifying the Standard Deviational Ellipse or Confidence Analysis of Standard Deviational Ellipse and Its Extension into Higher Dimensional Euclidean Space for example. Here is the "synthesis" of the problem dotted line : elliptical contours of the normal probability or confidence levels with car in magenta, a simple solution with eigenvectors/eigenvalues of the variance matrix. You can do the same processing with the covariance matrix in blue, your plugin in red, the ashape solution Now, what is the best solution ? From "Clarifying the Standard Deviational Ellipse" More than seventy years ago there were two interesting articles published by Lefever (1926) and Furfey (1927) in The American Journal of Sociology. Both discussed what Lefever called “standard deviational ellipse” and its application in spatial analysis for a set of geographical units regarded as point set in two-dimensional space. “Standard deviational ellipse,”or SDE for short, is not at all an ellipse as its name implies.This fact was clarified by Furfey in 1927.Despite this,SDE has been widely introduced and applied as an ellipse in later studies. To the present, it has remained unclear what the actual curve of SDE is. From "Confidence Analysis of Standard Deviational Ellipse and Its Extension into Higher Dimensional Euclidean Space" Although SDE has extensive applications in various fields ever since 1926, it still has not been correctly clarified sometimes. For instance, from the latest resources in ArcGIS Help 10.1 describing how standard deviational ellipse works, it is stated that one, two and three standard deviation(s) can encompass approximately 68%, 95% and 99% of all input feature centroids respectively, supposing the features concerned follow a spatially normal distribution. However, this content corresponds to the well-known 3-sigma rule with respect to univariate normal distribution, rather than bivariate case. Worse still, there is even an attached illustration therein depicting several bivariate geographical features located within a planar map. Obviously, such confusing interpretation may mislead the GIS users to believe the univariate 3-sigma rule remains valid in two-dimensional Euclidean space, or even higher dimensions. and The upcoming discussed method presented by Yuill was actually an amelioration of Lefever’s original model despite of suffering from certain criticisms
Owner

havatv commented May 31, 2016 • edited Edited 1 time havatv edited May 31, 2016 (most recent)

 It is probably useful to offer a choice of method (Yuill, aspace/CrimeStat, ...) through the plugin. I have uploaded a revised version (1.1) of the plugin with options for Yuill and CrimeStat/aspace. More could be added.

havatv added a commit that referenced this issue Jun 1, 2016

``` Updates to follow up issue #3 - version 1.1 of the plugin ```
``` 6865471 ```

mlaloux commented Jun 1, 2016 • edited Edited 1 time mlaloux edited Jun 1, 2016 (most recent)

 Very good, I can give you others implementations in Python if you want (as pure covariance/variance error ellipses with Chi-Square confidence intervals or bivariate normal ellipses witch are real ellispes )