# 1. Introduction to rgeoda

## Install rgeoda for R

rgeoda is a R package that wraps all core functions of spatial data analysis in GeoDa and libgeoda. Unlike the desktop software GeoDa, libgeoda is a non-UI and feature focused C++ library that is designed for programmers to do spatial data analysis using their favoriate programming languages (R, Python, Java etc.). It also aims to be easily integratd with other libraries, softwares or systems on different platforms.

For testing stage, the easiest way to install the development version of rgeoda is using rgeoda's source package:
```R
install.packages("https://github.com/lixun910/rgeoda/archive/0.0.1.tar.gz")
```

For windows R users, please use the following source package:
```R
install.packages("https://github.com/lixun910/rgeoda/releases/download/0.0.1/rgeoda_0.0.1.zip")
```

If everything installed without error, you should be able to load rgeoda:

In [1]:
library(rgeoda)

## Load Geospatial Data

In this note, we will use an ESRI Shapefile natregimes.shp comes with the package:

In [2]:
nat_path <- system.file("extdata", "natregimes.shp", package = "rgeoda")
nat_path

Use the path above (or the path to your own dataset), we can create a GeoDa instance, and the GeoDa instance is the main entry point of rgeoda.

In [3]:
gda <- GeoDa(nat_path)

rgeoda provides some functions to check the meta data of the loaded dataset:


In [4]:
num_obs <- gda$GetNumObs()
num_cols <- gda$GetNumCols()
field_types <- gda$GetFieldTypes()
field_names <- gda$GetFieldNames()

num_obs
num_cols

## Access Table Data

One can call function `GetNumericCol(string col_name)` to get numeric data (vector type) from GeoDa instance. For example, to get data of column “HR60”:

In [5]:
hr60 <- gda$GetNumericCol("HR60")
hr60[1:20]

## Spatial Weights

One can call function `CreateContiguityWeights(string poly_id="", bool is_queen=true, int order=1, bool include_lower_order=false)` to create a Queen or Rook contiguity weights:

In [6]:
queen_w <- gda$CreateContiguityWeights(is_queen=TRUE)

To access the properties of the created weights object, one can just access the attributes of the returned weights object queen_w:

In [7]:
cat("weight_type: ", queen_w$weight_type, 
    "\nis_symmetric: ", queen_w$is_symmetric, 
    "\nsparsity:",queen_w$sparsity,
    "\ndensity:",queen_w$density, 
    "\nmin_nbrs:",queen_w$min_nbrs, 
    "\nmax_nbrs:",queen_w$max_nbrs,
    "\nmean_nbrs:",queen_w$mean_nbrs, 
    "\nmedian_nbrs:",queen_w$median_nbrs, "\n")

weight_type:  gal_type 
is_symmetric:  TRUE 
sparsity: 0 
density: 0.190896 
min_nbrs: 1 
max_nbrs: 14 
mean_nbrs: 5.889141 
median_nbrs: 6 


## Spatial Data Analysis

### Local Spatial Autocorrelation

Using the created Queen weights queen_w and the data hr60, we can call function `LISA(GeoDaWeight w, vector data)` to compute the local spatial autocorrelation of variable “HR60”.

In [8]:
lisa <- gda$LISA(queen_w, hr60)

We can access the LISA results by calling the “getter” methods from the returned LISA object:

In [9]:
lags <- lisa$GetLagValues()
lags[1:20]

Get local moran values:

In [10]:
lms <- lisa$GetLocalMoranValues()
lms[1:20]

Get pseudo-p values:

In [11]:
pvals <- lisa$GetLocalSignificanceValues()
pvals[1:20]

Get LISA category values:

In [12]:
cats <- lisa$GetClusterIndicators()
cats[1:20]

You can easily re-run the LISA computation by calling its Run() function. For example, re-run the above LISA exampe using 9999 permutations:

In [13]:
lisa$SetNumPermutations(9999)
lisa$Run()

NULL

NULL

Display the p-values after 9999 permutations:

In [14]:
pvals <- lisa$GetLocalSignificanceValues()
pvals[1:20]

Since rgeoda is using GeoDa’s C++ code, by default, rgeoda uses multi-threading to accelerate the computation of Local Moran. One can also specify how many threads to run the computation:

In [15]:
lisa$SetNumThreads(4)
lisa$Run()

NULL

NULL

Display the p-values after re-run LISA with 4 threads:

In [16]:
pvals <- lisa$GetLocalSignificanceValues()
pvals[1:20]

## Clustering

### SKATER

Using the function `SKATER(int k, GeoDaWeight w, Vector column_names, String distance_method='euclidean')` to run a spatially constrained clustering on current dataset.

In [17]:
skater <- gda$SKATER(10, queen_w, c("HR60", "PO60"))
skater