<a href="https://colab.research.google.com/github/edwardoughton/GeoAI/blob/main/01_01_GeoAI_intro.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Welcome to GGS590 GeoAI

This GGS590 Special Topics class focuses on the use of AI in the geospatial sciences ("GeoAI").

This is the first time GGS has looked to integrate an AI class into the department. Hence, the aim is to transition into the main roster of classes over 2026. Therefore, you have a great opportunity to influence the structure and content of this class.



By the end of this session, students should be able to:

*   Define GeoAI and distinguish it from GIS, geospatial data science, and general AI
*   Identify major GeoAI use cases across domains
*   Recognize core methods and model families used in GeoAI
*   Use AI as a coding partner responsibly and effectively
*   Refresh essential Python skills for geospatial analysis
*   Set up and run notebooks in Google Colab


This course will explore the intersection of:

*   Geospatial data (raster, vector, spatiotemporal)
*   Artificial intelligence and machine learning
*   Domain-driven spatial reasoning
*   The use of common geospatial and AI tools
*   Critical thinking



I will place a key emphasis on:

*   Practical modeling with real geospatial data (frequently satellite imagery and OpenStreetMap data)
*   Understanding why particular AI methods work (or fail) in spatial contexts
*   Scientifically reproducible GeoAI workflows
*   Code testing to ensure you have a correct/plausible answer!



Please be aware this GeoAI course will not:

*   Be a general GIS introductory course
*   Teach you explicitly how to use Graphical User Interface (GUI) GIS software such as ESRI ArcGIS Pro or Quantum GIS
*   A general deep learning theory course
*   A “black box” AI tools class
*   A deeply math-based introduction to AI/ML tools



##What is GeoAI?

Obviously there are lots of definitions but let us run with a working definition as follows:



> The application of AI/ML to spatially explicit data, accounting for traditional geospatial characteristics such as location and spatial dependency.



By "traditional geospatial characteristics" we mean that GeoAI accounts for spatial structure (e.g., non-independent relationships imposed by geography):



*   Spatial autocorrelation (e.g., Tobler's first law)
*   Scale effects
*   Spatiotemporal dependence
*   Topological relationships
*   Heterogeneous geospatial data sources
*   Uncertainty and bias often tied to geographic space

A GeoAI model is “spatially aware” when these properties are explicitly encoded rather than implicitly hoped for.







### How does GeoAI shape up versus related geospatial fields?


| Field                   | Focus                                             | Examples                                  |
|-------------------------|---------------------------------------------------|-------------------------------------------|
| GIS                     | Data management, visualization, spatial analysis | ArcGIS, QGIS, PostGIS                     |
| Geospatial Data Science | Statistics + ML on spatial data                  | Spatial regression, kriging, GWR          |
| Computer Vision         | Image understanding (often non-spatial)          | CNNs, object detection, image segmentation|
| GeoAI                   | AI models aware of spatial structure             | GNNs, spatial deep learning, remote sensing AI |


##Core GeoAI Use Cases

**Remote sensing and Earth Observation (EO)**:

*   Land cover / land use classification
*   Change detection
*   Object detection (buildings, roads, ships)
*   Environmental monitoring

Some example papers:

Zhu et al. (2017). Deep learning in remote sensing: A comprehensive review.
https://doi.org/10.1109/MGRS.2017.2762307

Ma et al. (2019). Deep learning in remote sensing applications.
https://doi.org/10.1016/j.isprsjprs.2019.04.015

Zhu et al. (2025) GlobalBuildingAtlas: an open global and complete dataset of building polygons, heights and LoD1 3D models. https://doi.org/10.5194/essd-17-6647-2025

**Urban/rural analytics, smart cities, and transportation planning**:

*   Population estimation
*   Transportation and mobility prediction
*   Urban growth modeling
*   Informal settlement detection

Some example papers:

Batty, M. (2018). Artificial intelligence and smart cities: https://doi.org/10.1177/2399808317751169

Tatem, A. (2017) WorldPop, open data for spatial demography: https://doi.org/10.1038/sdata.2017.4

  




**Climate, environment, and natural hazards**:

*   Flood, hurricane and wildfire (NatCat) risk modeling
*   Climate downscaling
*   Ecosystem and biodiversity modeling

Some example papers:

Reichstein et al. (2019). Deep learning and process understanding for data-driven Earth system science.
https://doi.org/10.1038/s41586-019-0912-1

Rolnick et al. (2022). Tackling climate change with machine learning.
https://doi.org/10.1145/3485128

**Human–environment and social applications**:

*   Disease mapping
*   Socioeconomic inference
*   Accessibility and equity analysis

Some example papers:

Jean et al. (2016). Combining satellite imagery and machine learning to predict poverty.
https://doi.org/10.1126/science.aaf7894

Stevens et al. (2015). Disaggregating census data using remote sensing and machine learning.
https://doi.org/10.1371/journal.pone.0107042

Oughton and Mather (2021) Predicting cell phone adoption metrics using machine learning and satellite imagery. https://doi.org/10.1016/j.tele.2021.101622

## Core GeoAI methods

**Traditional ML modeling approaches**:

*  Random forests
*  Gradient boosting
*  Support vector machines

The benefit of using these approaches is that they are highly explainable (unlike more fancy deep learning methods). These techniques might also be much more computationally efficient than other options (so do not neglect them because they are classic approaches).

**Deep learning approaches**:

*  Convolutional Neural Networks (CNNs)
*  U-Net and encoder–decoder architectures
*  Vision Transformers (ViTs)

These methods have achieved substantial improvements (especially in computer vision) over the past decade. They are particularly strong for working with raster data, imagery, and potentially spatial data patterns.

**Graphs and spatial models**:

*   Graph Neural Networks (GNNs)
*   Spatial autoregressive models
*   Point process models

Graph-based, autoregressive, and point process models each address spatial dependence from distinct perspectives. There are distinct advantages and disadvantages which  make them suitable for different data structures and research objectives.

**Space: The final assumption violation**:

We need to be aware that in the same way that spatially dependent data can badly jeopardize regression assumptions, this is equally true for spatially applied naive AI:

* Independent and Identically Distributed (IID) assumption violations
* Spatial leakage (e.g., when proximate statistical entries lead between training and testing data, so an example of non-IID)
* Edge effects (biases at the edge of the study area)
* Modifiable Areal Unit Problem (MAUP)

## Using AI as a coding partner

Most of us writing code regularly will probably be teaming with an AI tool. Generally, GenAI tools are good at:

*   Producing boilerplate code
*   Providing library usage examples
*   Debugging syntax errors that inevitably arise when coding
*   Explaining unfamiliar APIs
*   Helping you refactor your code
*   Speeding up the process of profiling your code

However, there are key limitations. For example:

*   AI tools have very limited understanding of data provenance
*   Making modeling decisions for spatial validity (e.g., correct results)
*   Ensuring unbiased results
*   AI cannot guarantee correctness

Responsible use guidelines:

*   Always inspect and test AI-generated code
*   Never assume spatial correctness
*   Document AI assistance when used
*   Treat AI output as a draft, not an answer

So in summary, GenAI code is a starting point, not an end point.

If you really want to be a skeptic on AI, it can be argued that we have just shifted time previously spent on planning/architecture, to testing/validation. For example, you can get 90% of the way there in 20 minutes ("vibecoding"), but it takes an extremely long time to then test and validate this code, especially for all edge cases.



