# Introduction to Geospatial Data
*Lesson Developers: Coline Dony cdony@aag.org and Karen Kemp  kakemp@usc.edu*

In [1]:
# This code cell starts the necessary setup for Hour of CI lesson notebooks.
# First, it enables users to hide and unhide code by producing a 'Toggle raw code' button below.
# Second, it imports the hourofci package, which is necessary for lessons and interactive Jupyter Widgets.
# Third, it helps hide/control other aspects of Jupyter Notebooks to improve the user experience
# This is an initialization cell
# It is not displayed because the Slide Type is 'Skip'

from IPython.display import HTML, IFrame, Javascript, display
from ipywidgets import interactive
import ipywidgets as widgets
from ipywidgets import Layout

import getpass # This library allows us to get the username (User agent string)

# import package for hourofci project
import sys
sys.path.append('../../supplementary') # relative path (may change depending on the location of the lesson notebook)
import hourofci

# Retreive the user agent string, it will be passed to the hourofci submit button
agent_js = """
IPython.notebook.kernel.execute("user_agent = " + "'" + navigator.userAgent + "'");
"""
Javascript(agent_js)

# load javascript to initialize/hide cells, get user agent string, and hide output indicator
# hide code by introducing a toggle button "Toggle raw code"
HTML(''' 
    <script type="text/javascript" src=\"../../supplementary/js/custom.js\"></script>
    
    <input id="toggle_code" type="button" value="Toggle raw code">
''')

## The world is infinitely complex
![View of Queenstown NZ](supplementary/queenstown.jpg)
- How can we code this landscape around Queenstown, New Zealand so that we can compute with it? 
- How do we even decide what to measure and record? 
- And how can we structure data about this complex world into tables to represent this?????

## A famous GIScientist once said
###     "People cultivate fields (but manipulate objects)" <small><sup>1</sup></small>

This phrase summarizes the most important distinction we make when capturing geospatial data. 
- Is the world made up of *fields* or *objects*?

![raster or vector](supplementary/raster_vector.png)

<small><sup>1</sup>by Helen Couclelis, 1992, <a href="https://www.researchgate.net/publication/221589734_People_Manipulate_Objects_but_Cultivate_Fields_Beyond_the_Raster-Vector_Debate_in_GIS">"People Manipulate Objects (but Cultivate Fields): Beyond the Raster-Vector Debate in GIS"</a> from the book *Theories and Methods of Spatio-Temporal Reasoning in Geographic Space: International Conference GIS — From Space to Territory: Theories and Methods of Spatio-Temporal Reasoning* Pisa, Italy, September 21–23, 1992 (pp.65-77)</small>

Think about the picture of Queenstown we looked at earlier. 
 
The rolling surface of the landscape is continuous. There's land or water, at various elevations, everywhere. That's a field. Elevation is the classic field. There is a value of elevation everywhere. 

Then consider all the manmade structures in the picture. There are buildings, lightposts, roads. These are objects. The object world view is mostly empty, with objects scattered around. 

So, let's see if you can separate these two perspectives...

(an interactive here)

Sort these things into whether they are objects or fields.

- elevation
- soil type
- air temperature at the ground surface

- cars
- mailboxes
- railway tracks


Now let's look at some geospatial data that are coded as either objects or fields.

This is geospatial data stored as a field.

CODE BLOCK to load and display a raster image.

This is geospatial data stored as objects.

CODE BLOCK to load and display a collection of OSM layers

Now let's dig deeper. Let's look at how the field data is actually stored.

CODE BLOCK to provide head of the raster image.

What's this all about???

Field data is usually stored as *rasters*.

To store the world into a raster, the surface of the earth is divided into a grid of equal sized cells that covers a specific chunk of the earth, say a square that is 10 m by 10 m. 

![world to raster](supplementary/world_to_raster_sm.png)

Each cell is given a value that represents the data that has been measured on the earth in that cell. 

In the raster in this graphic, the building has been coded with the value green and the road has been coded with the value red. 

So, let's look again at the first few lines of that field data.

CODE BLOCK to provide head of the raster image.

Here we see the first 5 lines of the data file. Each line shows us the values of the field in each cell across a row of the grid. The numbers indicate the value in each raster cell. 

Now let's look at how object data is stored - hint, it's completely different! And WAY more complex. 

We'll start simple. When you ask Google to show you all the nearby restaurants on a map, you get a map with a bunch of pins, some with labels. You can click on them and find out information about those places. Those dots represent restaurant objects.

Here's that map for Queenstown with some points of interest. 
<a href="https://www.google.com/maps/place/Queenstown,+New+Zealand/@-45.0514839,168.6648181,1609a,35y,345.22h,54.63t/data=!3m1!1e3!4m5!3m4!1s0xa9d51df1d7a8de5f:0x500ef868479a600!8m2!3d-45.0301511!4d168.6616206">This link will take you to Google Maps.</a>
![Queenstown POIs](supplementary/queenstown_google_POI.png)

Now, let's see how that data is stored in a file. 

CODE BLOCK to load and show the first 5 rows of an attribute table of a point dataset. 

Here we have a table like the one at the beginning of this section. Each row has an object ID, some data about various attributes for that object, maybe one with an indirect georeference and then a magic column with entries in binary that you can't read. That contains the actual point locations. Fortunately, the computer can read that code and put the associated dot on the map. 

Now let's see how that table can generate the dots on a map!

CODE BLOCK to view the point data over a nice base map. 

Now, remember this? 
![raster or vector](supplementary/raster_vector.png)
These illustrate the two most common *data models* for geospatial data. 
- Fields are stored as grids called *rasters* and there is a value everywhere. 
- Objects, which are scattered around mostly empty space, are stored as *vectors*.

So, tell me more about vectors, you say...

Vectors usually come in three varieties - points, lines and polygons. 

![vectors](supplementary/vectors_sm.png)

Points are good for things like cities on a world map, or lightpoles and signposts on a neighborhood map. 

Lines are for rivers, roads, railways, boundaries - that sort of thing.

Polygons are areas. So they're used for lakes, building footprints, parks. 

Vector data has two components.
![Geometry + Attributes](supplementary/vector_structure.png)
These components can be stored together in a table by including one or more columns that provide the direct georeference (e.g. lat and long).

*OR*, these components can be stored separately. Attributes with an object ID in one table and the geometry labelled with the same IDs in a separate file. 

By the way, it's good to know that you can't mix up points, lines and polygons in the a single geospatial data file. If you want a map that shows points, lines and polygons, then you'll need at least three different datasets, one for each type of vector data. Here's an example.

CODE BLOCK load point, line and polygon data (OSM?) and view. 

OK, now let's practice these concepts. For each of the following kinds of geospatial data, choose the data model (raster or vector) that it's most likely to be stored in. 
- Public transit routes
- Elevation in across a national park.
- Points of interest, i.e. Tourist must see places in a city
- COVID infection rates by state.


Well done! Now you know a little bit about geospatial data. 

If you have worked through this lesson carefully, you should now be able to: 
1. Explain what is special about geospatial data.
2. Describe how location can be measured and recorded in geospatial data.
3. Explain the difference between raster and vector data.
4. Identify several different types of geospatial data.
5. Load and view different kinds of geospatial data in Python Notebooks.

If you want to learn more about geospatial data, you can go on to the intermediate Geospatial Data lesson.

Or you can go back and complete some of the other introductory lessons as they all touch on the use of geospatial data. 