# Data Structures

In [None]:
%matplotlib inline

import pandas
import geopandas
import rasterio

## Contents

* [Core data types](#Core-data-types)
* [Reading (spatial) data](#Reading-(spatial)-data)
* [`(Geo)DataFrames`](#(Geo)DataFrames)
* [`Series`](#Series)
* [The `geometry` column](#The-geometry-column)
    * [CRS](#CRS)
    * [Geometries](#Geometries)
    * [Geometric operations](#Geometric-operations)
* [A note on rasters](#A-note-on-rasters)

## Core data types

Core:

In [None]:
type(1)

In [None]:
type(1.0)

In [None]:
type('a')

In [None]:
type('hello world!')

Extensions:

In [None]:
pandas.to_datetime("2019-11-05 9:00")

In [None]:
pandas.Categorical(["Apples", "Oranges"])

In [None]:
from shapely.geometry import Point

Point(-0.08947918950509948, 51.49441830214852)

## Reading (spatial) data

For non-spatial data, we use `pandas` and its `read_XXX` methods. Have a peak at what's available by typing `pandas.read_` and pressing `TAB`; auto-completion will show you all supported file formats.

For spatial data, `geopandas` *extends* `pandas` functionality to support vector spatial data. Let's illustrate its main `read_file` method with a dataset of AirBnb aggregate statistics for Inner London:

In [None]:
db = geopandas.read_file('../data/ilm_abb.geojson')

## `(Geo)DataFrames`

When you read a multi-column tabular file, a `DataFrame` is created. If that table contains spatial information and is read with `geopandas`, you get a `GeoDataFrame`:

In [None]:
type(db)

Both data structures are very similar and modeled after relational databases like SQL (and not completely unlike an Excel Spreadsheet!). Let's print the top ("head") of the table to inspect its contents:

In [None]:
db.head()

Other quick exploratory methods:

In [None]:
db.info()

In [None]:
db.shape

In [None]:
db.describe()

## `Series`

`DataFrames` are two-dimensional array-like structures (think a matrix but with mixed types), and are "made up" of `Series`, which are one-dimensional objects (think of vectors). 

In [None]:
db['review_scores_rating'].head()

## The `geometry` column

In [None]:
db['geometry'].head()

Remember:

- (Almost) like a standard `Series` object
- Only one per `GeoDataFrame`
- Extends `pandas` bringing all sorts of geospatial goodies

### CRS

Small but powerful attribute:

In [None]:
db.crs

**IMPORTANT**: `crs` is an attribute of a `GeoDataFrame`, not of each geometry!

In [None]:
db_wgs84 = db.to_crs(epsg=4326)

### Geometries

In [None]:
poly = db.loc[0, 'geometry']

In [None]:
poly

In [None]:
poly.bounds

In [None]:
poly.area

In [None]:
#poly.crs

### Geometric operations

In [None]:
poly2 = db.loc[27, 'geometry']
poly2

In [None]:
poly.touches(poly2)

In [None]:
poly.intersects(poly2)

And we can "broadcast" this too!

In [None]:
db.touches(poly)

In [None]:
db[db.touches(poly)]

## A note on rasters

Very different approach. Your friend here is `rasterio`.

In [None]:
p = '../data/london_lights.gtiff'
src = rasterio.open(p)

In [None]:
src.count

In [None]:
src.crs

In [None]:
src.bounds