# Building Footprint: Rasters to Vectors

This notebook demonstrates the process of converting an output of the Analytics Feed (building footprint raster) into a vector dataset, using the following steps:

1. Obtaining An Analytics Raster
2. Downloading Quad Raster
3. Visualizing Buildings Image
4. Converting Buildings Raster to Vector Features using the following techniques:
    * GDAL CLI
    * Rasterio (no processing)
    * Rasterio (with simplification)
    * Rasterio (flexible function, filtering and simplification as example)

In [None]:
import json
import os
import requests
from pprint import pprint
from planet import Auth
from planet import Session, DataClient, OrdersClient
import fiona
import matplotlib.pyplot as plt
import rasterio
from rasterio import features as rfeatures
from rasterio.enums import Resampling
from rasterio.plot import show
import shapely
from shapely.geometry import shape as sshape

In [None]:
# if your Planet API Key is not set as an environment variable, you can paste it below
API_KEY = os.environ.get('PL_API_KEY', 'PASTE_API_KEY_HERE')

client = Auth.from_key(API_KEY)

# Use our API key as the basic authentication username
apiAuth = (API_KEY, '')

## Working With Analytic Feed Results

**Results** on the Planet Analytics API represent the output or "detections" of our machine learning models. Results are created for each Subscription, and each Subscription is derived from a Feed:

*Feed → Subscription → Results* 

> When new Planet imagery is published that intersects a Subscription's AOI and TOI, Planet’s computer vision models process the imagery and the output is added to a "collection" (OGC API - Features) of Results associated with a Subscription.


#### Feed / Result Types

As we've seen, several types of **Feeds** exist, and Results for Feed Subscriptions can be categorized as one of three types: `Object Detection`, `Segmentation`, and `Change Detection`. This notebook covers the `Object Detection` and `Segmentation` feed types, while the next notebook covers `Change Detection` feeds.


#### Types of Feeds + Result Output Format

| Feed Type | Results Type | Results Format ||
| --- | --- | --- | --- |
| Vessel Detection | Object Detection | Detection Features (Polygons) |
| Building Detection | Segmentation (Classification) | Raster Mask / Basemaps |
| Road Detection | Segmentation (Classification) | Raster Mask / Basemaps |
| Building Construction Detection | Change Detection | Raster Mask / Basemaps |
| Road Construction Detection | Change Detection | Raster Mask / Basemaps |

We'll be working with Segmentation Feeds Results in this notebook.


## Obtain Analytics Raster

#### Identify Building Feed Feature for Download

We want to download the most recent feature from the feed for building detection in Sazgin, Turkey.

To do this, we need to:
1. List All Available Feed IDs
2. Identify the Feed ID we will need, corresponding to Building Detection
3. List Subscriptions with our selected Feed ID
4. Identify the Subscription ID we will need, corresponding to Sazgin, Turkey
5. Request the Results Collection corresponding to the Subscription ID we've identified
6. Find the most recent Feature from this Feature Collection


#### Setup the Request Endpoints

The request should go to the following address: https://api.planet.com/analytics/feeds

In [None]:
# Planet Analytics API base url
PAA_BASE_URL = "https://api.planet.com/analytics/"

In [None]:
# Define our endpoints to point to feeds, subscriptions, and collections
feeds_endpoint = 'feeds/'
subscriptions_endpoint = 'subscriptions/'
collections_endpoint = 'collections/'

# Construct the URL for the HTTP request 
# (Planet Analytics API base URL + desired endpoint)
feeds_request_url = PAA_BASE_URL + feeds_endpoint
subscriptions_request_url = PAA_BASE_URL + subscriptions_endpoint
collections_request_url = PAA_BASE_URL + collections_endpoint

#### List Available Feeds

Since we're making a `GET` request, we'll use Requests' `.get` method. Now, let's create our request by passing our request URL and auth variable. Running the next cell should make a call out to the Planet Analytics API.

If our request call is **successful** we should get back a response with a [`200 OK`](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/200) `HTTP status code`! 

In [None]:
# Make the GET request
# A succesful request should return a 200 OK status code
feeds_response = requests.get(feeds_request_url, auth=apiAuth)

print(feeds_response)

In [None]:
# Read the Response Data
# Decode the response JSON body to a python dict
feeds_response_json = feeds_response.json()

In [None]:
# Get Feed IDs
for d in feeds_response_json['data']:
    print('{} ({}):\n\r{}\n\r'.format(d['id'], d['created'], d['description']))

#### Identify Feed ID We'll Use

We will use the following Feed_ID for Monthly Building Detection from the list above.

In [None]:
feed_id = 'b442c53b-fc72-4bee-bab4-0b7aa318ccd9'

#### List Subscriptions Containing Our Chosen Feed ID

In [None]:
# Set query parameters for the request
# Use the feedID`
feed_subscriptions_params = {"feedID": feed_id}

# Make the request to the api
feed_subscriptions_response = requests.get(subscriptions_request_url, params=feed_subscriptions_params, auth=apiAuth).json()

# Get the list of subscriptions from the 'data' property of the response
subscriptions = feed_subscriptions_response['data']

# Print the number of subscriptions found for the given feed
print("{} subscriptions found for Feed with id:\n{}\n".format(len(subscriptions), feed_id))

# Print the subscriptions list
print(json.dumps(subscriptions, indent=1))

In [None]:
# Get subscription ids
for d in subscriptions:
    print('{} ({}):\n\r{}\n\r'.format(d['id'], d['created'], d['title']))

#### Identify the Subscription ID we will need, corresponding to Sazgin, Turkey

In [None]:
# building footprints in Sazgin, Turkey
subscription_id = '02c4f912-090f-45aa-a18b-ac4a55e4b9ba'

#### Request the Corresponding Results Collection

In [None]:
# First, we will request a subscription to look at the subscription details

# Construct the URL for the Subscription
subscription_url = PAA_BASE_URL + subscriptions_endpoint + subscription_id

print("Request URL: {}".format(subscription_url))

# Make the GET request for Subscriptions list 
subscription = requests.get(subscription_url, auth=apiAuth).json()

# Get subscription details
print("{} \n{}\nSubscription Id: {}\n".format(subscription['title'], subscription['description'], subscription['id']))

# Print the subscription object
print(json.dumps(subscription, sort_keys=True, indent=4))

In [None]:
# Construct the URL for the subscription's Results Collection
collection_results_url = collections_request_url + subscription['id']

print("Request URL: {}".format(collection_results_url))

# Get subscription results collection
collection_results = requests.get(collection_results_url, auth=apiAuth).json()

# Pretty Print response JSON
print(json.dumps(collection_results, sort_keys=True, indent=4))

In [None]:
# Request Collection Items
# What we got above was the collection itself. We're interested in the items in the collection

# Construct the URL for the subscription's Results Collection
collection_results_url = collections_request_url + subscription['id'] + '/' + 'items'

print("Request URL: {}".format(collection_results_url))

# Get subscription results collection
collection_items = requests.get(collection_results_url, auth=apiAuth).json()

# Pretty Print response JSON
print(json.dumps(collection_items, sort_keys=True, indent=4))

In [None]:
# How many features do we have in this collection?

features = collection_items['features']
print('{} features in collection'.format(len(features)))

#### Find the most recent Feature from this Feature Collection

In [None]:
# sort features by acquisition date and take latest feature

features.sort(key=lambda k: k['properties']['first_acquired'])
feature = features[-1]

print (feature)

print(feature['properties']['first_acquired'])

## Download Quad Raster

Now that we've identified the feature in the feature collection that we're interested in, we can get this result using a "GET" request.

#### Downloading a Result Quad

If we want to download the raw quad data, for either the source or output target, we can do so via the Planet Mosaics API. To find the link to the file, we can look at the **Result** item's `links` property . Here's the first result from our Subscription Results collection:

In [None]:
feature_links = feature['links']
feature_links

Above, we see the links for both `target-quad` (the result output), and `source-quad` (the source imagery that was used to create the detections). We're interested in downloading the target (result) quad:

In [None]:
# Construct the URL to target quad
target_quad = list(filter(lambda link: link['rel'] == 'target-quad', feature_links))[0]['href']

print("Target (Result) Quad URL:\n{}\n".format(target_quad))

Clicking the link in the above cell will download the COG (.tiff) file!

## Visualize Buildings Image

The output of the analytics building detection is a boolean image where building pixels are given a value of True and non-building pixels are given a value of False.

In [None]:
# Set filepaths for target quad (raster), and folder where you'd like our vector files to be downloaded

filename = "TIFF_FILEPATH_HERE"
dest = "VECTOR_DESTINATION_HERE"

In [None]:
def _open(filename, factor=1):
    with rasterio.open(filename) as dataset:
        height = int(dataset.height / factor)
        width = int(dataset.width / factor)
        data = dataset.read(
            out_shape=(dataset.count, height, width)
        )
    return data

def open_bool(filename, factor=1):
    data = _open(filename, factor=factor)
    return data[0,:,:]

def get_figsize(factor):
    return tuple(2 * [int(25/factor)])



factor = 1
figsize = (15, 15)

buildings = open_bool(filename, factor=factor)
fig = plt.figure(figsize=figsize)
# show(buildings, title="footprints", cmap="binary")
show(buildings[2500:3000, 0:500], title="footprints", cmap="binary")

## Convert Buildings Raster to Vector Features

Here, we examine several different ways to convert our buildings raster into vector features:
* GDAL CLI
* Rasterio (basic)
* Rasterio (simplified)
* Rasterio (flexible)


### GDAL Command-Line Interface (CLI)

GDAL provides a python script that can be run via the CLI. It is quite easy to run and fast.

In [None]:
def get_layer_name(filename):
    # get the default layer output layer name based on the 
    # output filename. I wish there was a way to specify
    # the output layer name but attempts have failed thus far.
    return filename.split('/')[-1].split('.')[0]

gdal_tmp_output_filename = os.path.join(dest, 'test_gdal_all.shp')
gdal_tmp_output_layer_name = get_layer_name(gdal_tmp_output_filename)
gdal_output_filename = os.path.join(dest, 'test_gdal.shp')
gdal_output_layer_name = get_layer_name(gdal_output_filename)

In [None]:
# convert the binary image into polygons
# creates polygons for building footprints as well as regions between
# and around building footprints
!gdal_polygonize.py $filename $gdal_tmp_output_filename

In [None]:
# get number of features, this includes inside and outside building footprints
!ogrinfo -so  $gdal_tmp_output_filename $gdal_tmp_output_layer_name | grep 'Feature Count'

In [None]:
# get number of building footprint features
# building footprints are associated with image value (DN) of 255
!ogrinfo -so $gdal_tmp_output_filename -sql "SELECT * FROM $gdal_tmp_output_layer_name WHERE DN=255" \
    | grep 'Feature Count'

In [None]:
# create a new shapefile with only building footprints
!ogr2ogr -sql "SELECT * FROM $gdal_tmp_output_layer_name WHERE DN=255" \
    $gdal_output_filename $gdal_tmp_output_filename

In [None]:
# confirm the number of building footprint features
!ogrinfo -so $gdal_output_filename -sql "SELECT * FROM $gdal_output_layer_name WHERE DN=255" \
    | grep 'Feature Count'

### Rasterio

In this section we use rasterio to convert the binary buildings raster into a vector dataset. The vectors are written to disk as a shapefile. The shapefile can be imported into geospatial programs such as QGIS or ArcGIS for visualization and further processing.

This is basic conversion to vector shapes. No smoothing to remove pixel edges, or conversion to the building centerlines is performed here.

In [None]:
def buildings_as_vectors(filename): 
    with rasterio.open(filename) as dataset:
        buildings = dataset.read(1)
        building_mask = buildings == 255 # mask non-building pixels

        # transforms buildings
        features to image crs
        building_shapes = rfeatures.shapes(buildings, mask=building_mask, transform=dataset.transform)
        building_geometries = (s for s, _ in building_shapes)
        
        crs = dataset.crs
    return (building_geometries, crs)

def save_as_shapefile(output_filename, geometries, crs):
    driver='ESRI Shapefile'
    schema = {'geometry': 'Polygon', 'properties': []}
    with fiona.open(output_filename, mode='w', driver=driver, schema=schema, crs=crs) as c:
        count = 0
        for g in geometries:
            count += 1;
            c.write({'geometry': g, 'properties': {}})
        print('wrote {} geometries to {}'.format(count, output_filename))

        
building_geometries, crs = buildings_as_vectors(filename)
output_filename = os.path.join(dest, 'test_rasterio.shp')
save_as_shapefile(output_filename, building_geometries, crs)

### Rasterio - Simplifying

In this section, we use `shapely` to simplify the building footprints so we don't have a million pixel edges.

In [None]:
def buildings_as_vectors_with_simplification(filename): 
    with rasterio.open(filename) as dataset:
        buildings = dataset.read(1)
        building_mask = buildings == 255 # mask non-building pixels

        # we skip transform on vectorization so we can perform filtering in pixel space
        building_shapes = rfeatures.shapes(buildings, mask=building_mask)
        building_geometries = (s for s, _ in building_shapes)
        geo_shapes = (sshape(g) for g in building_geometries)
    
        # simplify so we don't have a million pixel edge points
        # value of 1 (in units of pixels) determined by visual comparison to non-simplified
        tolerance = 1
        geo_shapes = (g.simplify(tolerance, preserve_topology=False)
                      for g in geo_shapes)

        # apply image transform    
        # rasterio transform: (a, b, c, d, e, f, 0, 0, 1), c and f are offsets
        # shapely: a b d e c/xoff f/yoff
        d = dataset.transform
        shapely_transform = [d[0], d[1], d[3], d[4], d[2], d[5]]
        proj_shapes = (shapely.affinity.affine_transform(g, shapely_transform)
                       for g in geo_shapes)
        
        building_geometries = (shapely.geometry.mapping(s) for s in proj_shapes)
        
        crs = dataset.crs
    return (building_geometries, crs)

building_geometries_simp, crs = buildings_as_vectors_with_simplification(filename)
output_filename = os.path.join(dest, 'test_rasterio_simp.shp')
save_as_shapefile(output_filename, building_geometries_simp, crs)

### Appendix - Extending the Calculation with Rasterio

In this section we get a little bit fancy and set up the rasterio vectorization function so that it can take any calculation function, as long as that function has a generator of `rasterio.shape` as input and a generator of `rasterio.shape` as output. We will use this to filter and simplify building footprint shapes.

In [None]:
def buildings_as_vectors_proc(filename, proc_fcn): 
    with rasterio.open(filename) as dataset:
        buildings = dataset.read(1)
        building_mask = buildings == 255 # mask non-building pixels

        # we skip transform on vectorization so we can perform filtering in pixel space
        building_shapes = rfeatures.shapes(buildings, mask=building_mask)
        building_geometries = (s for s, _ in building_shapes)
        geo_shapes = (sshape(g) for g in building_geometries)
        
        # apply arbitrary processing function
        geo_shapes = proc_fcn(geo_shapes)

        # apply image transform    
        # rasterio transform: (a, b, c, d, e, f, 0, 0, 1), c and f are offsets
        # shapely: a b d e c/xoff f/yoff
        d = dataset.transform
        shapely_transform = [d[0], d[1], d[3], d[4], d[2], d[5]]
        proj_shapes = (shapely.affinity.affine_transform(g, shapely_transform)
                       for g in geo_shapes)
        
        building_geometries = (shapely.geometry.mapping(s) for s in proj_shapes)
        
        crs = dataset.crs
    return (building_geometries, crs)

def filter_and_simplify_footprints(footprints):
    # filter to shapes consisting of 6 or more pixels
    min_pixel_size = 6
    geo_shapes = (s for s in footprints if s.area >= min_pixel_size)
        
    # simplify so we don't have a million pixel edge points
    # value of 1 (in units of pixels) determined by visual comparison to non-simplified
    tolerance = 1
    geo_shapes = (s.simplify(tolerance, preserve_topology=False)
                  for s in geo_shapes)
    return geo_shapes

building_geometries_simp, crs = buildings_as_vectors_proc(filename, filter_and_simplify_footprints)
output_filename = os.path.join(dest, 'test_rasterio_proc.shp')
save_as_shapefile(output_filename, building_geometries_simp, crs)