# The Wealth of Cities
## Predicting the Wealth of a City from Sattelite Imagery

Accurate measurements of the economic characteristics of populations critically influence research and policy. Such measurements can shape decicisions by governments and how to allocate resources and provide infrastructure to improve human livelihoods in a wide-range of situations. Although economic data is readily available for some developing nations, many regions of the modern world remain unexposed to the benefits of economic analysis where regions lack key measures of economic development and efficiency. Regions such as parts of Africa conduct little to no economic surveys or other means of collecting data on their financial situations. We attempt to address this problem by using publicly available satellite imagery to predict the wealth of a city (or, more generally, a geographic region) based on fundamental features identified in these images and running them through a convolutional neural network. Not only would this method be applicable to regions that lack economic data, but could also be applied to cities with a wealth of economic information on a macro level but a dearth on a micro level. For example, cities in America, despite having lots of economic data on state and county levels, could benefit from understanding more granular information in order to improve policy decisions for infrastructure and public support. 

In order for this approach to work, we need to be able to extract relevant features from the images in order to train our machine learning model. Our model will not be able to predict the wealth of individual houses (i.e., families), but will work on clusters of houses (i.e., neighborhoods) because of the complexity of wealth measurements and tendency for neighborhood to be at a nearly homogenous economic level. As a result, we will need to extract "cluster" features to process with our (NEURAL NETWORK).

Thinking about the kinds of features that would elucidate the wealth of a region, we can start to identify what we need to extract (in some way) from the images. One of the first and most common thoughts is to get satellite imagery of the region at night and observe the nighlight intensity; more lights at night tend to correspond with more wealth while less lights at night tend to correspond with poorer areas. Our group has also thought of the following ideas as means to identify wealth:
- Number of cars
- Percentage of greenspace
- Number of high-rises
- What time traffic occurs at
- Housing density
- Aerospace/nautical infrastructure

The number of of cars tends to be a good indicator or whether a city has passed a certain threshold for wealth. Yes, some cities that are poorer than others will have more cars, but cities that have no cars tend to be the poorest, so we can figure out a baseline level for the wealth of a city if we can extract the number of cars from the image.

Percentage of greenspace is perhaps even less reliable than the number of cars, but can also establish relative rankings of wealth between multiple cities. Cities with lots of public funds, and consequently wealth, will tend to spend money on maintaining public greenspaces. Granted, some rural regions tend to also have a lot of greenspace in the form of farms or undeveloped land, so in this case greenspace does not correspond to higher wealth. However, if we can ensure that the imagery we are looking at represents a urban city, we could perhaps take into  greenspace into account to predict the level of wealth.

Number of high-rises is definitely a critical feature of a city's wealth. However, extracting this information from satellite imagery proves to be tricky because of the flattness of the images. One way to get around this is to analyze the shadows produced by buildings at different times of the day. If the buildings are tall, they will cast long shadows at all times of the day (not only briefly in the morning and night).

Housing density is highly correlated to the "urban-ness" of a region, which in turn is suggestive of the wealth of a city. Rural areas (i.e., poorer, generally) have a lower housing density while urban areas (i.e., wealthier, generally) have a higher housing density. Granted, there are exceptions to this trend, but generally this fact will hold and is one of the easier features to extract from satellite imagery.

We will be getting our images from Planet.org, a publically available database of satellite imagery from the last few years that covers most of the world. Unfortunately, API access is limited to California so we will only be able to run our model using data from California, but there is no reason that this method would not work given more input data from around the world.

In this notebook, we'll take you through the entire process from setting up the program to download images and extract features to running the data through the machine learning pipeline and getting a predicted wealth score for input data. 

First, we'll input the necessary modules. `json` and `io` are just used to load in our Planet.org API key. You can sign up for a free account at https://www.planet.com/. The approval process will take a few days, but after receiving your API key, this entire notebook can be completed in one sitting. We will be using the `requests` module to make API requests for the satellite imagery, which requires authorization using the `requests.auth` module.

In [None]:
import json, io
import requests
from requests.auth import HTTPBasicAuth

In [None]:
# LOADS in your PLANET_API_KEY from the config_secret.json file
with io.open("config_secret.json") as cred:
    PLANET_API_KEY = json.load(cred)["PLANET_API_KEY"]

To get an idea of what these satellite images look like, we will show you how to download a single image and then proceed to, what Planet calls, an Area of Interest, or AOI. First, we define a geometry, which is a collection of latitude and longtitude points  that forms a polygon around the area you would like to get pictures from. Remember that Planet API only works with California right now, so if you want to change the coordinates, make sure they remain within the state. Our example geometry is centered on a reservoir in Redding, CA. Next, we'll need to define filters for the Planet API; these include the geometry filter discussed above, as well as date range filters (only getting images within a specified date range), cloud cover filters (perhaps you only want to look at images on clear day), and many more. We then send this request to the Stats API endpoint to see how many possible images there are that fit our criteria. In our example, there are 30 images taken of Redding, CA within the date range that have less than 50% cloud cover.

In [None]:
# the geo json geometry object we got from geojson.io
geo_json_geometry = {
  "type": "Polygon",
  "coordinates": [
    [
      [
        -122.52227783203125,
        40.660847697284815
      ],
      [
        -122.52227783203125,
        40.987154933797335
      ],
      [
        -122.01690673828124,
        40.987154933797335
      ],
      [
        -122.01690673828124,
        40.660847697284815
      ],
      [
        -122.52227783203125,
        40.660847697284815
      ]
    ]
  ]
}

# filter for items the overlap with our chosen geometry
geometry_filter = {
  "type": "GeometryFilter",
  "field_name": "geometry",
  "config": geo_json_geometry
}

# filter images acquired in a certain date range
date_range_filter = {
  "type": "DateRangeFilter",
  "field_name": "acquired",
  "config": {
    "gte": "2016-07-01T00:00:00.000Z",
    "lte": "2016-08-01T00:00:00.000Z"
  }
}

# filter any images which are more than 50% clouds
cloud_cover_filter = {
  "type": "RangeFilter",
  "field_name": "cloud_cover",
  "config": {
    "lte": 0.5
  }
}

# create a filter that combines our geo and date filters
# could also use an "OrFilter"
redding_reservoir = {
  "type": "AndFilter",
  "config": [geometry_filter, date_range_filter, cloud_cover_filter]
}

# Stats API request object
stats_endpoint_request = {
  "interval": "day",
  "item_types": ["REOrthoTile"],
  "filter": redding_reservoir
}

# fire off the POST request
result = \
  requests.post(
    'https://api.planet.com/data/v1/stats',
    auth=HTTPBasicAuth(PLANET_API_KEY, ''),
    json=stats_endpoint_request)

In [None]:
# Search API request object
search_endpoint_request = {
  "item_types": ["REOrthoTile"],
  "filter": redding_reservoir
}

result = \
  requests.post(
    'https://api.planet.com/data/v1/quick-search',
    auth=HTTPBasicAuth(PLANET_API_KEY, ''),
    json=search_endpoint_request)

Finally, we can send a request to download the actual photo assset given its ID and type. This requires creating a authorized session and using that session to first activate the image and then download it.

In [None]:
import gdal
from requests.auth import HTTPBasicAuth
import os
import requests

In [None]:
item_id = "20160707_195147_1057916_RapidEye-1"
item_type = "REOrthoTile"
asset_type = "visual"

# setup auth
session = requests.Session()
session.auth = (PLANET_API_KEY, '')

# request an item
item = session.get(("https://api.planet.com/data/v1/item-types/" + "{}/items/{}/assets/").format(item_type, item_id))

# extract the activation url from the item for the desired asset
item_activation_url = item.json()[asset_type]["_links"]["activate"]
print item_activation_url

# request activation
response = session.post(item_activation_url)

Once the image is activated, we can retrieve them using a curl command like below:

curl -L \
"https://api.planet.com/data/v1/download?token=eyJhbGciOiJIUzUxMiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJwUDNCNU9aYVFKUnN2WGsydmF3UVpLL2ZWci9DZWk0bG82OGJuT2NRR2laZ01EcFBTUnpsSWdHNGlZM2R5YTZWQ2xHdDROeFBka29Kb295a1BvdktPUT09IiwiaXRlbV90eXBlX2lkIjoiUkVPcnRob1RpbGUiLCJ0b2tlbl90eXBlIjoidHlwZWQtaXRlbSIsImV4cCI6MTQ3Mzc1MDczOCwiaXRlbV9pZCI6IjIwMTYwNzA3XzE5NTE0N18xMDU3OTE2X1JhcGlkRXllLTEiLCJhc3NldF90eXBlIjoidmlzdWFsIn0.lhRgqIggvnRoCgUVX3hgaNYDQIdU09wVaImxv3a_vuGjfzC7_OteYeViboeiZYBH2_eMdWT5ZWDz2BZiAWkXlQ"

![alt text](redding1.png)

Alternatively you can use the following code below to get a subarea of the image (if you already have the image id and type).  Go to geojson.io to create a geojson file, and save it as subarea.geojson, and the run the following to get a smaller area of the image.  

In [None]:
item_id = "20161109_173041_0e0e"
item_type = "PSScene3Band"
asset_type = "visual"
item_url = 'https://api.planet.com/data/v1/item-types/{}/items/{}/assets'.format(item_type, item_id)

# Request a new download URL
result = requests.get(item_url, auth=HTTPBasicAuth(PLANET_API_KEY, ''))
download_url = result.json()[asset_type]['location']
vsicurl_url = '/vsicurl/' + download_url
output_file = item_id + '_subarea.tif'

# GDAL Warp crops the image by our AOI, and saves it
gdal.Warp(output_file, vsicurl_url, dstSRS = 'EPSG:4326', cutlineDSName = 'subarea.geojson', cropToCutline = True)

![alt text](yolo.png)