# Tutorial for Gathering Construction Sites


This notebook was developed by  

    Nicholas Kashani Motlagh, Aswathnarayan Radhakrishnan, Jim Davis @ Ohio State University  
    {kashanimotlagh.1,radhakrishnan.39,davis.1719}@osu.edu  
    
    and  
    
    Roman Ilin @ AFRL/RYAP, Wright-Patterson AFB  
    rilin325@gmail.com  

Should you have any questions, the principal point-of-contact is Dr. Jim Davis. This package was modified on 2020-08-05.

## Prerequisites
Make sure you have completed the prerequisites in [README.md](./README.md) before continuing this demo. This demo assumes some basic knowledge of jupyter lab and OpenStreetMap. Here is more information on [jupyter lab](https://jupyterlab.readthedocs.io/en/stable/) and [OpenStreetMap](https://wiki.openstreetmap.org/wiki/Beginners_Guide_1.3).

### Acquiring History Files

This script relies on OpenStreetMap history files. History files are used to observe construction changes over time. To download a history file, you will need an OpenStreetMap account.
This is free and quick to do. Simply navigate to this [link](https://www.openstreetmap.org/user/new). Once you have made an account, you can now access the internal history files. For this demo, we will be extracting construction sites in Ohio, specifically inside The Ohio State University campus.
* Navigate to this [link](http://download.geofabrik.de/north-america/us/ohio.html).
* Scroll to "Other Formats and Auxillary Files" and click the link "internal server".
* Click "Login with your OpenStreetMap account" and enter your credentials.  

After you are logged in, you will be able to download the previously crossed-out internal.osh.pbf file.

You can find other history files [here](http://download.geofabrik.de). 

### Creating Poly Files

This script needs a poly file which defines the bounds of the search. The poly file must be contained in the region that was downloaded. For more information on poly files, navigate to this [link](https://wiki.openstreetmap.org/wiki/Osmosis/Polygon_Filter_File_Format). For convenience, some example poly files can be found in `poly/`. Additionally, you can download the poly file of a region [here](http://download.geofabrik.de) under "Other Formats and Auxillary Files".

## Step 1: Site Extraction
In order to gather imagery of OpenStreetMap construction sites, we must first create a GeoDataFrame using the following modules. `setup` will be used to setup the necessary directories. Then, `extract_sites` will be used to create a GeoDataFrame of construction sites from an OpenStreetMap history file.

### Setup
The directory structure required for the extract_sites module is below.
```
temp/
    snapshots/
    
output/
   collection/
```

Run the following cell to setup the directory structure.

In [None]:
from setup import setup_directory
setup_directory()

### Reset Directories
You can clear the directories you just setup at any time by running

In [None]:
import reset_extract
reset_extract.reset()

### Obtaining GeoDataFrames

This module will record all construction sites that were under construction in between the
start and end window given in the arguments and save the file as a GeoDataFrame found in `output/collection/collection.shp`. If a construction site began or ended outside the window, but it was under construction inside the window, it will appear in the GeoDataFrame with correct start and end dates.  

The GeoDataFrame will contain a row for each construction site observed. Each row has a unique "chain_id" identifier followed by a start date, end date, construction type, previous tag, final tag, and polygon coordinates which define a bounding box containing the construction shape.  

We will use the extract_sites module to get a GeoDataFrame of construction sites over time in a specified polygon.

In [None]:
import extract_sites

To gather construction sites, we need to define input parameters to the script. This script will accept start dates after 2015-06-23.

Parameters for site extraction:  
`start`: Get construction sites which were under construction on or after this date (YYYY-MM-DD). This date must be on or after 2015-06-23.  
`end`: Get construction sites which were under construction before or on this date (YYYY-MM-DD). This date must be on or before 10 days from today.  
`poly`: The path to the poly file to search for construction sites.  
`region`: The path to the history file containing the polygon from the poly file.  
`keep-temp`: If set to true, the temp directory will not be deleted. (Default: False)  
`restrict-window`: If set to true, construction sites within the window will be located. In this case, `output/collection/collection.shp` will contain sites that end in the window. (Default: False)  
`save-wip`: If set to true, a work-in-progress GeoDataFrame will be saved as `output/collection/in_progress.shp`. (Default: False)   
 

Modify the parameters below:

In [None]:
# start date of the search in isoformat
start_date = '2018-04-15'
# end date of the search in isoformat
end_date = '2020-07-01'
# location of the polygon of osu campus which we will be analyzing
poly_file = "poly/campus.poly"
# location of the history file contianing osu campus
region_file = ""
# set to false if temp directory should be deleted
keep_temp_files = False
# set to true if you would only like to search within the window
restrict_window = False
# set to true if you would like to save the dataframe containing work-in-progress sites
save_wip = False

We will assign the parameters of the extract sites module to the parameters entered.

In [None]:
params = {"start": start_date, "end":end_date, "poly":poly_file, "region":region_file,
          "keep-temp": keep_temp_files, "restrict-window":restrict_window, "save-wip":save_wip}
extract_sites.set_params(params)

Now that the parameters are assigned, we can extract the polygon from the region file we've provided and locate construction sites. The time this function takes depends on the region size and length of construction timelines. This function will return two compiled data frames of all construction sites: the first contains sites that are still under construction, and the second contains completed sites. This function also saves completed sites to `output/collection/collection.shp` and optionally saves the sites in-progress to `output/collection/in_progress.shp` if `save-wip = True`.

In [None]:
in_progress_gdf, collection_gdf = extract_sites.locate_construction()

In [None]:
collection_gdf

You can also plot the GeoDataFrame.

In [None]:
collection_gdf.plot()

Now that we have a dataframe containing complete sites, lets create info text files for each site. This will make it easier to construct labeled datasets in the future. Info files are found in `output/{chain_id}/info.txt`.

In [None]:
extract_sites.create_dataset(collection_gdf)

Now let's take a look at the distribution of our final tags in a histogram.

In [None]:
import matplotlib.pyplot as plt
from matplotlib.ticker import StrMethodFormatter
histogram = {}
for i,row in collection_gdf.iterrows():
    if row["final_tag"] not in histogram.keys():
        histogram[row["final_tag"]] = 1
    else:
        histogram[row["final_tag"]] += 1
plt.xticks(rotation=90)
plt.bar(histogram.keys(), histogram.values())

## Step 2: Image Gathering

Now that we have a GeoDataFrame containing construction sites, we are ready to get imagery. We will use the gather_images module to download imagery over the course of construction.

In [None]:
import gather_images

### Reset
In order to clean all directories that have been made, run the following.

In [None]:
import reset_images
reset_images.reset()

### Setup
First, we will need to create directories for images to be stored. Here, we can set parameters for what source we want to use, what types of images to gather, how many images, and how much padding we want around the images.

#### Using Planet
Planet imagery is gathered in 2 steps: ordering and downloading.
* An order is placed for every site after 2017-02-19 (no images will be gathered for a site that began before that date).
* Planet imagery is sparse in 2017, but improves in more recent history.
* You can download an order some time after it is created.
* Each order can take between 5-15 minutes to download depending on the duration of construction.
* You can set the email parameter to true to recieve an email every time an order is ready.

#### Using Sentinel
Sentinel imagery is gathered in one step. Imagery is gathered for every site after 2015-06-23 (no images will be gathered for a site that began before that date).  

In order to access imagery on sentinel hub, you must first create a new configuration on the sentinel dashboard found [here](https://apps.sentinel-hub.com/dashboard/#/configurations).

<details>  
<summary>Sentinel dashboard instructions</summary>
    
+ Click "New configuration"
+ Enter in a name
+ Select "Python scripts template" from the dropdown
+ Edit your configuration
+ Increase image quality to 100
+ Click "Add a new layer"
+ Name the new layer "NIR"
+ Select "B08"
+ Select "B08 - reflectance"
+ Click "Copy script to editor"
+ Click "Set custom script"
+ Adjust the cloud coverage to 100
+ Click "Save"

**Creating a new layer whose "ID" is "NIR" is vital here.** Parameters such as cloud coverage and atmospheric correction can be changed; however, we will leave these as is for the sake of this demo.
</details>  
  

Parameters for image gathering  
`rgb`: Downloads RGB images if set to True. (Default: False)  
`nir`: Downloads NIR Images if set to True. (Default: False)  
`source`: "p" downloads planet images; "s" downloads sentinel images.  
`api`: the API Key for the source that you are using.  
`download-planet`: download previouosly created planet orders. (Default: False)  
`num-images`: The number of images to download including the start date and end date (evenly spaced). Set num-images >= 3 or -1 for all available imagery. (Default: 3)  
`padding`: The scale factor to increase the bounding box area, center invariant. The bounding box will be scaled by sqrt(padding) in lat and long. (Default: 1)  
`verbose`: Print script updates to console. (Default: False)  
`email`: Receive email notifications for each complete order. (Default: False)  

Modify the parameters below.

In [None]:
# Set to true to download RGB
rgb = True
# Set to true to download NIR
nir = True 
# Set to p to download Planet, s to download Sentinel
source = "p"
# Set number of images to download
num_images = 3
# Set the padding scale factor
padding = 1
# Set to true to receive status updates
verbose = True
# Set to true to receive email notifications
email = True
# Set to true to download planets
download_planet = False
# Set to source API key (Sentinel instance id or Planet api key).
api = ""

params =  {"rgb": rgb, "nir": nir, "source": source, "num-images": num_images, 
           "padding": padding, "verbose": verbose, "email": email, 
           "download-planet": download_planet, "api": api}
gather_images.set_params(params)

We also have to setup necessary directories.

In [None]:
# Create the necessary directories
gather_images.setup()

### Get Imagery
Now our directories are setup and we are ready to gather images.  

In [None]:
# For platform independent paths
from pathlib import Path
# Used to read in the GeoDataFrame
import geopandas as gpd
# Read in the GeoDataFrame
collection_gdf = gpd.read_file(str(Path("output/collection/collection.shp")))
# Start the retrieval process for source
gather_images.gather_from_source(collection_gdf)

If you are using planet as a source, you will have to download images when orders are ready. Reassign parameters with `download-planet` set to True.

In [None]:
# Change the mode to download
params["download-planet"] = True
# Set the new parameters
gather_images.set_params(params)
# Download planet images
gather_images.gather_from_source(collection_gdf)

After images are downloaded, you can find them in `output/{chain_id}/images/{source}/{rgb_nir}`. Each image is labeled with the date it was captured.