# Open AI Caribbean Challenge

## About our project

Our team was interested in image processing with deep learning since the beginning of the course. As a first thought we wanted to create a melanoma/skin cancer detecting neural network, but later we abandoned this idea.

Later on we found a fascinating competition on the website called Driven Data:
https://www.drivendata.org/competitions/58/disaster-response-roof-type/

The main topic of the task is also image processing; only with a more uncommon use of it. 

The regions of the **Caribbean**, **Central America**, the **South Pacific** and the **Himalayas** face a considerable number natural hazards every year, including earthquakes, hurricanes and floods. These can have a devastating effect due to building conditions in the region. In many cities, houses have been built without following sound construction standards. As such, many of these houses will likely be damaged or destroyed during the next natural disaster. What’s more, the majority of these houses are located in poor and often informal settlements that have grown over time to become very large and densely populated neighborhoods.

<br>

![title](https://th.thgim.com/news/international/809cbr/article29367424.ece/alternates/FREE_660/08IN-LT-TROPICALWEATHERBAHAMAS)

<br>
In order to retrofit relevant buildings in those very populated areas to bring them up to better construction standards, it is paramount that buildings that face higher risk of damage be identified quickly and accurately.

The traditional approach to identifying high-risk buildings is by foot: going door to door to visually assess building conditions, construction materials, roof types and other key factors that greatly influence how a building will fare during a natural disaster. This type of visual assessment is particularly time consuming and costly. A visual assessment by engineers typically takes weeks if not months and costs millions of dollars.

As a result, **WeRobotics** is teaming up with the **World Bank Global Program for Resilient Housing** to put drone imagery and AI image recognition to test the following hypothesis: the use of drones can help to quickly identify relevant buildings by creating risk categories that speed up the visual assessment process. Mapping a 10 km2 neighborhood with a drone can be done within a matter of days and at a cost of a few thousand dollars at most. The point here is not to replace onsite assessments of building conditions, but rather to narrow down the number of buildings that require onsite inspection before making a retrofit decision.

|  |  |
|---|---|
| <a href="https://werobotics.org/"><img src="https://blog.werobotics.org/wp-content/uploads/2017/06/Screenshot-2017-06-05-12.04.53.png" alt="WeRobotics" style="width: 380px;"/> | <a href="https://www.worldbank.org/en/topic/disasterriskmanagement/brief/global-program-for-resilient-housing"><img src="https://www.trzcacak.rs/myfile/detail/401-4016837_world-bank-group-logo.png" alt="WeRobotics" style="width: 400px;"/> |

The Global Program for Resilient Housing has assembled unique datasets with the goal of finding machine learning models that are able to most accurately map disaster risk from drone imagery. The objective is faster, cheaper prioritization of building inspections to help target resources for disaster preparation where they will have the most impact.

The goal of this challenge is to identify rooftop construction material. Roof construction material is one of the main building risks factors for earthquakes and hurricanes. Light material is more likely to fly off and leave people unprotected from strong wind, flying objects and rain during hurricanes; heavy material such as concrete is likely to collapse during an earthquake.

### What does the data looks like?

The images consist of seven large high-resolution Cloud Optimized GeoTiffs of the seven different areas (The spatial resolution of the images is roughly 4 cm) like the following (example of Borde Rural):

<img src="./images/regions/borde_rural_ortho-cog-thumbnail.png" width=400>

*(Castries and Gros Islet regions contain labels from an unverified automated process. For this reason, images from Castries and Gros Islet are included only in the training dataset.)*

>*Due to the size of the our data (approximately 34 Gb), it is quite hard to visualize the images with the given labels as a whole. Regardless all of the necessary data to the project can be found on the [main site of the competition](https://www.drivendata.org/competitions/58/disaster-response-roof-type/)*

### Labels
Every high-resolution image has lots of rooftops. Footprints are made for every rooftop and then labeled according to it's material.

<img src="./images/regions/labeled_region.png" width=600>

Each image corresponds to train and test **GeoJSON**s, where labels are encoded as FeatureCollections. **metadata.csv** links the each image with its corresponding GeoJSON. For each area in the train set, the GeoJSON includes the unique **building ID**, **building footprint**, **roof material**, and **verified field** (see note above). For each area in the test set, the GeoJSON contains just the unique building ID and building footprint.

In [1]:
import pandas as pd 
data = pd.read_csv("./labels/metadata.csv")
data.head()

Unnamed: 0,image,train,test
0,stac/guatemala/mixco_1_and_ebenezer/mixco_1_an...,stac/guatemala/mixco_1_and_ebenezer/train-mixc...,stac/guatemala/mixco_1_and_ebenezer/test-mixco...
1,stac/guatemala/mixco_3/mixco_3_ortho-cog.tif,stac/guatemala/mixco_3/train-mixco_3.geojson,stac/guatemala/mixco_3/test-mixco_3.geojson
2,stac/st_lucia/castries/castries_ortho-cog.tif,stac/st_lucia/castries/train-castries.geojson,
3,stac/st_lucia/dennery/dennery_ortho-cog.tif,stac/st_lucia/dennery/train-dennery.geojson,stac/st_lucia/dennery/test-dennery.geojson
4,stac/st_lucia/gros_islet/gros_islet_ortho-cog.tif,stac/st_lucia/gros_islet/train-gros_islet.geojson,


Roof material labels are also provided in **train_labels.csv**, where each row contains a unique building ID followed by five roof material columns, with a 1.0 indicating that building's roof type and 0.0s in the remaining columns. Each building has only one roof type.

In [2]:
data = pd.read_csv("./labels/train_labels.csv")
data.head()

Unnamed: 0,id,verified,concrete_cement,healthy_metal,incomplete,irregular_metal,other
0,7a3f2a10,True,1.0,0.0,0.0,0.0,0.0
1,7a1f731e,True,0.0,0.0,0.0,1.0,0.0
2,7a424ad8,True,0.0,1.0,0.0,0.0,0.0
3,7a3edc5e,True,0.0,1.0,0.0,0.0,0.0
4,7a303a6e,True,0.0,1.0,0.0,0.0,0.0


### Visual label examples
<br>

| label type | example | description | count |
|-----------------|---|:---|---|
| concrete_cement   | <img src="./images/label_types/building-concrete-cement.png" width=150> | Roofs are made of concrete or cement.  | 1518  |
| healthy_metal   | <img src="./images/label_types/building-healthy-metal.png" width=150> | Includes corrugated metal, galvanized sheeting, and other metal materials.  | 14817  |
| incomplete      | <img src="./images/label_types/building-incomplete.png" width=150> | Under construction, extremely haphazard, or damaged.  | 669  |
| irregular_metal | <img src="./images/label_types/building-irregular-metal.png" width=150> | Includes metal roofing with rusting, patching, or some damage. These roofs carry a higher risk.  | 5241  |
| other           | <img src="./images/label_types/building-other.png" width=150> | Includes shingles, tiles, red painted, or other material.  | 308  |