<a href="https://colab.research.google.com/github/fmorumbasi/ml4eo_bootcamp_application/blob/main/ml4eo_bootcamp_application.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Application Exercises

These exercises will be used to evaluate your ability to work with geospatial data in Python. You will need several python packages to complete these exercises, the packages required are as follows:

* geopandas
* pandas
* rasterio
* fiona
* pycrs

For each cell where you see "WRITE YOUR CODE HERE" delete the return notImplemented statement when you write your code there - don't leave it in the notebook. This notebook contains test cases to evaluate whether your functions are working correctly. When you run these cells they will print either "Congratulations, all is working just fine !!!" if your function works or "Sorry wrong answer" if it does not. Once you finish these exercises, upload the completed notebook to the application page and submit your application.

In case of any questions, contact the organizers at hello@radiant.earth

### Reading in data using GeoPandas
[GeoPandas](https://geopandas.org/) is a Python library that is particularly good for doing any geospatial-data-related tasks in Python. We will use several other libraries in Python as well.

To test that everything is working fine:
1. Ensure that all of the files from the repository are in the same directory as this notebook.
2. Use GeoPandas to read in the first sheet of data from the geojson file as shown below
3. Work through some of the example analyses shown and complete the questions that follow.

In [3]:
pip install geopandas

Collecting geopandas
[?25l  Downloading https://files.pythonhosted.org/packages/d7/bf/e9cefb69d39155d122b6ddca53893b61535fa6ffdad70bf5ef708977f53f/geopandas-0.9.0-py2.py3-none-any.whl (994kB)
[K     |████████████████████████████████| 1.0MB 4.0MB/s 
[?25hCollecting pyproj>=2.2.0
[?25l  Downloading https://files.pythonhosted.org/packages/b1/72/d52e9ca81caef056062d71991b0e9b1d16af042245627c5d0e4916a36c4f/pyproj-3.0.1-cp37-cp37m-manylinux2010_x86_64.whl (6.5MB)
[K     |████████████████████████████████| 6.5MB 24.9MB/s 
Collecting fiona>=1.8
[?25l  Downloading https://files.pythonhosted.org/packages/ea/2a/404b22883298a3efe9c6ef8d67acbf2c38443fa366ee9cd4cd34e17626ea/Fiona-1.8.19-cp37-cp37m-manylinux1_x86_64.whl (15.3MB)
[K     |████████████████████████████████| 15.3MB 244kB/s 
Collecting munch
  Downloading https://files.pythonhosted.org/packages/cc/ab/85d8da5c9a45e072301beb37ad7f833cd344e04c817d97e0cc75681d248f/munch-2.5.0-py2.py3-none-any.whl
Collecting cligj>=0.5
  Downloading https

In [4]:
pip install fiona



In [6]:
pip install PyDrive



In [7]:
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

In [10]:
# Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

In [7]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [8]:
path = "/content/drive/MyDrive/Colab Notebooks/crops.geojson"

In [9]:
import geopandas
import pandas as pd

data = geopandas.read_file(path)
data.head() # show the first 5 rows of the data.

Unnamed: 0,Field ID,Latitude,Longitude,Accuracy,Survey Date,Water Resource,Planting Date,PlantingDate Method,Estimated Harvest Date,Crop1,Crop2,Crop3,Crop4,Crop5,Crop Density,Variety,CMD Rating,CBSD Rating,CGM Rating,Disease Rating,geometry
0,5.1,0.538299,34.22033,3.216,2019-05-16,rainfed,2018-08-13,Recorded,2019-06-13,Cassava,,,,,40,Tanzania,,,,,"POLYGON ((635776.260 59518.908, 635798.727 595..."
1,5.101,0.538204,34.22009,4.288,2019-05-16,rainfed,2019-03-01,Estimated,2019-12-01,Maize,,,,,50,Null,,,,,"POLYGON ((635766.485 59501.773, 635765.180 595..."
2,6.1,0.539152,34.22034,4.288,2019-05-16,rainfed,2019-04-03,Recorded,2020-01-03,Maize,Groundnut,,,,60,DH 04,,,,,"POLYGON ((635800.201 59601.742, 635774.494 595..."
3,6.101,0.539092,34.22031,3.216,2019-05-16,rainfed,2019-03-01,Estimated,2019-12-01,Maize,,,,,30,Null,,,,,"POLYGON ((635798.934 59597.331, 635819.341 596..."
4,6.2,0.539539,34.22064,4.288,2019-05-16,rainfed,2019-04-03,Recorded,2020-01-03,Maize,Bean,,,,45,Null,,,,,"POLYGON ((635829.837 59634.348, 635812.466 596..."


In [10]:
my_list = data.columns.values.tolist()
my_list

['Field ID',
 'Latitude',
 'Longitude',
 'Accuracy',
 'Survey Date',
 'Water Resource',
 'Planting Date',
 'PlantingDate Method',
 'Estimated Harvest Date',
 'Crop1',
 'Crop2',
 'Crop3',
 'Crop4',
 'Crop5',
 'Crop Density',
 'Variety',
 'CMD Rating',
 'CBSD Rating',
 'CGM Rating',
 'Disease Rating',
 'geometry']

### Dataset Description

This dataset contains field boundaries for crop fields in Kenya as well as the crop types present. Each field can have multiple crops present and are represented in the "Crop1", "Crop2", "Crop3", "Crop4", and "Crop5" columns. Each record also has the date the survey was conducted, when the crops were planted, an estimated harvest date, and whether the planting date is an estimate.

### Listing all of the different types of crops present in this dataset

In [None]:
list(pd.unique(data[['Crop1', 'Crop2', 'Crop3', 'Crop4', 'Crop5']].values.ravel('K')))

['Cassava',
 'Maize',
 'Sorghum',
 'Bean',
 'Groundnut',
 'Fallowland',
 'Millet',
 'Tomato',
 'Sugarcane',
 'Sweetpotato',
 'Banana',
 '',
 'Cowpea',
 'Soybean']

In [11]:
col_one_list = data['Crop1'].tolist()
col_one_list

['Cassava',
 'Maize',
 'Maize',
 'Maize',
 'Maize',
 'Sorghum',
 'Cassava',
 'Maize',
 'Maize',
 'Maize',
 'Maize',
 'Maize',
 'Maize',
 'Maize',
 'Maize',
 'Cassava',
 'Bean',
 'Groundnut',
 'Maize',
 'Cassava',
 'Maize',
 'Maize',
 'Maize',
 'Maize',
 'Cassava',
 'Maize',
 'Maize',
 'Maize',
 'Fallowland',
 'Cassava',
 'Cassava',
 'Maize',
 'Maize',
 'Maize',
 'Maize',
 'Fallowland',
 'Cassava',
 'Maize',
 'Maize',
 'Maize',
 'Maize',
 'Maize',
 'Cassava',
 'Maize',
 'Maize',
 'Maize',
 'Maize',
 'Maize',
 'Maize',
 'Cassava',
 'Maize',
 'Maize',
 'Maize',
 'Maize',
 'Cassava',
 'Maize',
 'Millet',
 'Maize',
 'Maize',
 'Fallowland',
 'Maize',
 'Fallowland',
 'Maize',
 'Maize',
 'Maize',
 'Maize',
 'Cassava',
 'Cassava',
 'Maize',
 'Cassava',
 'Fallowland',
 'Cassava',
 'Cassava',
 'Maize',
 'Maize',
 'Maize',
 'Groundnut',
 'Maize',
 'Cassava',
 'Maize',
 'Groundnut',
 'Cassava',
 'Maize',
 'Cassava',
 'Maize',
 'Maize',
 'Cassava',
 'Maize',
 'Cassava',
 'Maize',
 'Tomato',
 'Maize'

### For you to do

In [None]:
# Write your code to complete the following functions

def get_crop_area(crop):
    # Calculate the total area in square meters of the fields which contain the selected crop
    
    # WRITE YOUR CODE HERE
    area = 0
    return area

In [None]:
# Run this to validate your function works correctly

assert int(get_crop_area('Maize')) == 192534, "Sorry wrong answer"
assert int(get_crop_area('Sorghum')) == 227, "Sorry wrong answer"
assert int(get_crop_area('Banana')) == 946, "Sorry wrong answer"
print("Congratulations, all is working just fine !!!")

### Extra Bonus questions

In [None]:
import fiona
import rasterio
import rasterio.mask
import pycrs


def masked_raster(input_file, raster_file):
    # Create a masked version of the input raster where pixels falling within one of the fields are set to `1` and pixels outside the fields are set to `0`
    
    # WRITE YOUR CODE HERE

    out_img = rasterio.open(raster_file).read()
    return out_img

def reproject_raster(raster_file, dst_crs):
    # Reproject the input raster to the provided CRS
    
    src = rasterio.open(raster_file)
    
    # WRITE YOUR CODE HERE
    dst = src
    
    return dst

In [None]:
# Run this to validate your function works correctly

assert masked_raster('crops.geojson', 'crops.tif')[0].sum() == 1144636.0, "Sorry wrong answer"
assert str(reproject_raster('crops.tif', 'EPSG:4326').crs) == 'EPSG:4326', "Sorry wrong answer"
print("Congratulations, all is working just fine !!!")