# Deep Learning Locust Prediction in Ethiopia using Remote Sensing Data

### Semester Project SS 2021 @ Advanced Programming for Remote Sensing
#### Julius-Maximilians-Universität Würzburg

This is a semester project for the course advanced programming aiming to use deep learning model and real-time climate variables to predict the emergence of locust swarms using data from Locust hub.

***
## Description

From Google Earth Engine, multiple csv files are created for all locust presence and absence data points. They includes 6 spectral bands from **Landsat 5,7,8**, as well as 19 selected variables from the **NASA FLDAS** (Famine Early Warning Systems Network) dataset, which is data assimilation designed for assisting food security by combining remote sensing data such as CHIRPS and MODIS. all time series data from 1985 to 2015. The variables included for example total precipitation, soil moisture and soil tempoerature. In order to creating training data, we need to remove unrelevant time series data for every row according to the occurence datetime of the data points. It will be done in python to post-process the data.

**Tasks:**
* Extract information from Landsat and FLDAS time series
***

#### Import Libraries

In [None]:
#@title Load Python libraries

! pip install pandas fiona shapely pyproj rtree
! pip install alpha_vantage -q
! pip install regions
! pip install geopandas

# pip install numpy
import numpy as np


# pip install torch
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import Dataset
from torch.utils.data import DataLoader
import regions

import pandas as pd
import seaborn as sns
import datetime

#pip install matplotlib
import matplotlib
import matplotlib.pyplot as plt
from matplotlib.pyplot import figure
import matplotlib.animation as animation

# pip install alpha_vantage
from alpha_vantage.timeseries import TimeSeries 

from geopandas.tools import sjoin

print("All libraries loaded")

Collecting fiona
  Downloading Fiona-1.8.20-cp37-cp37m-manylinux1_x86_64.whl (15.4 MB)
[K     |████████████████████████████████| 15.4 MB 34 kB/s 
Collecting pyproj
  Downloading pyproj-3.2.0-cp37-cp37m-manylinux_2_24_x86_64.whl (6.3 MB)
[K     |████████████████████████████████| 6.3 MB 17.4 MB/s 
[?25hCollecting rtree
  Downloading Rtree-0.9.7-cp37-cp37m-manylinux2010_x86_64.whl (994 kB)
[K     |████████████████████████████████| 994 kB 52.0 MB/s 
Collecting cligj>=0.5
  Downloading cligj-0.7.2-py3-none-any.whl (7.1 kB)
Collecting click-plugins>=1.0
  Downloading click_plugins-1.1.1-py2.py3-none-any.whl (7.5 kB)
Collecting munch
  Downloading munch-2.5.0-py2.py3-none-any.whl (10 kB)
Installing collected packages: munch, cligj, click-plugins, rtree, pyproj, fiona
Successfully installed click-plugins-1.1.1 cligj-0.7.2 fiona-1.8.20 munch-2.5.0 pyproj-3.2.0 rtree-0.9.7
[K     |████████████████████████████████| 1.3 MB 8.3 MB/s 
[K     |████████████████████████████████| 142 kB 67.7 MB/s 

#### Set up environment

In [None]:
config = {
    #"alpha_vantage": {
    #    "key": "YOUR_API_KEY", # Claim your free API key here: https://www.alphavantage.co/support/#api-key
     #   "symbol": "IBM",
     #   "outputsize": "full",
     #   "key_adjusted_close": "5. adjusted close",
    #},
    "data": {
        "window_size": 20,
        "train_split_size": 0.80,
    }, 
    "plots": {
        "show_plots": True,
        "xticks_interval": 90,
        "color_actual": "#001f3f",
        "color_train": "#3D9970",
        "color_val": "#0074D9",
        "color_pred_train": "#3D9970",
        "color_pred_val": "#0074D9",
        "color_pred_test": "#FF4136",
    },
    "model": {
        "input_size": 1, # since we are only using 1 feature, close price
        "num_lstm_layers": 2,
        "lstm_size": 32,
        "dropout": 0.2,
    },
    "training": {
        "device": "cpu", # "cuda" or "cpu"
        "batch_size": 64,
        "num_epoch": 100,
        "learning_rate": 0.01,
        "scheduler_step_size": 40,
    }
}

## GEE output Postprocessing

#### Import Data

In [None]:
# select the "Landsat_06_07.csv" from your local project drive
from google.colab import files
uploaded = files.upload()

# see the head of the dataframe
import io
import pandas as pd

df_fldas = pd.read_csv(io.BytesIO(uploaded['Landsat_06_07.csv']))

Saving Landsat_06_07.csv to Landsat_06_07.csv


In [None]:
df_fldas.tail()

Unnamed: 0,system:index,Blue,Green,Month,NIR,PRESENCE,Red,SWIR1,SWIR2,X,Y,Year,date,id,month,timestamp,year,.geo
41998,23_000000000000000004cb,628.628968,751.815476,1,1403.797619,0,874.017857,2274.583333,1801.555556,6.176316,35.410463,2007,2007-12-01,26427,12,1200000000000.0,2007,"{""geodesic"":false,""type"":""Polygon"",""coordinate..."
41999,23_000000000000000004c6,2068.177866,3514.956522,1,5894.039526,0,5264.27668,7170.209486,6913.166008,-1.33695,27.220089,2007,2007-12-01,26422,12,1200000000000.0,2007,"{""geodesic"":false,""type"":""Polygon"",""coordinate..."
42000,23_000000000000000004d2,2068.177866,3514.956522,1,5894.039526,0,5264.27668,7170.209486,6913.166008,-1.33695,27.220089,2007,2007-12-01,26434,12,1200000000000.0,2007,"{""geodesic"":false,""type"":""Polygon"",""coordinate..."
42001,23_000000000000000004c2,2731.777778,4059.630952,1,6106.051587,0,5375.007937,7344.055556,6831.269841,13.500628,15.888393,2007,2007-12-01,26418,12,1200000000000.0,2007,"{""geodesic"":false,""type"":""Polygon"",""coordinate..."
42002,23_000000000000000004f2,819.092,886.64,1,1933.664,0,895.776,1672.004,992.696,22.822672,6.554854,2007,2007-12-01,26466,12,1200000000000.0,2007,"{""geodesic"":false,""type"":""Polygon"",""coordinate..."


In the csv file, there are data columns for every month for all years and two columns of year and month. for every row, we need to extract the value in the column with the datetime matching with the year and month columns, in order to get the climate/weather variables in the real-time relative to the occurence of the locust outbreak in records.

In the dataframe, there are two months and years columns. "year" and "month" represent the timestamp of the climate variables, whereas "Year" and "Month" represent the occurence date of the locust outbreak in this datapoint. In order to extract the "real-time" information, we need to only keep rows that "year" and "Year" match, as well as "month" and "Month" match.

In [None]:
# remove unneeded columns
df_fldas = df_fldas.drop(columns=['system:index', 'timestamp', '.geo'])

In [None]:
# set all year and month columns to integer
df_fldas['year'] = df_fldas['year'].astype('int64')
df_fldas['month'] = df_fldas['month'].astype('int64')
df_fldas['Year'] = df_fldas['Year'].astype('int64')
df_fldas['Month'] = df_fldas['Month'].astype('int64')

In [None]:
df_fldas.head()

Unnamed: 0,Blue,Green,Month,NIR,PRESENCE,Red,SWIR1,SWIR2,X,Y,Year,date,id,month,year
0,736.479839,1045.46371,2,1993.467742,0,1440.008065,2577.879032,1983.487903,34.63865,2.970945,2006,2006-01-01,25357,1,2006
1,736.479839,1045.46371,2,1993.467742,0,1440.008065,2577.879032,1983.487903,34.63865,2.970945,2006,2006-01-01,25368,1,2006
2,632.75502,820.088353,2,1667.662651,0,1034.0,1969.73494,1423.116466,37.770269,5.198174,2006,2006-01-01,25388,1,2006
3,737.110672,1070.318182,2,2611.521739,0,1593.266798,3505.636364,2236.262846,22.762503,10.875932,2006,2006-01-01,25336,1,2006
4,737.110672,1070.318182,2,2611.521739,0,1593.266798,3505.636364,2236.262846,22.762503,10.875932,2006,2006-01-01,25385,1,2006


In [None]:
# set up a condition to indicate weather the time information match with the locust occurence
condition = np.logical_and(df_fldas.Year == df_fldas.year, df_fldas.Month == df_fldas.month)
df_fldas = df_fldas.assign(match = np.where(condition,1,0))

In [None]:
df_fldas.match.describe()

count    42003.000000
mean         0.043092
std          0.203067
min          0.000000
25%          0.000000
50%          0.000000
75%          0.000000
max          1.000000
Name: match, dtype: float64

In [None]:
# Keep only rows that match
sub_df = df_fldas[df_fldas.match == 1]

In [None]:
# Delete unneeded columns
sub_df = sub_df.drop(columns=['year','month','date'])

In [None]:
sub_df = sub_df.drop(columns=['match'])

In [None]:
sub_df.head()

Unnamed: 0,Blue,Green,Month,NIR,PRESENCE,Red,SWIR1,SWIR2,X,Y,Year,id
712,1072.092,1707.66,1,3137.088,0,2496.456,3671.456,2863.392,36.557216,16.060265,2006,25204
713,385.428571,507.928571,1,1444.559524,0,560.698413,1262.321429,780.242063,40.222654,7.432943,2006,25236
714,385.428571,507.928571,1,1444.559524,0,560.698413,1262.321429,780.242063,40.222654,7.432943,2006,25282
715,385.428571,507.928571,1,1444.559524,0,560.698413,1262.321429,780.242063,40.222654,7.432943,2006,25288
716,598.150235,695.169014,1,1519.568075,0,745.915493,1768.225352,1180.455399,17.420898,6.874051,2006,25201


## Data Export

In [None]:
from google.colab import drive
drive.mount('drive')

Drive already mounted at drive; to attempt to forcibly remount, call drive.mount("drive", force_remount=True).


In [None]:
sub_df.to_csv('Landsat_06_07_extracted.csv')
!cp Landsat_06_07_extracted.csv "drive/My Drive/"