# Deep Learning Locust Prediction in Ethiopia using Remote Sensing Data

### Semester Project SS 2021 @ Advanced Programming for Remote Sensing
#### Julius-Maximilians-Universität Würzburg

This is a semester project for the course advanced programming aiming to use deep learning model and real-time climate variables to predict the emergence of locust swarms using data from Locust hub.

***
## Description

This notebook aims to post-process the downloaded WorldClim data. WorldClim data monthly for 1985 to 2015 is first downloaded to the local drive, then uploaded in colab. By combining with the spatial and temporal dimensions of the locust data points, WorldClim data for the specific month of locust outbreak is extracted and all other time series data for that data point is deleted. The remaining data is the climate variables fitting with the spatio-temporal dimension of the data points, so the training data can better fits to the locust behavior patterns.

**Tasks:**
* Extract WorldClim data from time series
***

#### Import Libraries

In [None]:
#@title Load Python libraries

! pip install pandas fiona shapely pyproj rtree
! pip install alpha_vantage -q
! pip install regions
! pip install geopandas

# pip install numpy
import numpy as np


# pip install torch
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import Dataset
from torch.utils.data import DataLoader
import regions

import pandas as pd
import seaborn as sns
import datetime

#pip install matplotlib
import matplotlib
import matplotlib.pyplot as plt
from matplotlib.pyplot import figure
import matplotlib.animation as animation

# pip install alpha_vantage
from alpha_vantage.timeseries import TimeSeries 

from geopandas.tools import sjoin

print("All libraries loaded")

Collecting fiona
  Downloading Fiona-1.8.20-cp37-cp37m-manylinux1_x86_64.whl (15.4 MB)
[K     |████████████████████████████████| 15.4 MB 30 kB/s 
Collecting pyproj
  Downloading pyproj-3.2.0-cp37-cp37m-manylinux_2_24_x86_64.whl (6.3 MB)
[K     |████████████████████████████████| 6.3 MB 36.2 MB/s 
[?25hCollecting rtree
  Downloading Rtree-0.9.7-cp37-cp37m-manylinux2010_x86_64.whl (994 kB)
[K     |████████████████████████████████| 994 kB 49.3 MB/s 
Collecting click-plugins>=1.0
  Downloading click_plugins-1.1.1-py2.py3-none-any.whl (7.5 kB)
Collecting cligj>=0.5
  Downloading cligj-0.7.2-py3-none-any.whl (7.1 kB)
Collecting munch
  Downloading munch-2.5.0-py2.py3-none-any.whl (10 kB)
Installing collected packages: munch, cligj, click-plugins, rtree, pyproj, fiona
Successfully installed click-plugins-1.1.1 cligj-0.7.2 fiona-1.8.20 munch-2.5.0 pyproj-3.2.0 rtree-0.9.7
[K     |████████████████████████████████| 1.3 MB 15.8 MB/s 
[K     |████████████████████████████████| 294 kB 49.3 MB/s

#### Set up environment

In [None]:
config = {
    #"alpha_vantage": {
    #    "key": "YOUR_API_KEY", # Claim your free API key here: https://www.alphavantage.co/support/#api-key
     #   "symbol": "IBM",
     #   "outputsize": "full",
     #   "key_adjusted_close": "5. adjusted close",
    #},
    "data": {
        "window_size": 20,
        "train_split_size": 0.80,
    }, 
    "plots": {
        "show_plots": True,
        "xticks_interval": 90,
        "color_actual": "#001f3f",
        "color_train": "#3D9970",
        "color_val": "#0074D9",
        "color_pred_train": "#3D9970",
        "color_pred_val": "#0074D9",
        "color_pred_test": "#FF4136",
    },
    "model": {
        "input_size": 1, # since we are only using 1 feature, close price
        "num_lstm_layers": 2,
        "lstm_size": 32,
        "dropout": 0.2,
    },
    "training": {
        "device": "cpu", # "cuda" or "cpu"
        "batch_size": 64,
        "num_epoch": 100,
        "learning_rate": 0.01,
        "scheduler_step_size": 40,
    }
}

#### Import Data

In [None]:
# select the "Swarms.csv" from your local project drive
from google.colab import files
uploaded = files.upload()

# see the head of the dataframe
import io
import pandas as pd

df_climvar = pd.read_csv(io.BytesIO(uploaded['tmin_1980_2018.csv']))

Saving tmin_1980_2018.csv to tmin_1980_2018 (1).csv


In [None]:
# Inspect the data frame
df_climvar.head()

Unnamed: 0.1,Unnamed: 0,X_1980_01,X_1980_02,X_1980_03,X_1980_04,X_1980_05,X_1980_06,X_1980_07,X_1980_08,X_1980_09,X_1980_10,X_1980_11,X_1980_12,X_1981_01,X_1981_02,X_1981_03,X_1981_04,X_1981_05,X_1981_06,X_1981_07,X_1981_08,X_1981_09,X_1981_10,X_1981_11,X_1981_12,X_1982_01,X_1982_02,X_1982_03,X_1982_04,X_1982_05,X_1982_06,X_1982_07,X_1982_08,X_1982_09,X_1982_10,X_1982_11,X_1982_12,X_1983_01,X_1983_02,X_1983_03,...,X_2015_09,X_2015_10,X_2015_11,X_2015_12,X_2016_01,X_2016_02,X_2016_03,X_2016_04,X_2016_05,X_2016_06,X_2016_07,X_2016_08,X_2016_09,X_2016_10,X_2016_11,X_2016_12,X_2017_01,X_2017_02,X_2017_03,X_2017_04,X_2017_05,X_2017_06,X_2017_07,X_2017_08,X_2017_09,X_2017_10,X_2017_11,X_2017_12,X_2018_01,X_2018_02,X_2018_03,X_2018_04,X_2018_05,X_2018_06,X_2018_07,X_2018_08,X_2018_09,X_2018_10,X_2018_11,X_2018_12
0,83,14.006891,16.32407,17.907249,19.321846,18.378069,17.419968,15.489695,15.310701,15.746993,15.755427,14.548142,13.328924,13.885016,14.435006,16.918186,17.337471,17.058798,16.535593,14.638653,14.66643,14.085013,13.455427,12.706997,12.059133,14.151162,14.837611,17.118187,17.781742,16.922861,17.509031,16.2048,15.877368,15.761055,14.537719,13.633039,12.377882,12.936579,15.110006,16.869749,...,16.203243,16.171051,15.463768,13.466945,14.377726,16.672506,19.752041,19.470284,18.46661,17.70903,16.278757,16.313305,16.378242,15.444489,14.118977,14.384653,14.902205,16.327715,17.678083,18.99268,17.348383,18.071531,16.512091,15.810181,15.491263,15.284594,13.959601,13.950799,13.066267,17.56834,18.018707,18.23695,18.107237,16.764761,16.407925,15.277369,15.006888,15.34449,13.737205,13.455486
1,87,6.813125,8.901896,9.734489,12.125438,10.666479,10.144969,10.371103,9.586135,8.649292,7.461031,6.219031,5.224833,6.664687,7.034708,8.754802,10.117625,9.458667,9.254345,9.442979,8.928322,6.949291,5.251657,4.494031,3.832646,6.86,7.526896,8.918865,10.567625,9.261792,10.202782,10.891417,10.136135,8.732104,6.317282,5.331532,4.234208,5.806875,7.675333,8.698552,...,9.132105,7.881344,7.098719,5.277958,7.105312,9.176896,11.543865,12.250437,10.802417,10.387156,11.100792,10.648636,9.232104,7.231344,5.719031,6.159208,7.592813,8.859709,9.421989,11.684813,9.685229,10.81997,11.313291,10.193948,8.432104,7.151657,5.694031,5.840458,5.838125,10.242521,9.90949,10.955125,10.446167,9.524657,11.313291,9.578322,7.869604,7.134469,5.401844,5.288896
2,192,9.30148,11.761827,12.674699,13.448668,14.135401,13.117723,12.815877,12.752622,12.54394,12.674875,11.237862,10.029117,9.173529,10.026758,12.195533,11.802315,12.894949,12.289772,11.814662,12.053837,10.899843,10.410639,9.279528,9.063665,9.365716,10.026064,11.844317,11.957696,13.216998,13.227794,13.44678,12.960955,12.280746,11.41359,10.074667,9.516617,7.787244,10.677279,11.577651,...,12.971891,12.918972,12.436646,10.879637,10.062765,12.746376,14.483032,13.909432,14.45988,13.513556,13.642612,13.556788,13.130746,12.218972,10.840813,10.734498,9.11398,12.086827,13.00265,13.413599,13.689047,13.520675,13.714662,13.352621,12.598628,12.520187,10.767028,10.645783,8.231862,12.357661,12.840151,12.709433,13.234879,12.217724,13.142613,12.525886,11.738558,11.691021,10.56998,10.357068
3,193,9.63453,11.542589,12.629224,13.983024,13.69928,13.749825,13.684882,13.122119,12.46958,10.931791,9.636326,9.201748,9.43453,9.548666,11.573494,12.012537,12.320113,13.043401,12.849638,12.442952,10.763504,8.613562,7.674172,7.795672,9.649287,10.240333,11.787905,12.495523,12.3531,13.849825,14.284882,13.666042,12.348746,9.554881,8.921915,8.641158,8.619773,10.163423,11.550057,...,12.848746,11.098805,10.538582,9.666505,10.069773,11.783909,14.329224,14.091357,13.605356,14.064235,14.393214,13.966043,12.93399,10.390125,9.236325,9.906262,10.115953,11.504742,12.414814,13.683023,12.670113,14.335068,14.593215,13.52212,12.21958,10.740125,8.752992,9.545324,8.37203,11.871757,12.325404,13.018614,12.92237,12.853297,14.160229,12.763438,11.495275,10.157139,8.851082,9.345672
4,249,13.079427,15.544892,16.597277,17.810417,16.842098,16.452288,15.188854,14.689982,14.875163,14.772107,13.426065,12.551795,12.907552,13.549059,15.513424,15.822917,15.445744,15.81635,14.388854,13.990503,13.17933,12.555961,11.529711,11.192421,13.111197,14.099058,15.785298,16.226562,15.354597,16.552288,15.808645,15.286337,14.782454,13.497107,12.613045,11.747628,12.078906,14.186037,15.488945,...,15.27933,15.088773,14.4344,12.955441,13.491406,15.761559,18.313946,18.006771,16.758764,16.836143,15.988853,15.605608,15.39235,14.376273,13.038045,13.564295,13.799218,15.461038,16.313423,17.510416,15.742618,17.043955,16.192501,15.102483,14.590788,14.631482,12.739087,13.07992,12.008073,16.034996,16.502487,16.785938,16.230639,15.549162,15.945104,14.410815,14.043392,14.240336,12.638045,12.748149


In the csv file, there are data columns for every month for all years and two columns of year and month. for every row, we need to extract the value in the column with the datetime matching with the year and month columns, in order to get the climate/weather variables in the real-time relative to the occurence of the locust outbreak in records.

In [None]:
# Try to get some fake random data for the occurence date of the locust outbreak

#df_climvar['month'] = np.random.choice(month, size=len(df_climvar))
#df_climvar['year'] = np.random.choice(year, size=len(df_climvar))

In [None]:
# Names of the columns

names = list(df_climvar.columns)
names

Index(['Unnamed: 0', 'X_1980_01', 'X_1980_02', 'X_1980_03', 'X_1980_04',
       'X_1980_05', 'X_1980_06', 'X_1980_07', 'X_1980_08', 'X_1980_09',
       ...
       'X_2018_05', 'X_2018_06', 'X_2018_07', 'X_2018_08', 'X_2018_09',
       'X_2018_10', 'X_2018_11', 'X_2018_12', 'month', 'year'],
      dtype='object', length=404)

In [None]:
# This is a function to extract rowwise climate variables for specific datetime indicated in two columns: "year" and "month".
# the data will be extracted based on values of these columns.
# Indices of these columns can be indicated in the variables year_ind and month_ind to speed up computation.
# Otherwise the index of columns will be searched rowwise
# shift argument mean the month shift compared to the month indicated in the column.
# For example, data one month before the outbreak will be -1.

## Comment: 1985 to 1990 the data is missing, producing NaN values in the column

def extract_time(row,names,shift=0,year_ind=None,month_ind=None):
    """
    Extract WorldClim data from time series at the month of the outbreak.
    Description
    ----------
    Extract WorldClim data.
    Parameters
    ----------
    rows: single row
        A single row of the pandas dataframe; intend to use as lambda function with apply()
    names: list
        A list of the column names of the dataframe
    shift: integer
        The number of month shift when extracting climate information in time series, 
        further month indicated by positive and past month indicated by negative integers;
        default as 0
    year_ind: integer
        The indx of year column
        optional to speed up computation
    month_ind: integer
        The indx of month column
        optional to speed up computation
    Returns
    -------
    value: float
        A single float for the single row indicating climate variables at the month and year indicated
    """
  
  cleaned_list = np.nan_to_num(df_climvar.iloc[0,:], nan=-99)

  if year_ind==None or month_ind==None:
    years = np.arange(1985,2020,1)
    months = np.arange(1,13,1)

    col_year = np.where(np.isin(cleaned_list.astype('int32'),years))[0][0]
    col_month = np.where(np.isin(cleaned_list.astype('int32'),months))[0][0]
  else:
    col_year = year_ind
    col_month = month_ind

  try:
    occur_year = int(row[col_year])
    occur_month = int(row[col_month])
  except Exception:
    return np.nan

  list_0 = np.arange(1,10)
  if occur_month in list_0:
    search_year = str(occur_year)
    search_month = "0" + str(occur_month)
  else:
    search_year = str(occur_year)
    search_month = str(occur_month)
  search_str = "X_" + search_year + "_" + search_month
  #print(search_str)

  try:
    column_ind = names.index(search_str) + shift
    val = row[column_ind]
  except Exception:
    return np.nan

  return val



## Apply Function: Extracting Information from WorldClim Time Series

Using the previous function, we can extract WorldClim data from the given time series at the year and month of the locust outbreak. By doing so we can train our deep learning model not with average climate data but climate variables suitted to the time of outbreak which is important to inspect the behavior patterns of locust swarms in in-situ condition.

In [None]:
# new column for min temp extracted during the outbreak
df_climvar["min_temp_ob"] = df_climvar.apply(lambda x: extract_time(x,names), axis=1)

In [None]:
# new column for min temp extracted one month before the outbreak
df_climvar["min_temp_ob_last_month"] = df_climvar.apply(lambda x: extract_time(x,names,shift=-1), axis=1)

In [None]:
new_df = df_climvar[["year","month","min_temp_ob","min_temp_ob_last_month"]]
new_df.head()

Unnamed: 0,year,month,min_temp_ob,min_temp_ob_last_month
0,1996,4,13.050799,13.011164
1,1987,2,,
2,2000,11,11.871892,12.688906
3,1995,9,13.747034,12.664234
4,2007,9,13.838045,13.376273


## File Export

In [None]:
new_df.to_csv('climvar.csv')

In [None]:
from google.colab import files
files.download("climvar.csv")