# Crop Yield Forecasting with Landsat Imagery

## A general spatial approach to predicting crop yield for broadacre cropping with cloud processing of remote sensing imagery

#### Student: Mike Petrut

## Background

Dryland winter cropping refers to the cultivation of crops such as wheat, barley, canola, lupins and pulses which are not irrigated and are dependent on rainfall from late-Autumn through Winter. This project aims to look at what predictive value remote sensing data can add to crop yield forecasting by sourcing imagery over 30-years of to track the relationship between vegetation growth and crop performance from 1989 to 2020. There has been a growing field of academic research on correlating these variables at the paddock, local and small regional level (such as this article: Predicting Wheat Yield at the Field Scale by Combining High-Resolution Sentinel-2 Satellite Imagery and Crop Modelling, but not allot that attemps to preduct for large volumes of land cover, such as the larger regional or state level. 

With open source programming and cloud computing technologies becoming more accessable and powerfull, 

The model tests a regression analysis approach to using cloud processing of remote sensing data for predicting crop yield at the broad geographical level.

To access the data in this blog, and for instructions on how to set up the environment, clone the repository and run the model, visit [the project Github page](https://github.com/mike-petrut/dryland-crop-performance-modelling-project) 

### Area of Interest




#### South Australian Government Land Use Shapefile for Cereals Cropping
[](plot00_sa_folium.jpg)

## Data Collection & Processing

### Google Earth

The remote sensing data which is used in this model is sourced from Google Earth Engine (GEE). I have chosen to use this python API due to the speed and potential to process a large volume of imagery in the cloud over the 30-year study period. The code creates 2 collections, Landsat 5 data from 1989 to 1999, and Landsat 7 data from 1999 to 2020. 

In the repo the custom functions and workflow can be found in earth_engine.py

### ABARES 

The Australian Bureau of Agricultural and Resource Economics and Sciences publish annual production and area planted data for all states and crops. This is the data I will use as the historical actuals to calculate yield (yield calculated as production / hectares planted). 

In the repo the workflow for downloading, wrangling and formatting the ABARES data can be found in model_data_setup.py

Once the feature extraction and formatting of the raw excel data is complete we can visualize the historical data using the python plotting libraries. 


#### South Australian Cereal Cropping Production, Area Planted and Average Yield, 1989 -2020
![](plot01_sa_historical_1.JPG)

![](plot01_sa_historical_2.JPG)

## Methodology

This model aims to test the relationship between EVI & NDVI values over the key months of August, Spetember and October with final harvest crop yield. The model is a linear regression model which uses vegetation index as the exogenous variable (X variable) to predict crop yield (Y variable). 

The model is expressed as a standard linear regression equasion

<math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
  <mi>y</mi>
  <mo>=</mo>
  <mi>a</mi>
  <mo>+</mo>
  <mi>b</mi>
  <mi>x</mi>
  <mo>+</mo>
  <mtext>error</mtext>
  <mo></mo>
</math>

where: <br>
a is an intercept, and <br>
b is a slope

#### EVI vs. Cereal Cropping Yield in South Australia, 1989 - 2020

![](plot03_evi_vs_ield.jpg)

### Findings

The regression model findings show that EVI correlates to yield at a R2 of 0.434 and p value of 0.04E-05 meaning we reject the null hypothesis and accept the alternative hypothesis that there is a relationship between the EVI and Cereals Yield.

#### Regression Plot of EVI vs Cereal Crop Yield 1989-2020 for South Australia
[](reg_plot.jpg)

The correlation over the 30-years for SA is however not strong enough for a robust predictive model, hence the key takeaways are:

*	There is a relationship between NDVI/EVI and crop yield that can be examined further at different geographical scales to test for greater predictive value
*	There is a case to explore more modeling methodologies to test this hypothesis using more data inputs and different programming techniques
*	This modeling technique can be used for a high-level direction of what range yield is likely to fall in, but the predictive value is not high enough to be considered a basis for prescriptive actions. 

## Future Work

*	Source more local and regional time-series data from government and industry groups to test the model hypothesis across multiple regions incorporating soil data, elevation and other geographic variables.
*	Experiment with random forest models to further evaluate the impact each month throughout the year has on the final harvest yield. 



In [6]:

!jupyter nbconvert --execute --to html blog_post_final_210705.ipynb 


usage: jupyter-nbconvert [-h] [--debug] [--generate-config] [-y] [--execute]
                         [--allow-errors] [--stdin] [--stdout] [--inplace]
                         [--clear-output] [--no-prompt] [--no-input]
                         [--allow-chromium-download]
                         [--log-level NbConvertApp.log_level]
                         [--config NbConvertApp.config_file]
                         [--to NbConvertApp.export_format]
                         [--template TemplateExporter.template_name]
                         [--template-file TemplateExporter.template_file]
                         [--writer NbConvertApp.writer_class]
                         [--post NbConvertApp.postprocessor_class]
                         [--output NbConvertApp.output_base]
                         [--output-dir FilesWriter.build_directory]
                         [--reveal-prefix SlidesExporter.reveal_url_prefix]
                         [--nbformat NotebookExporter.nbformat_vers