# Introduction


Crop yield prediction is an essential predictive analytics technique in the agriculture industry. It is an agricultural practice that can help farmers and farming businesses predict crop yield in a particular season when to plant a crop, and when to harvest for better crop yield. Predictive analytics is a powerful tool that can help to improve decision-making in the agriculture industry. It can be used for crop yield prediction, risk mitigation, reducing the cost of fertilizers, etc. The crop yield prediction using ML and flask deployment will find analysis on weather conditions, soil quality, fruit set, fruit mass, etc.

# Learning Objectives


We will briefly go through the end-to-end project to predict crop yield using pollination simulation modeling.
We will follow each step of the data science project lifecycle including data exploration, pre-processing, modeling, evaluation, and deployment.
Finally, we will deploy the model using Flask API on a cloud service platform called render.
So let’s get started with this exciting real-world problem statement.

# Table of contents

1.Introduction
2.Project Description of Crop Yield Prediction
3.What is the Pollination Simulation Model?
4.Problem Statement
5.Pre-requisites
6.Data Description
7.Loading Dataset
8.Exploratory Data Analysis
9.Data Pre-processing and Data Preparation
10.Modeling and Evaluation
11.Deployment of the Model Using FlaskAPI
12.Conclusion
13.Frequently Asked Questions

# Project Description of Crop Yield Prediction


The dataset used for this project was generated using a spacial-explicit simulation computing model to analyze and study various factors that affect the wild-blue berry prediction including:

Plant spatial arrangement
Outcrossing and self-pollination
Bee species compositions
Weather conditions (in isolation and in combination) affect pollination efficiency and yield of the wild blueberry in the agricultural ecosystem.
The simulation model has been validated by the field observation and experimental data collected in Maine, USA, and Canadian Maritimes during the last 30 years and now is a useful tool for hypothesis testing and estimation of wild blueberry yield prediction. This simulated data provides researchers with actual data collected from the field for various experiments on crop yield prediction as well as provides data for developers and data scientists to build real-world machine learning models for crop yield prediction.

# What is the Pollination Simulation Model?

Pollination simulation modeling is the process of using computer models to simulate the process of pollination. There are various use cases of pollination simulation such as:
Studying the effects of different factors on pollination, such as climate change, habitat loss, and pesticides
Designing pollination-friendly landscapes
Predicting the impact of pollination on crop yields
Pollination simulation models can be used to study the movement of pollen grains between flowers, the timing of pollination events, and the effectiveness of different pollination strategies. This information can be used to improve pollination rates and crop yields which can further help farmers to produce crops effectively with optimal yield.

Pollination simulation models are still under development, but they have the potential to play an important role in the future of agriculture. By understanding how pollination works, we can better protect and manage this essential process.

In our project, we will use a dataset with various features like ‘clonesize’, ‘honeybee’, ‘RainingDays’, ‘AverageRainingDays’, etc., which were created using a pollination simulation process to estimate crop yield.

# Problem Statement

In this project, our task is to classify yield variable (target feature) based on the other 17 features step-by-step by going through each day’s task. The evaluation metrics will be RMSE scored. We will deploy the model using Python’s Flask framework on a cloud-based platform.

# Pre-requisites

This project is well-suited for intermediate learners of data science and machine learning to build their portfolio projects. begineers in the field can take up this project if they are familiar with below skills:

Knowledge of Python programming language, and machine learning algorithms using the scikit-learn library
Basic understanding of website development using Python’s Flask framework
Understanding of Regression evaluation metrics

# Data Description

In this section, we will look the each and every variable of the dataset for our project.

Clonesize — m2 — The average blueberry clone size in the field
Honeybee — bees/m2/min — Honeybee density in the field
Bumbles — bees/m2/min — Bumblebee density in the field
Andrena — bees/m2/min — Andrena bee density in the field
Osmia — bees/m2/min — Osmia bee density in the field
MaxOfUpperTRange — ℃ —The highest record of the upper band daily air temperature during the bloom season
MinOfUpperTRange — ℃ — The lowest record of the upper band daily air temperature
AverageOfUpperTRange — ℃ — The average of the upper band daily air temperature
MaxOfLowerTRange — ℃ — The highest record of the lower band daily air temperature
MinOfLowerTRange — ℃ — The lowest record of the lower band daily air temperature
AverageOfLowerTRange — ℃ — The average of the lower band daily air temperature
RainingDays — Day — The total number of days during the bloom season, each of which has precipitation larger than zero
AverageRainingDays — Day — The average of rainy days in the entire bloom season
Fruitset — Transitioning time of fruit set
Fruitmass — Mass of the fruit set
Seeds — Number of seeds in fruitset
Yield — Crop yield (A target variable)

# What is the value of this data for crop prediction use-case?

This dataset provides practical information on wild blueberry plant spatial traits, bee species, and weather situations. Therefore, it enables researchers and developers to build machine learning models for early prediction of blueberry yield.
This dataset can be essential for other researchers who have field observation data but wants to test and evaluate the performance of different machine learning algorithms by comparing the use of real data against computer simulation generated data as input in crop yield prediction.
Educationalists at different levels can use the dataset for training machine learning classification or regression problems in the agricultural industry.

# Loading Dataset

In this section, we will load the dataset in whichever environment you are working on. Load the dataset in the kaggle environment. Use the kaggle dataset or download it to your local machine and run it on the local environment.

Dataset source: Click Here

Let’s look at the code to load the dataset and load the libraries for the project.

In [7]:
import numpy as np # linear algebra
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
df = pd.read_csv("", index_col='Row#')