Wheat_Prediction

Problem Statement:

Given two years of winter wheat data, try and predict the wheat yield for several counties in the United States.

The data can be obtained from

--> 2013: https://aerialintel.blob.core.windows.net/recruiting/datasets/wheat-2013-supervised.csv

--> 2014: https://aerialintel.blob.core.windows.net/recruiting/datasets/wheat-2014-supervised.csv

The code for this analysis is written in python using ipython notebook.

Approach:

Firstly, all the location based fields like County, state, latitude, longitude is removed and the data for both the years are joined to make a single training dataset.
All row with 'NA' values are removed ( The number of rows are significantly less, 615 rows in 360,042 rows)
I split the data in training and test set ( test split = 0.3)
I implemented different regression models for prediction:-

a.) RandomForest Regression b.) Gradient Boosting Regression c.) Feed-Forward Neural Network regression d.) K Nearest Neighbour regression
Metric used for evaluation is "Mean squared error"

Results

The best approach among all of these was RandomForest Regression with a MSE value of 32.54

The MSE values for all the models are listed in the table below:

S.No	Models	MSE
1.	RandomForest Regression	32.54
2.	Gradient Boosting Regression	46.57
3.	Feed-Forward Neural Network Regression	129
4.	K Nearest Neighbour Regression	41.16

Technical Choices

Are annotated in the ipython notebook itself.

Key Findings and Insights

In each model we can observe that the importance of features like precipTypeIsOther, precipTypeIsSnow, precipTypeIsRain, precipAccumulation, precipProbability is pretty low in predicting the wheat yield, Therefore, these features can be ignored further analysis.
On the contrary features like windSpeed, windBearing, pressure, NDVI, DayInSeason play a vital role in predicting wheat yield.
Out of 150 counties only 19 have more than 1 value of Yield for a particular winter cycle. Therefore, it further supports the fact that location based fields are not a very good indicator and it is wise to ignore them.
The yield for a particular combination of longitude and latidue always remains the same in a particular winter cycle.

Improvements

Using the US census data on agriculture, we can use various features on county level

County wise(Census Data) on agriculture (2007, adjusted) --> https://www.census.gov/support/USACdataDownloads.html

Total number of farms in each county( Assumption: The more the number of farms the more the yield){ The above data gives us 2007 census (adjusted)}
Average age of farm operators( Hypothesis: If average age should be in between 30-40, the yield should increase, as this age group have a unique combination of youthful energy and experience and would work harder and smarter, while higher average age, shows that, the youth is losing interest and that would have detrimental effect on farming.)
Average size of farms( Hypothesis: As the average size increases, the yield should increase)
The Feed-Forward Neural Network shows the least predictive power, but if more time is devoted it could be further optimized to perform better( XGBoost also did not perform quite well in the first iteration but after optimizing the hyper parameters associated with it, the performance improved significantly).

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
README.md		README.md
wheat_prediction.ipynb		wheat_prediction.ipynb
wheat_prediction.py		wheat_prediction.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Wheat_Prediction

Problem Statement:

The code for this analysis is written in python using ipython notebook.

Approach:

Results

The MSE values for all the models are listed in the table below:

Technical Choices

Key Findings and Insights

Improvements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Wheat_Prediction

Problem Statement:

The code for this analysis is written in python using ipython notebook.

Approach:

Results

The MSE values for all the models are listed in the table below:

Technical Choices

Key Findings and Insights

Improvements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages