A ML and DL based website which recommends the best crop to grow, fertilizers to use and the diseases caught by your crops.
The agriculture sector is undergoing a transformation driven by new technologies, which seems very promising as it will enable this primary sector to move to the next level of farm productivity and profitability. Precision Agriculture, which consist of applying inputs (what is needed) when and where is needed, has become the third wave of the modem agriculture revolution (the first was mechanization and the second the green revolution with its genetic modification), and nowadays, it is being enhanced with an increase of farm knowledge systems due to the availability of larger amounts of data. The Ministry of Agriculture and Farmers Welfare (formerly Ministry of Agriculture) already reported in October 2019 that Precision Agriculture technologies increased net returns and operating profits. Also, when considering the environment, new technologies are increasingly being applied in the farms to maintain the sustainability of farm production. However, the adoption of these technologies involves uncertainty and trade-offs. According to a market analysis, the factors that would facilitate the adoption of sustainable farming technologies include better education and training of farmers, sharing of information, easy availability of financial resources, and increasing consumer demand for organic food. When applying these new technologies, the challenge for retrieving data from crops is to come out with something coherent and valuable, because data themselves are not useful, just numbers or images. Farms that decide to be technology-driven in some way, show valuable advantages, such us saving money and work, having an increased production or a reduction of costs with minimal effort, and producing quality food with more environmentally friendly practices. However, taking these advantages to the farm will depend, not only on the willingness of producers. The automation in agriculture is the main concern and the emerging subject across the world. The population is increasing tremendously and with this increase the demand of food and employment is also increasing. The traditional methods which were used by the farmers, were not sufficient enough to fulfill these requirements. Thus, new automated methods were introduced. These new methods satisfied the food requirements and also provided employment opportunities to billions of people. Machine learning in agriculture has brought an agriculture revolution. This technology has protected the crop yield from various factors like the climate changes, population growth, Diseases, which is best fertilizer for crop employment issues and the food security problems.
Chapter 2
Review of Literature
2.1 Survey Existing system
In the modern synopsis of the industrial revolution, where we have a limited amount of resources and their proper utilization is a subject of great concern, whether it’s the utilization of water or utilization of minerals from ores all this indirectly affects our lives. With the limited availability of resources and increased consumption there prices have been rising up and so there sustainable utilization is necessary. Similarly, in the case of Farming where we need to feed a large number of customers, any kind of loss at any stage proves to be a huge loss to the economy and the user as well. Moreover, there is a lack of research data in this field. The main motive is to bring AI and Machine Applied Farming to India, to ample up the technical application of AI and Machine Learning among Farmers, Researchers, and Government. Soil Fertilizer detection and crop to grow system: The type of soil and nutrition of soil plays an important factor in the type of crop is grown and the quality of the crop. Due to increasing,
deforestation soil quality is degrading and it’s hard to determine the quality of the soil. A German- based tech start-up PEAT has developed an AI-based application called Plantix that can identify the
nutrient deficiencies in soil including plant pests and diseases by which farmers can also get an idea to use fertilizer which helps to improve harvest quality. This app uses image recognition-based technology. The farmer can capture images of plants using smartphones. We can also see soil restoration techniques with tips and other solutions through short videos on this application. Similarly, Trace Genomics is another machine learning-based company that helps farmers to do a soil analysis to farmers. Such type of app helps farmers to monitor soil and crop’s health conditions and produce healthy crops with a higher level of productivity. Precision Farming and Predictive Analytics: AI applications in agriculture have developed applications and tools which help farmer’s inaccurate and controlled farming by providing them proper guidance to farmers about water management, crop rotation, timely harvesting, and type of crop to be grown, optimum planting, pest attacks, and nutrition management. While using the machine learning algorithms in connection with images captured by satellites and drones, AI-enabled technologies predict weather conditions, analyze crop sustainability and evaluate farms for the presence of diseases or pests and poor plant nutrition on farms with data like temperature, precipitation, wind speed, and solar radiation.
2.2 Limitation existing system or research gap
Although Artificial intelligence and Machine Learning improves the agriculture industry in many amazing ways, there are many concerns regarding the forthcoming of ML on employment and the workforce of the agricultural sectors, Agriculture is a $3 trillion industry that employs over 1.5 billion people, which is a whopping 20% of the world’s population, There are predictions of there being millions of unemployed field workers in the next decades primarily due to the impact of AI and ML in the agriculture industry. Field tasks which are monotonous can be easily automated this can gradually make certain roles obsolete, Human labor will be replaced by smart robots that can safely navigate the space, find and move agricultural products as well as perform simple and complex field operations. The idea of trusting data and algorithms more than our own judgment has its pros and cons. obviously, we benefit from these algorithms, and otherwise, we wouldn’t be using them in the first place. These algorithms allow us to automate processes by making informed judgments using available data. Sometimes, however, this means replacing someone’s job with an algorithm, which comes with ethical ramifications. Additionally, who do we blame if something goes wrong? Same thing can happen here. The most commonly discussed case currently is self-driving cars — how do we choose how the vehicle should react in the event of a fatal collision? In the future will we have to select which ethical framework we want our self-driving car to follow when we are purchasing the vehicle? Machine learning is incredibly powerful for sensors and can be used to help calibrate and correct sensors when connected to other sensors measuring environmental variables such as temperature, pressure, and humidity. The correlations between the signals from these sensors can be used to develop self-calibration procedures and this is a hot research topic in my research field of atmospheric chemistry. So Accuracy of this project is not so much. However, things get a bit more interesting when it comes to computational modeling. Running computer models that simulate global weather, emissions from the planet, and transport of these emissions is very computationally expensive. In fact, it is so computationally expensive, that a research-level simulation can take weeks even when running on a supercomputer. Machine learning is stochastic, not deterministic. Lack of Data Many machine learning algorithms require large amounts of data before they begin to give useful results. A good example of this is a neural network. Neural networks are data-eating machines that require copious amounts of training data. The larger the architecture, the more data is needed to produce viable results. Reusing data is a bad idea, and data augmentation is useful to some extent, but having more data is always the preferred solution. Lack of Good Data Despite the appearance, this is not the same as the above comment. Let’s imagine you think you can cheat by generating ten thousand fake data points to put in your neural network. What happens when you put it in? It will train itself, and then when you come to test it on an unseen data set, it will not perform well. You had the data but the quality of the data was not up to scratch. In the same way that having a lack of good features can cause your algorithm to perform poorly, having a lack of good ground truth data can also limit the capabilities of your model. No company is going to implement a machine learning model that performs worse than human-level error.
2.3 Problem Statement and Objective In the future, AI will help farmers evolve into agricultural technologists, using data to optimize yields down to individual rows of plants. AI & ML companies are developing algorithms that can easily perform multiple tasks in farming fields. This type of algorithms is trained to control diseases and harvest crops at a faster pace with higher volumes compared to humans. These types of algorithms are trained to check the quality of soil and detect weed with picking and packing of crops at the same time. These algorithms are also capable to fight with challenges faced by agricultural force labor. AI-enabled system to detect pests: Pests are one of the worst enemies of the farmers which damages crops. The Motive of project is to detect and determine the nature and quality of soil based in a particular area, considering the toxicity level at present instance of time and predict its future value using AI model. The Main objectives of the Project are:- We have a limited amount of resources and their proper utilization is a subject of great concern, whether it’s the utilization of Resources done properly indirectly affects our lives. Moreover, there is a lack of research data in this field, this would produce a huge collection of data for the farmers. Data, tons of data, collected by smart agriculture sensors, e.g. weather conditions, soil quality, and crop’s growth progress. This data can be used to track the state of your business in general as well as staff performance, equipment efficiency, etc. Better control over the internal processes and, as a result, lower production risks. The ability to foresee the output of your production allows you to plan for better product distribution. Being able to see any anomalies in the crop growth you will be able to mitigate the risks of losing your yield. Increased business efficiency through process automation. By using our App, you can automate multiple processes across your production cycle, e.g. which Crop is Suitable to grow, fertilizing, or Diseases control Enhanced product quality and volumes. Achieve better control over the production process and maintain higher standards of crop quality and growth capacity through automation.
2.4 Scope
The Green Revolution during the 1950s and 1960s remarkably drove up the global food production around the world, saving a billion people from starvation. The revolution led to the adoption of new technologies like high-yielding varieties (HYVs) of cereals, chemical fertilizers and aggro-chemicals, better irrigation and mechanization of cultivation methods. India followed suite and adopted the use of hybrid seeds, machine, fertilizers and pesticides. While these practices solved the food shortage problem, they created some problems too in terms of excessive use of fertilizers and pesticides, depletion of ground-water, soil degradation etc. These problems were exacerbated by lack of training to use modern technology and awareness about the correct usage of chemicals etc. According to the UN Food and Agriculture Organization, the global population will increase by 2 billion by 2050. With limited arable land available and exponentially increasing mouths to feed, we’re now in need of a second Green Revolution. Predictive and Recommendation Analytic in our Project – AI and Machine learning can help farmers by recommending the sowing dates for different crops based on weather conditions. ML models can also suggest tweaks in cropping patterns to boost yields. Using the historic production data, weather forecasts, seed information, and demand and supply information, ML can be used to forecast the amount of seed that should be grown to fulfill the growing needs. ML and Deep learning applications are used to identify potential defects and nutrient deficiencies in the soil. The algorithms analyses the soil samples and correlate particular foliage patterns with certain soil defects, plant pests and diseases. Identifying Plant Diseases – Crop images are analyzed using computer vision technology and segmented into areas like background, healthy part and diseased part. The diseased part is then captured and sent to remote labs for further diagnosis. Similarly, the leaf images pre-processing help early detection of pest’s infestation and allow farmers to act quickly and minimize losses.
Chapter 3 Proposed System
3.1 Data-set Collection The dataset containing the soil specific attributes which are collected from Polytest Laboratories soil testing lab, Pune, Maharashtra, India. In addition, similar sources of general crop data were also used from Marathwada University. The crops considered in our model include groundnut, pulses, cotton, vegetables, banana, paddy, sorghum, sugarcane, coriander. The number of examples of each crop available in the training dataset is shown. The attributes considered where Depth, Texture, Ph, Soil Color, Permeability, Drainage, Water holding and Erosion. The above stated parameters of soil play a major role in the crop's ability to remove water and nutrients from the soil. For crop growth to their possible, the soil must provide acceptable environment for it. Soil is the anchor of the roots. The water holding capacity determines the crop's ability to absorb nutrients and other nutrients that are changed into ions, which is the form that the plant can use. Texture determines how porous the soil is and the comfort of air and water movement which is essential to prevent the plants from becoming waterlogged. The level of acidity or alkalinity (Ph) is a master variable which affects the availability of soil nutrients. The activity of microorganisms present in the soil and also the level of exchangeable aluminum can be affected by PH. The water holding and drainage determine the infiltration of roots. Hence for the following reasons the above stated parameters are considered for choosing a crop.
3.2 Crop Prediction using Ensembling technique
Ensemble is a data mining model also known as the Model Combiners that combine the power of two or more models to attain better prediction, efficiency than any of its models could achieve alone. In our system, we use one of the most familiar Ensembling technique called Majority Voting technique. In the voting technique any number of base learners can be used. There should be at least two base learners. The learners are chosen in a way that they are capable to each other yet being complimentary also. Higher the competition higher is the chance of better prediction. But it is necessary for the learners to be complimentary because when one or few members make an error, the probability of the remaining members correcting this error would be high. Each learner builds itself into a model. The model gets trained using the training data set provided. When a new data has to be classified, each model predicts the class on its own. Finally, the class which is predicted by majority of the learners is voted to be the class label of the new sample.
3.3 Algorithms (Learning models) Machine Learning Algorithm: Different machine learning algorithms are being used in order to make comparisons. The different algorithms used are as follows: Logistic Regression Logistic regression is a process of modeling the probability of a discrete outcome given an input variable. The most common logistic regression models a binary outcome; something that can take two values such as true/false, yes/no, and so on. Multinomial logistic regression can model scenarios where there are more than two possible discrete outcomes. Logistic regression is a useful analysis method for classification problems, where you are trying to determine if a new sample fits best into a category. Decision Tree A decision tree is a non-parametric method of supervised learning technique. Throughout the process a tree like structure is formed. In this, the dataset is broken down to build upon the tree subsequently. Finally, the resulting output is a tree like structure with both decision nodes and leaf nodes. Decision nodes can either have two or more branches while the leaf nodes indicate the final nodes representing classification or regression result. The topmost node is the root node and the one with higher gain (or Gini index) value is taken to be the root. Decision trees have the ability to classify both categorical and numerical data. Naive Bayes This classifier has features that are statically independent to one another. Most of the other classifiers predict some amount of correlation between the features but Naive Bayes models its different features as independent features given its class. This implements a restriction on the given data, but in practice naive bayes have more sophisticated techniques to use and enjoy some theoretical support for improving its efficiency. Naive Bayes classifiers can take different high dimensional features with very less number of training data and they are also very highly scalable classifiers. XGBoost XGBoost is one of the most popular machine learning algorithm these days. Regardless of the type of prediction task at hand, regression or classification. XGBoost is well known to provide better solutions than other machine learning algorithms. In fact, since its inception, it has become the "state- of-the-art” machine learning algorithm to deal with structured data. Speed and performance: Originally written in C++, it is comparatively faster than other ensemble classifiers. Core algorithm is parallelizable: Because the core XGBoost algorithm is parallelizable it can harness the power of multi-core computers. It is also parallelizable onto GPU’s and across networks of computers making it feasible to train on very large datasets as well. Consistently outperforms other algorithm methods: It has shown better performance on a variety of machine learning benchmark datasets. Wide variety of tuning parameters: XGBoost internally has parameters for cross-validation, regularization, user-defined objective functions, missing values, tree parameters, scikit-learn compatible API etc.
Support Vector Machine (SVM) An SVM that is Support Vector Machine is an example of a supervised machine learning model which has many learning algorithms that analyzes the data that is used for solving both classification and regression problems. We are given some training samples where each sample is marked such that it belongs to one or other of the two initially given categories. Support Vector Model algorithm creates a model where it allots new samples to any of the given categories. An SVM represents many examples that are taken as dots in space such that the samples belonging to different groups are partitioned with a gap between them.
Random Forest A random forest is a supervised machine learning algorithm that is constructed from decision tree algorithms. A random forest is a machine learning technique that’s used to solve regression and classification problems. It utilizes ensemble learning, which is a technique that combines many classifiers to provide solutions to complex problems. A random forest algorithm consists of many decision trees. The ‘forest’ generated by the random
forest algorithm is trained through bagging or bootstrap aggregating. Bagging is an ensemble meta- algorithm that improves the accuracy of machine learning algorithms.
The (random forest) algorithm establishes the outcome based on the predictions of the decision trees. It predicts by taking the average or mean of the output from various trees. Increasing the number of trees increases the precision of the outcome. A random forest eradicates the limitations of a decision tree algorithm. It reduces the overfitting of datasets and increases precision. It generates predictions without requiring many configurations in packages (like scikit-learn).
3.4 Software and Hardware Requirements Software Requirements
Python Python is an interpreted high-level general-purpose programming language. Its design philosophy emphasizes code readability with its use of significant indentation. Its language constructs as well as
its object-oriented approach aim to help programmers write clear, logical code for small and large- scale projects.
IDE Visual Studio Code is a source-code editor made by Microsoft for Windows, Linux and macOS. Features include support for debugging, syntax highlighting, intelligent code completion, snippets, code refactoring, and embedded Git. Anaconda Anaconda is a distribution of the Python and R programming languages for scientific computing, that aims to simplify package management and deployment. The distribution includes data-science packages suitable for Windows, Linux, and macOS. Jupyter Notebook The Jupiter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more.
Hardware Requirements Minimum hardware requirements are very dependent on the particular software being developed by a given End-thought Python / Canopy / VS Code user. Applications that need to store large arrays/objects in memory will require more RAM, whereas applications that need to perform numerous calculations or tasks more quickly will require a faster processor.
We find that the following list represents the minimum requirements needed to install End-thought Python 3.9 and associated applications:- Modern Operating System: Windows 7 or 10 Mac OS X 10.11 or higher, 64-bit Linux: RHEL 6/7, 64-bit (almost all libraries also work in Ubuntu) x86 64-bit CPU (Intel / AMD architecture) 4 GB RAM 5 GB free disk space
13
3.5 Frameworks Required
-
Pandas: is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.
-
NumPy: is a library for the Python programming language, adding support for large, multi- dimensional arrays and matrices, along with a large collection of high-level mathematical
functions to operate on these arrays. 3. MatPlotLib: Matplotlib is a plotting library for the Python programming language and its numerical mathematics extension NumPy. Matplotlib is a library for making 2D plots of arrays in Python. It provides an object-oriented API for embedding plots into applications using general-purpose GUI toolkits like wxPython, Qt, or GTK+.The Matplotlib code is conceptually divided into three parts: the pylab interface is the set of functions provided by Matplotlib. The Matplotlib frontend or Matplotlib API is the set of classes that do the heavy lifting, creating and managing figures, text, lines, plots and so on. This is an abstract interface that knows nothing about output. The back ends are device-dependent drawing devices, aka renderers, that transform the frontend representation to hardcopy or a display device. 4. Seaborn: an open-source Python library built on top of matplotlib. It is used for data visualization and exploratory data analysis. Seaborn works easily with data frames and the Pandas library. The graphs created can also be customized easily. 5. Scikit-learn: the most useful and robust library for machine learning in Python. It provides a selection of efficient tools for machine learning and statistical modeling including classification, regression, and clustering and dimensionality reduction via a consistence interface in Python. 6. Flask: Flask is a micro web framework written in Python. It is classified as a micro framework because it does not require particular tools or libraries.[2] It has no database abstraction layer, form validation, or any other components where pre-existing third-party libraries provide common functions.
Flask supports extensions that can add application features as if they were implemented in Flask itself. Extensions exist for object-relational mappers Form validation, upload handling, various open authentication technologies and several common framework related tools. 7. Bootstrap Bootstrap is a free and open-source CSS framework directed at responsive,
mobile-first front-end web development. It contains HTML, CSS and (optionally) JavaScript- based design templates for typography, forms, buttons, navigation, and other interface
components.
3.6 hosting Service Cloud Platform (Heroku) Heroku is a cloud platform as a service (PaaS) supporting several programming languages. One of the first cloud platforms, Heroku has been in development since June 2007, when it supported only the Ruby programming language, but now supports Java, Node.js, Scala, Clojure, Python, PHP, and Go. For this reason, Heroku is said to be a polyglot platform as it has features for a developer to build, run and scale applications in a similar manner across most languages. Applications that are run on Heroku typically have a unique domain used to route HTTP requests to the correct application container or dyno. Each of the dynos are spread across a "dyno grid" which consists of several servers. Heroku's Git server handles application repository pushes from permitted users. All Heroku services are hosted on Amazon's EC2 cloud-computing platform. Running Apps in Heroku
- Start your app locally To locally start all of the process types that are defined in your Procfile:heroku local heroku local web You can now test the app locally. Press Ctrl+C to shut it down when you are done.
Here are some of the command line options: To use a different Procfile, use the -f flag: heroku local -f Procfile.test. To use a different environment file, use the -e flag: heroku local -e .env.test. to use a different port, use the -p flag: heroku local -p 7000. If you don’t specify a port, 5000 is used. 2. Set up your local environment variables When running your app, you will typically use a set of config vars to capture the configuration of the app. For example: say your app uses S3 for image storage. You would want to store the credentials to S3 as config vars. If you’re running your app locally, you typically want to use a different S3 bucket than if you were running it in production. The .env file lets you capture all the config vars that you need in order to run your app locally. When you start your app using any of the heroku local commands, the .env file is read, and each name/value pair is inserted into the environment, to mimic the action of config vars. 3. Copy Heroku config vars to your local .env file Sometimes you may want to use the same config var in both local and Heroku environments. For each config var that you want to add to your .env file, use the following command: heroku config:get CONFIG-VAR-NAME -s >> .envDo 4. Run your app locally using Foreman As an alternative to using Heroku Local, you can still use Foreman to run your app locally. It’s not officially supported but if you want to use it, you can get more information by visiting the Foreman GitHub repository.Foreman is a command-line tool for running Procfile-backed apps. 5. Start your app locally foreman start If you had a Procfile with both web and worker process types, Foreman will start one of each process type, with the output interleaved on your terminal. Your web process loads on port 5000 because this is what Foreman provides as a default in the $PORT env var.
-
Crop Recommendation system ==> enter the corresponding nutrient values of your soil, state and city. Note that, the N-P-K (Nitrogen-Phosphorous-Pottasium) values to be entered should be the ratio between them. Refer this website for more information. Note: When you enter the city name, make sure to enter mostly common city names. Remote cities/towns may not be available in the Weather API from where humidity, temperature data is fetched.
-
Fertilizer suggestion system ==> Enter the nutrient contents of your soil and the crop you want to grow. The algorithm will tell which nutrient the soil has excess of or lacks. Accordingly, it will give suggestions for buying fertilizers.
-
Disease Detection System ==> Upload an image of leaf of your plant. The algorithm will tell the crop type and whether it is diseased or healthy. If it is diseased, it will tell you the cause of the disease and suggest you how to prevent/cure the disease accordingly. Note that, for now it only supports following crops
for more information visit- https://drive.google.com/drive/folders/13barmjZebtTzhCcVbZIPWXTPsYHSwtI3?usp=sharing its has full information
The App has now been hosted, the link:- https://easy-farm.herokuapp.com/