# Title: Mapping of boro rice area in Bangladesh using supervised machine learning algorithm
## Group # 4
### Group Members:
Hasan Md. Hamidur Rahman, BARC

Md. Manik Sarker, DAE

Marina Afroze, BARI

Mst. Shamsun Naher, BARI


## 1. Introduction
Rice is the staple food in Bangladesh. There are three rice seasons in Bangladesh namely aus, aman and boro. Among them boro is the most cultivated one and the lion share of the rice grown in boro season. The cultivation time of boro lies between mid-November to May. The rice production area mapping is a very important task in Bangladesh, as because there is no such real time rice area mapping tool in place. Therefore our objective is to mapping the boro rice area of the whole country. We have survey data of boro rice fields from different locations of the country during December 2020 to May 2021. Using high resolution Sentinel-2 images from Google Earth Engine (GEE), the boro cultivation area for the whole country is estimated. Machine learning algorithms are used for the classification of rice and non-rice area. The area estimation leads to yield estimation which plays important in policy planning, price fixing, export-import planning, crop management and planning etc. The outcome of the task will be visible in the livelihood improvement of farmers as well as the increase of the overall economy of the country.

## 2. Method
While capturing geolocations (latitude, longitude) by handheld GPS, field photos are also taken by digital camera. A total of 305 samples were collected out of which 218 are boro rice area and 87 are non-rice area like settlements, water bodies, salt pan, crops other than rice etc. Based on the camera photo, polygons of rice fields and non-rice fields are drawn for each geolocation using Google Earth Pro. Once the polygon shape file is generated, it is our signature data from where we get the spectral band of rice and non-rice objects. The feature vector consists of the spectral bands and acquisition date of the images. We used R, G, B and NIR spectral bands.
The remote sensing image (10 meter resolution of Sentinel-2) of all over Bangladesh was used from Google Earth Engine (GEE) during 01/12/2020 to 31/03/2021 as reference image for classification of rice/ non-rice area. The classification of rice/ non-rice fields are done using supervised machine learning algorithms such as Classification and Regression Trees (CART) and Random Forest. Initially we proposed Artificial Neural Network (ANN), but due to time constraint, we could not implement ANN. Rather we used CART and Random Forest algorithm.The process that we followed is as follows:

### 2.1 Data preparation
1. The GPS point data (KML format) captured during 01/12/2020 to 31/03/2021 is transfered to shape file
2. From the point data, the polygon data is prepared using Google Earth Pro
3. The polygon shape file is uploaded in GEE
4. Sentinel-2 raster image during 01/12/2020 to 31/03/2021 for Bangladesh (ROI) is imported from GEE as reference data 

### 2.2 Apply CART and Random Forest Algorithm
1. In GEE platform, apply CART and Random Forest algorithm separately on the raster image using the polygon shape file
2. Calculate the accuracy of the individual models

To complete the work, we first written code in GEE and then converted it in Python Notebook.

As the target variable is rice/ non-rice category that is why, supervised classification method is chosen. 70% of the field (signature) data is used to train the model and 30% is used to test the model. Accuracy of the model is calculated after the model runs.
The accuracy of the algorithm calculated is Overall Accuracy (OA), Producer Accuracy (PA) and Kappa.




## 3. Results
### 3.1 We observed following results after running the CART and Random Forest algorithm respectively.


In [37]:
# Importing GEE libraries
import ee
import geemap

try:
  ee.Initialize()
  print('Google Earth Engine has initialized successfully!')
except ee.EEException as e:
  print('Google Earth Engine has failed to initialize!')
except:
    print("Unexpected error:", sys.exc_info()[0])
    raise


Google Earth Engine has initialized successfully!


## Classification and Regression Trees (CART)

In [38]:
Map_cart = geemap.Map()

countries_cart= ee.FeatureCollection('USDOS/LSIB_SIMPLE/2017')
roi_cart= countries_cart.filter(ee.Filter.eq('country_na','Bangladesh'))
Map_cart.addLayer(roi_cart,{},'Bangladesh')

image_cart= ee.ImageCollection("COPERNICUS/S2_SR") \
  .filterDate('2020-12-01','2021-03-31') \
  .filter(ee.Filter.lt('CLOUDY_PIXEL_PERCENTAGE',20)) \
  .filterBounds(roi_cart) \
  .median()

vizparams_cart= {'bands':['B4', 'B3', 'B2'], 'min':0, 'max':2500, 'gamma':1.1}
Map_cart.addLayer(image_cart.clip(roi_cart),vizparams_cart,'Sentinel-2')
Map_cart.centerObject(roi_cart,7)


fc_cart=ee.FeatureCollection('users/hrahman/FieldData_Rice_NonRice')


training_cart=fc_cart
# training= Rice.merge(NonRice)
# print(fc_cart.getInfo())
# print (training_cart.toDictionary().getInfo())

first = training_cart.first()
props = first.propertyNames().getInfo()
print(props)
values = training_cart.first().toDictionary(props).getInfo()
print(values)


# for i in range(10): 
#     print(values.items())


label_cart= "id"
bands_cart=['B2','B3','B4','B8']
input= image_cart.select(bands_cart)

trainingImage_cart=input.sampleRegions(**{
   'collection':training_cart,
   'properties':[label_cart],
   'scale':30
 })

trainingData_cart=trainingImage_cart.randomColumn()
# print (trainingData)
trainingSet_cart=trainingData_cart.filter(ee.Filter.lessThan('random',0.7))
testSet_cart=trainingData_cart.filter(ee.Filter.greaterThanOrEquals('random',0.7))
# print(trainingSet)
# print(testSet)

classifier_cart= ee.Classifier.smileCart().train(trainingSet_cart, label_cart, bands_cart)
classified_cart=input.classify(classifier_cart)
# print(classified)

landcoverPallete_cart = [
  '0000FF', # rice (1)  # green
  '008000' #  non rice (0) # blue
]

Map_cart.addLayer(classified_cart.clip(roi_cart), {'palette':landcoverPallete_cart, 'min':0, 'max':1}, 'Classified Layer')

confusionMatrix_cart= ee.ConfusionMatrix(testSet_cart.classify(classifier_cart)
  .errorMatrix(**{
    'actual':'id',
    'predicted':'classification'
  }))
print('Confusion Matrix:',confusionMatrix_cart.getInfo())
print('Overall Accuracy:',confusionMatrix_cart.accuracy().getInfo())
print('Producer Accuracy:',confusionMatrix_cart.producersAccuracy().getInfo())
print('Kappa Accuracy:',confusionMatrix_cart.kappa().getInfo())

Map_cart


['OBJECTID', 'OID_', 'UniqueID', 'Crops', 'SurveyDate', 'Latitude', 'id', 'Longitude', 'Feature', 'system:index']
{'Crops': 'Non-Rice', 'Feature': 'Orchard', 'Latitude': 24.0730552112, 'Longitude': 89.9287860076, 'OBJECTID': 1, 'OID_': 1, 'UniqueID': 300, 'id': 0, 'system:index': '00000000000000000000'}
Confusion Matrix: [[2525, 178], [170, 722]]
Overall Accuracy: 0.9031988873435327
Producer Accuracy: [[0.9341472438031817], [0.8094170403587444]]
Kappa Accuracy: 0.7413369611962024


Map(center=[23.845740439422947, 90.28182335177775], controls=(WidgetControl(options=['position', 'transparent_…

## Random Forest

In [39]:
# import geemap
# geemap.js_snippet_to_py(js_snippet, add_new_cell=True,import_geemap=True,show_map=True)

In [40]:
Map_rf = geemap.Map()
countries_rf= ee.FeatureCollection('USDOS/LSIB_SIMPLE/2017')
roi_rf= countries_rf.filter(ee.Filter.eq('country_na','Bangladesh'))
Map_rf.addLayer(roi_rf,{},'Bangladesh')

image_rf= ee.ImageCollection("COPERNICUS/S2_SR") \
  .filterDate('2020-12-01','2021-03-31') \
  .filter(ee.Filter.lt('CLOUDY_PIXEL_PERCENTAGE',20)) \
  .filterBounds(roi_rf) \
  .median()

vizparams_rf= {'bands':['B4', 'B3', 'B2'], 'min':0, 'max':2500, 'gamma':1.1}
Map_rf.addLayer(image_rf.clip(roi_rf),vizparams_rf,'Sentinel-2')
Map_rf.centerObject(roi_rf,7)


fc_rf=ee.FeatureCollection('users/hrahman/FieldData_Rice_NonRice')
training_rf=fc_rf

label_rf= "id"
bands_rf = ['B2', 'B3', 'B4', 'B8']
# bands = ['B2', 'B3', 'B4', 'B8']

input= image_rf.select(bands_rf)

trainingImage_rf=input.sampleRegions(**{
  'collection':training_rf,
  'properties':[label_rf],
  'scale':30
})

trainingData_rf=trainingImage_rf.randomColumn()
trainingSet_rf=trainingData_rf.filter(ee.Filter.lessThan('random',0.7))
testSet_rf=trainingData_rf.filter(ee.Filter.greaterThanOrEquals('random',0.7))
#print(trainingSet)
#print(testSet)

#ee.Classifier.smileRandomForest(numberOfTrees, variablesPerSplit, minLeafPopulation, bagFraction, maxNodes, seed)

# Making a Random Forest classifier and training it.
classifier_rf= ee.Classifier.smileRandomForest(10).train(**{
  'features': trainingImage_rf,
  'classProperty': 'id',
  'inputProperties': bands_rf
})
classified_rf=input.classify(classifier_rf)
#print(classified)

landcoverPallete_rf = [
  '0000FF', # rice (1)  # green
  '008000' #  non rice (0) # blue
]

Map_rf.addLayer(classified_rf.clip(roi_rf), { 'min':0, 'max':1, 'palette':landcoverPallete_rf}, 'Classified Layer')

confusionMatrix_rf= ee.ConfusionMatrix(testSet_rf.classify(classifier_rf) \
  .errorMatrix(**{
    'actual':'id',
    'predicted':'classification'
  }))
print('Confusion Matrix',confusionMatrix_rf.getInfo())
print('Overall Accuracy:',confusionMatrix_rf.accuracy().getInfo())
print('Producer Accuracy:',confusionMatrix_rf.producersAccuracy().getInfo())
print('Kappa Accuracy:',confusionMatrix_rf.kappa().getInfo())
Map_rf


Confusion Matrix [[2685, 18], [35, 857]]
Overall Accuracy: 0.9852573018080668
Producer Accuracy: [[0.9933407325194229], [0.9607623318385651]]
Kappa Accuracy: 0.9602336703632474


Map(center=[23.845740439422947, 90.28182335177775], controls=(WidgetControl(options=['position', 'transparent_…

### 3.2 Discussion and interpretation of the results
For CART algorithm, we got 90.3% overall accuracy whereas for Random Forest we got 98.5% accuracy. From the output map also we observe the similar reflection that the random forest algorithm delineates the boro rice area more accurately. We got this result using 70% data for training and 30% for testing for both of the models. We ran the model using 80-20 ratio and observed slightly more accuracy for both algorithms. If we look other accuracy measurements like producer accuracy and kappa accuracy, then also it is clear that the random forest algorithm delineates boro rice area more accurately. However if more samples with non-rice area could be used, the model accuracy would be higher than the present. The different measurements of model output are as follows: 

#### Accuracy measurement of CART
Confusion Matrix: [2525, 178], [170, 722]

Overall Accuracy: 0.9031988873435327

Producer Accuracy: [0.9341472438031817], [0.8094170403587444]

Kappa Accuracy: 0.7413369611962024

#### Accuracy measurement of Random Forest
Confusion Matrix [2685, 18], [35, 857]

Overall Accuracy: 0.9852573018080668

Producer Accuracy: [0.9933407325194229], [0.9607623318385651]

Kappa Accuracy: 0.9602336703632474

## 4. Conclusion
From the above discussion it is clear that random forest algorithm is the best one for boro rice area mapping as it produces almost 98.5% accuracy in our experiment. We can utilize random forest algorithm for boro rice area mapping in Bangladesh. Here we used Sentinel-2 image of 10 meter resolution which also plays important role in good output for boro rice area mapping. However there is scope to work with other models like SVM, ANN etc. and compare their outputs. There is also future scope to work with Sentinel-2 data and random forest algorithm for crop type mapping for Bangladesh. Again this methodology can be tested for other rice season like aus and aman in Bangladesh.