<a href="https://colab.research.google.com/github/myspace-img/implementation-of-RSI-CB256-using-ml-algorithms/blob/main/SAT_Classification_Using_ML_Algo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ***Satellite image classification using Machine learning algorithms***

We have applied SVM classification algorithm, KNN classification algorithm and Decision tree algorithm for specific parameters on the given dataset.

- ***DATASET*** - Satellite image Classification Dataset-RSI-CB256 , This dataset has 4 different classes mixed from Sensors and google map snapshot
- The given satellite imagery based dataset consists of 4 classes i.e. "cloudy, desert, green_area and water" with sizes of 1500, 1131, 1500 and 1500 respectively.
- In order to classify these images using machine learning algorithms, we need certain set of features for the provided data. Hence, we have applied first order statistical feature extraction by condidering **mean** and **standard deviation** as main features for the provided data.
- The obtained accuracies have been considerable and will be observed once the cell is executed.





***NOTE*** - The drive link for the data is provided below, along with kaggle data link:
- Google Drive Link - https://drive.google.com/drive/folders/1UOGMtSbmvVZM8SJNiUpqh3IJR4pCZyO1?usp=sharing
- KAGGLE Link - https://www.kaggle.com/datasets/mahmoudreda55/satellite-image-classification


In [1]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [2]:
#Import Libraries
import os
from os import listdir
from skimage import io
import numpy as np
import pandas as pd
from sklearn.svm import SVC, LinearSVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, classification_report
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, Normalizer
import matplotlib.pyplot as plt
import cv2 as cv

In [3]:
#Statistical feature extraction
def stats(folder_dir):
    mean = []
    std = []
    y = []
    for folder in folder_dir:
      for images in os.listdir(folder):
        if images.endswith('.jpeg') or images.endswith('.jpg'):
            image = io.imread(folder+images)
            mean_temp = np.mean(image)
            std_temp = np.std(image)
            mean.append(mean_temp)
            std.append(std_temp)
            if folder[-2] == 'y':
                y.append(0)
            elif folder[-2] == 't':
                y.append(1)
            elif folder[-2] == 'a':
                y.append(2)
            else:
                y.append(3)
    return mean, std, y


In [4]:
folder_dir = ["/content/drive/MyDrive/satellite images/cloudy/", "/content/drive/MyDrive/satellite images/desert/", '/content/drive/MyDrive/satellite images/green_area/', '/content/drive/MyDrive/satellite images/water/']
for i in folder_dir:
  print(i)
mean, std, y = stats(folder_dir)
#1 Merge mean and std obtained from the previous function
data = pd.DataFrame()
data['mean'] = mean
data['std'] = std

/content/drive/MyDrive/satellite images/cloudy/
/content/drive/MyDrive/satellite images/desert/
/content/drive/MyDrive/satellite images/green_area/
/content/drive/MyDrive/satellite images/water/


# For Test Split 80-20 %

In [5]:
# Split data into train and test data
X_train, X_test, y_train, y_test = train_test_split(data, y, test_size=0.2, random_state=10)

##SVM Classification





###Kernel = **RBF**

In [6]:
#SVM Classifier, returns the accuracy
def svm_pred_rbf(X_train, y_train, X_test, y_test):
    np.random.seed(2)
    svm_model = SVC(kernel='rbf')
    svm_model.fit(X_train, y_train)
    svm_pred = svm_model.predict(X_test)
    svm_acc=accuracy_score(y_test, svm_pred)
    print(classification_report(y_test,svm_pred))
    return svm_acc * 100

In [7]:
print("ACCURACY:",svm_pred_rbf(X_train, y_train, X_test, y_test), "%")


              precision    recall  f1-score   support

           0       0.97      0.74      0.84       314
           1       0.77      0.98      0.86       224
           2       0.77      0.95      0.85       300
           3       0.86      0.71      0.78       289

    accuracy                           0.83      1127
   macro avg       0.84      0.84      0.83      1127
weighted avg       0.85      0.83      0.83      1127

ACCURACY: 83.40727595385981 %


###Kernel = **Linear**

In [8]:
def svm_pred_linear(X_train, y_train, X_test, y_test):
    np.random.seed(3)
    svm_model = SVC(kernel='linear')
    svm_model.fit(X_train, y_train)
    svm_pred = svm_model.predict(X_test)
    svm_acc=accuracy_score(y_test, svm_pred)
    print(classification_report(y_test,svm_pred))
    return svm_acc * 100

In [9]:
print("ACCURACY:",svm_pred_linear(X_train, y_train, X_test, y_test), "%")

              precision    recall  f1-score   support

           0       0.71      0.73      0.72       314
           1       0.70      0.75      0.72       224
           2       0.80      0.82      0.81       300
           3       0.75      0.67      0.70       289

    accuracy                           0.74      1127
   macro avg       0.74      0.74      0.74      1127
weighted avg       0.74      0.74      0.74      1127

ACCURACY: 74.0905057675244 %


###Kernel = **Poly**

In [10]:
def svm_pred_poly(X_train, y_train, X_test, y_test):
    np.random.seed(4)
    svm_model = SVC(kernel='poly')
    svm_model.fit(X_train, y_train)
    svm_pred = svm_model.predict(X_test)
    svm_acc=accuracy_score(y_test, svm_pred)
    print(classification_report(y_test,svm_pred))
    print(X_test.shape)
    return svm_acc * 100

In [11]:
print("ACCURACY:", svm_pred_poly(X_train, y_train, X_test, y_test), "%")

              precision    recall  f1-score   support

           0       0.83      0.68      0.74       314
           1       0.69      0.82      0.75       224
           2       0.62      0.94      0.75       300
           3       0.77      0.39      0.52       289

    accuracy                           0.70      1127
   macro avg       0.73      0.71      0.69      1127
weighted avg       0.73      0.70      0.69      1127

(1127, 2)
ACCURACY: 70.1863354037267 %


##KNN Classification

In [12]:
for i in range(3,9):
  np.random.seed(5)
  knn = KNeighborsClassifier(n_neighbors=i)
  knn.fit(X_train, y_train)
  y_pred = knn.predict(X_test)
  print("Accuracy for {} neighbours:".format(i),accuracy_score(y_test, y_pred) * 100, "%")

Accuracy for 3 neighbours: 89.44099378881988 %
Accuracy for 4 neighbours: 88.55368234250221 %
Accuracy for 5 neighbours: 89.26353149955635 %
Accuracy for 6 neighbours: 88.99733806566104 %
Accuracy for 7 neighbours: 88.99733806566104 %
Accuracy for 8 neighbours: 89.26353149955635 %


##Decision Tree Classification

In [13]:
np.random.seed(6)
tree = DecisionTreeClassifier()
tree.fit(X_train, y_train)
y_pred = tree.predict(X_test)
print("Accuracy:",accuracy_score(y_test, y_pred)*100, "%")

Accuracy: 86.06921029281278 %


# Ensemble Learning for 80-20 Split

##Ensemble - Gradient Boosting Classifier 

In [14]:
from sklearn.ensemble import GradientBoostingClassifier
clf = GradientBoostingClassifier(n_estimators=100, learning_rate=1.0, max_depth=1, random_state=0).fit(X_train, y_train)
print("Score:",clf.score(X_test, y_test)*100,"%")
print("Accuracy:",accuracy_score(y_test, y_pred)*100, "%")

Score: 85.18189884649512 %
Accuracy: 86.06921029281278 %


## Ensemble - Histogram based gradient boosting classifier

In [15]:
from sklearn.ensemble import HistGradientBoostingClassifier
clf = HistGradientBoostingClassifier(max_iter=100).fit(X_train, y_train)
print("Score:",clf.score(X_test, y_test)*100,"%")
print("Accuracy:",accuracy_score(y_test, y_pred)*100, "%")

Score: 89.17480035492457 %
Accuracy: 86.06921029281278 %


##Ensemble - Random Forest

In [16]:
from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier(max_depth=2, random_state=0).fit(X_train, y_train)
print("Score:",clf.score(X_test, y_test)*100,"%")
print("Accuracy:",accuracy_score(y_test, y_pred)*100, "%")

Score: 78.34960070984916 %
Accuracy: 86.06921029281278 %


## Ensemble - Bagging Classifier

In [17]:
from sklearn.svm import SVC
from sklearn.ensemble import BaggingClassifier
clf = BaggingClassifier(base_estimator=SVC(),
                        n_estimators=10, random_state=0).fit(X_train, y_train)
print("Score:",clf.score(X_test, y_test)*100,"%")
print("Accuracy:",accuracy_score(y_test, y_pred)*100, "%")

Score: 83.6734693877551 %
Accuracy: 86.06921029281278 %


# For Test Split 70-30 %

In [18]:
# Split data into train and test data
X_train, X_test, y_train, y_test = train_test_split(data, y, test_size=0.3,random_state=11)

##SVM Classification





###Kernel = **RBF**

In [19]:
#SVM Classifier, returns the accuracy
def svm_pred_rbf(X_train, y_train, X_test, y_test):
    np.random.seed(20)
    svm_model = SVC(kernel='rbf')
    svm_model.fit(X_train, y_train)
    svm_pred = svm_model.predict(X_test)
    svm_acc=accuracy_score(y_test, svm_pred)
    print(classification_report(y_test,svm_pred))
    return svm_acc * 100
    

In [20]:
print("ACCURACY:",svm_pred_rbf(X_train, y_train, X_test, y_test), "%")



              precision    recall  f1-score   support

           0       0.95      0.75      0.84       474
           1       0.79      0.96      0.86       342
           2       0.70      0.98      0.82       404
           3       0.88      0.63      0.73       470

    accuracy                           0.81      1690
   macro avg       0.83      0.83      0.81      1690
weighted avg       0.84      0.81      0.81      1690

ACCURACY: 81.18343195266272 %


###Kernel = **Linear**

In [21]:
def svm_pred_linear(X_train, y_train, X_test, y_test):
    np.random.seed(21)
    svm_model = SVC(kernel='linear')
    svm_model.fit(X_train, y_train)
    svm_pred = svm_model.predict(X_test)
    svm_acc=accuracy_score(y_test, svm_pred)
    print(classification_report(y_test,svm_pred))
    return svm_acc * 100

In [22]:
print("ACCURACY:",svm_pred_linear(X_train, y_train, X_test, y_test), "%")

              precision    recall  f1-score   support

           0       0.70      0.79      0.74       474
           1       0.74      0.68      0.71       342
           2       0.73      0.84      0.78       404
           3       0.78      0.62      0.69       470

    accuracy                           0.73      1690
   macro avg       0.74      0.73      0.73      1690
weighted avg       0.74      0.73      0.73      1690

ACCURACY: 73.31360946745562 %


###Kernel = **Poly**

In [23]:
def svm_pred_poly(X_train, y_train, X_test, y_test):
    np.random.seed(22)
    svm_model = SVC(kernel='poly')
    svm_model.fit(X_train, y_train)
    svm_pred = svm_model.predict(X_test)
    svm_acc=accuracy_score(y_test, svm_pred)
    print(classification_report(y_test,svm_pred))
    return svm_acc * 100

In [24]:
print("ACCURACY:", svm_pred_poly(X_train, y_train, X_test, y_test), "%")

              precision    recall  f1-score   support

           0       0.82      0.71      0.76       474
           1       0.72      0.79      0.75       342
           2       0.55      0.98      0.70       404
           3       0.79      0.29      0.43       470

    accuracy                           0.68      1690
   macro avg       0.72      0.70      0.66      1690
weighted avg       0.73      0.68      0.65      1690

ACCURACY: 67.6923076923077 %


##KNN Classification

In [25]:
for i in range(3,9):
  np.random.seed(23)
  knn = KNeighborsClassifier(n_neighbors=i)
  knn.fit(X_train, y_train)
  y_pred = knn.predict(X_test)
  print("Accuracy for {} neighbours:".format(i),accuracy_score(y_test, y_pred) * 100, "%")

Accuracy for 3 neighbours: 88.34319526627219 %
Accuracy for 4 neighbours: 88.04733727810651 %
Accuracy for 5 neighbours: 89.05325443786982 %
Accuracy for 6 neighbours: 88.52071005917159 %
Accuracy for 7 neighbours: 88.99408284023669 %
Accuracy for 8 neighbours: 88.63905325443787 %


##Decision Tree Classification

In [26]:
np.random.seed(24)
tree = DecisionTreeClassifier()
tree.fit(X_train, y_train)
y_pred = tree.predict(X_test)
print("Accuracy:",accuracy_score(y_test, y_pred)*100, "%")

Accuracy: 85.73964497041419 %


##Ensemble - AdaBoost

In [27]:
from sklearn.ensemble import AdaBoostClassifier
clf = AdaBoostClassifier(n_estimators=100, random_state=0).fit(X_train, y_train)
print("Score:",clf.score(X_test, y_test)*100,"%")
print("Accuracy:",accuracy_score(y_test, y_pred)*100, "%")

Score: 53.66863905325444 %
Accuracy: 85.73964497041419 %


# Ensemble Learning for 70-30 split

## Ensemble - Gradient Boosting Classifier

In [28]:
from sklearn.ensemble import GradientBoostingClassifier
clf = GradientBoostingClassifier(n_estimators=100, learning_rate=1.0, max_depth=1, random_state=0).fit(X_train, y_train)
print("Score:",clf.score(X_test, y_test)*100,"%")
print("Accuracy:",accuracy_score(y_test, y_pred)*100, "%")

Score: 85.62130177514793 %
Accuracy: 85.73964497041419 %


## Ensemble  - Histogram based gradient boosting classifier

In [29]:
from sklearn.ensemble import HistGradientBoostingClassifier
clf = HistGradientBoostingClassifier(max_iter=100).fit(X_train, y_train)
print("Score:",clf.score(X_test, y_test)*100,"%")
print("Accuracy:",accuracy_score(y_test, y_pred)*100, "%")

Score: 87.6923076923077 %
Accuracy: 85.73964497041419 %


##Ensemble - Random Forest

In [30]:
from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier(max_depth=2, random_state=0).fit(X_train, y_train)
print("Score:",clf.score(X_test, y_test)*100,"%")
print("Accuracy:",accuracy_score(y_test, y_pred)*100, "%")

Score: 77.27810650887574 %
Accuracy: 85.73964497041419 %


##Ensemble - Bagging Classifier

In [31]:
from sklearn.svm import SVC
from sklearn.ensemble import BaggingClassifier
clf = BaggingClassifier(base_estimator=SVC(),
                        n_estimators=10, random_state=0).fit(X_train, y_train)
print("Score:",clf.score(X_test, y_test)*100,"%")
print("Accuracy:",accuracy_score(y_test, y_pred)*100, "%")

Score: 81.77514792899409 %
Accuracy: 85.73964497041419 %


##Ensemble - AdaBoost

In [32]:
from sklearn.ensemble import AdaBoostClassifier
clf = AdaBoostClassifier(n_estimators=100, random_state=0).fit(X_train, y_train)
print("Score:",clf.score(X_test, y_test)*100,"%")
print("Accuracy:",accuracy_score(y_test, y_pred)*100, "%")

Score: 53.66863905325444 %
Accuracy: 85.73964497041419 %


# For Test Split 60-40 %

In [33]:
# Split data into train and test data
X_train, X_test, y_train, y_test = train_test_split(data, y, test_size=0.4, random_state=12)

##SVM Classification





###Kernel = **RBF**

In [34]:
#SVM Classifier, returns the accuracy
def svm_pred_rbf(X_train, y_train, X_test, y_test):
    np.random.seed(30)
    svm_model = SVC(kernel='rbf')
    svm_model.fit(X_train, y_train)
    svm_pred = svm_model.predict(X_test)
    svm_acc=accuracy_score(y_test, svm_pred)
    print(classification_report(y_test,svm_pred))
    return svm_acc * 100

In [35]:
print("ACCURACY:",svm_pred_rbf(X_train, y_train, X_test, y_test), "%")


              precision    recall  f1-score   support

           0       0.93      0.71      0.81       585
           1       0.76      0.95      0.84       455
           2       0.72      0.98      0.83       592
           3       0.91      0.63      0.75       621

    accuracy                           0.81      2253
   macro avg       0.83      0.82      0.81      2253
weighted avg       0.84      0.81      0.81      2253

ACCURACY: 81.04749223257879 %


###Kernel = **Linear**

In [36]:
def svm_pred_linear(X_train, y_train, X_test, y_test):
    np.random.seed(31)
    svm_model = SVC(kernel='linear')
    svm_model.fit(X_train, y_train)
    svm_pred = svm_model.predict(X_test)
    svm_acc=accuracy_score(y_test, svm_pred)
    print(classification_report(y_test,svm_pred))
    return svm_acc * 100

In [37]:
print("ACCURACY:",svm_pred_linear(X_train, y_train, X_test, y_test), "%")

              precision    recall  f1-score   support

           0       0.68      0.71      0.69       585
           1       0.68      0.71      0.69       455
           2       0.73      0.81      0.77       592
           3       0.75      0.62      0.68       621

    accuracy                           0.71      2253
   macro avg       0.71      0.71      0.71      2253
weighted avg       0.71      0.71      0.71      2253

ACCURACY: 70.97203728362183 %


###Kernel = **Poly**

In [38]:
def svm_pred_poly(X_train, y_train, X_test, y_test):
    np.random.seed(32)
    svm_model = SVC(kernel='poly')
    svm_model.fit(X_train, y_train)
    svm_pred = svm_model.predict(X_test)
    svm_acc=accuracy_score(y_test, svm_pred)
    print(classification_report(y_test,svm_pred))
    return svm_acc * 100

In [39]:
print("ACCURACY:", svm_pred_poly(X_train, y_train, X_test, y_test), "%")

              precision    recall  f1-score   support

           0       0.77      0.66      0.71       585
           1       0.68      0.76      0.72       455
           2       0.59      0.97      0.73       592
           3       0.84      0.34      0.49       621

    accuracy                           0.68      2253
   macro avg       0.72      0.69      0.66      2253
weighted avg       0.72      0.68      0.66      2253

ACCURACY: 67.64314247669773 %


##KNN Classification

In [40]:
for i in range(3,9):
  np.random.seed(33)
  knn = KNeighborsClassifier(n_neighbors=i)
  knn.fit(X_train, y_train)
  y_pred = knn.predict(X_test)
  print("Accuracy for {} neighbours:".format(i),accuracy_score(y_test, y_pred) * 100, "%")

Accuracy for 3 neighbours: 86.64003550821128 %
Accuracy for 4 neighbours: 87.12827341322681 %
Accuracy for 5 neighbours: 87.0838881491345 %
Accuracy for 6 neighbours: 86.86196182867289 %
Accuracy for 7 neighbours: 86.9063470927652 %
Accuracy for 8 neighbours: 86.99511762094984 %


##Decision Tree Classification

In [41]:
np.random.seed(34)
tree = DecisionTreeClassifier()
tree.fit(X_train, y_train)
y_pred = tree.predict(X_test)
print("Accuracy:",accuracy_score(y_test, y_pred)*100, "%")

Accuracy: 84.77585441633377 %


# Ensemble Learning for 60-40 split

## Ensemble - Gradient Boosting Classifier


In [42]:
from sklearn.ensemble import GradientBoostingClassifier
clf = GradientBoostingClassifier(n_estimators=100, learning_rate=1.0, max_depth=1, random_state=0).fit(X_train, y_train)
print("Score:",clf.score(X_test, y_test)*100,"%")
print("Accuracy:",accuracy_score(y_test, y_pred)*100, "%")

Score: 83.3555259653795 %
Accuracy: 84.77585441633377 %


## Ensemble - Histogram based gradient boosting classifier

In [43]:
from sklearn.ensemble import HistGradientBoostingClassifier
clf = HistGradientBoostingClassifier(max_iter=100).fit(X_train, y_train)
print("Score:",clf.score(X_test, y_test)*100,"%")
print("Accuracy:",accuracy_score(y_test, y_pred)*100, "%")

Score: 86.28495339547271 %
Accuracy: 84.77585441633377 %


## Ensemble - Bagging Classfier

In [44]:
from sklearn.svm import SVC
from sklearn.ensemble import BaggingClassifier
clf = BaggingClassifier(base_estimator=SVC(),
                        n_estimators=10, random_state=0).fit(X_train, y_train)
print("Score:",clf.score(X_test, y_test)*100,"%")
print("Accuracy:",accuracy_score(y_test, y_pred)*100, "%")

Score: 81.0918774966711 %
Accuracy: 84.77585441633377 %


## Ensemble - Random Forests


In [45]:
from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier(max_depth=2, random_state=0).fit(X_train, y_train)
print("Score:",clf.score(X_test, y_test)*100,"%")
print("Accuracy:",accuracy_score(y_test, y_pred)*100, "%")

Score: 75.14425210830005 %
Accuracy: 84.77585441633377 %


## Ensemble - AdaBoost


In [46]:
from sklearn.ensemble import AdaBoostClassifier
clf = AdaBoostClassifier(n_estimators=100, random_state=0).fit(X_train, y_train)
print("Score:",clf.score(X_test, y_test)*100,"%")
print("Accuracy:",accuracy_score(y_test, y_pred)*100, "%")

Score: 64.0479360852197 %
Accuracy: 84.77585441633377 %
