# Satellite Image Classification Project
## Project Overview
This script build and validate a classification model using satellite image data.
## Introduction
There are three different classification models applied:
1. Random Forest
2. Extra Tree
3. Bagging

Performance of each  model will be assesed visually by confusion matrix. Further preformance report is printed followed by Kohen Kappa Score values.

The analysis were carried out in the following steps:
1. Data from file „labels.csv” was read
2. Labels was numbered from 0 to 9 to create vector for classification
3. Columns: *x*, *y*, *band1*, *band2*, *band3*, *band4*, *band5*, *band6* was used as samples to train and test each of the models; total size of samples and split to train / test data is reported in appendices
4. Confusion matrices are calculated and presented in plots (see Figure 1÷3)
5. Train samples was fit to each classification model and metrics such as precision, recall for ech category/label and average accuracy as well as macro and weighted average of all categories are reported in appendices
6. Cohen kappa score was calculated from prediction and test data to asses accuracy for selection the best fitted classification model
7. File „satellite_image.csv” including satellite image (coordinates and eight bands) was read and all columns were applied to prediction models
8. Satellite image was plotted in new coordinate system with color brightness read form band4 (red), band3 (green) and band2 (blue) - origin of original coordinate system was set to 0,0 (see Figure 4)
9. On the top of plotted satellite image the categories/labels as points was overlaid; color scheme was selected the same like for predicted land cover maps
10. Each category was designated by color and predicted categories for each pixel were plotted as a land cover map (see Figures 5÷7)


In [1]:
%matplotlib inline

## Python modules in use
To build and validate model performance some libraries were necessary. Version of libriaries working with this scirpt are included in *requirements.txt* [file](https://github.com/pciuh/satellite-image-classification/blob/main/requirements.txt)

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy.interpolate as sci
import seaborn as sns

from matplotlib import cm
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier, ExtraTreesClassifier, BaggingClassifier
from sklearn.datasets import make_classification
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import classification_report, confusion_matrix, cohen_kappa_score

## Reading files for classification

File *labels.csv* is read to pandas DataFrame object. File include *band* features numbered from *1* to *6*.

In [3]:
SEED = 30082024

iDir = 'input/'

fnam = 'labels.csv'
df = pd.read_csv(iDir + fnam,sep=',')
lbl = df.label.drop_duplicates().values
print('Labels:',lbl)

Labels: ['road1' 'road2' 'grass' 'water' 'roof1' 'roof2' 'roof3' 'tree' 'shadow'
 'soil']


In [4]:
_,num = np.unique(df.label,return_counts=True) ### Count number of labels
variables = df.drop(['label','band7','band8'],axis=1).columns.values
print('Features:',variables)

Features: ['x' 'y' 'band1' 'band2' 'band3' 'band4' 'band5' 'band6']


All above variables are to be features of classification model and assigned to the matix *X*.

In [5]:
X = df[variables].values

The output vector *y* will be built on *lbl* variable from imported database. Below part of code assign integer numbers for each label.

In [6]:
y = []
for i,l in enumerate(lbl):
    for ni in range(num[i]):
        y=np.append(y,i)

In [7]:
TIT = {'RF' : 'Random Forest', 'ET' : 'Extra Tree', 'BA' : 'Bagging'}

mvec = [ExtraTreesClassifier(),
        RandomForestClassifier(),
        BaggingClassifier()]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.1, random_state = SEED)

#print('Total Size:',X.shape,y.shape)
#print('Train Size:',X_train.shape, y_train.shape)
#print(' Test Size:',X_test.shape, y_test.shape)

In [8]:
pvec = []
for v in mvec:
    v.fit(X_train, y_train)
    pvec.append(v.predict(X_test))

In [13]:
ofnam = 'class_report-%.8d.txt'%SEED
of = open(ofnam,'w')
of.write('            %10s%10s%10s\n'%('Samples','Category','Outcome'))
of.write('Total Size: %10.0f%10.0f%10.0f\n'%(X.shape[0],X.shape[1],y.shape[0]))
of.write('Train Size: %10.0f%10.0f%10.0f\n'%(X_train.shape[0],X_train.shape[1],y_train.shape[0]))
of.write(' Test Size: %10.0f%10.0f%10.0f\n'%(X_test.shape[0],X_test.shape[1],y_test.shape[0]))

p_crf,p_cet,p_cba = pvec
mNam = ['RF','ET','BA']

of.write('\n%36s\n'%'Model Performace')
print('\nModel Performance')
cfm = []
for i,v in enumerate(pvec):
    print('\n%18s:'%TIT[mNam[i]])
    print(classification_report(y_test, v, target_names=lbl))
    print('Kappa Score:',cohen_kappa_score(y_test, v))

    of.write('\n%14s:\n'%TIT[mNam[i]])
    of.write(classification_report(y_test, v, target_names=lbl))
    of.write('\nKappa Score:%12.3f\n'%(cohen_kappa_score(y_test,v))) 

    cfm.append(confusion_matrix(y_test, v))
of.close()


Model Performance

     Random Forest:
              precision    recall  f1-score   support

       road1       1.00      0.50      0.67         2
       road2       0.50      1.00      0.67         3
       grass       1.00      0.75      0.86         4
       water       1.00      1.00      1.00         1
       roof1       1.00      0.50      0.67         2
       roof2       0.50      0.50      0.50         2
       roof3       1.00      1.00      1.00         2
        tree       0.86      1.00      0.92         6
      shadow       0.80      0.57      0.67         7
        soil       0.25      0.33      0.29         3

    accuracy                           0.72        32
   macro avg       0.79      0.72      0.72        32
weighted avg       0.78      0.72      0.72        32

Kappa Score: 0.6771300448430493

        Extra Tree:
              precision    recall  f1-score   support

       road1       1.00      1.00      1.00         2
       road2       1.00      1.00      

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


## Data visualization