Objective: Create an AI system that can tell the latitude and longitude coordinates of a satellite image, given data of satellite images and their respective coordinates

Category: Supervised Learning, Regression Problem, Image Recognition and Processing, Convolutional Neural Networks

First, we shall import the relevant modules.

numpy: linear algebra  
pandas: data processing  
PIL, cv2: image manipulation  
re (regex): natural sorting  
matplotlib: data visualization  
math: calculate distance  
random: randomization

We will be using the Keras library for our convolutional neural network.

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

import numpy as np
import pandas as pd
import PIL.Image
import cv2
import re
import matplotlib.pyplot as plt
import random
import math

from keras.optimizers import Adam
from keras.models import Sequential
from keras.layers.normalization import BatchNormalization
from keras.layers.convolutional import Conv2D
from keras.layers import MaxPool2D
from keras.layers.core import Flatten
from keras.layers.core import Dropout
from keras.layers.core import Dense
from keras.callbacks import EarlyStopping

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# Any results you write to the current directory are saved as output.

Next we will be defining functions to help with the code.

In [None]:
#this function orders the images using natural sort
#normal sort order: img01, img135, img296, img37
#natural sorted order: img01, img37, img296, img135
def natural_key(astr):
    """See http://www.codinghorror.com/blog/archives/001018.html"""
    return [int(s) if s.isdigit() else s for s in re.split(r'(\d+)', astr)]

#helper function to get an ndarray from image files
def load_image_array(directory):
    dfa = []
    filelist = sorted(os.listdir(directory), key=natural_key)
    for filename in filelist:
        path = directory + '/' + filename
        image = PIL.Image.open(path) # open colour image
        image = np.array(image)
        image = image[:, :, :3] #remove transparency layer (alpha)
        dimension=(200,200) #specify number of pixels
        rimage = cv2.resize(image, dimension, interpolation=cv2.INTER_AREA) #resize images to specified size
        dfa.append(rimage)
    dfnew = np.array(dfa) #convert list to ndarray
    
    return dfnew

# get loss for training set and validation set
# plot for each epoch
def plot_loss(num, history):
    plt.plot(history.history['loss'])
    plt.plot(history.history['val_loss'])
    plt.title('model ' + str(num) + ' loss')
    plt.ylabel('loss')
    plt.xlabel('epoch')
    plt.legend(['train', 'val'], loc='upper left')
    plt.show()

#haversine function to calculate distance
#shortest distance above earth's surface
def haversine(n1, w1, n2, w2):
    earthrad = 6371000
    phi_1 = math.radians(n1)
    phi_2 = math.radians(n2)
    phi_change = math.radians(n2 - n1)
    lambda_change = math.radians(w1 - w2) #(reversed because west is negative)
    
    a = math.sin(phi_change/2) ** 2 + math.cos(phi_1) * math.cos(phi_2) * math.sin(lambda_change/2) ** 2
    c = 2 * math.atan2(a ** 0.5, (1 - a) ** 0.5)
    d = earthrad * c
    return d

#distance between top left and bottom right coordinates of area covered (maximum distance)
print(haversine(40.57770678, 73.74077701, 40.90365244, 74.03151652))

In [None]:
#start actual code
trainDirectory = '../input/train-images/trainimg'
testDirectory = '../input/test-images/imagestest'
trainX = load_image_array(trainDirectory)
testX = load_image_array(testDirectory)
trainY = pd.read_csv('../input/train-coordinates/traincoord.txt', sep=" ", header=None)
trainY.columns = ['North', 'West']
testY = pd.read_csv('../input/test-coordinates/testcoord.txt', sep=" ", header=None)
testY.columns = ['North', 'West']

maxNorth = trainY['North'].max()
minNorth = trainY['North'].min()
maxWest = trainY['West'].max()
minWest = trainY['West'].min()
print(maxNorth)
print(minNorth)
print(maxWest)
print(minWest)

Our dataset is made of satellite images, and their respective coordinates. The satellite images are taken from Google Earth, and they cover the coordinates of New York. They are square images that cover a side length of approximately 1700 metres. The coordinate recorded is that of the center of the satellite image.

Train Images: 400 images (200px by 200px)  
Test Images: 100 images (200px by 200px)

Since the scope of our problem description requires the model to find the image coordinates, we need the model to see other images surrounding the area to identify the features in the area. Therefore, the train images will cover a predefined area of New York, while the test images need to be chosen within that area, so that the model will be accurate.

Here is a sample training instance (image and its respective coordinates).

In [None]:
randnum = random.randrange(400)
if randnum < 10:
    sample_image_path = trainDirectory + '/img0' + str(randnum) + '.PNG'
else:
    sample_image_path = trainDirectory + '/img' + str(randnum) + '.PNG'
sample_image = PIL.Image.open(sample_image_path)
plt.figure()
plt.imshow(sample_image) 
plt.show()  # display it
print(trainY.iloc[randnum])

#note that we have used the original image to show here, so the pixel size is different from (200,200)

We are using a Convolutional Neural Network to train the model. Since this is a regression problem, we will be adding a fully connected layer with linear activation (ReLU: rectified linear unit).  
We will be looking at 3 possible models, and we will find the best model to train.

Model 1: Conv2D (7x7) -> Conv2D (3x3) -> BatchNormalization -> MaxPool2D (2x2)

In [None]:
model1 = Sequential()
filters = [16, 32, 64]

for f in filters:
    model1.add(Conv2D(f, kernel_size=7, activation='relu', input_shape=(200,200,3), padding='same'))
    model1.add(Conv2D(f, kernel_size=3, activation='relu', input_shape=(200,200,3), padding='same'))
    model1.add(BatchNormalization())
    model1.add(MaxPool2D((2,2)))

model1.add(Flatten())
model1.add(Dense(3, activation='relu'))
model1.add(BatchNormalization())
model1.add(Dropout(0.25))
model1.add(Dense(2, activation='relu'))

Model 2: Conv2D (3x3) -> BatchNormalization -> MaxPool2D (2x2)

In [None]:
model2 = Sequential()
filters = [16, 32, 64]

for f in filters:
    model2.add(Conv2D(f, kernel_size=3, activation='relu', input_shape=(200,200,3), padding='same'))
    model2.add(BatchNormalization())
    model2.add(MaxPool2D((2,2)))

model2.add(Flatten())
model2.add(Dense(3, activation='relu'))
model2.add(BatchNormalization())
model2.add(Dropout(0.25))
model2.add(Dense(2, activation='relu'))

Model 3: Conv2D (3x3) -> BatchNormalization -> Conv2D (3x3, strides=(2,2))

In [None]:
model3 = Sequential()
filters = [16, 32, 64]

for f in filters:
    model3.add(Conv2D(f, kernel_size=3, activation='relu', input_shape=(200,200,3), padding='same'))
    model3.add(BatchNormalization())
    model3.add(Conv2D(f, kernel_size=3, strides=(2,2), activation='relu', input_shape=(200,200,3), padding='same'))

model3.add(Flatten())
model3.add(Dense(3, activation='relu'))
model3.add(BatchNormalization())
model3.add(Dropout(0.25))
model3.add(Dense(2, activation='relu'))

In [None]:
#scaling coordinates from zero to one for better performance
maxNorth = trainY['North'].max()
minNorth = trainY['North'].min()
diffNorth = maxNorth - minNorth
maxWest = trainY['West'].max()
minWest = trainY['West'].min()
diffWest = maxWest - minWest

trainY['North'] = (trainY['North'] - minNorth) / diffNorth
trainY['West'] = (trainY['West'] - minWest) / diffWest
testY['North'] = (testY['North'] - minNorth) / diffNorth
testY['West'] = (testY['West'] - minWest) / diffWest

es1 = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=10)
es2 = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=10)
es3 = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=10)

model1.compile(loss='mean_squared_error', optimizer='adam')
model2.compile(loss='mean_squared_error', optimizer='adam')
model3.compile(loss='mean_squared_error', optimizer='adam')

In [None]:
history1 = model1.fit(trainX, trainY, validation_data=(testX, testY), epochs=100, batch_size=10, callbacks=[es1])

In [None]:
history2 = model2.fit(trainX, trainY, validation_data=(testX, testY), epochs=100, batch_size=10, callbacks=[es2])

In [None]:
history3 = model3.fit(trainX, trainY, validation_data=(testX, testY), epochs=100, batch_size=10, callbacks=[es3])

In [None]:
# get loss for training set and validation set
# plot for each epoch    
plot_loss(1, history1)

In [None]:
plot_loss(2, history2)

In [None]:
plot_loss(3, history3)

In [None]:
pred1 = model1.predict(testX)
pred2 = model2.predict(testX)
pred3 = model3.predict(testX)

In [None]:
pred1[:,0] = pred1[:,0] * diffNorth + minNorth
pred1[:,1] = pred1[:,1] * diffWest + minWest
pred2[:,0] = pred2[:,0] * diffNorth + minNorth
pred2[:,1] = pred2[:,1] * diffWest + minWest
pred3[:,0] = pred3[:,0] * diffNorth + minNorth
pred3[:,1] = pred3[:,1] * diffWest + minWest
testY['North'] = testY['North'] * diffNorth + minNorth
testY['West'] = testY['West'] * diffWest + minWest

In [None]:
#use haversine formula to calculate distance
dist1 = np.empty(100)
dist2 = np.empty(100)
dist3 = np.empty(100)
for count in range(100):
    dist1[count] = haversine(pred1[count,0], pred1[count,1], testY['North'].iloc[count], testY['West'].iloc[count])
    dist2[count] = haversine(pred2[count,0], pred2[count,1], testY['North'].iloc[count], testY['West'].iloc[count])
    dist3[count] = haversine(pred3[count,0], pred3[count,1], testY['North'].iloc[count], testY['West'].iloc[count])

print(dist1[:5])
print(dist2[:5])
print(dist3[:5])

avg1 = dist1.mean()
avg2 = dist2.mean()
avg3 = dist3.mean()

std1 = dist1.std()
std2 = dist2.std()
std3 = dist3.std()

print(str(avg1) + ' ' + str(std1))
print(str(avg2) + ' ' + str(std2))
print(str(avg3) + ' ' + str(std3))

From the above, we found out that Model 1 has an average runtime of 38 seconds per epoch.

The average distance between the predicted point and actual coordinates is about 8000 metres. The distance of the side length of the image is 1700 metres, while the maximum distance possible is 40000 metres. The average distance between two random points within the area is about 10 to 15 kilometres.  
Although the error was quite big, it was due to the fact that the data given to the model is less, and the model will improve its accuracy given more data. It is successful, as it is a few times better than purely random guessing.

Limitations: There was no data of images labelled with their coordinates, so we had to manually extract the data from Google Earth using Snipping Tool. Moreover, all the images had slightly different number of pixels, so resizing them may have caused the pixels to be different, affecting our results.  
The area covered was quite small (about 20km * 35km), so the most difference between coordinates was about 0.5 degrees for latitude and 0.3 degrees for longitude. Therefore, the errors between the predicted output and actual output were magnified, and the accuracy of the model decreased.

Conclusions and Recommendations: The model was reasonably accurate for the data that we had, and it will be able to perform with a better accuracy when given more data.

Future Work: This project can be extended further, when we use photos taken from smartphones as the images. This will increase the usability of the model, as it is more applicable in the real world where billions of photos are present. Some real time applications include finding where a person is, when asking them to take a picture of their whereabouts. In this case, we can have another output attribute be the landmark nearest to the coordinates, so that people can easily track the landmark while also having the coordinates. This application will be very useful, due to its high usability.