![](http://i67.tinypic.com/2jcbwcw.png)

# Project Ocean Trash

## Neural Network of Marine Debris data

**Author:** Jan Xu

**Date:** Dec 1 2018

### Import modules and visualization packages

In [1]:
# Suppress TensorFlow and Keras warnings for cleaner output
import warnings
warnings.simplefilter("ignore")

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf
import keras

from sklearn import datasets
from sklearn.model_selection import train_test_split

from keras.models import Sequential
from keras.layers import Dense, BatchNormalization, Dropout, Conv2D, MaxPooling2D, Flatten

%matplotlib inline

Using TensorFlow backend.


In [2]:
dataclass = pd.read_csv("Datasets/dataclass.csv")
dataclass.head()

Unnamed: 0,Date,X,Y,Debris,Trash1,NoTrash2
0,07/07/2012,-124.566667,48.383333,0,0,0
1,07/07/2012,-124.016667,48.283333,0,0,0
2,07/07/2012,-124.033333,48.316667,0,0,0
3,07/07/2012,-124.35,48.3,1,0,0
4,07/07/2012,-126.183333,44.9,1,0,0


In [3]:
# Keep rows that have no satellite imagery
newdata = dataclass.loc[dataclass[dataclass["Trash1"]==1].index].append(dataclass.loc[dataclass[dataclass["NoTrash2"]==1].index])
newdata.rename({479:0, 618:1}, inplace=True)
newdata.head()

Unnamed: 0,Date,X,Y,Debris,Trash1,NoTrash2
0,06/08/2012,-154.566667,27.333333,1,1,0
1,08/01/2011,-155.5,25.5333,0,0,1


In [5]:
# Import satellite imagery and normalize (divide by 18)

T_band6 = pd.read_csv("Satellite Data ASTER/Trash1/band6.csv").values // 18
T_band7 = pd.read_csv("Satellite Data ASTER/Trash1/band7.csv").values // 18
NT_band6 = pd.read_csv("Satellite Data ASTER/NoTrash2/band6.csv").values // 18
NT_band7 = pd.read_csv("Satellite Data ASTER/NoTrash2/band7.csv").values // 18

In [6]:
T_band6.shape

(2496, 2815)

### For DNN, reshape image shape to 1-D vector and concatenate band 6 with band 7 (as well as coordinates)

For training purposes, the band 6 image is combined with the band 7 image, which will together with coordinates be the input features in our NN model.

In [7]:
image1 = np.concatenate([[newdata["X"][0], newdata["Y"][0]], T_band6.reshape(2496*2815,), T_band7.reshape(2496*2815,)])
image2 = np.concatenate([[newdata["X"][1], newdata["Y"][1]], NT_band6.reshape(2496*2815,), NT_band7.reshape(2496*2815,)])
image1.shape

(14052482,)

In [8]:
# Create input vector
x = np.stack([image1, image2])
y = pd.get_dummies(newdata['Debris'].values).values
print(x.shape)
print(y.shape)

(2, 14052482)
(2, 2)


In [9]:
# Normally we would do a train test split here, but since we only have two data points that seems a bit pathetic

## Sequential DNN model

In [33]:
# model initialization
model = Sequential() # instantiate empty Sequential model

# model contruction (architecture build computational graph)
model.add(Dense(units=50, activation='sigmoid'))
model.add(Dropout(0.5))
model.add(Dense(units=20, activation='sigmoid'))
model.add(Dropout(0.3))
model.add(Dense(units=2, activation='sigmoid'))

# model compilation
model.compile(loss = 'categorical_crossentropy',
             optimizer = 'sgd',
             metrics = ['accuracy'])

In [23]:
# Fit the model by iterating over the training data in batches

history = model.fit(x, y, epochs = 10, batch_size= 2)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


Obviously DNN is very inefficient here for large images which are flattened to extremely long input vectors.

In [28]:
score = model.evaluate(x, y, verbose=0)
print('Loss:', score[0])
print('Accuracy:', score[1])

Loss: 0.4487477242946625
Accuracy: 1.0


Again, this is just to emphasize that all of this is just a proof of concept, and not a futile attempt to perform a neural network analysis with two pieces of inputs.

## Now try a CNN, where image shapes are retained

For this model, ignore the coordinates as an input feature

In [10]:
# Create input vector (based only on band 6 data)
xCNN = np.stack([T_band6, NT_band6]).reshape(2, 2496, 2815, 1)
xCNN.shape

(2, 2496, 2815, 1)

In [20]:
# Almost LeNet architecture
model = Sequential()
model.add(Conv2D(16, kernel_size=(20, 20),
                 strides=(8,5),
                 activation='sigmoid',
                 input_shape=(2496, 2815, 1)))

model.add(Conv2D(32, (20, 20), activation='sigmoid'))
model.add(MaxPooling2D(pool_size=(8, 5)))
model.add(Dropout(0.25))

model.add(Flatten())

model.add(Dense(16, activation='sigmoid'))
model.add(Dropout(0.5))

model.add(Dense(2, activation='sigmoid'))

model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_7 (Conv2D)            (None, 310, 560, 16)      6416      
_________________________________________________________________
conv2d_8 (Conv2D)            (None, 291, 541, 32)      204832    
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 36, 108, 32)       0         
_________________________________________________________________
dropout_7 (Dropout)          (None, 36, 108, 32)       0         
_________________________________________________________________
flatten_4 (Flatten)          (None, 124416)            0         
_________________________________________________________________
dense_7 (Dense)              (None, 16)                1990672   
_________________________________________________________________
dropout_8 (Dropout)          (None, 16)                0         
__________

In [21]:
model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer='sgd',
              metrics=['accuracy'])

In [22]:
# Fit the model by iterating over the training data in batches

history = model.fit(xCNN, y, epochs=10, batch_size=2)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [23]:
# Model evaluation
score = model.evaluate(xCNN, y, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

Test loss: 0.6979650855064392
Test accuracy: 0.5
