# Description

* The East Coast of Canada :  Drifting iceberg, threating to navigation and activities
* Monitoring not easy using aerial, shore-based support ( harsh weather, not feasible )
* Built a Computer vison based surveillance system.( C-CORE platform by Statoil co.)
* Competion : Have to challenge to build an algorithm ( automatically identifies ) 

# Evaluation

* Evaluation : based on log loss between the predicted values and the truth
=> http://wiki.fast.ai/index.php/Log_Loss 
* Submissio File : 0.5 => 0 ~ 1 and probability

```
id, is_iceberg
3aa99a38,0.5
etc.
```

# Background

* Satellite

```
- bounce a signal off an object and records the echo ( called backscatter )
- the recoded data is translated into an image
- Sentinel-1 satellite : a side looking radar
```

![Side Looking radar](https://www.radartutorial.eu/20.airborne/pic/SLAR-resolution.jpg)

```
- Sentinel-1 : transmit and receive in the horizontal and vertical plan => daul-polarization
- High winds will generate a brighter background, low wind will be a darker background.
```

![ICEBERGS](https://storage.googleapis.com/kaggle-media/competitions/statoil/NM5Eg0Q.png) 

* Data with two channel

![HV](https://storage.googleapis.com/kaggle-media/competitions/statoil/lhYaHT0.png)
```
HH ( transmit/receive horizontally)
HV ( transmit hoorizontally and receive vertically)
```


---
* Can be visually classified

![EasyDetect1](https://storage.googleapis.com/kaggle-media/competitions/statoil/8ZkRcp4.png)
![EasyDetect2](https://storage.googleapis.com/kaggle-media/competitions/statoil/M8OP2F2.png)

---
* Challenging object to classify

![hardDetect](https://storage.googleapis.com/kaggle-media/competitions/statoil/AR4NDrK.png)
![hardDetect](https://storage.googleapis.com/kaggle-media/competitions/statoil/nXK6Vdl.png)




## Data fields

### train.json, test.json
* json format
* id : the id of the image
* band_1, band_2 : the flattended image data. 75x75 pixel values, floating point value
* inc_angle : the incidence angle of which the image was taken
* is_iceberg : the target variable, 1: iceberg, 0: ship ( only in train.json )

### ML approach
* Keras Model for Beginners (0.210 on LB)+EDA+R&D
* Transfer Learning with VGG-16 CNN+AUG LB 0.1712
* Submarineering.EVEN BETTER PUBLIC SCORE until now.
* Keras+TF LB 0.18

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/working'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# Any results you write to the current directory are saved as output.

In [None]:
!7za e /kaggle/input/statoil-iceberg-classifier-challenge/test.json.7z

# Keras Model for Beginners (0.210 on LB)+EDA+R&D

In [None]:
from sklearn.model_selection import train_test_split
from os.path import join as opj
from matplotlib import pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import pylab
plt.rcParams['figure.figsize'] = 10, 10
%matplotlib inline

In [None]:
# Load the data
train = pd.read_json("/kaggle/input/iceberg/train.json")

In [None]:
test = pd.read_json("/kaggle/working/test.json")

## Intro about the Data

Now coming to features, for the purpose of this demo code, I am extracting all two bands and taking avg of them as 3rd channel to create a 3-channel RGB equivalent.


band_1 / band_2 -> HH / HV / avg of both

In [None]:
# Generate the training data
# Create 3 bands having HH, HV and avg. of both
# 5,625 = 75 x 75
X_band_1 = np.array( [np.array(band).astype(np.float32).reshape(75,75) for band in train["band_1"]] )
X_band_2 = np.array( [np.array(band).astype(np.float32).reshape(75,75) for band in train["band_2"]] )

X_train = np.concatenate( [X_band_1[:, :, :, np.newaxis], X_band_2[:, :, :, np.newaxis], ((X_band_1+X_band_2)/2)[:, :, :, np.newaxis]], axis=-1 )

In [None]:
X_train[0]

Plotly [plotly](https://plot.ly/python/) : Modern chart 

In [None]:
# Take a look at a iceberg
import plotly.offline as py
import plotly.graph_objs as go
py.init_notebook_mode(connected=True)

def plotmy3d(c, name):
    data = [
        go.Surface(
            z=c
        )
    ]
    layout = go.Layout(
        title=name,
        autosize=False,
        width=700,
        height=700,
        margin=dict(
            l=65,
            r=50,
            b=65,
            t=90
        )
    )
    fig = go.Figure(data=data, layout=layout)
    py.iplot(fig)
plotmy3d(X_band_1[12, :, :], 'iceberg')

* with rada data, => similar to mountain
* not a actual image but scatter from radar
* Can exploit those difference using a CNN(?)

In [None]:
plotmy3d(X_band_1[14, :, :], 'Ship')

* don't have much resolution in images to visualize the shape of the ship
* CNN is to help ( http://elib.dlr.de/99079/2/2016_BENTES_Frost_Velotto_Tings_EUSAR_FP.pdf )


### Building a CNN using Keras

ref : https://www.slideshare.net/EricAhn/tensorflow-and-python-fault-detection-system-pycon-taiwan-2017

https://www.youtube.com/watch?v=FmpDIaiMIeA

In [None]:
from matplotlib import pyplot
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Dense, Dropout, Input, Flatten, Activation
from keras.layers import GlobalMaxPooling2D
from keras.layers.normalization import BatchNormalization
from keras.layers.merge import Concatenate
from keras.models import Model
from keras import initializers
from keras.optimizers import Adam
from keras.callbacks import ModelCheckpoint, Callback, EarlyStopping

In [None]:
#define out model
def getModel():
    #Building the model
    gmodel = Sequential()
    
    #Conv Layer 1
    gmodel.add(Conv2D(64, kernel_size=(3,3), activation='relu', input_shape=(75,75,3)))
    gmodel.add(MaxPooling2D(pool_size=(3,3), strides=(2,2)))
    gmodel.add(Dropout(0.2))
    
    #Conv Layer 2
    gmodel.add(Conv2D(128, kernel_size=(3,3), activation='relu'))
    gmodel.add(MaxPooling2D(pool_size=(2,2), strides=(2,2)))
    gmodel.add(Dropout(0.2))
    
    #Conv Layer 3
    gmodel.add(Conv2D(128, kernel_size=(3,3), activation='relu'))
    gmodel.add(MaxPooling2D(pool_size=(2,2), strides=(2,2)))
    gmodel.add(Dropout(0.2))
    
    #Conv Layer 4
    gmodel.add(Conv2D(64, kernel_size=(3,3), activation='relu'))
    gmodel.add(MaxPooling2D(pool_size=(2,2), strides=(2,2)))
    gmodel.add(Dropout(0.2))
    
    #Flatten the data for upcoming dense layers
    gmodel.add(Flatten())
    
    #Dense Layers
    gmodel.add(Dense(512))
    gmodel.add(Activation('relu'))
    gmodel.add(Dropout(0.2))
    
    #Dense Layer 2
    gmodel.add(Dense(256))
    gmodel.add(Activation('relu'))
    gmodel.add(Dropout(0.2))
    
    #Sigmoid Layer
    gmodel.add(Dense(1))
    gmodel.add(Activation('sigmoid'))
    
    mypotim=Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)
    gmodel.compile(loss='binary_crossentropy',
                  optimizer=mypotim,
                  metrics=['accuracy'])
    gmodel.summary()
    return gmodel

def get_callbacks(filepath, patience=2):
    es = EarlyStopping('val_loss', patience=patience, mode="min")
    msave = ModelCheckpoint(filepath, save_best_only=True)
    return [es, msave]

file_path = ".model_wights.hdf5"
callbacks = get_callbacks(filepath = file_path, patience=5)

In [None]:
target_train=train['is_iceberg']
X_train_cv, X_valid, y_train_cv, y_valid = train_test_split(X_train, target_train, random_state=1, train_size=0.75)

In [None]:
# Without denoising, core features,
import os
gmodel = getModel()
gmodel.fit(X_train_cv, y_train_cv,
          batch_size=24,
          epochs=50,
          verbose=1,
          validation_data=(X_valid, y_valid),
          callbacks=callbacks)

In [None]:
gmodel.load_weights(".model_wights.hdf5")
score = gmodel.evaluate(X_valid, y_valid, verbose=1)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

In [None]:
X_band_test_1 = np.array([np.array(band).astype(np.float32).reshape(75, 75) for band in test["band_1"]])
X_band_test_2 = np.array([np.array(band).astype(np.float32).reshape(75, 75) for band in test["band_2"]])
X_test = np.concatenate([X_band_test_1[:, :, :, np.newaxis]
                        , X_band_test_2[:, :, :, np.newaxis]
                        , ((X_band_test_1+X_band_test_2)/2)
                        [:, :, :, np.newaxis]], axis=-1)
predicted_test = gmodel.predict_proba(X_test)

In [None]:
submission = pd.DataFrame()
submission['id'] = test['id']
submission['is_iceberg'] = predicted_test.reshape((predicted_test.shape[0]))
submission.to_csv('sub.csv', index=False)