## Problem: 

Seismic data is collected using reflection seismology, or seismic reflection. The method requires a controlled seismic source of energy, such as compressed air or a seismic vibrator, and sensors record the reflection from rock interfaces within the subsurface. The recorded data is then processed to create a 3D view of earth’s interior. Reflection seismology is similar to X-ray, sonar and echolocation.

A seismic image is produced from imaging the reflection coming from rock boundaries. The seismic image shows the boundaries between different rock types. In theory, the strength of reflection is directly proportional to the difference in the physical properties on either sides of the interface. While seismic images show rock boundaries, they don't say much about the rock themselves; some rocks are easy to identify while some are difficult.

There are several areas of the world where there are vast quantities of salt in the subsurface. One of the challenges of seismic imaging is to identify the part of subsurface which is salt. Salt has characteristics that makes it both simple and hard to identify. Salt density is usually 2.14 g/cc which is lower than most surrounding rocks. The seismic velocity of salt is 4.5 km/sec, which is usually faster than its surrounding rocks. This difference creates a sharp reflection at the salt-sediment interface. Usually salt is an amorphous rock without much internal structure. This means that there is typically not much reflectivity inside the salt, unless there are sediments trapped inside it. The unusually high seismic velocity of salt can create problems with seismic imaging.

### Data
The data is a set of images chosen at various locations chosen at random in the subsurface. The images are 101 x 101 pixels and each pixel is classified as either salt or sediment. In addition to the seismic images, the depth of the imaged location is provided for each image. The goal of the competition is to segment regions that contain salt.

#### Source: 
https://www.kaggle.com/c/tgs-salt-identification-challenge


### Note: 
Accept the terms and download data from the above link

### Aim: 

Implement U-Net neural model architecture in keras to solve this problem.


In this, you are asked to segment salt deposits beneath the Earth’s surface. Given a set of seismic images that are 101 x 101 pixels each and each pixel we need to classify as either salt or sediment. Our goal is to segment regions that contain salt. A seismic image is produced from imaging the reflection coming from rock boundaries. The seismic image shows the boundaries between different rock types. 

### Steps:

1. Download the dataset
2. Upload to Drive
3. Import from drive to colab
4. Load the images
5. Build U-net Model
6. Train
7. Report train and test accuracy

In [1]:
import numpy as np
import cv2
from tqdm import tqdm #Progress bar
import os
TRAIN_IMAGE_DIR = '/content/drive/My drive/colab notebook/train/images/' #img_id is x(input)
TRAIN_MASK_DIR = '/content/drive/My drive/colab notebook/train/mask/'   #rle_mask is y(output)
TEST_IMAGE_DIR = '/content/drive/My drive/colab notebook/train/images/'

train_d = os.listdir(TRAIN_IMAGE_DIR)

FileNotFoundError: [WinError 3] The system cannot find the path specified: '/content/drive/My drive/colab notebook/train/mask/'

In [0]:
x = [np.array(cv2.imread(TRAIN_IMAGE_DIR + p, cv2.IMREAD_GRAYSCALE), dtype=np.uint8) for p in tqdm(train_d)] #cv2.imread=openCV image read
x = np.array(x)/255

y = [np.array(cv2.imread(TRAIN_MASK_DIR + p, cv2.IMREAD_GRAYSCALE), dtype=np.uint8) for p in tqdm(train_d)]
y = np.array(y)/255
print(x.shape,y.shape)

In [0]:
x=np.expand_dims(x,axis=3) #EXPAND DIM OF X AND INSERT NEW AXIS @ 3 
y=np.expand_dims(y,axis=3)
print(x.shape,y.shape)

In [0]:
from keras.layers import MaxPooling2D,Conv2D,Dense,Dropout,Input,Conv2DTranspose,Concatenate
from keras.models import Sequential,Model
from keras.optimizers import Adam
import keras
def conv_block(num_layers,inp,units,kernel_size):
    x = input
    for l in range(num_layers): #repeat 32-24-16 ----4 times
        x = Conv2D(units, kernel_size=kernel_size,padding='SAME',activation='relu')(x)
    return x
input = Input(shape=(101,101,1))
cnn1 = conv_block(5,input,32,3)
cnn2 = conv_block(5,input,24,5)
cnn3 = conv_block(5,input,16,7)
cnn4 = conv_block(5,input,8,9)
cnn5 = conv_block(5,input,4,11)
concat = Concatenate()([cnn1,cnn2,cnn3,cnn4,cnn5])

d1 = Conv2D(16,1,activation='relu')(concat)
out = Conv2D(1,1,activation='sigmoid')(d1) #filter_size = 1 ,so that 1x1 filter will scan over for more learning

model = Model(inputs=[input], outputs=[out])
adam=Adam(lr=0.001)
model.compile(loss="binary_crossentropy",optimizer="adam",metrics=["accuracy"])
model.summary() # start_dim=(101,101,1) == #end_dim=(101,101,1)

In [0]:
keras.callbacks.EarlyStopping(monitor='val_loss', min_delta=0, patience=0, verbose=0, mode='auto')
model.fit(x,y,epochs=50,batch_size=128,validation_split=0.2,verbose=True)

In [0]:
#test_data
test_d=os.listdir(TEST_IMAGE_DIR)

x_test = [np.array(cv2.imread(TEST_IMAGE_DIR + p, cv2.IMREAD_GRAYSCALE), dtype=np.uint8) for p in tqdm(test_d)]
x_test = np.array(x_test)/255
print(x_test.shape)
x_test = np.expand_dims(x_test,axis=3)
print(x_test.shape)

In [0]:
predict=model.predict(x_test,verbose=True)

In [0]:
def RLenc(img, order='F', format=True):
    """
    img is binary mask image, shape (r,c)
    order is down-then-right, i.e. Fortran
    format determines if the order needs to be preformatted (according to submission rules) or not

    returns run length as an array or string (if format is True)
    """
    bytes = img.reshape(img.shape[0] * img.shape[1], order=order)
    runs = []  ## list of run lengths
    r = 0  ## the current run length
    pos = 1  ## count starts from 1 per WK
    for c in bytes:
        if (c == 0):
            if r != 0:
                runs.append((pos, r))
                pos += r
                r = 0
            pos += 1
        else:
            r += 1
            
    # if last run is unsaved (i.e. data ends with 1)
    if r != 0:
        runs.append((pos, r))
        pos += r
        r = 0

    if format:
        z = ''

        for rr in runs:
            z += '{} {} '.format(rr[0], rr[1])
        return z[:-1]
    else:
        return runs

pred_dict = {fn[:-4]:RLenc(np.round(predict[i,:,:,0])) for i,fn in tqdm(enumerate(test_d))}