# SIIM: Step-by-Step Image Detection for Beginners
## Part 4(mini). Multi-Output Regression

👉 Part 1. [EDA to Preprocessing](https://www.kaggle.com/songseungwon/siim-covid-19-detection-10-step-tutorial-1)

👉 Part 2. [Basic Modeling - Simplest Image Classification Models using Keras](https://www.kaggle.com/songseungwon/siim-covid-19-detection-10-step-tutorial-2)

👉 Part 3(mini). [Preprocessing for Multi-Output Regression that Detect Opacities](https://www.kaggle.com/songseungwon/siim-covid-19-detection-mini-part-preprocess)

> index
```
Step 1. Load Train Data Table
     1-a. extract data with only one opacity
     1-b. extract image paths
Step 2. Load Image Dataset
     2-a. Data Preprocessing
Step 3. Modeling
     3-a. Train-valid split
     3-b. Modeling
     3-c. Training
     3-d. Evaluation
 ```


This model trains on data with only one opacity.

The X matrix consists of the image and the Y matrix consists of 4 borders, i.e. 4 coordinate vectors, that make up the box that detects opacity.

We will deal with multi output regression through this simple learning.

## Step 1. Load Train Data Table

In [1]:
import pandas as pd

In [2]:
train_df = pd.read_csv('../input/siim-covid19-preprocessed-datasettrain/train_full_info.csv')
train_df.head()

Unnamed: 0.1,Unnamed: 0,id,boxes,label,StudyInstanceUID,OpacityCount,Negative for Pneumonia,Typical Appearance,Indeterminate Appearance,Atypical Appearance,Opacity,origin_img_height,origin_img_width,height_ratio,width_ratio,path,resized_box_x,resized_box_y,resized_box_width,resized_box_height
0,0,000a312787f2,"[{'x': 789.28836, 'y': 582.43035, 'width': 102...",opacity 1 789.28836 582.43035 1815.94498 2499....,5776db0cec75,2,0,1,0,0,1,3488,4256,0.073108,0.059915,/kaggle/input/siim-covid19-resized-to-256px-jp...,[ 47.29053849 134.56475103],[42.58020047 43.22171628],[114.87599732 105.54396316],[75.05660496 80.02830077]
1,1,000c3a3f293f,,none 1 0 0 1 1,ff0879eb20ed,0,1,0,0,0,0,2320,2832,0.109914,0.090042,/kaggle/input/siim-covid19-resized-to-256px-jp...,0,0,0,0
2,2,0012ff7358bc,"[{'x': 677.42216, 'y': 197.97662, 'width': 867...",opacity 1 677.42216 197.97662 1545.21983 1197....,9d514ce429a7,2,0,1,0,0,1,2544,3056,0.100236,0.083442,/kaggle/input/siim-covid19-resized-to-256px-jp...,[ 56.52573652 149.58642448],[19.8443546 40.35019163],[ 83.42422961 100.49453207],[86.98443626 61.84825932]
3,3,001398f4ff4f,"[{'x': 2729, 'y': 2181.33331, 'width': 948.000...",opacity 1 2729 2181.33331 3677.00012 2785.33331,28dddc8559b2,1,0,0,0,1,1,3520,4280,0.072443,0.059579,/kaggle/input/siim-covid19-resized-to-256px-jp...,[162.59228972],[158.02272558],[35.98598131],[68.67614506]
4,4,001bd15d1891,"[{'x': 623.23328, 'y': 1050, 'width': 714, 'he...",opacity 1 623.23328 1050 1337.23328 2156 opaci...,dfd9fdd85a3e,2,0,1,0,0,1,2800,3408,0.091071,0.074824,/kaggle/input/siim-covid19-resized-to-256px-jp...,[ 46.63277183 192.93852276],[95.625 90.9500003],[82.75528169 83.8028169 ],[65.025 60.3500003]


In [3]:
train_df.drop(columns=['Unnamed: 0'], inplace=True)

### 1-a. extract data with only one opacity

In [4]:
train_df[train_df.OpacityCount == 1]

Unnamed: 0,id,boxes,label,StudyInstanceUID,OpacityCount,Negative for Pneumonia,Typical Appearance,Indeterminate Appearance,Atypical Appearance,Opacity,origin_img_height,origin_img_width,height_ratio,width_ratio,path,resized_box_x,resized_box_y,resized_box_width,resized_box_height
3,001398f4ff4f,"[{'x': 2729, 'y': 2181.33331, 'width': 948.000...",opacity 1 2729 2181.33331 3677.00012 2785.33331,28dddc8559b2,1,0,0,0,1,1,3520,4280,0.072443,0.059579,/kaggle/input/siim-covid19-resized-to-256px-jp...,[162.59228972],[158.02272558],[35.98598131],[68.67614506]
5,0022227f5adf,"[{'x': 1857.2065, 'y': 508.30565, 'width': 376...",opacity 1 1857.2065 508.30565 2233.23384 907.8...,84543edc24c2,1,0,0,1,0,1,2539,3050,0.100433,0.083607,/kaggle/input/siim-covid19-resized-to-256px-jp...,[155.2746418],[51.05078407],[33.40325346],[37.76564462]
16,008ca392cff3,"[{'x': 2284.17508, 'y': 1342.64878, 'width': 1...",opacity 1 2284.17508 1342.64878 3307.2952 2708...,39a80a14bfda,1,0,0,0,1,1,3480,4240,0.073276,0.060142,/kaggle/input/siim-covid19-resized-to-256px-jp...,[137.37373712],[98.38374681],[82.13804639],[74.97000879]
18,00a129830f4e,"[{'x': 496.23799, 'y': 1175.83357, 'width': 61...",opacity 1 496.23799 1175.83357 1113.61823 1840...,3a3c198051f0,1,0,0,1,0,1,2336,2836,0.109161,0.089915,/kaggle/input/siim-covid19-resized-to-256px-jp...,[44.61942435],[128.35512001],[59.80413845],[67.39381901]
40,012f57190f1d,"[{'x': 1440.88577, 'y': 1319.0304, 'width': 39...",opacity 1 1440.88577 1319.0304 1838.2981200000...,20eb74deaf29,1,0,0,1,0,1,2428,2428,0.105025,0.105025,/kaggle/input/siim-covid19-resized-to-256px-jp...,[151.32861258],[138.53078748],[44.12314782],[41.73811748]
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6298,fe7fd0793fb3,"[{'x': 723.8129, 'y': 918.02781, 'width': 481....",opacity 1 723.8129 918.02781 1204.824250000000...,07a2358b4e59,1,0,0,1,0,1,2320,2832,0.109914,0.090042,/kaggle/input/siim-covid19-resized-to-256px-jp...,[65.17383104],[100.90391877],[50.97012754],[52.869782]
6299,fe829429edc4,"[{'x': 1968.89074, 'y': 1397.96122, 'width': 7...",opacity 1 1968.89074 1397.96122 2762.55224 191...,aa4b16d18061,1,0,0,1,0,1,2539,3050,0.100433,0.083607,/kaggle/input/siim-covid19-resized-to-256px-jp...,[164.61217662],[140.40177672],[43.59883987],[79.70999705]
6300,fe94f73e3072,"[{'x': 2887.40002, 'y': 1557.86664, 'width': 1...",opacity 1 2887.40002 1557.86664 3987.40002 231...,9945a45c802e,1,0,0,0,1,1,4020,4891,0.063433,0.052137,/kaggle/input/siim-covid19-resized-to-256px-jp...,[150.53915459],[98.81989881],[39.62379881],[69.7761194]
6309,fee6b3f57081,"[{'x': 764.45953, 'y': 775.33251, 'width': 562...",opacity 1 764.45953 775.33251 1326.49096 1642....,8dd03d75fbbc,1,0,0,1,0,1,2800,3408,0.091071,0.074824,/kaggle/input/siim-covid19-resized-to-256px-jp...,[57.1998768],[70.6106393],[64.85345264],[51.18500523]


### 1-b. extract image paths

In [5]:
train_df[train_df.OpacityCount == 1]['path']

3       /kaggle/input/siim-covid19-resized-to-256px-jp...
5       /kaggle/input/siim-covid19-resized-to-256px-jp...
16      /kaggle/input/siim-covid19-resized-to-256px-jp...
18      /kaggle/input/siim-covid19-resized-to-256px-jp...
40      /kaggle/input/siim-covid19-resized-to-256px-jp...
                              ...                        
6298    /kaggle/input/siim-covid19-resized-to-256px-jp...
6299    /kaggle/input/siim-covid19-resized-to-256px-jp...
6300    /kaggle/input/siim-covid19-resized-to-256px-jp...
6309    /kaggle/input/siim-covid19-resized-to-256px-jp...
6318    /kaggle/input/siim-covid19-resized-to-256px-jp...
Name: path, Length: 973, dtype: object

In [6]:
img_path_array = train_df[train_df.OpacityCount == 1]['path'].values
img_path_array[:5]

array(['/kaggle/input/siim-covid19-resized-to-256px-jpg/train/001398f4ff4f.jpg',
       '/kaggle/input/siim-covid19-resized-to-256px-jpg/train/0022227f5adf.jpg',
       '/kaggle/input/siim-covid19-resized-to-256px-jpg/train/008ca392cff3.jpg',
       '/kaggle/input/siim-covid19-resized-to-256px-jpg/train/00a129830f4e.jpg',
       '/kaggle/input/siim-covid19-resized-to-256px-jpg/train/012f57190f1d.jpg'],
      dtype=object)

## Step 2. Load Image Dataset

In [7]:
import matplotlib.pyplot as plt
import numpy as np

In [8]:
plt.imread(img_path_array[1]).shape

(256, 256)

In [9]:
np.empty((256,256),dtype=int)

array([[ 5064878326892452095,    72057598349690441,  4828100688804315392,
        ..., -7585050098837799473,  2117698324267671300,
         -466172358711421750],
       [-8432477558940240071, -5344019190829194997,  6738364404397233814,
        ...,  4796904901838199415, -8546273337030787947,
         7828918341277627062],
       [-3280189988986474304,  6381936258677507742, -7983436003324606221,
        ...,  6679623710432926638,  3813522763527512933,
        -2986825688515149075],
       ...,
       [                   0,                    0,                    0,
        ...,                    0,                    0,
                           0],
       [                   0,                    0,                    0,
        ...,                    0,                    0,
                           0],
       [                   0,                    0,                    0,
        ...,                    0,                    0,
                           0]])

In [10]:
len(img_path_array)

973

In [11]:
imgs = []
i = 0
for path in img_path_array:
    imgs.append(plt.imread(path))
    i+=1
    if i % 100 == 0:
        print('{}/{}'.format(i,len(img_path_array)))
    elif i == len(img_path_array):
        print('{}/{} - done!'.format(i,len(img_path_array)))

100/973
200/973
300/973
400/973
500/973
600/973
700/973
800/973
900/973
973/973 - done!


In [12]:
X_train = np.array(imgs)
X_train.shape

(973, 256, 256)

In [13]:
X_train = X_train[:,:,:,np.newaxis]
X_train.shape

(973, 256, 256, 1)

### 2-a. Data Preprocessing

In [14]:
train_df[train_df.OpacityCount==1].iloc[:,-4:].apply(lambda x : x.str.strip('[]'))

Unnamed: 0,resized_box_x,resized_box_y,resized_box_width,resized_box_height
3,162.59228972,158.02272558,35.98598131,68.67614506
5,155.2746418,51.05078407,33.40325346,37.76564462
16,137.37373712,98.38374681,82.13804639,74.97000879
18,44.61942435,128.35512001,59.80413845,67.39381901
40,151.32861258,138.53078748,44.12314782,41.73811748
...,...,...,...,...
6298,65.17383104,100.90391877,50.97012754,52.869782
6299,164.61217662,140.40177672,43.59883987,79.70999705
6300,150.53915459,98.81989881,39.62379881,69.7761194
6309,57.1998768,70.6106393,64.85345264,51.18500523


In [15]:
remove_brk_y = train_df[train_df.OpacityCount==1].iloc[:,-4:].apply(lambda x : x.str.strip('[]'))
for col in remove_brk_y.columns:
    remove_brk_y[col] = remove_brk_y[col].astype('float')

In [16]:
remove_brk_y.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 973 entries, 3 to 6318
Data columns (total 4 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   resized_box_x       973 non-null    float64
 1   resized_box_y       973 non-null    float64
 2   resized_box_width   973 non-null    float64
 3   resized_box_height  973 non-null    float64
dtypes: float64(4)
memory usage: 38.0 KB


In [17]:
Y_train = np.array(remove_brk_y)
Y_train.shape

(973, 4)

## Step 3. Modeling

### 3-a. Train-valid split

In [18]:
from sklearn.model_selection import train_test_split
X_train, X_valid, Y_train, Y_valid = train_test_split(
    X_train, Y_train, test_size=0.3, random_state=42)

In [19]:
print('Shape of X_train : ', X_train.shape)
print('Shape of Y_train : ', Y_train.shape)
print('Shape of X_valid : ', X_valid.shape)
print('Shape of Y_valid : ', Y_valid.shape)


Shape of X_train :  (681, 256, 256, 1)
Shape of Y_train :  (681, 4)
Shape of X_valid :  (292, 256, 256, 1)
Shape of Y_valid :  (292, 4)


### 3-b. Modeling

In [20]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dropout, Dense

model = Sequential([
    Conv2D(16,(3,3),activation='relu',input_shape=(256,256,1)),
    MaxPooling2D(2,2),
    Conv2D(32,(3,3),activation='relu'),
    MaxPooling2D(2,2),
    Conv2D(64,(3,3),activation='relu'),
    MaxPooling2D(2,2),
    Conv2D(128,(3,3),activation='relu'),
    MaxPooling2D(2,2),
    Flatten(),
    Dropout(0.5),
    Dense(128,activation='relu'),
    Dense(32,activation='relu'),
    Dense(4,activation='linear')
])

In [21]:
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 254, 254, 16)      160       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 127, 127, 16)      0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 125, 125, 32)      4640      
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 62, 62, 32)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 60, 60, 64)        18496     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 30, 30, 64)        0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 28, 28, 128)       7

In [22]:
model.compile(optimizer='adam',loss='mse',metrics=['mae'])

In [23]:
from tensorflow.keras.callbacks import ModelCheckpoint

In [24]:
filepath = 'my_checkpoint.ckpt'
cp = ModelCheckpoint(
    filepath=filepath,
    save_weights_only=True,
    save_best_only=True,
    monitor='val_loss',
    verbose=1
)

### 3-c. Training

In [25]:
model.fit(
    X_train, Y_train,
    validation_data=(X_valid,Y_valid),
    epochs=12,
    callbacks=[cp]
)

Epoch 1/12

Epoch 00001: val_loss improved from inf to 1944.37805, saving model to my_checkpoint.ckpt
Epoch 2/12

Epoch 00002: val_loss improved from 1944.37805 to 1919.19800, saving model to my_checkpoint.ckpt
Epoch 3/12

Epoch 00003: val_loss improved from 1919.19800 to 1842.76343, saving model to my_checkpoint.ckpt
Epoch 4/12

Epoch 00004: val_loss improved from 1842.76343 to 1795.74377, saving model to my_checkpoint.ckpt
Epoch 5/12

Epoch 00005: val_loss did not improve from 1795.74377
Epoch 6/12

Epoch 00006: val_loss improved from 1795.74377 to 1761.31030, saving model to my_checkpoint.ckpt
Epoch 7/12

Epoch 00007: val_loss did not improve from 1761.31030
Epoch 8/12

Epoch 00008: val_loss did not improve from 1761.31030
Epoch 9/12

Epoch 00009: val_loss improved from 1761.31030 to 1739.53076, saving model to my_checkpoint.ckpt
Epoch 10/12

Epoch 00010: val_loss did not improve from 1739.53076
Epoch 11/12

Epoch 00011: val_loss did not improve from 1739.53076
Epoch 12/12

Epoch 00

<tensorflow.python.keras.callbacks.History at 0x7f7c1e406210>

### 3-d. Evaluation

In [26]:
model.load_weights(filepath)
model.evaluate(X_valid, Y_valid)



[1739.53076171875, 32.65669250488281]

---
If there are any mistakes, please feel free to give feedback! Thank you!