# Create 10-slice as the input of Task 2
### Task 2 (new):
**input:** a 10-slice volume<br>
**output:** 3-category severity

In this notebook the input of Task2 is generated by using the trained newmodel3.h5 to predict on the patients. For each patient, we pick the 10 slices with the highest positive score as a representation of that patient.

In [2]:
import keras

from keras import backend as K
from keras.models import Sequential,Model
from keras.layers import Dense, Dropout,Input
from keras.layers import Conv2D, MaxPooling2D, Flatten,GlobalAveragePooling2D, BatchNormalization
from keras_preprocessing.image import ImageDataGenerator
from keras.callbacks import EarlyStopping

from keras.applications.vgg16 import VGG16

import os
import cv2
import time
import json
import random
from PIL import Image

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt 
%matplotlib inline

## Load newmodel3 weight

In [3]:
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
                 activation='relu',
                 input_shape=(256,256,3)))

model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(128, (3, 3), activation='relu'))
model.add(Conv2D(128, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(256, (3, 3), activation='relu'))
model.add(Conv2D(256, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(512, (3, 3), activation='relu'))
model.add(Conv2D(512, (3, 3), activation='relu'))
model.add(Conv2D(512, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(BatchNormalization())
model.add(Dropout(0.3))
model.add(Dense(128, activation='relu'))
model.add(Dense(3, activation='softmax'))


model.load_weights('../input/models/newmodel3.h5')

In [4]:
csv_file = pd.read_csv('../input/task-2/morbidity.csv')
patient = np.array(csv_file['Patient'])
morbidity = np.array(csv_file['Morbidity'])

Since we cannot run all the patients one time, the whole training set created by running this notebook for 4 times.<br>
999 training samples, 342 test samples

In [5]:
print(patient[298])
print(patient[590])
print(patient[890])

399
739
1060


In [6]:
train_x = patient[:999]
test_x = patient[999:]
train_y = morbidity[:999]
test_y = morbidity[999:]
print(len(train_x))
print(len(train_y))
print(len(test_x))
print(len(test_y))

999
999
342
342


In [7]:
print(patient[9])

11


## Case Study: Patient 11 (label: regular)
The below is the prediction of patient 11. Looking at the values, it is found that though the modal can classify the slices of the patient with a high accuracy, it does not mean that the predictions are confident. Therefore, the model is actually imperfect and needs to be improved.

## Generate training data

In [8]:
train_data = []
start_time = time.time()
    
for Id in train_x[9:10]:
    folder_path = '../input/task-2/Covid-19 CT/Covid-19 CT/train/Patient ' + str(Id) + '/CT'
    
    if Id==991:
        # a folder has special naming
        folder_path = '../input/task-2/Covid-19 CT/Covid-19 CT/train/Patient ' + str(Id) + '/2020_1_22'
    all_imgs = list(sorted(os.listdir(folder_path)))
    processed = np.stack([np.array(Image.open(folder_path + '/' + file).resize((256,256)))/255 for file in all_imgs])
    Y_pred = model.predict(processed,batch_size=64)
    print(Y_pred)
    # from largest to smallest, pick top10
    Volume_rep_idx = np.argsort(-Y_pred,axis=0)[:10,2]
    print('{:0}: {}'.format(Id,Volume_rep_idx))
    
    # resize, change into grey-scale image, stack together
    processed = np.stack([(np.array(Image.open(folder_path + '/' + file).convert('L').resize((256,256)))/255).astype('float32') for file in all_imgs])
    selected = processed[Volume_rep_idx]
    train_data.append(selected.transpose(1,2,0))
    
length = time.time() - start_time
print('{:.0f}m {:.0f}s'.format(length // 60, length % 60))

[[3.35667143e-03 4.70917344e-01 5.25725901e-01]
 [2.39633722e-03 5.33867955e-01 4.63735759e-01]
 [1.68426405e-03 5.41989625e-01 4.56326067e-01]
 [1.36043190e-03 5.42730391e-01 4.55909193e-01]
 [1.11459941e-03 5.73114753e-01 4.25770730e-01]
 [8.78235791e-04 6.53131068e-01 3.45990598e-01]
 [7.10243359e-04 7.44112909e-01 2.55176842e-01]
 [5.57586842e-04 8.25757265e-01 1.73685178e-01]
 [4.65892692e-04 8.93536091e-01 1.05997972e-01]
 [3.95751820e-04 9.44745421e-01 5.48588559e-02]
 [4.46722232e-04 9.69325542e-01 3.02277990e-02]
 [6.19920844e-04 9.82335508e-01 1.70445740e-02]
 [8.81545478e-04 9.88232315e-01 1.08861430e-02]
 [9.88348504e-04 9.93032336e-01 5.97926648e-03]
 [1.13451644e-03 9.94617403e-01 4.24808171e-03]
 [1.45440898e-03 9.95644331e-01 2.90131127e-03]
 [2.57395674e-03 9.95226145e-01 2.19993852e-03]
 [3.52216815e-03 9.94962335e-01 1.51541911e-03]
 [3.99228884e-03 9.95166779e-01 8.41042842e-04]
 [5.35958307e-03 9.94072258e-01 5.68136689e-04]
 [6.72714598e-03 9.92859483e-01 4.134171

1m 27s


In [None]:
print(len(train_data))
np.save('train_10ct4.npy',np.array(train_data))

## Generate test data
Running this cell takes around half an hour on kaggle.

In [None]:
test_data = []
start_time = time.time()
    
for Id in test_x:
    folder_path = '../input/task-2/Covid-19 CT/Covid-19 CT/test/Patient ' + str(Id) + '/CT'
    all_imgs = list(sorted(os.listdir(folder_path)))
    processed = np.stack([np.array(Image.open(folder_path + '/' + file).resize((256,256)))/255 for file in all_imgs])
    Y_pred = model.predict(processed,batch_size=64)
    Volume_rep_idx = np.argsort(-Y_pred,axis=0)[:10,2]
    print('{:0}: {}'.format(Id,Volume_rep_idx))
    
    processed = np.stack([(np.array(Image.open(folder_path + '/' + file).convert('L').resize((256,256)))/255).astype('float32') for file in all_imgs])
    selected = processed[Volume_rep_idx]
    test_data.append(selected.transpose(1,2,0))
    
    
length = time.time() - start_time
print('{:.0f}m {:.0f}s'.format(length // 60, length % 60))

In [None]:
print(len(test_data))
np.save('test_10ct.npy',np.array(test_data))