<a href="https://colab.research.google.com/github/kytk/MagiciansCorner/blob/master/MedNISTClassify.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# MedNISTデータセットを用いた放射線画像の分類

### Bradley J Erickson, MD PhD
*Copyright 2019

### このNotebookは、Radiology: AI article の以下の論文に対応しています
https://pubs.rsna.org/doi/10.1148/ryai.2019190072


このチュートリアルでは以下の3つを行います:

1) 6種類の画像をダウンロードし、展開します (頭部CT, 胸部CT, 腹部CT, 頭部MR, 乳腺MR, 胸部Xp) 

2) 写真を用いて事前にトレーニングされた畳み込みニューラルネットワーク (CNN) と ResNet 34 アーキテクチャを用いて画像を3種類に分類します 

3) システムの性能を評価し、一番間違っている結果を記録し、どのように性能を改善できるか考慮します 


In [3]:
# セル 1
# 最初に、fastai ライブラリをインストールしたうえで、必要なモジュールをインポートします
!pip3 install fastai
from fastai.vision import *

Defaulting to user installation because normal site-packages is not writeable


In [4]:
# セル 2
# 再度セルを実行する時の為に、念の為に以前のデータを削除します
!rm -rf MagiciansCorner
!rm -rf images

!wget --no-check-certificate 'https://docs.google.com/uc?export=download&id=1mqgBKTB0MtGf8Fhc8HaedJyiD8yMoXOh' -O ./MedNIST.zip

!mkdir images
!cd images; unzip -q "../MedNIST.zip" 
!rm -rf MagiciansCorner
# macOSによって生成された不要なファイルを削除します
!rm -rf ./images/__MACOSX
!ls images


--2021-02-11 23:14:09--  https://docs.google.com/uc?export=download&id=1mqgBKTB0MtGf8Fhc8HaedJyiD8yMoXOh
docs.google.com (docs.google.com) をDNSに問いあわせています... 2404:6800:4004:81f::200e, 172.217.175.110
docs.google.com (docs.google.com)|2404:6800:4004:81f::200e|:443 に接続しています... 失敗しました: 接続を拒否されました.
docs.google.com (docs.google.com)|172.217.175.110|:443 に接続しています... 接続しました。
HTTP による接続要求を送信しました、応答を待っています... 302 Moved Temporarily
場所: https://doc-0s-60-docs.googleusercontent.com/docs/securesc/ha0ro937gcuc7l7deffksulhg5h7mbp1/98r88rsveqqso10d08rnbt001goug5ej/1613052825000/16160187475894979440/*/1mqgBKTB0MtGf8Fhc8HaedJyiD8yMoXOh?e=download [続く]
警告: HTTPはワイルドカードに対応していません。
--2021-02-11 23:14:21--  https://doc-0s-60-docs.googleusercontent.com/docs/securesc/ha0ro937gcuc7l7deffksulhg5h7mbp1/98r88rsveqqso10d08rnbt001goug5ej/1613052825000/16160187475894979440/*/1mqgBKTB0MtGf8Fhc8HaedJyiD8yMoXOh?e=download
doc-0s-60-docs.googleusercontent.com (doc-0s-60-docs.googleusercontent.com) をDNSに問いあわせています... 2404:6

In [6]:
# セル 3
import os # ローカル用に追加
classes_dir = "./images"
flist = os.listdir(classes_dir)
print (flist)

['CXR', 'MRBrain', 'CTChest', 'CTHead', 'CTAbd', 'MRBreast']


事前にDICOM画像をグレイスケールのJPEG画像に変換し、サイズを 64x64 にしてあります (もし胸部レントゲン画像を 64x64 以上のサイズのままにしていたら、畳み込みニューラルネットワークの実行に失敗します。）すべての画像が同じサイズでないといけません。次の論文でこのことについてもう少し説明します。

## データを表示する

In [9]:
# セル 4
import numpy as np #ローカルで使えるように追加
from fastai.vision.data import ImageDataLoaders # fastai は v2にあがっている
np.random.seed(42)
#data = ImageDataBunch.from_folder(classes_dir, train=".", valid_pct=0.2,
#        ds_tfms=get_transforms(), size=64, num_workers=4).normalize(imagenet_stats)
data = ImageDataLoaders.from_folder(classes_dir, train=".", valid_pct=0.2,
        ds_tfms=aug_transforms(), size=64, num_workers=4).normalize(imagenet_stats)

data.classes
data.classes, data.c, len(data.train_ds), len(data.valid_ds)


NameError: name 'aug_transforms' is not defined

Good! Let's take a look at some of our pictures then.
いいですね。それではいくつかの画像を見てみましょう。

In [None]:
# セル 5
data.show_batch(rows=3, figsize=(7,8))

def get_img(img_url): return open_image(img_url)

# 関数 Function that displays many transformations of an image
def plots_of_one_image(img_url, tfms, rows=1, cols=3, width=15, height=5, **kwargs):
    img = get_img(img_url)
    [img.apply_tfms(tfms, **kwargs).show(ax=ax)
         for i,ax in enumerate(plt.subplots(rows,cols,figsize=(width,height))[1].flatten())]
tfms = get_transforms(flip_vert=False,                # flip vertical and horizontal
                      max_rotate=20.0,                # rotation between -30° and 30°
                      max_zoom=1.2)                   # zoom between 1 and 1.2
# Uncomment the line below to turn off augmentation (sets the transformations to nothing. Note that you will still see many images, but they are all the same
# tfms=[[],[]]

# Uncomment these 3 lines to show examples of artificial/augmented images from 1 starting image
# note that 00000124.jpg is my randomly selected head CT
# all displayed images are variants of that 1 image
#plots_of_one_image('./images/MRBrain/00000129.jpg',tfms[0],9,14,11,7, size=64)
#plt.subplots_adjust(left=0, bottom=0,wspace=0, hspace=0)
#plt.show()

## Train model

In [None]:
# セル 6
learn = cnn_learner(data, models.resnet34, metrics=error_rate)
#learn = cnn_learner(data, models.resnet50, metrics=error_rate)


In [None]:
# セル 7
learn.fit_one_cycle(3)
learn.save("MedNIST-34-1")

#Evaluation
* During the training process, the data is split into 3 parts: training, testing and validation. The training data is used to adjust the weights. The GPU does not have enough RAM to store the entire training set of images, so it is split into 'batches'. When all of the images have been used once for training, then an 'epoch' has passed. Once trained for that epoch, it evaluates how well it has learned using the 'test' data set. The performance on the training set is the train_loss and the performance on the validation set is the valid_loss, and the error_rate is also the percentage of cases wrong in the validation set.
* It is common practice that after 'acceptable' performance is achieved on the vclidation set, that the system is tested on the 'test' data, and that is what is considered the 'real' performance.
* Note that some use 'test' for what is called validation here, and vice versa.

* But sometimes the overall error rate doesn't really tell the story. We might care more about false positives than false negatives, and vica versa. Looking at early results can provide valuable insight into the training process, and how to improve results.

In [None]:
# セル 8
interp = ClassificationInterpretation.from_learner(learn)
interp.plot_confusion_matrix()

# Looking closer
* The confusion matrix shows that there is more confusion between chests and abdomens than with heads. Does that surprise you?
* Lets look a little closer at those. FastAI has a nice function that can show you the cases the it did the worst on. Think about that--there are 'errors' but what are the worst errors?
* Well, the class assigned to an image is the class that gets the highest score. So the 'worst' would be those where the score for the correct class was lowest. The function 'plot_top_losses' will show the predicted class, the real class, and the score, as well as the image for the N 
(in our case, 9) worst scored cases.
* The second line of code in the cell shows another nice feature of FastAI: to get documentation on any function, just type 'doc(function)' and it will print the documentation for that function. AND it also has a link you can click to then see teh actual source code that implements that function.

In [None]:
# セル 9
interp.plot_top_losses(9, figsize=(10,10))
doc(interp.plot_top_losses)

# What do we see?
* Most of the errant classes are slices that contain BOTH lung and abdomen. 
* This is an important point: Data preparation and curation is critical to getting good results
* We can argue about how to handle these cases. The correct answer probably depends on your use case. The point is that without seeing these error cases, fyou might never know what was going wrong...


In [None]:
# セル 10
learn.unfreeze()
learn.lr_find()
learn.recorder.plot()


# Extra credit:
* We 'cheated' by starting with a network that was already trained on more than 1,000,000 images. That means the system really only had to learn the specific features of these body parts, but the lower level features like edges and lines were already 'known' to be important to the network.
* On the other hand, the 'pretrained' network was trained on photographic images, which are color, not gray scale, and had a matrix size other than 64x64. 
* While we could start from scratch, a better option might be to use the pre-trained values, but allow any of the weights and kernels in the network to be changed, and that is what 'unfreeze' does. 

In [None]:
#Cell 1
learn.fit_one_cycle(5, max_lr=slice(3e-6,3e-5))
learn.save("Unfreeze-34-1")