 > **<h1> Metastatic Cancer cells in lymph node samples **
 
 **<h2> Contents : **
   **1. Understanding the problem **
   
   **2. Strategy we're gonna take **
 
    
   **3. Create the model**

**<h1>Understanding the problem**
          
**<h2> Describe the problem**
       
  The problem requires us to detect whether or not lymph
  node sections contain metastatic cancer tissue
  
**<h2>What are lymph nodes ?**
  
   To understand what lymph nodes are, we need to unerstand what
   the lymph system is.

 **<h3>What is the lymph system ?**
    
  ![](https://www.cancer.org/cancer/cancer-basics/lymph-nodes-and-cancer/_jcr_content/par/textimage/image.img.gif/149877$) 

  Our bodies have a network of lymph vessels and lymph nodes. (Lymph is pronounced limf.)
  This network is a part of the body抯 immune system.
  It collects fluid, waste material, and other things (like viruses and bacteria) that are in the body tissues, outside the bloodstream.

  Lymph vessels are a lot like the veins that collect and carry blood through the body.
  But instead of carrying blood, these vessels carry the clear watery fluid called lymph.

  Lymph fluid flows out from capillary walls to bathe the body抯 tissue cells.
  It carries oxygen and other nutrients to the cells,
  and carries away waste products like carbon dioxide (CO2) that flow out of the cells.
  Lymph fluid also contains white blood cells, which help fight infections.
 
**<h3>What are lymph nodes and what they do ?**
  
  Lymph vessels route lymph fluid through nodes throughout the body.
  Lymph nodes are small structures that work as filters for harmful substances.
  They contain immune cells that can help fight infection by attacking and destroying germs that are carried in through the 
  lymph fluid.

**<h2> What's Metastatic Cancer ?**
![](https://www.cancer.gov/PublishedContent/Images/images/cancer-types/metastasis-article.__v30025841.jpg)

 In metastasis, cancer cells break away from where they first formed (primary cancer),
 travel through the blood or lymph system, and form new tumors (metastatic tumors) in other parts of the body.
 The metastatic tumor is the same type of cancer as the primary tumor.

 To Learn more about Metastatic Cancer refer to : [This Link](https://www.cancer.gov/types/metastatic-cancer)
 
 **<h2>How is cancer in lymph nodes found ?**
 
 Normal lymph nodes are tiny and can be hard to find, but when there抯 infection, inflammation, or cancer, the nodes can get larger.
 Those near the body抯 surface often get big enough to feel with your fingers, and some can even be seen.
 But if there are only a few cancer cells in a lymph node, it may look and feel normal.
 In that case, the doctor must check for cancer by removing all or part of the lymph node.
 
 Doctors take samples of one or more nodes using needles. Usually, this is done on lymph nodes that are enlarged.
 This is called a needle biopsy. The tissue that抯 removed is looked at under the microscope by a pathologist (a doctor who diagnoses illness using tissue samples) 
 to find out if there are cancer cells in it, but this is where we come in to make models that could classify and help the pathologist with his job.

Please refer to those references to learn more and more abt this topic

References : https://www.cancer.gov/types/metastatic-cancer
                       <br>   https://www.cancer.org/cancer/cancer-basics/lymph-nodes-and-cancer.html


 **<h1> Strategy we're gonna take**

1. Try Densenet169 w/ 48px imgs

2. Create a Databunch and look at the data

3. Train and finetune 
  
4. Resize to original size(96px) and create another databunch  
  
5. Use the previous pretrained model to train and fine tune again 

**<h1>Create the model**

In [None]:
#Import Libs
import numpy as np 
import pandas as pd 
from fastai import *
from fastai.vision import *
from torchvision.models import *
import os 
import path

1. declare the specification of our model(arch, bs, size,...)
2. create a dataframe and look at how ur labels are formated 
3. set your transforms 
4. create a databunch (with the declared size but we're gonna increase it and tune the model more) 
5. look at your data and understand it 
6. create a learner 
7. train for a while. after performing lr_find 
8. unfreeze and retrain and once ur happy with the model export it  
9. increase size 
10. do 4->7->8 again 

In [None]:
# print(fastai.__version__)
imgs = os.listdir('../input/train')[:5]
arch = resnet50 
bs = 64
sz = 48 #gonna start out with 48x48 px and then gonna move up ro the original size 96x96 px 
path = Path('../input')
open_image('../input/train/'+imgs[1])

In [None]:
# look at the csv
df = pd.read_csv(path/'train_labels.csv')
df.head()

In [None]:
# Use data_block API to define the default specs for each databunch we create... 
tfms = get_transforms()
src = (ImageItemList.from_csv(path, 'train_labels.csv', folder = 'train', suffix = '.tif')
      .random_split_by_pct()
      .label_from_df()
      .add_test_folder('test'))

In [None]:
np.random.seed(50)
data = (src.transform(tfms, size = sz)
       .databunch().normalize(imagenet_stats)) 

In [None]:
data.show_batch(rows = 3, figsize=(11,12))

In [None]:
#thanks to Khoi Nguyen's kernel :https://www.kaggle.com/suicaokhoailang/wip-densenet121-baseline-with-fastai
from sklearn.metrics import roc_auc_score

def auc_score(y_pred,y_true,tens=True):
    score = roc_auc_score(y_true,torch.sigmoid(y_pred)[:,1])
    if tens:
        score = tensor(score)
    return score

In [None]:
learn = create_cnn(data, arch, metrics = [accuracy, auc_score], model_dir = '/tmp/models/')

In [None]:
learn.lr_find()
learn.recorder.plot()

In [None]:
lr = 1e-02 
learn.fit_one_cycle(5, slice(lr))

In [None]:
learn.save('model-1-rn50')

In [None]:
learn.unfreeze()
learn.lr_find() 
learn.recorder.plot()

In [None]:
learn.fit_one_cycle(5, slice(1e-05, lr/5))

In [None]:
learn.save('model-2-rn50')

In [None]:
#increase the size and replace the old data w/ the new one and do transfer learning 
data = (src.transform(tfms, size = 96)
       .databunch().normalize(imagenet_stats))
learn.data = data 
data.train_ds[0][0].shape

In [None]:
learn.freeze()
learn.lr_find()
learn.recorder.plot()

In [None]:
lr = 1e-03 
learn.fit_one_cycle(5, slice(lr))

In [None]:
learn.save('model-big-1-rn50')

In [None]:
learn.unfreeze()
learn.lr_find()
learn.recorder.plot()

In [None]:
learn.fit_one_cycle(5, slice(3e-06, lr/5))

In [None]:
learn.save('model-big-2-rn50')

In [None]:
interp = ClassificationInterpretation.from_learner(learn)
losses,idxs = interp.top_losses()

In [None]:
losses,idxs = interp.top_losses()

In [None]:
interp.plot_confusion_matrix(figsize=(12,12), dpi=60)

In [None]:
preds,y = learn.TTA()
acc = accuracy(preds, y)
print('The validation accuracy is {} %.'.format(acc * 100))

In [None]:
pred_score = auc_score(preds,y).item()
print('The validation AUC is {}.'.format(pred_score))

In [None]:
df1 = pd.read_csv(path/'sample_submission.csv')
id_list = list(df1.id)


In [None]:
preds,y = learn.TTA(ds_type=DatasetType.Test)
pred_list = list(preds[:,1])

In [None]:
pred_dict = dict((key, value.item()) for (key, value) in zip(learn.data.test_ds.items,pred_list))
pred_ordered = [pred_dict[Path('../input/test/' + id + '.tif')] for id in id_list]

In [None]:
submissions = pd.DataFrame({'id':id_list,'label':pred_ordered})
submissions.to_csv("submission_{}.csv".format(pred_score),index = False)
