# Blood cells


# Tasks

To pass you need to turn in a runable notebook  (ipynb) and a pdf version of that notebook. Start the filename and your title with your group number, which assignment, then your names. E.g. 42_3a_EbbaBergman_AkshaiFILLIN_DavidFILLIN_JonathanFILLIN.ipynb

Note: Previously we've run with 5-10 epochs, I'd reccomend starting with atleast 30 epochs in this assignment, if it takes a long time to run contact a teacher

Complete the following tasks

1. Motivate why you are using the architecture you are using and hand in  
    a) One network not using the masks provided  
    b) One network using the masks provided  
    c) compare the two and discuss the impact the masks did or did not have (1 paragraph)  

2. Improve your network  
    a) Choose a network that you think is better to proceed with, motivate why
    b) Try 3 different improvements of your network, motivate your choices
    c) Combine your 3 improvements, discuss the results

3. A badly performing network run atleast once. Motivate why you chose this network (1 paragraph). 

The first, good neural network is the focus of this assignment.   

For the bad network I wouldn't recommend spending more than 30 minutes on this unless you really, really want. I do not think everyone will need 30 minutes.  

4. Evaluate your best network using  your test set and atleast 3 metrics and discuss their differences/similarities in relation to your results.  (1-3 paragraphs)
Hint: https://www.tensorflow.org/api_docs/python/tf/keras/metrics might be useful to look at for your fit method


# Data info

Xinyi Dai wrote her master thesis in 2022, and you can find it here: https://uu.diva-portal.org/smash/get/diva2:1681915/FULLTEXT01.pdf

You are not going to do the exact same thing that she did, and you are not allowed to use her code or her exact techniques and architectures.

It might be interesting for you to look at the flow of data below, here we will only focus on the Deep Learning part, but as you can see there are several steps between the first images from the microscope and the application of them.

Note that in this dataset I have removed the controll class for easier modelling

![XinYi_Process_2.1.png](attachment:be35181a-5a86-4c9e-916f-468a400d4e99.png)

# Imports

In [None]:
import numpy as np
import tensorflow as tf
import pandas as pd
import IPython
from tensorflow import keras
from tensorflow.keras import layers
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from datetime import datetime
import cv2
import os 

from IPython.display import display, HTML

from tensorflow.keras.preprocessing.image import ImageDataGenerator
from skimage import io, img_as_uint
import matplotlib.pyplot as plt

In [None]:
from importlib.machinery import SourceFileLoader
base_path = '/home/jovyan/Teaching/big-data_deeplearning/'


cnn_helper = SourceFileLoader("cnn_helper", base_path + "Labs/"+"cnn_helper.py").load_module()
plot_helper = SourceFileLoader("plot_helper", base_path + "Labs/" +"plot_helper.py").load_module()

In [None]:
# Configure GPUs
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
gpus = tf.config.list_physical_devices('GPU')
if gpus:
  try:
    # Currently, memory growth needs to be the same across GPUs
    for gpu in gpus:
      tf.config.experimental.set_memory_growth(gpu, True)
      print("Done setting memory_growth")
    logical_gpus = tf.config.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
  except Exception as e:
    # Memory growth must be set before GPUs have been initialized
    print("If you get this , probably done already")
    # Catch the exception and display a custom HTML message with red text
    message = """There was some problems setting up the GPU,
                 it is probably best to restart kernel and clear
                 all outputs before starting over
              """
    display(HTML(f"<div style='color: red;'><strong>Warning:</strong>{message}</div>"))
    print(e)

# Paths

In [None]:
lab_data_path = base_path + "LabData/LeukemiaCells/"
images_path = lab_data_path + "cell_images/"  # 16bit bf images
masks_path = lab_data_path + "masks/" # masks filted based on csv from cp/cpa

cell_labels_path = images_path +"cell_labels.csv"
mask_labels_path= masks_path + "mask_labels.csv"


# Unpack the data for the leukemia cells just as you did in the labs 

No need to show the code or commands you do for this

# Load Data

In [None]:
df_cell_labels = pd.read_csv(cell_labels_path)
df_mask_labels = pd.read_csv(mask_labels_path)

## Look at a sample of the images

Let's look at the images - always a good start to the project.\
Here random images will be displayed, run this several time to see different images

The plotting functions are in plot_helper.py in this directory if you want to look at them.

In [None]:
df_cell_labels.head()

In [None]:
df_mask_labels.head(2)

In [None]:
file_column = "filename" ## Enter the name of the file names column
class_column = "class"
print(df_cell_labels[class_column].unique())
plot_helper.show_random_images(images_path, df_cell_labels, file_column, class_column)

In [None]:
# Code straight from XinYis notebooks, slightly modified by Ebba, probably based on Ebba Bergman's and Phil Harrison's code
# new code to show images here, because the tif files for the masks aren't as easy to show using our usual function

import numpy as np
from skimage import io
import matplotlib.pyplot as plt

print(df_mask_labels[class_column].unique())

figure, ax = plt.subplots(2, 3, figsize=(14, 10))
figure.suptitle("Examples of masks images", fontsize=20)
axes = ax.ravel()

df_images_to_show = df_mask_labels.sample(6)


for i in range(len(axes)):
    row = df_images_to_show.iloc[[i]]
    random_image = io.imread(masks_path+ row[file_column].values[0])
    imgnp = np.array(random_image)
    print(imgnp.max(), imgnp.min())
    axes[i].set_title(row[class_column].values[0], fontsize=14) 
    axes[i].imshow(random_image, cmap='gray')
    axes[i].set_axis_off()
    
plt.subplots_adjust(wspace=0.05, hspace=0.05)
plt.show()
plt.close()

# Code snippets you might find useful

In [None]:
image_shape = (40,40)
input_shape = (40,40,1)

### Code below is from XinYi's masters project
#### multiply the bf images with mask to get rid of the backgtound(change it to all black background but only bf cell)

In [None]:
image_files = [f for f in sorted(os.listdir(images_path)) if f.endswith('.tif')]
imgs = [io.imread(images_path+ str(f)) for f in image_files]

In [None]:
mask_files = [f for f in sorted(os.listdir(masks_path)) if f.endswith('.tif')]
masks = [io.imread(masks_path + str(f)) for f in mask_files]

## From here on you will need to add code

In [None]:
# From XinYi's masters project, slightly modified by Ebba


for i in range(len(image_files)):
    img_name = image_files[i]
    filename = str(img_name).split('.')
    imgnp = np.array(imgs[i]) #imgnp = imgnp/imgnp.max()
    masknp= np.array(masks[i]) # masknp= masknp/masknp.max()
    if imgnp.shape != masknp.shape:
        print("could not merge following file due to shape issues: " + str(filename))
        continue
    masked_np = imgnp*masknp
    masked_np = masked_np/masked_np.max()
    masked_np = img_as_uint(masked_np)

    if not os.path.exists(output_path):
        os.mkdir(output_path)
    if masked_np.max() == 65535:
        outfile = output_path + filename[0] + '.tif'
        io.imsave(outfile, masked_np)

In [None]:
folder_path = lab_data_path + "masked_cells/"  # Replace with your directory path
output_csv = 'masked_cell_labels.csv'               # Replace with your desired output CSV file path
column_names = ['filename', 'type', 'class', 'unknown1', 'unknown2', 'unknown3', 'original_image_name']  # Adjust column names as needed

masked_cells_labels = cnn_helper.filenames_to_csv(folder_path, output_csv, column_names)

In [None]:
def start_time():
    print("Starting run at: " + str(datetime.now()))

def end_time():
    print("Run finished at: " + str(datetime.now()))
    
def get_image_data_flat_from_file(data_directory, image_paths):
    file_names = image_paths.values.flatten() # Assumes image_paths come in df[image_path_column_name] structure due to lab
    image_data = np.array([np.array(cv2.imread(data_directory + file_name)) for file_name in file_names])
    flattened_image_data = image_data.reshape(image_data.shape[0], -1)
    return flattened_image_data

# Merge the cell images with the masks and save in a new folder

In [None]:
folder_path = lab_data_path + "masked_cells/"  # Replace with your directory path
output_csv = 'masked_cell_labels.csv'               # Replace with your desired output CSV file path
column_names = ['filename', 'type', 'class', 'unknown1', 'unknown2', 'unknown3', 'original_image_name']  # Adjust column names as needed

cnn_helper.filenames_to_csv(folder_path, output_csv, column_names)

# Look at the mereged images and labels

In [None]:
# Part 1

# Part 1

## Make a neural network for the unmasked images and run

## Make a neural network for the masked images and run

## Compare the networks

# Part 2

## a) Choose a network that you think is better to proceed with, motivate why
    

## b) Try 3 different improvements of your network, motivate your choices


### Attempt 1

### Attempt 2

### Attempt 3

## c) Combine your 3 improvements, discuss the results

### Attempt 4

# Part 3
A badly performing network run atleast once. Motivate why you chose this network (1 paragraph). 

# Part 4

Evaluate your best network using atleast 3 metrics and discuss their differences/similarities in relation to your results.  (1-3 paragraphs)