# **Data Visualization**

## Objectives

* Answering business requirement 1:
    - The client is interested in conducting a study to visually differentiate a cherry leaf that is healthy and that contains powdery mildew.

## Inputs

* inputs/cherry_leaves_dataset/cherry-leaves/train
* inputs/cherry_leaves_dataset/cherry-leaves/test
* inputs/cherry_leaves_dataset/cherry-leaves/validation

## Outputs

* Image shape embeddings pickle file
* Mean and variability of images per label plot
* Plot to investigate contrast between healthy leaves and leaves with powdery mildew
* Generate code that answers business requirement 1 and can be used to build image montage on Streamlit dashboard


---

# Set directories

We need to change the working directory from its current folder to its parent folder
* We access the current directory with os.getcwd()

In [1]:
import os
current_dir = os.getcwd()
current_dir

'/workspace/Mildew-detection-in-cherry-leaves/jupyter_notebooks'

We want to make the parent of the current directory the new current directory
* os.path.dirname() gets the parent directory
* os.chir() defines the new current directory

In [2]:
os.chdir(os.path.dirname(current_dir))
print("You set a new current directory")

You set a new current directory


Confirm the new current directory

In [3]:
current_dir = os.getcwd()
current_dir

'/workspace/Mildew-detection-in-cherry-leaves'

### Set input directories

In [4]:
my_data_dir = 'inputs/cherry_leaves_dataset/cherry-leaves'
train_path = my_data_dir + '/train'
test_path = my_data_dir + '/test'
validation_path = my_data_dir + '/validation'

### Set output directory

In [6]:
version = 'v1'
file_path = f'outputs/{version}'

if 'outputs' in os.listdir(current_dir) and version in os.listdir(current_dir + '/outputs'):
    print('Old version is already available, create a new version.')
    pass
else:
    os.makedirs(name=file_path)

Old version is already available, create a new version.


### Set label names

In [7]:
labels = os.listdir(train_path)
print(f'Label for images are: {labels}')

Label for images are: ['healthy', 'powdery_mildew']


---

# Visualisation of image data

### Image shape

Calculate avarage image size on train set and set image_shape

In [28]:
from matplotlib.pyplot import imread
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
# Get Image size
dim1, dim2 = [], []
for label in labels:
    for image_filename in os.listdir(train_path + '/' + label):
        img = imread(train_path + '/' + label + '/' + image_filename)
        d1, d2, colors = img.shape
        dim1.append(d1) # Height
        dim2.append(d2) # Width
# Check if the images have different sizes
if len(set(dim1)) == 1 and len(set(dim2)) == 1:
    print('All images have the same size')
    print(f'Image height: {dim1[0]}, image widht: {dim2[0]}')
    image_shape = (dim1[0], dim2[0], 3)
else:
    # plot height and width. 
    # Calculate mean values
    print('The images have different sizes')
    fig, axes = plt.subplots()
    sns.scatterplot(x=dim2, y=dim1, alpha=0.5)
    axes.set_ylabel('Height in pixels')
    axes.set_xlabel('Width in pixels')
    dim1_mean = int(np.array(dim1).mean())
    dim2_mean = int(np.array(dim2).mean())
    axes.axvline(x=dim1_mean, color='g', linestyle='-')
    axes.axhline(y=dim2_mean, color='r', linestyle='-')
    plt.show()
    print(f'Avarage width: {dim2_mean} \nAvarage height: {dim1_mean}')
    image_shape = (dim1_mean, dim2_mean, 3)
image_shape

All images have the same size
Image height: 256, image widht: 256


(256, 256, 3)

Save the image shape embeddings

In [33]:
import joblib as joblib

joblib.dump(value=image_shape, filename=f'{file_path}/image_shape.pk1')

['outputs/v1/image_shape.pk1']

---

---

# Push files to Repo

* In case you don't need to push files to Repo, you may replace this section for "Conclusions and Next Steps" and state your conclusions and next steps.

In [None]:
import os
try:
  # create here your folder
  # os.makedirs(name='')
except Exception as e:
  print(e)
