## Data Preparation
What the data? See the link:
https://www.kaggle.com/datasets/jerzydziewierz/bee-vs-wasp

In [1]:
!wget https://github.com/SVizor42/ML_Zoomcamp/releases/download/bee-wasp-data/data.zip
!unzip -q data.zip

--2023-11-13 14:38:15--  https://github.com/SVizor42/ML_Zoomcamp/releases/download/bee-wasp-data/data.zip
Resolving github.com (github.com)... 140.82.113.3
Connecting to github.com (github.com)|140.82.113.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/405934815/e6c56cb7-dce1-463f-865b-01e913c38485?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20231113%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20231113T143803Z&X-Amz-Expires=300&X-Amz-Signature=c88d77854df76454637d57eb23ba204d0acffbb183d7d9d51f26305cbd872282&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=405934815&response-content-disposition=attachment%3B%20filename%3Ddata.zip&response-content-type=application%2Foctet-stream [following]
--2023-11-13 14:38:15--  https://objects.githubusercontent.com/github-production-release-asset-2e65be/405934815/e6c56cb7-dce1-463f-865b-01e913c38485?X-Amz-Algor

## Question 1
Since we have a binary classification problem, what is the best loss function for us?

* `mean squared error`
* `binary crossentropy`
* `categorical crossentropy`
* `cosine similarity`

> **Note:** since we specify an activation for the output layer, we don't need to set `from_logits=True`

Answer: `binary crossentropy`

## Build the Model

In [64]:
import tensorflow as tf
from tensorflow.keras import layers, models, optimizers

model = models.Sequential([
    layers.Conv2D(32, (3, 3), activation='relu', input_shape=(150, 150, 3)),
    layers.MaxPooling2D((2, 2)),
    layers.Flatten(),
    layers.Dense(64, activation='relu'),
    layers.Dense(1, activation='sigmoid')
])

model.compile(optimizer=optimizers.SGD(learning_rate=0.002, momentum=0.8),
              loss='binary_crossentropy',
              metrics=['accuracy'])

## Question 2

What's the number of parameters in the convolutional layer of our model? You can use the `summary` method for that.

* 1
* 65
* 896
* 11214912

In [65]:
model.summary()

Model: "sequential_16"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d_17 (Conv2D)          (None, 148, 148, 32)      896       
                                                                 
 max_pooling2d_16 (MaxPooli  (None, 74, 74, 32)        0         
 ng2D)                                                           
                                                                 
 flatten_16 (Flatten)        (None, 175232)            0         
                                                                 
 dense_32 (Dense)            (None, 64)                11214912  
                                                                 
 dense_33 (Dense)            (None, 1)                 65        
                                                                 
Total params: 11215873 (42.79 MB)
Trainable params: 11215873 (42.79 MB)
Non-trainable params: 0 (0.00 Byte)
___________

## Data Generators

In [66]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

train_datagen = ImageDataGenerator(rescale=1./255)
test_datagen = ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(
    'data/train',
    target_size=(150, 150),
    batch_size=20,
    class_mode='binary',
    shuffle=True)

test_generator = test_datagen.flow_from_directory(
    'data/test',
    target_size=(150, 150),
    batch_size=20,
    class_mode='binary',
    shuffle=True)


Found 3677 images belonging to 2 classes.
Found 918 images belonging to 2 classes.


## Train the Model

In [67]:
history = model.fit(
    train_generator,
    epochs=10,
    validation_data=test_generator
)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


## Question 3

What is the median of training accuracy for all the epochs for this model?

* 0.20
* 0.40
* 0.60
* 0.80

In [68]:
import numpy as np

# Assuming 'history' is the variable that holds the training history
training_accuracies = history.history['accuracy']
median_accuracy = np.median(training_accuracies)
print(f"Median Training Accuracy: {median_accuracy}")

Median Training Accuracy: 0.7880065441131592


### Question 4

What is the standard deviation of training loss for all the epochs for this model?

* 0.031
* 0.061
* 0.091
* 0.131

In [69]:
training_losses = history.history['loss']
std_loss = np.std(training_losses)
print(f"Standard Deviation of Training Loss: {std_loss}")

Standard Deviation of Training Loss: 0.08603451877824


## Data Augmentation

In [70]:
train_datagen = ImageDataGenerator(
    rescale=1./255,
    rotation_range=50,
    width_shift_range=0.1,
    height_shift_range=0.1,
    zoom_range=0.1,
    horizontal_flip=True,
    fill_mode='nearest'
)

# Re-define the train generator with the augmented datagen
train_generator = train_datagen.flow_from_directory(
    'data/train',
    target_size=(150, 150),
    batch_size=20,
    class_mode='binary',
    shuffle=True)

# Continue training for 10 more epochs
augmented_history = model.fit(
    train_generator,
    epochs=10,
    validation_data=test_generator
)

Found 3677 images belonging to 2 classes.
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


### Question 5

Let's train our model for 10 more epochs using the same code as previously.
> **Note:** make sure you don't re-create the model - we want to continue training the model
we already started training.

What is the mean of test loss for all the epochs for the model trained with augmentations?

* 0.18
* 0.48
* 0.78
* 0.108

In [72]:
test_losses = augmented_history.history['val_loss']
mean_test_loss = np.mean(test_losses)
print(f"Mean Test Loss with Augmentations: {mean_test_loss}")

Mean Test Loss with Augmentations: 0.49167258143424986


### Question 6

What's the average of test accuracy for the last 5 epochs (from 6 to 10)
for the model trained with augmentations?

* 0.38
* 0.58
* 0.78
* 0.98

In [80]:
test_accuracies = augmented_history.history['val_accuracy']
average_accuracy_last_5_epochs = np.mean(test_accuracies[-5:])
print(f"Average Test Accuracy for Last 5 Epochs: {average_accuracy_last_5_epochs}")

Average Test Accuracy for Last 5 Epochs: 0.7819172143936157


## Submit the results

- Submit your results here: https://forms.gle/5sjtM3kzY9TmLmU17
- If your answer doesn't match options exactly, select the closest one
- You can submit your solution multiple times. In this case, only the last submission will be used


## Deadline

The deadline for submitting is November 20 (Monday), 23:00 CEST. After that the form will be closed.