### Gender Classification using Logistic Regression Model

The main objective of this notebook is to see how logistic regression is used together along with a gender classification dataset that contains about 58,000 male and female images that are 

### Requirements from Rubric (dont delete muna)

An overview or description of the data is provided, including how it was collected, and its implications on the types of conclusions that could be made from the data. A description of the variables, observations, and/or structure of the data is provided. The target task is well introduced and clearly defined.

The data is sufficiently explored to get a grasp of the distribution and the content of the data. Appropriate summaries and visualizations are presented. Insights into how the EDA can help the model training is mentioned.

The necessary steps for preprocessing and cleaning are performed, including explanations for every step. If no preprocessing or cleaning is done, there is a justification on why it was not needed.

The appropriate models are used to accomplish the machine learning task. Justification of choosing the models is shown.

Appropriate data-driven error analysis is made, and changes to the model selection and hyperparameters are performed to improve model performance. The study exhausts improvements that can be done to the model

The study is concluded by effectively summarizing the efforts of the authors. Recommendations on how the model could be further improved are provided.

### About the Dataset

The Gender Classification Dataset 

In [88]:
from torchvision import transforms
import tensorflow as tf

# Image transformations
image_transforms = {
    # Train uses data augmentation
    'train':
    transforms.Compose([
        transforms.RandomResizedCrop(size=256, scale=(0.8, 1.0)),
        transforms.RandomRotation(degrees=15),
        transforms.ColorJitter(),
        transforms.RandomHorizontalFlip(),
        transforms.CenterCrop(size=224),  # Image net standards
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406],
                             [0.229, 0.224, 0.225])  # Imagenet standards
    ]),
    # Validation does not use augmentation
    'valid':
    transforms.Compose([
        transforms.Resize(size=256),
        transforms.CenterCrop(size=224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
}

from torchvision import datasets
from torch.utils.data import DataLoader
import os

# Datasets from folders

# tm_path = "C:\Users\uyjus\Documents\GitHub\STINTSY-FinalProject-Gender-Classification\train\female"

# data = {
#     'train_male':
#     datasets.ImageFolder(root='train/train_female', transform=image_transforms['train']),
#     'train_female':
#     datasets.ImageFolder(root='train/train_male', transform=image_transforms['train']),
#     'validation_male':
#     datasets.ImageFolder(root='valid/valid_male', transform=image_transforms['valid']),
#     'validation_female':
#     datasets.ImageFolder(root='valid/valid_female', transform=image_transforms['valid']),
# }

data = {
    'training':
    datasets.ImageFolder(root='train/train_data', transform=image_transforms['train']),
    'validation':
    datasets.ImageFolder(root='valid/valid_data', transform=image_transforms['valid']),
}

# tm_batch_size=len(data["train_male"])
# print (tm_batch_size)
# tf_batch_size=len(data["train_female"])
# print (tf_batch_size)
# vm_batch_size=len(data["validation_male"])
# print (vm_batch_size)
# vf_batch_size=len(data["validation_female"])
# print (vf_batch_size)

training_size=len(data["training"])
print (training_size)
validation_size=len(data["validation"])
print (validation_size)

# # Dataloader iterators, make sure to shuffle
# dataloaders = {
#     'train_male': DataLoader(data['train_male'], batch_size=tm_batch_size, shuffle=True),
#     'train_female': DataLoader(data['train_female'], batch_size=tf_batch_size, shuffle=True),
#     'validation_male': DataLoader(data['validation_male'], batch_size=vm_batch_size, shuffle=True),
#     'validation_female': DataLoader(data['validation_female'], batch_size=vf_batch_size, shuffle=True)
# }

dataloaders = {
    'training': DataLoader(data['training'], batch_size=tm_batch_size, shuffle=True),
    'validation': DataLoader(data['validation'], batch_size=vm_batch_size, shuffle=True),
}

train_datagen1 = tf.keras.preprocessing.image.ImageDataGenerator(horizontal_flip=True,
                                   width_shift_range=0.4,
                                   height_shift_range=0.4,
                                   zoom_range=0.3,
                                   rotation_range=20,
                                   rescale=1./255)

train_gen1 = train_datagen1.flow_from_directory('train/train_data/train',
                                              target_size=(249,249),
                                              batch_size=32,
                                              class_mode='binary')

test_datagen1 =  tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255)

test_gen1 = test_datagen1.flow_from_directory('valid/valid_data/valid',
                                              target_size=(249,249),
                                              batch_size=32,
                                              class_mode='binary')

47009
11649
Found 0 images belonging to 0 classes.
Found 0 images belonging to 0 classes.


### Data Visualization

In [47]:
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import torch 

### Pre-processing

In [46]:
training_40k = torch.utils.data.Subset(data['training'], list(range(0,40000)))
validation_10k =  torch.utils.data.Subset(data['validation'], list(range(0,10000)))


In [54]:
import numpy as np
from keras.models import Sequential
from keras.layers import Dense
from keras.utils import np_utils
from keras.layers import Dropout
from keras.layers import Flatten
from keras.layers.convolutional import Conv2D
from keras.layers.convolutional import MaxPooling2D
from tensorflow.keras.applications import Xception
from tensorflow.keras.layers import BatchNormalization

target_size = (249,249,3)
#base
model = Sequential()
model.add(Xception(include_top=False, pooling='avg', weights='imagenet', input_shape=target_size))
model.add(Flatten())
model.add(BatchNormalization())
#head
model.add(Dense(2048, activation='relu'))
model.add(BatchNormalization())
model.add(Dense(1024, activation='relu'))
model.add(BatchNormalization())

model.add(Dense(1, activation='sigmoid'))

model.layers[0].trainable = False

model.summary()
model.compile(optimizer= 'Adam', loss = 'binary_crossentropy', metrics= 'accuracy')

Model: "sequential_4"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 xception (Functional)       (None, 2048)              20861480  
                                                                 
 flatten_1 (Flatten)         (None, 2048)              0         
                                                                 
 batch_normalization_8 (Batc  (None, 2048)             8192      
 hNormalization)                                                 
                                                                 
 dense (Dense)               (None, 2048)              4196352   
                                                                 
 batch_normalization_9 (Batc  (None, 2048)             8192      
 hNormalization)                                                 
                                                                 
 dense_1 (Dense)             (None, 1024)             

In [89]:
import tensorflow as tf

checkpoint = tf.keras.callbacks.ModelCheckpoint('xception_v1_{epoch:02d}_{val_accuracy:.3f}.h5',
                                             save_best_only = True,
                                             monitor= 'val_accuracy',
                                             mode = 'max')


epochs = 12
batch_size=256
history1 = model.fit(train_gen1, epochs= epochs, validation_data= test_gen1,
                    steps_per_epoch= len(train_gen1)//batch_size,
                    validation_steps= len(test_gen1)//batch_size,
                    callbacks= [checkpoint]
                    )



ValueError: Asked to retrieve element 0, but the Sequence has length 0