## Yoga Pose Image Classifiers using Flux.jl

This exercise is intended to gain experience in convolutional neural networks for image recognition using the Flux library in Julia.  Below I will build a small zoo of various convolutional neural network architectures and at the end compare their performance at the task of recognizing yoga poses in images.   

In [1]:
using Images
using Flux: onehotbatch, onecold
using Base.Iterators: partition
using Statistics
using Flux, Flux.Optimise
using Flux: crossentropy, Momentum, ADAM
using DataFrames
using Augmentor
using MLDataPattern
using BSON

In [2]:
ENV["COLUMNS"] = 1000
ENV["ROWS"] = 1000

1000

### Data Loading & Augmentation

I did a cursory search online for others who might have built similar projects and found Anastasia Marchenkova's [blog post](https://www.amarchenkova.com/2018/03/25/convolutional-neural-network-yoga-poses/).  She constructed a dataset of approximately 700 images pulled from searching Google and Flickr for 10 yoga poses. I started out using her dataset, but soon wanted to add more poses and ended up building a new dataset from scratch. Given we both scraped Google Images several of the same images appear in my dataset.

The images in this dataset were collected by scraping Google Images (See the Web Scraper Notebook in this project folder).  I attempted to collect images of a single person performing the specified pose from a variety of positions and include a diversity of gender, age, body-type, race, and settings. While there is some diversity in the data it is likely not equal among all poses, and in all cases there is bias toward fit body-types and females. In some photos there are multiple people and some images showing computer-rendered humans are in the dataset. For each pose there is also different orientations represented (straight on, from side, etc), however the bias is towards one or a few common angles. The raw images come in a a variety of sizes and resolutions. I manually cleaned up the data to balance out the number of images per pose, remove low quality images, and separate into training and test sets. Some images appeared multiple times, and in some cases I deleted them and in others I introduced agumentations such as horizontal flips, cropping, and adding color filters. Overall the dataset is not great quality, but should suffice for this exercise.

I am new to both Julia and Deep Learning Frameworks, and really programming in general, so I had a lot of catching up to do in the course of this project.  The first stumbling block was loading the data into a structure that I could feed to a neural network.  Keras/TensorFlow and PyTorch have more developed ecosystems and include easy to use functions for getting image data from folders into a structure ready for feeding to a neural network.  Flux is still lacking a standard package for this specific task. Luckily it was fairly easy to learn how to roll my own using base Julia. My dataset is small enough to fit into memory so I did not worry about implementing lazy loading or multithreading.

In [3]:
TRAIN_DIR = "../Images/training_set"
TEST_DIR = "../Images/test_set"
CLASSES = 24
IMG_SIZE = (224, 224)
COL_CHANNELS = 3
CLASS_LABELS = readdir(TRAIN_DIR)

24-element Array{String,1}:
 "bridge"
 "cat"
 "chair"
 "childs"
 "cobraupdog"
 "cow"
 "cranecrow"
 "downwarddog"
 "halfmoon"
 "headstand"
 "mountain"
 "plank"
 "plough"
 "seatedforwardfold"
 "seatedmudra"
 "sideangle"
 "sideplank"
 "standingbend"
 "tree"
 "triangle"
 "warriorone"
 "warriortwo"
 "wheel"
 "yogicsquat"

In [4]:
""" This function converts an image into a valid array structure for feeding to a Flux neural network. """

getarray(X) = Float32.(permutedims(channelview(X), (2, 3, 1)))

getarray (generic function with 1 method)

In [5]:
function FindBadImages(dir, color_chans)

""" The training data might include some images not matching proper color channels, this loop will find them. """

    badims_arr =[] 
    
    for (root, dirs, files) in walkdir(dir)
        
        for file in files
            try
                img = getarray(load(joinpath(root, file)))
            catch
                push!(badims_arr, string(basename(root),"-",file))
                continue
            end
            img = getarray(load(joinpath(root, file)))    
            if size(img)[3] != color_chans
                push!(badims_arr, string(basename(root),"-",file))
            end 
        end
    end
    
    return badims_arr
end

FindBadImages (generic function with 1 method)

In [6]:
FindBadImages(TRAIN_DIR, 3)

Any[]

In [7]:
FindBadImages(TEST_DIR, 3)

Any[]

In [6]:
function LoadImagesFromDir(dir, img_size)
    
    """ Loads images and class labels from directory where folders are class names. 
        Outputs arrays with images and class names to be used for random tests. """
    
    img_arr = []
    class_arr =[] 
    
    for (root, dirs, files) in walkdir(dir)
        
        for file in files
            img = load(joinpath(root, file))
            img_rs = imresize(img, img_size)
            push!(img_arr, img_rs)
            push!(class_arr, basename(root))
        end
    end
    
    return img_arr, class_arr
end

LoadImagesFromDir (generic function with 1 method)

In [7]:
test_imgs, test_class = LoadImagesFromDir(TEST_DIR, IMG_SIZE);

In [9]:
function ImageDataGenFromDir(dir, img_size; batch_size=1, val_pct=0, augims=false, aug_pl=NoOp)
    
    """ Loads image data from directory where folders are class names. 
        Applys augmentation and splits into train and validation sets, optionally.
        Outputs dataset ready to be fed into Flux neural network."""
    
    img_arr = []
    class_arr =[] 
    
    for (root, dirs, files) in walkdir(dir)
        
        for file in files
            img = load(joinpath(root, file))
            img_rs = imresize(img, img_size)
            push!(img_arr, img_rs)
            push!(class_arr, basename(root))
            if augims 
                img_aug = imresize(augment(img_rs, aug_pl), img_size)
                push!(img_arr, img_aug)
                push!(class_arr, basename(root))
            end
        end
    end
    
    imgs = [getarray(img_arr[i]) for i in 1:length(img_arr)]
    classes=unique(class_arr)
    labels = onehotbatch([i for i in class_arr], classes)
    imgs, labels = shuffleobs((imgs, labels))
    
    if val_pct > 0
        (trainX, trainY), (valX, valY) = splitobs((imgs, labels), at = 1-val_pct)
        trainset = ([(cat(trainX[i]..., dims = 4), trainY[:,i]) for i in partition(1:length(trainX), batch_size)])
        valX = cat(valX[1:end]..., dims = 4)
        return trainset, valX, valY
    end
    
    dataset = ([(cat(imgs[i]..., dims = 4), labels[:,i]) for i in partition(1:length(imgs), batch_size)])
    return dataset
end

ImageDataGenFromDir (generic function with 1 method)

In [10]:
testset = ImageDataGenFromDir(TEST_DIR, IMG_SIZE, batch_size=128);

In [11]:
#Image augmentation operations:
aug_ops = ShearX(-3:3) |> 
        Zoom([0.9, 1, 1.1]) |> 
        FlipX() * NoOp()

trainset, valX, valY = ImageDataGenFromDir(TRAIN_DIR, IMG_SIZE, batch_size=128, val_pct=0.1, augims=true, aug_pl=aug_ops);

### Accuracy & Loss Functions

In [12]:
loss(m, x, y) = sum(crossentropy(m(x), y))
accuracy(m, x, y) = mean(onecold(m(x), 1:24) .== onecold(y, 1:24))    

accuracy (generic function with 1 method)

In [13]:
function accuracy_by_class(m, x, class_labels)
    """ Accuracy by class """
    class_guess = zeros(24)
    class_correct = zeros(24)
    class_total = zeros(24)
    for i in 1:size(x)[1]
        tpreds = m(x[i][1])
        tlab = x[i][2]
        for j = 1:size(tpreds)[2]
                pred_class = findmax(tpreds[:, j])[2]
                actual_class = findmax(tlab[:, j])[2]
                class_guess[pred_class] += 1
                if pred_class == actual_class
                    class_correct[pred_class] += 1
                end
                class_total[actual_class] += 1
        end
    end
    
    class_acc = class_correct ./ class_total
    return DataFrame([(class=class_labels[i], accuracy=class_acc[i]*100, correct_count=class_correct[i], guess_count=class_guess[i], act_count=class_total[i]) for i in 1:24])
end

accuracy_by_class (generic function with 1 method)

In [14]:
function random_test(m, imgs, classes, size, labels)
    """ Test model on a random sample from the given test images. """
    
    ids = rand(1:length(imgs), size)
    rand_test = getarray.(imgs[ids])
    rand_test = cat(rand_test..., dims = 4)
    rand_truth = classes[ids]
    preds_test = m(rand_test)
    col_headers = ["Pose"]
    for i in 1:size
        push!(col_headers, string("test","_",i))
    end
    return DataFrame([labels round.(preds_test.*100, digits=2)], col_headers), ids
end

random_test (generic function with 1 method)

In [146]:
function top_n_loss(m, imgs, classes, N, labels)
    """ Identify images resulting in worst loss scores. """
    
    loss_scores = []
    for i in 1:length(imgs)
        xi = cat(getarray(imgs[i]), dims=4)
        yi = Flux.onehot(classes[i], labels)
        lossi = sum(crossentropy(m(xi), yi))
        push!(loss_scores, (idx=i, loss=lossi))
    end
    
    loss_df = DataFrame(loss_scores)
    top_n_ids = sort(loss_df, :loss, rev=true)[1:N, :idx]
    return top_n_ids
end

top_n_loss (generic function with 2 methods)

In [152]:
function predict_pose(m, imgs, labels)
    
    predicted_poses = []
    for i in 1:length(imgs)
        xi = cat(getarray(imgs[i]), dims=4)
        predi = m(xi)
        classi = findmax(predi)[2]
        push!(predicted_poses, labels[classi])
    end
    
    return predicted_poses
end

predict_pose (generic function with 2 methods)

In [162]:
yan_top5 = top_n_loss(YogAlexNet, test_imgs, test_class, 10, CLASS_LABELS)
[i for i in test_imgs[yan_top5]]

In [163]:
test_class[yan_top5]

10-element Array{Any,1}:
 "cranecrow"
 "downwarddog"
 "seatedmudra"
 "standingbend"
 "warriorone"
 "sideangle"
 "yogicsquat"
 "cobraupdog"
 "wheel"
 "yogicsquat"

In [165]:
predict_pose(YogAlexNet, test_imgs[yan_top5], CLASS_LABELS)

10-element Array{Any,1}:
 "downwarddog"
 "sideangle"
 "yogicsquat"
 "halfmoon"
 "cow"
 "triangle"
 "wheel"
 "chair"
 "seatedforwardfold"
 "plough"

In [167]:
predict_pose(Yoga_ResNet18, test_imgs[yan_top5], CLASS_LABELS)

10-element Array{Any,1}:
 "cranecrow"
 "triangle"
 "yogicsquat"
 "halfmoon"
 "warriortwo"
 "triangle"
 "seatedforwardfold"
 "cobraupdog"
 "downwarddog"
 "seatedforwardfold"

In [161]:
yrn_top5 = top_n_loss(Yoga_ResNet18, test_imgs, test_class, 10, CLASS_LABELS)
[i for i in test_imgs[yrn_top5]]

In [164]:
test_class[yrn_top5]

10-element Array{Any,1}:
 "chair"
 "seatedmudra"
 "wheel"
 "chair"
 "triangle"
 "sideplank"
 "seatedmudra"
 "downwarddog"
 "downwarddog"
 "plank"

In [166]:
predict_pose(YogAlexNet, test_imgs[yrn_top5], CLASS_LABELS)

10-element Array{Any,1}:
 "mountain"
 "seatedmudra"
 "downwarddog"
 "mountain"
 "sideangle"
 "sideplank"
 "childs"
 "sideangle"
 "cranecrow"
 "sideplank"

In [168]:
predict_pose(Yoga_ResNet18, test_imgs[yrn_top5], CLASS_LABELS)

10-element Array{Any,1}:
 "mountain"
 "headstand"
 "seatedforwardfold"
 "mountain"
 "sideangle"
 "cranecrow"
 "plough"
 "triangle"
 "cranecrow"
 "halfmoon"

### YogAlexNet

[AlexNet](http://vision.stanford.edu/teaching/cs231b_spring1415/slides/alexnet_tugce_kyunghee.pdf) was one of the early breakthroughs in Convolutional Neural Network (CNN) image classifiers, winning the ImageNet challenge in 2012.  It's authors concluded that the depth of the network was very important for the model's performance, and that GPU's were making it tractable to compute those deep networks. My implementation here includes 5 convolutional layers with filter sizes 11x11, 5x5 and 3x3 leading into 3 fully connected layers outputting pose predictions.

In [16]:

YogAlexNet = Chain(                                                 # Weighted Layers:
    Conv((11,11), 3=>96, relu, stride=4, pad=SamePad()),            # Conv1
    MaxPool((3,3), stride=2),
    
    Conv((5,5), 96=>256, relu, stride=1, pad=SamePad()),            # Conv2
    MaxPool((3,3), stride=2),
    
    Conv((3,3), 256=>384, relu, stride=1, pad=SamePad()),           # Conv3
    MaxPool((3,3), stride=2),
    Conv((3,3), 384=>384, relu, stride=1, pad=SamePad()),           # Conv4
    Conv((3,3), 384=>256, relu, stride=1, pad=SamePad()),           # Conv5
    MaxPool((3,3), stride=2),
    
    x -> reshape(x, :, size(x, 4)),
    Dense(1024, 4096),                                              # FC1
    Dense(4096, 4096),                                              # FC2
    Dense(4096, 24),                                                # FC3
    softmax
)

yan_opt = Momentum()

Momentum(0.01, 0.9, IdDict{Any,Any}())

### Training the YogAlexNet

In [18]:
epochs = 90

for epoch = 1:epochs
    for d in trainset
       
        gs = gradient(params(YogAlexNet)) do
            l = loss(YogAlexNet, d...)
        end
        update!(yan_opt, params(YogAlexNet), gs)
    end
    @show accuracy(YogAlexNet, valX, valY), epoch
end

(accuracy(YogAlexNet, valX, valY), epoch) = (0.668488160291439, 1)
(accuracy(YogAlexNet, valX, valY), epoch) = (0.819672131147541, 2)
(accuracy(YogAlexNet, valX, valY), epoch) = (0.7978142076502732, 3)
(accuracy(YogAlexNet, valX, valY), epoch) = (0.8652094717668488, 4)
(accuracy(YogAlexNet, valX, valY), epoch) = (0.8269581056466302, 5)
(accuracy(YogAlexNet, valX, valY), epoch) = (0.8351548269581056, 6)
(accuracy(YogAlexNet, valX, valY), epoch) = (0.8724954462659381, 7)
(accuracy(YogAlexNet, valX, valY), epoch) = (0.8852459016393442, 8)
(accuracy(YogAlexNet, valX, valY), epoch) = (0.8943533697632058, 9)
(accuracy(YogAlexNet, valX, valY), epoch) = (0.9171220400728597, 10)
(accuracy(YogAlexNet, valX, valY), epoch) = (0.9143897996357013, 11)
(accuracy(YogAlexNet, valX, valY), epoch) = (0.9089253187613844, 12)
(accuracy(YogAlexNet, valX, valY), epoch) = (0.9116575591985429, 13)
(accuracy(YogAlexNet, valX, valY), epoch) = (0.9107468123861566, 14)
(accuracy(YogAlexNet, valX, valY), epoch) = (

In [19]:
using BSON: @save

@save "YogAlexNet.bson" YogAlexNet

In [54]:
using BSON: @load

@load "YogAlexNet.bson" YogAlexNet

### Testing the YogAlexNet

In [48]:
scores, rand_ids = random_test(YogAlexNet, test_imgs, test_class, 15, CLASS_LABELS)
scores

Unnamed: 0_level_0,Pose,test_1,test_2,test_3,test_4,test_5,test_6,test_7,test_8,test_9,test_10,test_11,test_12,test_13,test_14,test_15
Unnamed: 0_level_1,Any,Any,Any,Any,Any,Any,Any,Any,Any,Any,Any,Any,Any,Any,Any,Any
1,bridge,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,cat,0.0,0.0,99.95,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,chair,0.0,0.0,0.0,100.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,childs,100.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,cobraupdog,0.0,100.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,cow,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,cranecrow,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,downwarddog,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0
9,halfmoon,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
10,headstand,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,100.0,0.0


In [49]:
[test_imgs[i] for i in rand_ids]

In [50]:
accuracy(YogAlexNet, testset[1]...)*100

93.75

In [20]:
accuracy_by_class(YogAlexNet, testset, CLASS_LABELS)

Unnamed: 0_level_0,class,accuracy,correct_count,guess_count,act_count
Unnamed: 0_level_1,String,Float64,Float64,Float64,Float64
1,bridge,92.0,46.0,51.0,50.0
2,cat,92.0,46.0,48.0,50.0
3,chair,92.0,46.0,50.0,50.0
4,childs,92.0,46.0,55.0,50.0
5,cobraupdog,88.0,44.0,50.0,50.0
6,cow,98.0,49.0,53.0,50.0
7,cranecrow,96.0,48.0,56.0,50.0
8,downwarddog,90.0,45.0,48.0,50.0
9,halfmoon,96.0,48.0,52.0,50.0
10,headstand,90.0,45.0,51.0,50.0


94% accuracy is better than I was expecting! That is a good result for the first network.  It took about 24 hours to train for 90 epochs on a signle CPU thread. I will need to find another way to train the next networks because I could be waiting a long time. 

### Yoga VGG16 Architecture

AlexNet proved deeper networks are more powerful, so how much deeper can we go? A group of researchers from the Visual Geometry Group at the University of Oxford pushed the boundaries a bit further in their [paper](https://arxiv.org/abs/1409.1556) presenting a general architecture in varied levels of depth.  This architecture makes use of multiple layers of 3x3 filters to replace the larger 5x5 and 11x11 filters seen in AlexNet.  This produces a receptive field with similar size to the larger kernel filters but with far fewer trainable parameters, improving training time and enabling deeper networks.

The paper proposed many different configurations, and the one I chose to implement was labeled D in the paper, consisting of 16 weight layers and using 3x3 filters in every convolution layer.

In [119]:

Yoga_VGG16 = Chain(
    Conv((3,3), 3=>64, relu, pad=(1, 1), stride=(1, 1)),
    BatchNorm(64),
    Conv((3, 3), 64=>64, relu, pad=(1, 1), stride=(1, 1)),
    BatchNorm(64),
    MaxPool((2,2)),
    
    Conv((3,3), 64=>128, relu, pad=(1, 1), stride=(1, 1)),
    BatchNorm(128),
    Conv((3, 3), 128=>128, relu, pad=(1, 1), stride=(1, 1)),
    BatchNorm(128),
    MaxPool((2,2)),
    
    Conv((3,3), 128=>256, relu, pad=(1, 1), stride=(1, 1)),
    BatchNorm(256),
    Conv((3, 3), 256=>256, relu, pad=(1, 1), stride=(1, 1)),
    BatchNorm(256),
    MaxPool((2,2)),
    
    Conv((3,3), 256=>512, relu, pad=(1, 1), stride=(1, 1)),
    BatchNorm(512),
    Conv((3, 3), 512=>512, relu, pad=(1, 1), stride=(1, 1)),
    BatchNorm(512),
    Conv((3, 3), 512=>512, relu, pad=(1, 1), stride=(1, 1)),
    BatchNorm(512),
    MaxPool((2,2)),
    
    Conv((3,3), 512=>512, relu, pad=(1, 1), stride=(1, 1)),
    BatchNorm(512),
    Conv((3, 3), 512=>512, relu, pad=(1, 1), stride=(1, 1)),
    BatchNorm(512),
    Conv((3, 3), 512=>512, relu, pad=(1, 1), stride=(1, 1)),
    BatchNorm(512),
    MaxPool((2,2)),
    
    x -> reshape(x, :, size(x, 4)),
    Dense(25088, 4096),
    Dropout(0.5),
    Dense(4096, 4096),
    Dropout(0.5),
    Dense(4096, 24),
    softmax)

vgg_opt = Momentum(0.001)

Momentum(0.001, 0.9, IdDict{Any,Any}())

In [28]:
using BSON: @load

@load "Yoga_VGG16.bson" Yoga_VGG16

vgg_opt = Momentum(0.001)

vgg_accuracy (generic function with 1 method)

### Training the Yoga VGG16

In [None]:
epochs = 5

for epoch = 1:epochs
    for d in trainset
       
        gs = gradient(params(Yoga_VGG16)) do
            l = loss(Yoga_VGG16, d...)
        end
        update!(vgg_opt, params(Yoga_VGG16), gs)
    end
    @show accuracy(valX, valY), epoch
end

In [34]:
using BSON: @save

@save "Yoga_VGG16.bson" Yoga_VGG16

### Testing the Yoga VGG16

In [59]:
scores, rand_ids = random_test(Yoga_VGG16, test_imgs, test_class, 15, CLASS_LABELS)

In [None]:
[test_imgs[i] for i in rand_ids]

In [113]:
accuracy(Yoga_VGG16, testset[1]...)*100

68.0

In [115]:
accuracy_by_class(Yoga_VGG16, testset, CLASS_LABELS)

Unnamed: 0_level_0,class,accuracy,correct_count,guess_count,act_count
Unnamed: 0_level_1,String,Float64,Float64,Float64,Float64
1,bridge,53.3333,8.0,13.0,15.0
2,childs,66.6667,10.0,16.0,15.0
3,downwarddog,93.3333,14.0,15.0,15.0
4,mountain,40.0,6.0,6.0,15.0
5,plank,86.6667,13.0,16.0,15.0
6,seatedforwardbend,73.3333,11.0,15.0,15.0
7,tree,93.3333,14.0,23.0,15.0
8,trianglepose,60.0,9.0,10.0,15.0
9,warrior1,73.3333,11.0,21.0,15.0
10,warrior2,73.3333,11.0,15.0,15.0


### Yoga ResNet Architecture

Next on the list is ResNet.  ResNet introduced a new technique to address an issue seen in experiments where adding more layers to a model resulted in less accuracy.  As networks went deeper they had to deal with vanishing gradients, that is during back propagation the value of the partial derivatives can become insignificant when propagaded through many layers.

ResNet's innovation came in the form of implementing Identity Shortcut and Projection Shortcut connections. Beyond that the other building blocks resemble VGG architecture.

I referenced Julia Discourse user OTapio's [post](https://discourse.julialang.org/t/a-implementation-of-resnet-18-uses-lot-of-gpu-memory/36389) to get a grasp on how to implement the ResNet in Flux.  To make things more interesting and test my understanding I extended the number of layers from 18 to 34.

![resnet_block](../reference_material/resnet_block.png)

![resnet_block2](../reference_material/resnet-blockw1x1conv.svg)

In [27]:
""" Identity Layer takes input through 2 3x3 conv layers, maintaining n number of filters"""
identity_layer(n) = Chain(
    Conv((3,3), n=>n, pad=(1,1), stride=(1,1)),
    BatchNorm(n, relu),
    Conv((3,3), n=>n, pad=(1,1), stride=(1,1)),
    BatchNorm(n, relu)
    )

""" Identity Layer takes input through 2 3x3 conv layers, outputting 2*n number of filters"""
convol_layer(n) = Chain(
    Conv((3,3), n=>2*n, pad=(1,1), stride=(2,2)),
    BatchNorm(2*n, relu),
    Conv((3,3), 2*n=>2*n, pad=(1,1), stride=(1,1)),
    BatchNorm(2*n, relu)
    )
""" This is used for downsampling when making shortcut connections between layers with equal
    feature map depth."""
simple_convol(n) = Chain(
    Conv((1,1), n=>n, pad=(1,1), stride=(2,2)),
    BatchNorm(n, relu)
    )
""" This is used for downsampling when making shortcut connections between layers with different
    feature map depth. """
m_filter(n) = Chain(
    Conv((3,3), n=>2*n, pad=(1,1), stride=(2,2)),
    BatchNorm(2*n, relu)
    )

""" Container for filter to adjust inputs with different dimensions. """
struct DifDimShortcut
    filter::Chain
end

""" Sets the number of channels for combination of two blocks. """
DifDimShortcut(n) = DifDimShortcut(m_filter(n))

""" Defining function for passing two inputs to DifDimShortcut to Flux.SkipConnection. """
function (sc::DifDimShortcut)(x,y)
    z = sc.filter(y)
    return x + z
end

DifDimShortcut

In [28]:
Yoga_ResNet18 = Chain(
    
    Conv((7,7), 3=>64, pad=(3, 3), stride=(2, 2)),
    BatchNorm(64, relu),
    MaxPool((3,3), pad = (1,1), stride=(2,2)),
    
    SkipConnection(identity_layer(64), +),
    SkipConnection(identity_layer(64), +),
    
    SkipConnection(convol_layer(64), DifDimShortcut(64)),
    
    SkipConnection(identity_layer(128), +),
    
    SkipConnection(convol_layer(128), DifDimShortcut(128)),
    
    SkipConnection(identity_layer(256), +),
    
    SkipConnection(convol_layer(256), DifDimShortcut(256)),
    
    SkipConnection(identity_layer(512), +),
    MeanPool((7,7)),
    
    x -> reshape(x, :, size(x, 4)),
    Dense(512, 24),
    softmax
    )
    


yrn_18_opt = Momentum(0.01)


Momentum(0.01, 0.9, IdDict{Any,Any}())

In [None]:
using BSON: @load

@load "Yoga_ResNet18.bson" Yoga_ResNet18


yrn_18_opt = Momentum(0.01)


In [29]:
Yoga_ResNet34 = Chain(
    
    Conv((7,7), 3=>64, pad=(3, 3), stride=(2, 2)),
    BatchNorm(64, relu),
    MaxPool((3,3), pad = (1,1), stride=(2,2)),
    
    SkipConnection(identity_layer(64), +),
    SkipConnection(identity_layer(64), +),
    SkipConnection(identity_layer(64), +),
    SkipConnection(convol_layer(64), DifDimShortcut(64)),
    
    SkipConnection(identity_layer(128), +),
    SkipConnection(identity_layer(128), +),
    SkipConnection(identity_layer(128), +),
    SkipConnection(convol_layer(128), DifDimShortcut(128)),
    
    SkipConnection(identity_layer(256), +),
    SkipConnection(identity_layer(256), +),
    SkipConnection(identity_layer(256), +),
    SkipConnection(identity_layer(256), +),
    SkipConnection(identity_layer(256), +),
    SkipConnection(convol_layer(256), DifDimShortcut(256)),
    
    SkipConnection(identity_layer(512), +),
    SkipConnection(identity_layer(512), +),
    MeanPool((7,7)),
    
    x -> reshape(x, :, size(x, 4)),
    Dense(512, 24),
    softmax
    
    )
    


yrn_34_opt = Momentum(0.01)


LoadError: UndefVarError: Combinator not defined

In [28]:
using BSON: @load

@load "Yoga_ResNet34.bson" Yoga_ResNet34


yrn_34_opt = Momentum()


vgg_accuracy (generic function with 1 method)

### Training the Yoga ResNet

In [30]:
epochs = 25

for epoch = 1:epochs
    for d in trainset
       
        gs = gradient(params(Yoga_ResNet18)) do
            l = loss(Yoga_ResNet18, d...)
        end
        update!(yrn_18_opt, params(Yoga_ResNet18), gs)
    end
    @show accuracy(Yoga_ResNet18, valX, valY), epoch
end

(accuracy(Yoga_ResNet18, valX, valY), epoch) = (0.25045537340619306, 1)
(accuracy(Yoga_ResNet18, valX, valY), epoch) = (0.3561020036429873, 2)
(accuracy(Yoga_ResNet18, valX, valY), epoch) = (0.29143897996357016, 3)
(accuracy(Yoga_ResNet18, valX, valY), epoch) = (0.6275045537340619, 4)
(accuracy(Yoga_ResNet18, valX, valY), epoch) = (0.6193078324225865, 5)
(accuracy(Yoga_ResNet18, valX, valY), epoch) = (0.7987249544626593, 6)
(accuracy(Yoga_ResNet18, valX, valY), epoch) = (0.7632058287795993, 7)
(accuracy(Yoga_ResNet18, valX, valY), epoch) = (0.9134790528233151, 8)
(accuracy(Yoga_ResNet18, valX, valY), epoch) = (0.9462659380692168, 9)
(accuracy(Yoga_ResNet18, valX, valY), epoch) = (0.953551912568306, 10)
(accuracy(Yoga_ResNet18, valX, valY), epoch) = (0.9526411657559198, 11)
(accuracy(Yoga_ResNet18, valX, valY), epoch) = (0.9544626593806922, 12)
(accuracy(Yoga_ResNet18, valX, valY), epoch) = (0.9544626593806922, 13)
(accuracy(Yoga_ResNet18, valX, valY), epoch) = (0.953551912568306, 14)
(

In [31]:
using BSON: @save

@save "Yoga_ResNet18.bson" Yoga_ResNet18


In [101]:
epochs = 5

for epoch = 1:epochs
    for d in train
       
        gs = gradient(params(Yoga_ResNet34)) do
            l = yrn_34_loss(d...)
        end
        update!(yrn_34_opt, params(Yoga_ResNet34), gs)
    end
    @show yrn_34_accuracy(valX, valY), epoch
end

(yrn_34_accuracy(valX, valY), epoch) = (0.86, 1)
(yrn_34_accuracy(valX, valY), epoch) = (0.84, 2)
(yrn_34_accuracy(valX, valY), epoch) = (0.83, 3)
(yrn_34_accuracy(valX, valY), epoch) = (0.85, 4)
(yrn_34_accuracy(valX, valY), epoch) = (0.84, 5)


In [96]:
using BSON: @save

@save "Yoga_ResNet34.bson" Yoga_ResNet34

### Testing the Yoga ResNet18

In [32]:
scores, rand_ids = random_test(Yoga_ResNet18, test_imgs, test_class, 15, CLASS_LABELS)

(24×16 DataFrame
│ Row │ Pose              │ test_1 │ test_2 │ test_3 │ test_4 │ test_5 │ test_6 │ test_7 │ test_8 │ test_9 │ test_10 │ test_11 │ test_12 │ test_13 │ test_14 │ test_15 │
│     │ [90mAny[39m               │ [90mAny[39m    │ [90mAny[39m    │ [90mAny[39m    │ [90mAny[39m    │ [90mAny[39m    │ [90mAny[39m    │ [90mAny[39m    │ [90mAny[39m    │ [90mAny[39m    │ [90mAny[39m     │ [90mAny[39m     │ [90mAny[39m     │ [90mAny[39m     │ [90mAny[39m     │ [90mAny[39m     │
├─────┼───────────────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ 1   │ bridge            │ 0.0    │ 0.0    │ 0.0    │ 0.0    │ 0.0    │ 0.0    │ 0.0    │ 0.0    │ 0.0    │ 0.0     │ 0.0     │ 0.0     │ 0.0     │ 0.0     │ 0.0     │
│ 2   │ cat               │ 0.0    │ 0.0    │ 0.0    │ 0.0    │ 0.0    │ 0.0    │ 0.01   │ 0.0    │ 0.0    │ 0.02    │ 0.0     │ 0.0     │ 0.0     

In [33]:
scores

Unnamed: 0_level_0,Pose,test_1,test_2,test_3,test_4,test_5,test_6,test_7,test_8,test_9,test_10,test_11,test_12,test_13,test_14,test_15
Unnamed: 0_level_1,Any,Any,Any,Any,Any,Any,Any,Any,Any,Any,Any,Any,Any,Any,Any,Any
1,bridge,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,cat,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.02,0.0
3,chair,0.01,0.13,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.01
4,childs,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,cobraupdog,0.0,0.02,0.0,99.99,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0
6,cow,0.0,0.02,0.0,0.0,0.02,0.0,99.97,0.0,0.0,99.97,0.0,0.0,0.0,0.0,0.0
7,cranecrow,0.0,0.0,0.0,0.0,0.07,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,downwarddog,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.09,0.0,0.0,0.0,0.0,0.0,0.0
9,halfmoon,0.0,0.02,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
10,headstand,0.0,89.05,0.0,0.0,0.0,0.0,0.0,99.89,5.9,0.0,0.0,0.0,0.0,0.0,0.0


In [34]:
[test_imgs[i] for i in rand_ids]

In [35]:
accuracy(Yoga_ResNet18, testset[1]...)*100

89.0625

In [36]:
accuracy_by_class(Yoga_ResNet18, testset, CLASS_LABELS)

Unnamed: 0_level_0,class,accuracy,correct_count,guess_count,act_count
Unnamed: 0_level_1,String,Float64,Float64,Float64,Float64
1,bridge,90.0,45.0,52.0,50.0
2,cat,92.0,46.0,47.0,50.0
3,chair,86.0,43.0,47.0,50.0
4,childs,88.0,44.0,51.0,50.0
5,cobraupdog,84.0,42.0,48.0,50.0
6,cow,96.0,48.0,52.0,50.0
7,cranecrow,98.0,49.0,56.0,50.0
8,downwarddog,96.0,48.0,50.0,50.0
9,halfmoon,94.0,47.0,54.0,50.0
10,headstand,90.0,45.0,47.0,50.0


### Testing the Yoga ResNet34

In [111]:
ids = rand(1:length(test_imgs), RAND_TEST_SIZE)
[test_imgs[i] for i in ids]

In [112]:
rand_test = getarray.(test_imgs[ids])
rand_test = cat(rand_test..., dims = 4)
rand_truth = test_class[ids]
preds_test = Yoga_ResNet34(rand_test)
DataFrame([class_labels round.(preds_test)], ["Pose", "Test1", "Test2", "Test3", "Test4", "Test5", "Test6", "Test7", "Test8", "Test9", "Test10"])

Unnamed: 0_level_0,Pose,Test1,Test2,Test3,Test4,Test5,Test6,Test7,Test8,Test9,Test10
Unnamed: 0_level_1,Any,Any,Any,Any,Any,Any,Any,Any,Any,Any,Any
1,bridge,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
2,childs,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,downwarddog,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,mountain,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
5,plank,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0
6,seatedforwardbend,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
7,tree,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,trianglepose,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
9,warrior1,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
10,warrior2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [109]:
#Accuracy on Test Set:
yrn_34_accuracy(test[1]...)*100

50.0

In [110]:
#Accuracy by Class
class_correct = zeros(10)
class_total = zeros(10)
for i in 1:size(test)[1]
    tpreds = Yoga_ResNet34(test[i][1])
    tlab = test[i][2]
    for j = 1:size(tpreds)[2]
            pred_class = findmax(tpreds[:, j])[2]
            actual_class = findmax(tlab[:, j])[2]
            if pred_class == actual_class
                class_correct[pred_class] += 1
            end
            class_total[actual_class] += 1
    end
end

class_acc = class_correct ./ class_total
DataFrame([(class=class_labels[i], accuracy=class_acc[i]*100, count=class_total[i]) for i in 1:10])

Unnamed: 0_level_0,class,accuracy,count
Unnamed: 0_level_1,String,Float64,Float64
1,bridge,46.6667,15.0
2,childs,53.3333,15.0
3,downwarddog,53.3333,15.0
4,mountain,53.3333,15.0
5,plank,80.0,15.0
6,seatedforwardbend,20.0,15.0
7,tree,86.6667,15.0
8,trianglepose,73.3333,15.0
9,warrior1,60.0,15.0
10,warrior2,80.0,15.0


### Yoga Inception Architecture

To conclude this tour through famous CNN architectures I will implement a GoogLeNet style Inception network.  Incpetion blocks are a group of different network layers that try several different filter sizes on the same input then concatenates all the outputs from the individual layers into a tensor that gets passed to the next layer.

I built upon the approach taken in the ResNet network above where function constructors are used to build differnt layer chains. I needed additional help from the Julia Discourse, found [here](https://discourse.julialang.org/t/flux-params-does-not-recognize-parameters-with-x-layer-x-syntax/46832/4), to get the concatenation operation I use below. Additionally I referenced this [tutorial](https://www.youtube.com/watch?v=uQc4Fs7yx5I) on how to implement the inception block in PyTorch and adapted what I felt made sense to Julia here.

![inception_block](../reference_material/inception_block.png)
![inception_block](../reference_material/inception_blockv1.png)

In [187]:
struct InceptionBlock
    path1::Chain
    path2::Chain
    path3::Chain
    path4::Chain
end

Flux.@functor InceptionBlock

function InceptionBlock(chn_in, out_1, red_3, out_3, red_5, out_5, pool_out) 
    path1 = Chain(
                    Conv((1,1), chn_in=>out_1),
                    BatchNorm(out_1, relu)
                )
    
    path2 = Chain(
                    Conv((1,1), chn_in=>red_3),
                    BatchNorm(red_3, relu),
                    Conv((3,3), red_3=>out_3, pad=(1,1)),
                    BatchNorm(out_3, relu)
                )
    
    path3 = Chain(
                    Conv((1,1), chn_in=>red_5),
                    BatchNorm(red_5, relu),
                    Conv((5,5), red_5=>out_5, pad=(2,2)),
                    BatchNorm(out_5, relu)
                )
    
    path4 = Chain(
                    MaxPool((3,3), stride=(1,1), pad=(1,1)),
                    Conv((1,1), chn_in=>pool_out, pad=SamePad()),
                    BatchNorm(pool_out, relu)
                )
    
    InceptionBlock(path1, path2, path3, path4)
end

function (m::InceptionBlock)(x)
    cat(m.path1(x), m.path2(x), m.path3(x), m.path4(x), dims = 3)
end

In [188]:
YogLeNetv1 = Chain(
    #Input size = 3x224x224
    
    # Stage 1:
    Conv((7,7), 3=>64, pad=(3, 3), stride=(2, 2)),
    BatchNorm(64, relu),
    MaxPool((3,3), pad = (1,1), stride=(2,2)),
    
    # Stage 2:
    Conv((1,1), 64=>64, relu),
    Conv((3,3), 64=>192, relu),
    MaxPool((3,3), pad=(2,2), stride=(2,2)),
    
    #Output size = 192x28x28
    
    # Stage 3:
    InceptionBlock(192, 64, 96, 128, 16, 32, 32),
    InceptionBlock(256, 128, 128, 192, 32, 96, 64),
    MaxPool((3,3)),
    
    #Output size = 480x14x14
    
    # Stage 4:
    InceptionBlock(480, 192, 96, 208, 16, 48, 64),
    InceptionBlock(512, 160, 112, 224, 24, 64, 64),
    InceptionBlock(512, 128, 128, 256, 24, 64, 64),
    InceptionBlock(512, 112, 144, 288, 32, 64, 64),
    InceptionBlock(528, 256, 160, 320, 32, 128, 128),
    MaxPool((3,3), pad=(2,2), stride=(2,2)),
    
    #output size = 832x7x7
    
    # Stage 5:
    InceptionBlock(832, 256, 160, 320, 32, 128, 128),
    InceptionBlock(832, 384, 192, 384, 48, 128, 128),
    GlobalMeanPool(),
    x -> reshape(x, :, size(x, 4)),
    
    #Output size = 1024x1x1
    Dropout(0.4),
    Dense(1024,24),
    softmax
    )
    

ygn_v1_opt = Momentum()


Momentum(0.01, 0.9, IdDict{Any,Any}())

In [186]:
v = Chain(
                    MaxPool((3,3), stride=(1,1), pad=(1,1)),
                    Conv((1,1), 192=>64, pad=SamePad()),
                    #BatchNorm(pool_out, relu)
                )

Flux.outdims(v, (28, 28))

(28, 28)

### Training the YogLeNet

In [189]:
epochs = 30

for epoch = 1:epochs
    for d in trainset
       
        gs = gradient(params(YogLeNetv1)) do
            l = loss(YogLeNetv1, d...)
        end
        update!(ygn_v1_opt, params(YogLeNetv1), gs)
    end
    @show accuracy(YogLeNetv1, valX, valY), epoch
end

(accuracy(YogLeNetv1, valX, valY), epoch) = (0.1703096539162113, 1)


LoadError: InterruptException:

In [23]:
using BSON: @save

@save "YogaLeNetv1.bson" YogLeNetv1

### Testing the YogLeNet

In [49]:
scores, rand_ids = random_test(YogLeNetv1, test_imgs, test_class, 15, CLASS_LABELS)

(10×16 DataFrame
│ Row │ Pose              │ test_1 │ test_2 │ test_3 │ test_4 │ test_5 │ test_6 │ test_7 │ test_8 │ test_9 │ test_10 │ test_11 │ test_12 │ test_13 │ test_14 │ test_15 │
│     │ [90mAny[39m               │ [90mAny[39m    │ [90mAny[39m    │ [90mAny[39m    │ [90mAny[39m    │ [90mAny[39m    │ [90mAny[39m    │ [90mAny[39m    │ [90mAny[39m    │ [90mAny[39m    │ [90mAny[39m     │ [90mAny[39m     │ [90mAny[39m     │ [90mAny[39m     │ [90mAny[39m     │ [90mAny[39m     │
├─────┼───────────────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ 1   │ bridge            │ 0.01   │ 0.09   │ 0.16   │ 0.63   │ 0.88   │ 15.52  │ 10.41  │ 0.11   │ 0.0    │ 0.02    │ 0.06    │ 0.1     │ 0.0     │ 0.11    │ 14.19   │
│ 2   │ childs            │ 0.0    │ 99.07  │ 0.0    │ 2.95   │ 0.04   │ 0.0    │ 20.79  │ 0.01   │ 0.0    │ 0.0     │ 0.0     │ 0.0     │ 0.0     

In [50]:
[test_imgs[i] for i in rand_ids]

In [51]:
accuracy(YogLeNetv1, testset[1]...)*100

62.5

In [52]:
accuracy_by_class(YogLeNetv1, testset, CLASS_LABELS)

Unnamed: 0_level_0,class,accuracy,correct_count,guess_count,act_count
Unnamed: 0_level_1,String,Float64,Float64,Float64,Float64
1,bridge,53.8462,7.0,9.0,13.0
2,childs,40.0,6.0,10.0,15.0
3,downwarddog,46.6667,7.0,10.0,15.0
4,mountain,53.3333,8.0,9.0,15.0
5,plank,73.3333,11.0,16.0,15.0
6,seatedforwardbend,46.6667,7.0,33.0,15.0
7,tree,66.6667,10.0,14.0,15.0
8,trianglepose,73.3333,11.0,14.0,15.0
9,warrior1,40.0,6.0,15.0,15.0
10,warrior2,60.0,9.0,18.0,15.0


### Next Steps:

* Add narrative, notes, and comments
* Create helper functions for plotting training, plotting predictions
* Model Comparison summary
* Deploy best image classifer in a web app
* Add more poses/classes to train and test sets
* Add more images to training set
* Pose Estimation

## Old code Snipetts

In [8]:
train_imgs = []
train_class = []

#Image augmentation operations:
aug_pl = ShearX(-5:5) * ShearY(-5:5) |> 
        Zoom([0.85, 0.9, 1, 1.1, 1.2, 1.3]) |> 
        Rotate(-10:10) |> FlipX() * NoOp()

for (root, dirs, files) in walkdir(TRAIN_DIR)
    
    for file in files
        img = load(joinpath(root, file))
        img_rs = imresize(img, IMG_SIZE)
        img_aug = imresize(augment(img_rs, aug_pl), IMG_SIZE)
        push!(train_imgs, img_rs)
        push!(train_class, basename(root))
        push!(train_imgs, img_aug)
        push!(train_class, basename(root))
    end
    
end

imgs = [getarray(train_imgs[i]) for i in 1:length(train_imgs)]
train_labels = onehotbatch([i for i in train_class],unique(train_class));

In [10]:
train =  ([(cat(img_shuffled[i]..., dims = 4), labels_shuffled[:,i]) for i in partition(1:900, 100)])
valset = 901:1000
valX = cat(img_shuffled[valset]..., dims = 4)
valY = labels_shuffled[:, valset];