## DCGAN TF 2.0 on Sagemaker
### There are many Tensorflow Tutorials, and there are many guides on how to use Sagemaker.
### However, there are very few guides on how to implement Tensorflow 2.0 in Sagemaker!
This notebook is meant to help those trying to implement their Tensorflow 2.0 neural nets in Sagemaker.

https://github.com/awslabs/amazon-sagemaker-examples/blob/master/advanced_functionality/tensorflow_bring_your_own/tensorflow_bring_your_own.ipynb

The above link from AWS gives a thorough guide on how to deploy one's custom algorithm. The code in the link served as a rough guideline when making the code associated with this project. However, many changes were made. 

First, I utilized a DCGAN with Tensorflow 2.0. AWS Sagemaker utilizes docker for deploying and training neural networks. For convenience, AWS has official docker images to deploy Tensorflow algorithms, however, the most recent release is compatible with Tensorflow version 1.14.0. This means that I had to find another image and get all the relevant dependencies.

The image used is tensorflow/tensorflow:nightly-custom-op-gpu-ubuntu16-cuda10.0.
Perhaps there is a better one out there, but this one seemed to work best. 

As far as the DCGAN used, Tensorflow provides a very good starting point below:

https://www.tensorflow.org/tutorials/generative/dcgan


However, as I decided that I wanted to train my network on the celebA dataset, and not on MNIST, I needed a larger neural network to do the job. I decided to use the network structure from the following tutorial:

https://github.com/skywall34/BabyGan/blob/master/DCGAN_CelebA.ipynb


The image set has about 200,000 images.

In [None]:
!pip install -q imageio

In [None]:
import boto3
import imageio
import glob
import matplotlib.pyplot as plt
import numpy as np
import os
from PIL import Image
import sagemaker as sage
import sys
import tarfile
import time
import tensorflow as tf


from data_pipeline import download_unzip_upload
from IPython import display
from sagemaker.estimator import Estimator 
from sagemaker import get_execution_role
role = get_execution_role()
sess = sage.Session()




## Data and Preprocessing

### celebA image set
This is one of the more common image sets used. You can use other datasets as well like cifar or from your own collection. Below is code used to automatically download the celebA dataset:

https://gist.github.com/charlesreid1/4f3d676b33b95fce83af08e4ec261822

In addition to this, we unzip, resize [64x64], and finally combine them all into a single tfrecord, uploading it into s3.

### Important:
The tfrecord conversion is compatible with tensorflow 2.0 or higher. I haven't found a simple way to upgrade Sagemaker instances themselves to Tensorflow 2.0. So the data processing should be run either outside the notebook instance (ie. on your local machine), or in an isolated environment (ie. a docker container).

In [None]:
bucket_name = "[BUCKET_NAME]"
#enter the name of your AWS bucket
file_id = '0B7EVK8r0v71pZjFTYXZWM3FlRnM'
#this is the name of the id for the celebA zip file.
destination = "celebA.zip"
#we  download the celebA zip file as celebA.zip or whatever name you prefer
output_file = "train.tfrecords"
#output the file in tf record format

In [None]:
prepare_records = download_unzip_upload(file_id, destination, output_file, bucket_name)

In [None]:
prepare_records.run_all()

## Building and pushing out Docker Image

### Below we read from our Dockerfile.

For more information please refer to:

https://github.com/awslabs/amazon-sagemaker-examples/blob/master/advanced_functionality/tensorflow_bring_your_own/tensorflow_bring_your_own.ipynb

Or:

https://docs.aws.amazon.com/sagemaker/latest/dg/amazon-sagemaker-containers.html


In [None]:
!cat container/Dockerfile

### The following code conveniently pushes our docker image to ECR where it can be called when we actually run our algorithm. 

In [None]:
%%sh

# The name of our algorithm
algorithm_name=dcgan-dogs

cd container

chmod +x train

account=$(aws sts get-caller-identity --query Account --output text)

# Get the region defined in the current configuration (default to us-east-1 if none defined)
region=$(aws configure get region)
region=${region:-us-east-1}

fullname="${account}.dkr.ecr.${region}.amazonaws.com/${algorithm_name}:latest"

# If the repository doesn't exist in ECR, create it.
aws ecr describe-repositories --repository-names "${algorithm_name}" > /dev/null 2>&1

if [ $? -ne 0 ]
then
    aws ecr create-repository --repository-name "${algorithm_name}" > /dev/null
fi

# Get the login command from ECR and execute it directly
$(aws ecr get-login --region ${region} --no-include-email)

# Build the docker image locally with the image name and then push it to ECR
# with the full name.

docker build  -t ${algorithm_name} .
docker tag ${algorithm_name} ${fullname}

docker push ${fullname}

## Running our neural network

### Here we define the last variables and parameters needed before we can finally run our DCGAN

In [None]:
#Locations where we get our data, and where we output our model and images
data_key1 = 'work_folder/output'
data_key2 = 'work_folder/train.tfrecords'
output_location = 's3://{}/{}'.format(bucket_name, data_key1)
data_location = 's3://{}/{}'.format(bucket_name, data_key2)

In [None]:
account = sess.boto_session.client('sts').get_caller_identity()['Account']
region = sess.boto_session.region_name
image = '{}.dkr.ecr.{}.amazonaws.com/dcgan-dogs:latest'.format(account, region)

In [None]:
hyperparameters = {'epochs': 1}

### Here we choose a ml.p2.xlarge instance. This is the cheapest GPU instance available for training on Sagemaker.  

Training on this instance will incur charges.

In [None]:


estimator = Estimator(image_name= image,
              hyperparameters = hyperparameters,
              role= role,
              output_path=output_location,
              train_instance_count=1,
              train_instance_type='ml.p2.xlarge')
              #train_instance_type='local')
    
    
estimator.fit(data_location)

## Let's see the results!

In [None]:
#First, let's look in our output_location

In [None]:

s3 = boto3.resource('s3')
bucket = s3.Bucket(bucket_name)

In [None]:
for obj in bucket.objects.filter(Prefix=data_key1):
    key = obj.key
    print(key)

In [None]:
#copy the recently added file, and then paste below to untar.
#note that the tmp folder clear on restart

In [None]:
s3a = boto3.client('s3')
s3a.download_file(bucket_name, 'path/to/model/output/model.tar.gz', '/tmp/results.tar.gz')

In [None]:
tar = tarfile.open("/tmp/results.tar.gz")

In [None]:
tar.extractall(path="output")

In [None]:
tar.close()

In [None]:
anim_file = 'dcgan.gif'

with imageio.get_writer(anim_file, mode='I') as writer:
  filenames = glob.glob('output/images/000image*.png')
  filenames = sorted(filenames)
  last = -1
  for i,filename in enumerate(filenames):
    frame = 2*(i**0.5)
    if round(frame) > round(last):
      last = frame
    else:
      continue
    image = imageio.imread(filename)
    writer.append_data(image)
  image = imageio.imread(filename)
  writer.append_data(image)


In [None]:
import IPython
display.Image(filename=anim_file)

## Final Thoughts:

### Training actually seemed quicker when run on google Colab
I'd recommend, with this network, to just run it on google Colab. Most of the code would be the same, and you will not incur any extra costs.

### So why bother on Sagemaker?
Google Colab or a local machine are perfectly fine options for training models. However, when our datasets begin to become too large or we just want to get our results much faster, the cloud is where we go. AWS is great because of scalability. S3 offers a large datastore. We can also scale up our gpu power by using more powerful instances. However, this does require that we modify the provided code to accomodate multi-gpu utilizaton.

### Final Takeaway:
This code is great for simultaneously learning about one of the most exciting neural networks around, DCGAN's, as well as becoming more familiar with deploying custom algorithms in the cloud. The next iteration of this project will be running this same code but on multiple GPU's, as it doesn't make any sense to pay for a GPU when you can already use one locally or even on Colab for a comparable ammount of time. 