<a href="https://colab.research.google.com/github/solarshao1006/Math110BFinalProject/blob/main/Final%20Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Compression with Constraints: Steganography

The natural images are compressible, usually a full size image of several megabytes can be reduced to kilobytes level easily without lossing too much important information. Such property has been used widely to denoising, deblur, etc. techniques. 

The [``steganography``](https://en.wikipedia.org/wiki/Steganography) is a topic lying in the cryptography. It is concealing a file, message, image, or video within another file, message, image, or video. 

The advantage of steganography over cryptography alone is that the intended secret message ***does not attract attention to itself as an object of scrutiny***. Plainly visible encrypted messages, no matter how unbreakable they are, arouse interest and may in themselves be incriminating in countries in which encryption is illegal. 

Whereas cryptography is the practice of protecting the contents of a message alone, steganography is concerned both with concealing the fact that a secret message is being sent and its contents.

Steganography includes the concealment of information within computer files. In digital steganography, electronic communications may include steganographic coding inside of a transport layer, such as a document file, image file, program or protocol. Media files are ideal for steganographic transmission because of their large size. For example, a sender might start with an innocuous image file and adjust the color of every hundredth pixel to correspond to a letter in the alphabet. The change is so subtle that someone who is not specifically looking for it is unlikely to notice the change. 

In this project, we deal with a special case: stegranography with images only. So unlike many practical scenes, for instance, encrypt text, document in images, this task might not be able to produce perfect recovery of information.

## Purpose of the project

The project is not meant to create super powerful technique to conceal information, it is more concerned to get used to imaging processing libraries and optimization techniques. On the other hand, it also provides a challenge to think about how to detect the ``steganographic`` images without the original images.

## Mathematical aspects

The stegranography has two important components: encryption and decryption. 

Suppose you have an original image and a secret image.


1.   Encryption: As a sender, your task is to make the original image and secret image merge into one image. The purpose is two fold. Firstly, your outcome must be an image, if this image stays far away from the original image, then it will attract other people's attention. That will count as a failure. Secondly, your outcome image must also convey the information of the secret image. It is not that simple, since any blending of information will change each other. The problem is how much we can afford.
2.   Decryption: After the encryption part, your outcome image will have two parts of information coming from original image and the secret image. As the receiver, your task will be inverting the encryption process, to recover the secret image as much as possible (Caution,  the receiver do not care about original image). 

If we mathematically represent such process, let $x$ be the original image, $y$ is the secret image, then $z = E(x, y)$ is the encrypted image, $E$ is the encryption function. You will try to minimize 
$$\|z - x\|$$
The above norm is in certain sense, we will discuss that later. However, above minimization will subject to another constraint, which is the decryption function $D$ can recover sufficient information of the secret image. That is 
$$\|D(z) - y\|$$
should be as small as possible.

It is possible to construct a unified objective function:
$$\min_{E, D} \|x - E(x,y)\| + \gamma \|y - D(z)\|$$
where $E$ and $D$ are the parameters to find. $\gamma$ is a parameter chosen at your choice.

Of course, there are other constraints from the images, because images are pixels, each pixel contains 3 channels: R,G,B, each one is a 8-bit integer, goes from 0 to 255. If the image has 4 channels RGBA, then it will provide more information. Therefore above optimization problem also has constraints that $E(x,y)$ and $D(z)$ must be images. 



## Algorithm 101, LSB

The LSB is called least significant bits, which means you will replace the least significant bits of the original image with the secret image's most significant bits. This method will kill some information from both images, but the performance seems OK for general cases.

Here are a few references on this simple algorithm: 

0.   https://towardsdatascience.com/steganography-hiding-an-image-inside-another-77ca66b2acb1, the code is [here](https://github.com/kelvins/steganography)
1.   https://github.com/RobinDavid/LSB-Steganography 
2.   https://pdfs.semanticscholar.org/3dce/b6307cee042b687b7f377ec1d5de91ce20b0.pdf
3.   https://hackernoon.com/simple-image-steganography-in-python-18c7b534854f

The basic idea is (suppose you have a code to turn int8 into binary string),  inside each channel, say R, your original image's pixel, say represented as ``1001,0011``, and your secret image's that pixel is ``1110,1101``, then replace the last 4 bits in original image's pixel with the first 4 bits of secret image's corresponding pixel, the resulting number will be ``1001,1110``. In this way, the change in the original image could be small (on average).  There are other ways to alter the LSB, like treating the secret image as a binary string , and evenly distribute to each pixel. ***In our case, for simplicity, we only consider the images with the same size.***

## Shortcomings 

The shortcomings of algorithms/methods are mainly on the detection, which is, we can easily detect your outcome image is not feeling right. 

In practice, if you look at the altered image, say from LSB, you do not feel anything. But the LSB has a very obvious drawback: it alters the last bit, which may distory the statistics of the last bit. In theory, the last bit 0 and 1 should obey certain heuristic distribution in the image, but now it will be changed.

The detection code is here: https://github.com/b3dk7/StegExpose

In https://dl.acm.org/citation.cfm?id=1929317, the paper introduced the method to preserve the statistics. 

In https://pdfs.semanticscholar.org/80a5/fcbeda7697d9641bc80460593c2f8f305a65.pdf, it introduced the detection of LSB. 


In http://futuremedia.szu.edu.cn/assets/files/CF_What%20makes%20the%20stego%20imageundetectable.pdf, the authors considered choosing the best original image to hide the given secret image 

Again,  currently, we are not supposed to consider this far, but it will be a future work if you find this interesting.

## Other ways

The review paper (maybe old) is found here: https://www.sciencedirect.com/science/article/pii/B9780123855107000023

(some other reviews are found: [here](https://pdfs.semanticscholar.org/57a1/d15dcbf946f093a59db55f8828699fef7826.pdf) and [here](https://www.cscjournals.org/manuscript/Journals/IJCSS/Volume6/Issue3/IJCSS-670.pdf))


1.   https://arxiv.org/pdf/1606.05294.pdf. In this paper, it introduces the method to use NN to replace (learn) the LSB process. 
2.   https://papers.nips.cc/paper/6802-hiding-images-in-plain-sight-deep-steganography.pdf, it introduced a NN to  approximate $D$ and $E$. 
3. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.178.7157&rep=rep1&type=pdf, it uses the DCT (discrete-cosine-transform) and LSB. 
4. https://arxiv.org/pdf/1806.06357.pdf and  [code](https://github.com/adamcavendish/Deep-Image-Steganography)
5. https://ieeexplore.ieee.org/document/8403208/all-figures
6. https://eccv2018.org/openaccess/content_ECCV_2018/papers/Jiren_Zhu_HiDDeN_Hiding_Data_ECCV_2018_paper.pdf
7. https://arxiv.org/pdf/1904.01444.pdf
8. https://link.springer.com/article/10.1007/s00521-014-1702-1

## First task
Implement LSB, the images (orignal and secret) are of the same sizes. If you do not want to implement any, at least go through the code  [here](https://github.com/kelvins/steganography). 

In [2]:
! mkdir ~/.kaggle

In [3]:
! cp kaggle.json ~/.kaggle/

In [4]:
! chmod 600 ~/.kaggle/kaggle.json

In [5]:
! kaggle datasets download gaz3ll3/optimization-ii-project-3

Downloading optimization-ii-project-3.zip to /content
 96% 262M/274M [00:01<00:00, 164MB/s]
100% 274M/274M [00:01<00:00, 153MB/s]


In [None]:
! unzip 'optimization-ii-project-3.zip'

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
  inflating: Opencountry/open48.jpg  
  inflating: Opencountry/open52.jpg  
  inflating: Opencountry/open53.jpg  
  inflating: Opencountry/open55.jpg  
  inflating: Opencountry/open61.jpg  
  inflating: Opencountry/open7.jpg   
  inflating: Opencountry/osun12.jpg  
  inflating: Opencountry/sclos10.jpg  
  inflating: Opencountry/sclos18.jpg  
  inflating: Opencountry/sclos30.jpg  
  inflating: Opencountry/sopen10.jpg  
  inflating: Opencountry/sopen11.jpg  
  inflating: Opencountry/sopen15.jpg  
  inflating: Opencountry/sopen61.jpg  
  inflating: Opencountry/sopen9.jpg  
  inflating: Opencountry/tell56.jpg  
  inflating: Opencountry/tell59.jpg  
  inflating: Opencountry/tell67.jpg  
  inflating: Opencountry/urb969.jpg  
  inflating: coast/Thumbs.db         
  inflating: coast/arnat59.jpg       
  inflating: coast/art1130.jpg       
  inflating: coast/art294.jpg        
  inflating: coast/art487.jpg        
  inflating: coa

In [None]:
import os
os.listdir()

['.config',
 'highway',
 'optimization-ii-project-3.zip',
 'Opencountry',
 'mountain',
 'forest',
 'street',
 'data',
 'coast',
 'kaggle.json',
 'tallbuilding',
 'inside_city',
 'sample_data']

In [None]:
from google.colab import files
from io import BytesIO
from IPython.display import Image, display

In [None]:
#Upload images
uploaded = files.upload()

In [None]:
import os
#Check current directory
os.listdir()

In [None]:
import click
import PIL

class Steganography:

    @staticmethod
    def __int_to_bin(rgb):
        """Convert an integer tuple to a binary (string) tuple.
        :param rgb: An integer tuple (e.g. (220, 110, 96))
        :return: A string tuple (e.g. ("00101010", "11101011", "00010110"))
        """
        r, g, b = rgb
        return (f'{r:08b}',
                f'{g:08b}',
                f'{b:08b}')

    @staticmethod
    def __bin_to_int(rgb):
        """Convert a binary (string) tuple to an integer tuple.
        :param rgb: A string tuple (e.g. ("00101010", "11101011", "00010110"))
        :return: Return an int tuple (e.g. (220, 110, 96))
        """
        r, g, b = rgb
        return (int(r, 2),
                int(g, 2),
                int(b, 2))

    @staticmethod
    def __merge_rgb(rgb1, rgb2):
        """Merge two RGB tuples.
        :param rgb1: A string tuple (e.g. ("00101010", "11101011", "00010110"))
        :param rgb2: Another string tuple
        (e.g. ("00101010", "11101011", "00010110"))
        :return: An integer tuple with the two RGB values merged.
        """
        r1, g1, b1 = rgb1
        r2, g2, b2 = rgb2
        rgb = (r1[:4] + r2[:4],
               g1[:4] + g2[:4],
               b1[:4] + b2[:4])
        return rgb

    @staticmethod
    def merge(img1, img2):
        """Merge two images. The second one will be merged into the first one.
        :param img1: First image
        :param img2: Second image
        :return: A new merged image.
        """

        # Check the images dimensions
        if img2.size[0] > img1.size[0] or img2.size[1] > img1.size[1]:
            raise ValueError('Image 2 should not be larger than Image 1!')

        # Get the pixel map of the two images
        pixel_map1 = img1.load()
        pixel_map2 = img2.load()

        # Create a new image that will be outputted
        new_image = PIL.Image.new(img1.mode, img1.size)
        pixels_new = new_image.load()

        for i in range(img1.size[0]):
            for j in range(img1.size[1]):
                rgb1 = Steganography.__int_to_bin(pixel_map1[i, j])

                # Use a black pixel as default
                rgb2 = Steganography.__int_to_bin((0, 0, 0))

                # Check if the pixel map position is valid for the second image
                if i < img2.size[0] and j < img2.size[1]:
                    rgb2 = Steganography.__int_to_bin(pixel_map2[i, j])

                # Merge the two pixels and convert it to a integer tuple
                rgb = Steganography.__merge_rgb(rgb1, rgb2)

                pixels_new[i, j] = Steganography.__bin_to_int(rgb)

        return new_image

    @staticmethod
    def unmerge(img):
        """Unmerge an image.
        :param img: The input image.
        :return: The unmerged/extracted image.
        """

        # Load the pixel map
        pixel_map = img.load()

        # Create the new image and load the pixel map
        new_image = PIL.Image.new(img.mode, img.size)
        pixels_new = new_image.load()

        # Tuple used to store the image original size
        original_size = img.size

        for i in range(img.size[0]):
            for j in range(img.size[1]):
                # Get the RGB (as a string tuple) from the current pixel
                r, g, b = Steganography.__int_to_bin(pixel_map[i, j])

                # Extract the last 4 bits (corresponding to the hidden image)
                # Concatenate 4 zero bits because we are working with 8 bit
                rgb = (r[4:] + '0000',
                       g[4:] + '0000',
                       b[4:] + '0000')

                # Convert it to an integer tuple
                pixels_new[i, j] = Steganography.__bin_to_int(rgb)

                # If this is a 'valid' position, store it
                # as the last valid position
                if pixels_new[i, j] != (0, 0, 0):
                    original_size = (i + 1, j + 1)

        # Crop the image based on the 'valid' pixels
        new_image = new_image.crop((0, 0, original_size[0], original_size[1]))

        return new_image

def merge(img1, img2, output):
    merged_image = Steganography.merge(PIL.Image.open(img1), PIL.Image.open(img2))
    merged_image.save(output)

def unmerge(img, output):
    unmerged_image = Steganography.unmerge(PIL.Image.open(img))
    unmerged_image.save(output)

In [None]:
#Call the function directly instead of using the command line
merge('img1.jpg', 'img2.jpg', 'output_merge.png')
unmerge('output_merge.png', 'output_unmerge.png')

In [None]:
#Check the current directory list now, the outputs are in the directory
os.listdir()

In [None]:
#Visualization
display(Image('output_merge.png'))

In [None]:
#Visualization
display(Image('output_unmerge.png'))

## Second task
Try to use neural network to approximate $D$ and $E$, the parameters are up to you, the structure is up to you.  This paper provides a good insight: https://papers.nips.cc/paper/6802-hiding-images-in-plain-sight-deep-steganography.pdf,  an implementation is found here: https://github.com/fpingham/DeepSteg/blob/master/DeepSteganography.ipynb, https://github.com/Ankit-Dhankhar/deep-steg/blob/master/steg%20net.py, and https://github.com/mr3coi/deepsteg and https://github.com/alexandremuzio/deep-steg and https://github.com/harveyslash/Deep-Steganography....., a blog https://buzzrobot.com/hiding-images-using-ai-deep-steganography-b7726bd58b06

For the network structure, you can borrow the idea from autoencoder for the $E$ part, in that paper, the authors claimed the $E$ part uses 5 layers of convolutional neural networks with 3x3, 4x4, 5x5 patches. The idea is only to approximate the mappings $D$ and $E$, the fully connected network should also work, but convolutional type is cheaper. 

A good way to combine DCT (discrete-cosine-transform) to reduce the information first on secret images (bypassing the prep network in the paper). References are easy to find by searching google with DCT keywords. 

In [None]:
# Imports necessary libraries and modules
from itertools import islice
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import torch
from torch.autograd import Variable
from torch import utils
import torchvision
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import os 
import pickle
from PIL import Image
from torchvision import datasets, utils
import torchvision.transforms as transforms
from torchvision.transforms import ToPILImage
from random import shuffle
from IPython.display import Image
from google.colab import drive
import random
from collections import OrderedDict

In [None]:
os.listdir('data/highway')

['a866041.jpg',
 'gre40.jpg',
 'art608.jpg',
 'gre458.jpg',
 'bost301.jpg',
 'gre467.jpg',
 'bost166.jpg',
 'urb962.jpg',
 'gre279.jpg',
 'bost387.jpg',
 'urb743.jpg',
 'art820.jpg',
 'gre239.jpg',
 'a866042.jpg',
 'bost173.jpg',
 'art596.jpg',
 'gre644.jpg',
 'gre409.jpg',
 'par23.jpg',
 'gre662.jpg',
 'bost180.jpg',
 'bost403.jpg',
 'art237.jpg',
 'n480023.jpg',
 'gre504.jpg',
 'bost181.jpg',
 'gre36.jpg',
 'n480045.jpg',
 'gre30.jpg',
 'gre470.jpg',
 'bost172.jpg',
 'bost389.jpg',
 'nat533.jpg',
 'bost302.jpg',
 'bost171.jpg',
 'bost168.jpg',
 'gre48.jpg',
 'gre678.jpg',
 'bost314.jpg',
 'bost295.jpg',
 'gre462.jpg',
 'gre658.jpg',
 'art558.jpg',
 'gre413.jpg',
 'art1204.jpg',
 'bost331.jpg',
 'n480036.jpg',
 'land449.jpg',
 'art1696.jpg',
 'gre404.jpg',
 'urb471.jpg',
 'bost307.jpg',
 'bost161.jpg',
 'gre609.jpg',
 'gre493.jpg',
 'gre147.jpg',
 'urb714.jpg',
 'gre473.jpg',
 'art563.jpg',
 'bost160.jpg',
 'urb713.jpg',
 'land464.jpg',
 'urb720.jpg',
 'bost179.jpg',
 'gre275.jpg',
 '

In [None]:
def load_data():
  '''
  Load data into tensor format and add to a list.
  '''
  data_list = []
  folder_dir = 'data/highway'
  file_dirlist = os.listdir(folder_dir)
  for file_dir in file_dirlist:
    split = os.path.splitext(file_dir)
    if split[1] == '.jpg' or split[1] == '.png' or split[1] == '.jpeg':
      file_full_path = os.path.join(folder_dir, file_dir)
      image_tensor = torchvision.io.read_image(file_full_path)
      data_list.append(image_tensor.div(256))
  return data_list


In [None]:
def create_train_test_set():
  '''
  Create Train Set and Test Set; 80% of the data are randomly chosen as train set and the rest are test set.
  '''
  file_dir = load_data()
  random.shuffle(file_dir)
  train_set = file_dir[:int(0.8*len(file_dir))]
  test_set = file_dir[int(0.8*len(file_dir)):]
  return train_set, test_set

In [None]:
def separate_cover_secret(train_set, test_set):
  '''
  Separate cover and secret set from train set and test set. First half as cover set, second half as secret set. 
  '''
  train_set_cover = train_set[int(len(train_set) / 2):]
  train_set_secret = train_set[:int(len(train_set) / 2)]

  test_set_cover = test_set[int(len(test_set) / 2):]
  test_set_secret = test_set[:int(len(test_set) / 2)]
  return train_set_cover, train_set_secret, test_set_cover, test_set_secret

In [None]:
#Create Training Set and Test set, cover_set and secret set
train_set_, test_set_ = create_train_test_set()
train_set_cover, train_set_secret, test_set_cover, test_set_secret = separate_cover_secret(train_set_, test_set_)

In [None]:
class PrepNetwork(nn.Module):
    def __init__(self):
        super(PrepNetwork, self).__init__()
        self.initialP3 = nn.Sequential(OrderedDict([
          ('conv1', nn.Conv2d(1,50,kernel_size = 3, padding=1)),
          ('relu1', nn.ReLU()),
          ('conv2', nn.Conv2d(50,50,kernel_size = 3, padding=1)),
          ('relu2', nn.ReLU()),
          ('conv3', nn.Conv2d(50,50, kernel_size=3, padding=1)),
          ('relu3', nn.ReLU()),
          ('conv4', nn.Conv2d(50,50, kernel_size=3, padding=1)),
          ('relu4', nn.ReLU())
        ]))
        self.initialP4 = nn.Sequential(OrderedDict([
          ('conv5', nn.Conv2d(1,50,kernel_size = 4, padding=1)),
          ('relu5', nn.ReLU()),
          ('conv6', nn.Conv2d(50,50,kernel_size = 4, padding=2)),
          ('relu6', nn.ReLU()),
          ('conv7', nn.Conv2d(50,50, kernel_size=4, padding=1)),
          ('relu7', nn.ReLU()),
          ('conv8', nn.Conv2d(50,50, kernel_size=4, padding=2)),
          ('relu8', nn.ReLU())
        ]))
        self.initialP5 = nn.Sequential(OrderedDict([
          ('conv9', nn.Conv2d(1,50,kernel_size = 5, padding=2)),
          ('relu9', nn.ReLU()),
          ('conv10', nn.Conv2d(50,50,kernel_size = 5, padding=2)),
          ('relu10', nn.ReLU()),
          ('conv11', nn.Conv2d(50,50, kernel_size= 5, padding=2)),
          ('relu11', nn.ReLU()),
          ('conv12', nn.Conv2d(50,50, kernel_size=5, padding=2)),
          ('relu12', nn.ReLU())
        ]))
        self.finalP3 = nn.Sequential(OrderedDict([
          ('conv13', nn.Conv2d(150,50,kernel_size = 3, padding=1)),
          ('relu13', nn.ReLU())
        ]))
        self.finalP4 = nn.Sequential(OrderedDict([
          ('conv14', nn.Conv2d(150,50,kernel_size = 4, padding=1)),
          ('relu14', nn.ReLU()),
          ('conv15', nn.Conv2d(50, 50, kernel_size=4, padding=2)),
          ('relu15', nn.ReLU())
        ]))
        self.finalP5 = nn.Sequential(OrderedDict([
          ('conv16', nn.Conv2d(150,50, kernel_size = 5, padding=2)),
          ('relu16', nn.ReLU())
        ]))
        

    def forward(self, p):
        p1 = self.initialP3(p)
        p2 = self.initialP4(p)
        p3 = self.initialP5(p)
        mid = torch.cat((p1, p2, p3), 1)
        p4 = self.finalP3(mid)
        p5 = self.finalP4(mid)
        p6 = self.finalP5(mid)
        out = torch.cat((p4, p5, p6), 1)
        return out

# Hiding Network (5 conv layers)
class HidingNetwork(nn.Module):
    def __init__(self):
        super(HidingNetwork, self).__init__()
        self.initialH3 = nn.Sequential(
            nn.Conv2d(151, 50, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.Conv2d(50, 50, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.Conv2d(50, 50, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.Conv2d(50, 50, kernel_size=3, padding=1),
            nn.ReLU())
        self.initialH4 = nn.Sequential(
            nn.Conv2d(151, 50, kernel_size=4, padding=1),
            nn.ReLU(),
            nn.Conv2d(50, 50, kernel_size=4, padding=2),
            nn.ReLU(),
            nn.Conv2d(50, 50, kernel_size=4, padding=1),
            nn.ReLU(),
            nn.Conv2d(50, 50, kernel_size=4, padding=2),
            nn.ReLU())
        self.initialH5 = nn.Sequential(
            nn.Conv2d(151, 50, kernel_size=5, padding=2),
            nn.ReLU(),
            nn.Conv2d(50, 50, kernel_size=5, padding=2),
            nn.ReLU(),
            nn.Conv2d(50, 50, kernel_size=5, padding=2),
            nn.ReLU(),
            nn.Conv2d(50, 50, kernel_size=5, padding=2),
            nn.ReLU())
        self.finalH3 = nn.Sequential(
            nn.Conv2d(150, 50, kernel_size=3, padding=1),
            nn.ReLU())
        self.finalH4 = nn.Sequential(
            nn.Conv2d(150, 50, kernel_size=4, padding=1),
            nn.ReLU(),
            nn.Conv2d(50, 50, kernel_size=4, padding=2),
            nn.ReLU())
        self.finalH5 = nn.Sequential(
            nn.Conv2d(150, 50, kernel_size=5, padding=2),
            nn.ReLU())
        self.finalH = nn.Sequential(
            nn.Conv2d(150, 3, kernel_size=1, padding=0))
        
    def forward(self, h):
        h1 = self.initialH3(h)
        h2 = self.initialH4(h)
        h3 = self.initialH5(h)
        mid = torch.cat((h1, h2, h3), 1)
        h4 = self.finalH3(mid)
        h5 = self.finalH4(mid)
        h6 = self.finalH5(mid)
        mid2 = torch.cat((h4, h5, h6), 1)
        out = self.finalH(mid2)
        out_noise = gaussian(out.data, 0, 0.1)
        return out, out_noise

# Reveal Network (2 conv layers)
class RevealNetwork(nn.Module):
    def __init__(self):
        super(RevealNetwork, self).__init__()
        self.initialR3 = nn.Sequential(
            nn.Conv2d(3, 50, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.Conv2d(50, 50, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.Conv2d(50, 50, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.Conv2d(50, 50, kernel_size=3, padding=1),
            nn.ReLU())
        self.initialR4 = nn.Sequential(
            nn.Conv2d(3, 50, kernel_size=4, padding=1),
            nn.ReLU(),
            nn.Conv2d(50, 50, kernel_size=4, padding=2),
            nn.ReLU(),
            nn.Conv2d(50, 50, kernel_size=4, padding=1),
            nn.ReLU(),
            nn.Conv2d(50, 50, kernel_size=4, padding=2),
            nn.ReLU())
        self.initialR5 = nn.Sequential(
            nn.Conv2d(3, 50, kernel_size=5, padding=2),
            nn.ReLU(),
            nn.Conv2d(50, 50, kernel_size=5, padding=2),
            nn.ReLU(),
            nn.Conv2d(50, 50, kernel_size=5, padding=2),
            nn.ReLU(),
            nn.Conv2d(50, 50, kernel_size=5, padding=2),
            nn.ReLU())
        self.finalR3 = nn.Sequential(
            nn.Conv2d(150, 50, kernel_size=3, padding=1),
            nn.ReLU())
        self.finalR4 = nn.Sequential(
            nn.Conv2d(150, 50, kernel_size=4, padding=1),
            nn.ReLU(),
            nn.Conv2d(50, 50, kernel_size=4, padding=2),
            nn.ReLU())
        self.finalR5 = nn.Sequential(
            nn.Conv2d(150, 50, kernel_size=5, padding=2),
            nn.ReLU())
        self.finalR = nn.Sequential(
            nn.Conv2d(150, 3, kernel_size=1, padding=0))

    def forward(self, r):
        r1 = self.initialR3(r)
        r2 = self.initialR4(r)
        r3 = self.initialR5(r)
        mid = torch.cat((r1, r2, r3), 1)
        r4 = self.finalR3(mid)
        r5 = self.finalR4(mid)
        r6 = self.finalR5(mid)
        mid2 = torch.cat((r4, r5, r6), 1)
        out = self.finalR(mid2)
        return out

# Join three networks in one module
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.m1 = PrepNetwork()
        self.m2 = HidingNetwork()
        self.m3 = RevealNetwork()

    def forward(self, secret, cover):
        x_1 = self.m1(secret)
        mid = torch.cat((x_1, cover), 1)
        x_2, x_2_noise = self.m2(mid)
        x_3 = self.m3(x_2_noise)
        return x_2, x_3
model = nn.Sequential(OrderedDict([
          ('conv1', nn.Conv2d(1,20,5)),
          ('relu1', nn.ReLU()),
          ('conv2', nn.Conv2d(20,64,5)),
          ('relu2', nn.ReLU())
        ]))


In [None]:
# Creates net object
net = Net()

In [None]:
def customized_loss(S_prime, C_prime, S, C, B):
    ''' Calculates loss specified on the paper.'''
    loss_cover = torch.nn.functional.mse_loss(C_prime, C)
    loss_secret = torch.nn.functional.mse_loss(S_prime, S)
    loss_all = loss_cover + B * loss_secret
    return loss_all, loss_cover, loss_secret


def denormalize(image, std, mean):
    ''' Denormalizes a tensor of images.'''
    for t in range(3):
        image[t, :, :] = (image[t, :, :] * std[t]) + mean[t]
    return image


def gaussian(tensor, mean=0, stddev=0.1):
    '''Adds random noise to a tensor.'''
    noise = torch.nn.init.normal(torch.Tensor(tensor.size()), 0, 0.1)
    return Variable(tensor + noise)

In [None]:
net, mean_train_loss, loss_history = train_model(train_set_, train_set_secret, train_set_cover, beta, learning_rate, num_epochs)

  after removing the cwd from sys.path.
  """


Training: Batch 1/208. Loss of 0.7123, cover loss of 0.3277, secret loss of 0.3846
Training: Batch 2/208. Loss of 0.6351, cover loss of 0.3398, secret loss of 0.2953
Training: Batch 3/208. Loss of 0.6836, cover loss of 0.3579, secret loss of 0.3257
Training: Batch 4/208. Loss of 0.4556, cover loss of 0.3286, secret loss of 0.1270
Training: Batch 5/208. Loss of 0.3882, cover loss of 0.2037, secret loss of 0.1845
Training: Batch 6/208. Loss of 0.4520, cover loss of 0.1192, secret loss of 0.3328
Training: Batch 7/208. Loss of 0.5510, cover loss of 0.2785, secret loss of 0.2724
Training: Batch 8/208. Loss of 0.4189, cover loss of 0.2639, secret loss of 0.1551
Training: Batch 9/208. Loss of 0.3339, cover loss of 0.1051, secret loss of 0.2288
Training: Batch 10/208. Loss of 0.5040, cover loss of 0.2553, secret loss of 0.2488
Training: Batch 11/208. Loss of 0.5016, cover loss of 0.2538, secret loss of 0.2478
Training: Batch 12/208. Loss of 0.3480, cover loss of 0.1019, secret loss of 0.2461
T

## Optional Task

What if you have two secret images to encrpt, what if there are more. Does LSB work? Does the NN work? 

## Data set 

https://www.kaggle.com/gaz3ll3/optimization-ii-project-3

In order to efficiency, we only consider small pictures, 256x256. If you have problem dealing with 256x256, you can resize them to 128x128 or 64x64.  If you feel the images are too many, you can sample a portion from them as well.  

Training and Validation sets are chosen at random (say, 80% and 20%). Each input data will be two images from the training set. 

If you are more comfortable with other data sets, it is up to you. Say you can use https://tiny-imagenet.herokuapp.com/ for 64x64 small images.

## Metric

In your trainging process for $D$ and $E$, the norm to compare images is the RMSE (root mean squared error), the images are of dimension $N\times N\times 3$, BTW.

## Your final result


1.   A writeup on your work, including performance, your work, issues, how do you solve the issues, etc. 
2.   Test your codes (LSB and NN) against the data set http://r0k.us/graphics/kodak/, each image will be downsize to 256x256 or 128x128 or 64x64 if you trained an NN on smaller images. Report your result in your writeup. 
3. Code, again, host on github. Submission will be a link. 
4. If you also tried the optional task, please also report that in your writeup.




In [None]:
print('Good luck!')