# <strong> Computational Photography <strong>

## Week 1

### <strong>Some of the Professor's Research:</strong>

- The profesor created a algorithm to turn an image into a simple 3-D scene. 
- Other research focused on turning images into 3-D scenes/ models. 
- Inserting objects into images, and making those pasted objects look like they were in the original photograph. 
- Generating videos based off of text input (comic strips).
- Creating 3D, AR models of construction sites (Reconstruct)

### <Strong> Some Context on Computational Photography </strong>

- Depictions of people and scenes have changed a lot over the last few thousands of years (from abstract, iconography - to realism)
    - shift to perspective, real-life scenes
<img src="./Arnolfini_Portrait.png" alt="Arnolfini's Portrait and Mirror">
- Then there was the camera: first used to help artists draw realistic scenes (the Lens Based Camera Obscura)
    - hard to learn to draw things as they are, vs how we perceive them
<img src="./Daguerre.png" alt="Daguerre">
- But are photos really realistic?
    - photos can be staged (like the Iraqi photo, and touched up photos in magazines.)
    - this is where computer graphics come in. 
        - Model the 3D model of the scene
        - use a physics engine to render from any viewpoint
        - hard to do well
        - sometimes looks too shiny, too real
        - people are hard to do, pores, wrinkles, glow
- The realism spectrum
    - Computer Graphics:
        - easy to create new worlds
        - easy to manipulate objects/ viewpoints
        - very hard to make look realistic
    - Photography
        - instantly realistic
        - easy to aquire
        - very hard to manipulate objects/ viewpoints
- Computational photography = the best of both worlds
    - How can I use computational techniques to capture light in new ways?
    - How can I use computational techniques to breathe new light into the photograph?
    - How can I use computational techniques to synthesize and organize photo collections?


### <strong>Course Objectives</strong>

1. You will have new abilities for visual creation
2. You will get a foundation for computer vision
3. You will better appreciate your own visual ability 

<img src="./Thinking.png" alt="Daguerre">

4. You will have fun doing cool stuff

### <strong>Course Projects</strong>


1. Hybrid images
    - Creating an image that has signal from 2 different images (different interpretation depending on the size of the image)
2. Image quilting for texture synthesis and transfer
    - being able to create texture in images (face on toast)
3. Poisson editing
    - Picture of swimming pool + picture of a bear = blended in bear into a swimming pool
4. Image-based lighting
    - capturing light with mirrored ball
5. Video alignment, sitching, editing
    - panoramic video insertion and deletion
6. Do something cool
    - should be about the same scale as the previous projects

### <strong>Pixel and Image Filters</strong>

- Image formation
    - digital camera records light into CCD (converts photons of light into electrons)
    - measuring the total number of photos that reach each cell
    - the original signal could be nice and continuous (curved), but is converted to discrete and blocky for the camera
    - this is why elements of images can be pixelated and noisy
    - Raster image
        - matrix representation of an image
        - one value per pixel of the image
    - Perception of intensity
        - humans can be tricked by our own visual system (checkerboard shadow example)
    - Digital Color Images
        - using filters, CCD's can record color based on intensity 
        - all images are just three colors with different intensities (RGB)

In [None]:
# in python..

import cv2

im = cv2.imread(filename)
im = cv2.cvtColor(im, cv2.COLOR_BGR2RGB) # orders channels as RGB
im = im / 255 # values range from 0 to 1

# RGB image im is a H x W x 3 matrix (numpy.ndarray)

im[0,0,0] # top left pixel value in R-channel
im[y, x, c] # y + 1 pixels down, x + 1 pixels to the right in the cth channel
im[H-1, W-1, 2] # bottom right pixel in the B channel

Image Filtering

- is the compute function of local neighborhood at each position
- Really important
    - enhance images
        - denoise, resize, increase contrast, etc
    - extract information from images
        - texture, edges, distinctive points, etc
    - detect patterns 
        - template matching
- box filter
    - looks like a box in 2D plot
    - a 3x3 matrix filter applied to an image means you take each 3x3 part of the image, and get the dot product of of each segment
- what does a filter do?
    - it sort of blurs out the image
    - smoothes
    - reduces contrast
    - convolution (?)
    - answer: takes the average of each window
-  You have to get the dot product of the filter and the segment of the image you are looking at. The segment of the image matches the filter image in terms of size.
- The resulting image is the same size as the original image, as the affected pixel during each round of the filtering operation is just the one in the middle. We handle the edges of the photo differently. 
- This process is called the filtering operation
- What different filters do
    - zero matrix with a 1 in the center?
        - does nothing, as every pixel gets replaced by itself. 
        - called the identity filter
    - A zero matrix (3x3) with a single one at position 23?
        - shifted to the left
    - Doubling the image (zero matrxi with 2 in the center), and then subtracting a box filter?
        - this is a sharpening filter
        - if you subtract a blurred image from a sharp image, you just get the sharper image
        - making the differences in pixel intensities STRONGER
    - Edge filter
        - [1, 0, -1
           2, 0, -2
           1, 0, -1]
        - gets the absolute value of an image
        - sum of pixels from the left and subtracting the pixels from the right
        - turning this filter horizontal makes a Sobel filter
- How can we synthesize motion blur?
    - shift the image by multiple positions and then average it out
    - How is this done with a filter?

In [None]:
import cv2
import numpy as np

im_fn = './Thinking.png'

im = cv2.imread(im_fn)
im = cv2.cvtColor(im, cv2.COLOR_BGR2GRAY)/255 # convert to grayscale for now

theta = 0
len = 15
mid = (len-1)/2

fil = np.zeros((len,len))
print(fil)

fil[:, int(mid)] = 1/len
R = cv2.getRotationMatrix2D((mid, mid),theta,1)
fil = cv2.warpAffine(fil, R, (len,len))

im_fil = cv2.filter2D(im, -1, fil)

%matplotlib inline
fig, axes = plt.subplots(3,1,figsize=(50,50))
axes[0] = imshow(im,cmap='gray')
axes[1] = imshow(im,cmap='gray')
axes[2] = imshow(im,cmap='gray')

Correlation vs Convolution

- different terms for filtering. 
- sometimes used interchangably
- strong relationship between them though
- correlation
    - when you take a window over the image, you multiply corresponding elements of the window with the kernel (filter matrix)
- convolution
    - same as correlation, but you rotate the kernel first by 180 degrees
    - calculated using fast fourier transforms
- if you can do correlation, you can also do convolution
- if you have a symetric kernel, the output will be the same

Key properties of linear filters

- linearity
    - if you filter the sum of two signals, that is the same as filtering each separately and adding those responses together
- shift invariance
    - same behavior regardless of pixel location
    - filter(shift(f)) = shift(filter(f))
    - any linear shift invariant operator can also be represented as a convolution
- cummutative
    - a * b = b * a
    - conceptually not difference between filter and signal (image)
    - I could also filter my blur kernel with my image, and I get the same result. This is unlike matrices of linear algebra
- associative 
    - a * (b * c) = (a * b) * c
    - often apply several filters one after another
    - this is equivalent to applying one filter
- distributes over addition 
    - a * (b + c) = (a * b) + (a * c)
- scalars factor out
- identity filter is a filter with a 1 in the center

Important filter: Gaussian

- 4 representations as depicted in the lecture
- effective smoother without edgey artifacts (compared to box filter)
- remove "high frequency" components from the image (low pass filter)
    - images become more smooth
- if you convolve a gaussian with another gaussian you get another gaussian. (Ring a bell from stats? Normal distribution + normal distribution = normal distribution)
    - convolving twice with a gaussian kernal of width sigma is the same as convolving once with a kernel of width sigma * radical 2. 
- separable
    - you can divide it into a product of two 1-D Gaussians

Separability 

To summarize what the professor said here, separability just describes the fact that we can split our filter into smaller filters. The example given was splitting a 3x3 filter into the two 1x3 and 3x1 filters that could be multiplied to produce it. Using these two to convolve on the image instead of the larger 3x3 matrix is much faster for larger matrices. 

Some practical matters

- How big should a filter be?
    - values at edges should be near zero
    - rule of thumb for Gaussian: set kernel half-width to >= 3*sigma (since Gaussian is not discrete and can't be zero)
    - this just says that if the standard deviation of the pixel values is 1, then we want the size of the filter to be 7 by 7 (3 sigma is 3, so we want 3 on one side, 3 on the other, 1 value in the middle)
    - too small of a size on the Gaussian results in what is essentially the box filter
- What about near the edge?
    - the filter window falls off the edge of the image
    - need to extrapolate - aka making the image larger such that our filter can fit.
    - methods (all can be done in Python)
        - clipping (black filter around the whole image)
        - wrap around 
        - copy edge
        - reflect across edge (DEFAULT)
    - What is the size of the output?
        - full (response is the size of original image plus what we extrapolated)
        - same (response is the size of original image) (DEFAULT)
        - valid (original image does not get padded at all, just record response where filter fits)


Application Representing Texture

- regular or stochastic patterns caused by bumps, grooves, and/ or markings
- How can we represent texture?
    - computre respones of blobs and edges at various orientations and scales
    - filter bank = set of filters
    - we can apply multiple filters to an image and measure the responses of each one to see how much of an impact that filter made on the image
    - the result is a vector to describe the image
        - this tells us something about the texture of the image
        - for example, what would it mean if we saw there was a high response to verticle filters in an image, low responses to horizontal filters, and low responses to blob filters (blob meaning what it sounds like, organic circular looking patterns)?
            - probably means we have an image with a vertical looking texture

Hybrid Images (Project 1)

- a way of combining two images so that you get a different perception of the image depending on your distance to the image
- Gaussian filtered image (smooth image) + laplacian filtered image (detail image) = hybrid image
- far away and small = blurred image
- close and large = detail image

Summary

- images are a matrix of numbers
- linear fitering is the dot product at each window position of the image with the filter (kernel)
- be aware of details (size of filter, extrapolation, cropping)

## Week 2

- Fourier transform and frequency domain
    - another way to look at images
    - frequency view of filtering
    - another look at hybrid images
    - sampling
- Why does the Gaussian give a nice smooth image, but the box filter gives edgey artifacts?
    - hard to understand from the spacial domain, easier with frequency domain
- Why does we get different distance-dependent interpretations of hybrid images?
    - also answered in the frequency domain
- Why does a lower resolution image still make sense to us? What do we lose?

### <strong> Jean Baptiste Fourier </strong>

- Crazy idea
    - any univariate function can be rewritten as a weighted sum of sines and cosines of different frequencies
    - Laplace, Lagrange, Poisson were the judges
        - not impressed
    - his work was not even translated to english until 70 years later
- Idea: you can compose a signal out of sines and cosines
    - Amplitude * sin((frequency * x) + phase)
    - convergence on the square wave with enough smaller frequencies added
- We often think of frequencies in terms of music
    - pitches
- images are usually looked at in the spacial domain, but we can also look at them with frequencies
- in two dimensions:
    - fourier images are always symetric about the origin 
    - dots close to center = low frequency, slow change
    - dots farther away = higher frequency, faster change
    - signals can be composed and added together just like in the spacial domain
- Fourier transform
    - stores the  magnitude and phase at each frequency
        - magnitude encodes how much signal there is at a particular frequency
        - phase encodes spatial information (indirectly) - how sine and cosines are shifted
- can compute a few ways, including Euler's formula
- can compute the transform as an integral (continuous) or as a sum (discrete)
- Fast Fourier transformation is what we use
- The Convolutional Theorem
    - why it works
    - the Fourier transform of the convolution of two functions is the product of their Fourier transforms
        - F[g * h] = F[g]F[h]
    - the inverse Fourier transform of the product of two Fourier transforms is the convolution of the two inverse Fourier transforms
        - F inverse [gh] = F inverse [g] * F inverse [h]
    - what does this mean?
        - <strong>Convolution in the spacial domain is equivalent to multiplication in the frequency domain </strong>
- Properties of Fourier Tranforms
    - linearity
    - the Fourier transform of a real signal is symmetric about the origin
    - The energy of the signal is the same as the energy of its Fourier transform

### <strong> Filtering with FFT </strong>

<img src="./Filtering_FFT.png" alt="FFT Filtering">


In [None]:
# Filtering with FFT in Python

import matplotlib.pyplot as plt
import numpy as np

def filter_image(im, fil):
    # im: H x W floating point numpy ndarray representing image in grayscale
    # fil: M x M floating point numpy ndarray representing 2D filter

    H,W = im.shape
    hs = fil.shape[0] // 2                          # half of filter size
    fftsize = 1024                                  # should be order of 2 (for speed) and include padding
    im_fft = np.fft.fft2(im, (fftsize,fftsize))     # 1) fft im with padding
    fil_fft = np.fft.fft2(fil, (fftsize, fftsize))  # 2) fft fil, pad to same size as image
    im_fil_fft = im_fft * fil_fft                   # 3) multiply fft images
    im_fil = np.fft.ifft2(im_fil_fft)               # 4) inverse fft2
    im_fil = im_fil[hs:hs + H, hs:hs + W]           # 5) remove padding
    im_fil = np.reak(im_fil)                        # 6) extract out real part
    return im_fil

In [None]:
# Displaying with fft

import matplotlib.pyplot as plt
import numpy as np
def display_frequency_image(frequency_image):
    '''
    frequency_image: H x W floating point numpy ndarray representing image after FFT

    in grayscale

    '''
    shifted_image = np.fft.fftshift(frequency_image)
    amplitude_image = np.abs(shifted_image)
    log_amplitude_image = np.log(amplitude_image)
    fig = plt.figure()
    plt.imshow(log_amplitude_image, cmap='gray')
    plt.show()

<strong> Which has more information, the phase or the magnitude? </strong>

- magnitude = amount of power in frequencies
- phase = how they are shifted

In [None]:
# Compute FFT and decompose to magnitude and phase
im1_fft = fft2(im1);
im1_fft_mag = abs(im1_fft);
im1_fft_phase = angle(im1_fft);
im2_fft = fft2(im2);
im2_fft_mag = abs(im2_fft);
im2_fft_phase = angle(im2_fft);
# Combine mag and phase from different images and compute inverse FFT
mag1_phase2 = ifft2(im1_fft_mag.*cos(im2_fft_phase)+1i*im1_fft_mag.*sin(im2_fft_phase));
phase1_mag2 =ifft2(im2_fft_mag.*cos(im1_fft_phase)+1i*im2_fft_mag.*sin(im1_fft_phase));

- phase contains more information it would appear

### <strong> Answering some questions </strong>


- so why does the Gaussian give smooth images and the box gives edgey artifacts?
    - The gaussian preserves information only in the low frequencies
    - The square does the same, but also in some isolated spots of higher frequency. These spots cause the artifacts
- Why does lower resolution still make sense to us, and what do we lose?
    - in all the natural image frequency plots we saw, power is always really concetrated in the lower frequencies (a red dot in the center, meaning there is a lot of power in the low freqs, while blue out towards the rest of the plot indicating not much power)
        - AKA: this means there is not a lot of change as you go from a pixel to its neighbors 
            - images are mostly smooth
        - what you lose is the high frequency information, but there is not a lot to begin with, so therefore we keep a lot of the useful information 
- How do you shrink an image?
    - Naively, you might think you could just throw away every other row and column (to reduce by a factor of two)
        - This causes an aliasing problem
            - can be dangerous and cause artifacts
                - wagon wheels rolling in the wrong way in movies
                - checkerboards disintegrate in ray tracing
                - striped shirts look funny on color tv
    - Nyquist-Shannon Sampling Theorem
        - when sampling a singal at discrete intervals, the sampling frequency must be >= 2 x f_max
        - f_max = max frequency of the input signal
        - This will allow you to reconstruct the original perfectly from the sampled version
    - anti-aliasing
        - sample more often (doesn't achieve downsampling goal really)  
        - OR, get rid of all frequencies that are greater than half the new sampling frequency
            - will lose information
            - but it is better than aliasing
            - apply a smoothing filter
        - Algorithm
            - start with image
            - apply low pass filter (Gaussian)
            - sample every other pixel
- Why does a lower resolution image still make sense to us? What do we lose?
    - because it preserves low frequency (make sure you apply a low pass filter)
- Why do we get different, distance-dependent interpretations of hybrid images?
    - Early processing in humans filters for various orientations and scales of frequency
    - perceptual cues in the mid frequencies dominate perception
    - when we see an image from far away, we sare basically subsampling it
        - thus, we don't have access to high frequencies (or even mid frequencies)
    - Hybrid image = low passed image + high passed image

### <strong> Summary </strong>

- sometimes it makes sense to think of images and filtering in the frequency domain
    - Fourier analysis
- can be faster to filter using FFT for large images (N LogN vs N^2 for auto-correlation)
- Images are mostly smooth
    - basis for compression
- remember to low-pass before you down sample