In [3]:
import tensorflow as tf
import numpy as np
from matplotlib import pyplot as plt

## Scratch
A goal worth seeking is to be able to use the results from one-shot denoising or
inpainting in order to quickly evaluate different network architectures. This 
would allow the speed up of network architecture search.

To use such one-shot evaluations, it must be known what qualities of a network
they measure. 

Some possible ideas:
* good denoising results suggest that when learning algorithm is applied, the
  learning process descends to optimum parameter sets which have one or both of 
  the following properties:
    1. earlier layers contribute most to the output.
    2. a large number of parameters contribute to the output.


Some questions:
* why haven't I had any success training with SGD, but success with Adam?
* why are the results so sensitive to learning rate?
* maybe the effectiveness is a better indicator of appropriate learning rate 
  (rather than being indicative of a good network structure). 

# Investigate results from "Deep Image Prior" paper
Try to investigate some of the following claims and questions.

## Learning differences between layers explains one-shot denoising results.
Proposition 1: back-propagation with gradient descent favors the stabilization
of earlier layers before later layers. 

In more detail: the gradient of parameters in earlier layers must become small 
enough so that updates to  parameters in later layers will reduce the loss in 
expectation. As the gradient of earlier parameters reduces from large to small, 
the probability of a later parameter making a constructive contribution 
(reducing loss) increases from 50% towards 100%. 

Proposition 2: without contribution from the final layer, it is not possible to
recreate single pixel noise. without contribution from the second last layer,
it is not possible to recreate noise blocks of size 2x2. 

Argument:
Proposition 1
Proposition 2
Therefore, a network's ability to model noise improves as training stabilizes. 

There is no notion of "naturalness" covered by this argument.

### Experiment
Train a network to output a noisy image from input noise.
Measure:
* the update distance for all layers. 
* the contribution distribution for the last layer.
* the output accuracy.
* the output accuracy for the noisy pixels.

#### Evidence for proposition 1
Approximate layer stability with the update distance for each layer. Might need
some normalization. Fix some stability threshold. A strong relationship between 
layer number and time until reaching the  stability threshold is evidence for 
proposition 1.

#### Evidence for proposition 2 
If there is a strong positive relationship between the accuracy for the noisy 
pixels and the contribution distribution for the last layer-this is evidence 
for proposition 2. 