# Scratch Notebook (Ignore)

In [None]:
%run -m ipy_startup
%matplotlib inline

import plotly as plty
import plotly.graph_objs as go
import cufflinks as cf
cf.set_config_file(offline=True, theme='white', offline_link_text=None, offline_show_link=False)

import logging
console = logging.StreamHandler()
console.setFormatter(logging.Formatter('%(asctime)s:%(levelname)s:%(name)s: %(message)s'))
logger = logging.getLogger()
logger.setLevel(logging.DEBUG)
logger.addHandler(console)

Notes

- [Single Layer Approximation](http://stats.stackexchange.com/questions/200330/does-number-of-layers-in-neural-network-corresponds-to-degree-of-the-approximati)
    - From Cybenko's proof for universal approximations of single layer network:
        - "that networks with one internal layer and an arbitrary continuous sigmoidal function can approximate continuous functions wih arbitrary precision providing that no constraints are placed on the number of nodes or the size of the weights"
        - "requires the activation function be nonconstant, bounded, and monotonically-increasing continuous function"
- XOR can be done in single layer network with logit activitation, but not hard threshold or linear
- [Showing approximation as steps](http://neuralnetworksanddeeplearning.com/chap4.html)
- [On exponential number of response regions possible with deeper networks](http://stats.stackexchange.com/questions/196585/how-to-understand-the-geometric-intuition-of-the-inner-workings-of-neural-networ/197032#197032)
- [Deep Learning Conspiracy](http://people.idsia.ch/~juergen/deep-learning-conspiracy.html)
    - Dates and people associated with development of main ideas in deep learning
- Vapnik created original SVM in 1963

papers:
- [On the Number of Linear Regions of Deep Neural Networks](https://arxiv.org/pdf/1402.1869.pdf)
- [On the number of response regions of deep feed forward networks with piece-wise linear activations](http://arxiv.org/abs/1312.6098)
    - Number of input region splits grows exponentially with parameters
    - Main results:
        - lower bound on maximal number of response regions per parameter for multi layer network:<br>
            $ \Omega(\lfloor\frac{k}{d}\rfloor^{(l - 1)} \frac{k^{(d-2)}}{l})$<br>
            $l$ = number of layers<br>
            $d$ = number of inputs<br>
            $k$ = number of neurons ($\geq d$)
        - upper bound on maximal number of response regions per parameter for shallow network:<br>
            $ O(l^{(d - 1)}k^{(d - 1)}) $
        - on number of neurons being greater than number of inputs: One way deep networks are advantageous is by using first layer to provide activiation in direction of manifold, which effectively brings number of dimensions back well below number of neurons

resources:

Breakfast Reading:
- [Neural Networks, Manifolds, and Topology](http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/)
- [A visual proof that neural nets can compute any function](http://neuralnetworksanddeeplearning.com/chap4.html)

Best Book (free too):
- [Deep Learning (2015 - Bengio)](https://github.com/HFTrader/DeepLearningBook/raw/master/DeepLearningBook.pdf)
    - "It has been proven in many diﬀerent settings that organizing computationthrough the composition of many nonlinearities and a hierarchy of reused featurescan give an exponential boost to statistical eﬃciency, on top of the exponentialboost given by using a distributed representation"
    - Most relevant sections to understanding why deep learning works: 
        - 15.4 Distributed Representation
        - 15.5 Exponential Gains from Depth
        - Together there are less than 10 pages here and they're a little dense but well worth the read
        - number of regions created by $n$ hyperplaces in $R^d$ is $\sum_{j=0}^{d}{{{N}\choose{j}}} = O(n^d)$

Best Survey Paper (also free):
- [Deep Learning (2015 - Bengio, LeCunn, Hinton)](http://www.cs.toronto.edu/~hinton/absps/NatureDeepReview.pdf)

Papers often cited as "classic" Deep Learning resources:
- [A Fast Learning Algorithm for Deep Belief Nets (2006 - Hinton et al)](http://nuyoo.utm.mx/~jjf/rna/A8%20A%20fast%20learning%20algorithm%20for%20deep%20belief%20nets.pdf)
- [Gradient-Based Learning Applied to Document Recognition (1998 - LeCun, Bengio et al)](http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf)
- [ImageNet Classification with Deep Convolutional Neural Networks (2012 - Hinton et al)](https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf)


Contents

- [History](#history)
- [Tensorflow Installation](#install)
- [Hello World](#hello-world)


History

- First "deep learning" networks published in 1965
- McCulloch and Pitts credited with first ANN model in 1943
- Frank Rosenblatt (1958) created the (much overly hyped) Perceptron algorithm 
- Vladimir Arnold (Kolmogrov student) showed that single-layer NN can be used to solve Hilbert's 13th problem
- Minsky and Papert (1969) showed that Perceptron could not solve XOR problem
- Neural network research mostly abandoned until the mid 80s
- In 1989, universal approximation of NNs shown (Cybenko, Funahashi, Hornik, Stinchcombe, White)
- 1962 - Backpropogation
- 

Notes

Word2Vec Explanation:
- http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/

In [None]:
Image(url='https://i.stack.imgur.com/ddJFC.png', width=800)

In [None]:
import jupyter_core
custom_css = jupyter_core.paths.jupyter_config_dir() + '/custom/custom.css'
"File: {}".format(custom_css)

In [None]:
Image(url='http://ww2.tnstate.edu/ganter/BIO-311-Ch12-Eq5a.gif', width=500)

In [None]:
from IPython.display import Image
Image(url='https://i.stack.imgur.com/bmg5Z.png')

In [None]:
from IPython.display import Image
#Image(url='https://www.cs.toronto.edu/~frossard/post/linear_regression/sgd.gif')   
Image(url='https://alykhantejani.github.io/images/gradient_descent_line_graph.gif')

In [None]:
np.expand_dims([1, 2, 3], 1)

In [2]:

import tensorflow as tf

#activation = lambda x: tf.reduce_max(x, axis=0)
activation = lambda x: tf.abs(x)

def get_one_layer_network(b, w1, w2):
    X = tf.placeholder(tf.float64, shape=(None, 2))
    
    nw = len(w1[0])
    b = tf.constant(b, dtype=tf.float64)
    w1 = tf.constant(w1, shape=[2, nw], dtype=tf.float64)
    z = activation(b + tf.matmul(X, w1))
    
    w2 = tf.constant(w2, shape=[nw], dtype=tf.float64)
    y = tf.reduce_sum(tf.multiply(z, w2), axis=1)
    return X, y

def tf_print(t, transform=None):
    def log_value(x):
        logger.info('{} - {}'.format(t.name, x if transform is None else transform(x)))
        return x
    log_op = tf.py_func(log_value, [t], [t.dtype], name=t.name.split(':')[0])[0]
    with tf.control_dependencies([log_op]):
        r = tf.identity(t)
    return r

def get_two_layer_network(b1, b2, w1, w2, w3):
    X = tf.placeholder(tf.float64, shape=(None, 2))
    
    b1 = tf.constant(b1, dtype=tf.float64)
    w1 = tf.constant(w1, shape=[2, 2], dtype=tf.float64)
    z1 = activation(b1 + tf.matmul(X, w1))
    
    b2 = tf.constant(b2, dtype=tf.float64)
    w2 = tf.constant(w2, shape=[2, 1], dtype=tf.float64)
    z2 = activation(b2 + tf.matmul(z1, w2))

    y = tf.reduce_sum(tf.multiply(z2, w3), axis=1)
    return X, y


sess_config = tf.ConfigProto(device_count = {'GPU': 0}, log_device_placement=True)
tf.logging.set_verbosity(tf.logging.DEBUG)


def get_network_response_surface(X, network_fn):
    with tf.Session(config=sess_config) as sess:
        X_, y = network_fn()

        with tf.device('/cpu:0'):
            yv = sess.run(y, feed_dict={X_: X})
        return yv
    
def plot_network_response_surface(v, y):
    trace = go.Surface(x=v, y=v, z=y.reshape((len(v), -1), order='C'))
    layout = go.Layout()
    fig = go.Figure(data=[trace])
    plty.offline.iplot(fig)

In [4]:
v = np.linspace(-15, 15, num=21)
X = np.hstack([np.expand_dims(x.ravel(), 1) for x in np.meshgrid(v, v)])
X = X.astype(np.float64)

In [7]:
b = [0, 0]
w1 = [[1, 0], [0, 1]]
w2 = [.1, .1]
network_fn = lambda: get_one_layer_network(b, w1, w2)
y = get_network_response_surface(X, network_fn)
plot_network_response_surface(v, y)

In [8]:
b = [-5, -5, -5]
w1 = [[1, -1, 0], [1, 1, -3]]
w2 = [.1, .1, .1]
network_fn = lambda: get_one_layer_network(b, w1, w2)
y = get_network_response_surface(X, network_fn)

In [9]:
plot_network_response_surface(v, y)

In [316]:
b1 = [0, 0]
b2 = [-15]
w1 = [[1, 0], [0, 1]]
w2 = [1, 1]
w3 = [1]
network_fn = lambda: get_two_layer_network(b1, b2, w1, w2, w3)
y = get_network_response_surface(X, network_fn)

In [317]:
plot_network_response_surface(v, y)

In [169]:
y[:21]

array([  0. ,   0. ,   0. ,   0. ,   0. ,   0. ,   0. ,   0. ,   0. ,
         0. ,   0. ,   1.5,   3. ,   4.5,   6. ,   7.5,   9. ,  10.5,
        12. ,  13.5,  15. ])

# Alzheimers

In [None]:
d = pd.read_csv('~/Downloads/Alzheimers.csv')
d.head()

In [None]:
d['gender'].value_counts()

In [None]:
d['gender'] = d['gender'].str.upper().str[0]
d['gender'].value_counts()

In [None]:
d.groupby(['gender', 'Genotype']).size()

In [None]:
d['Genotype'].value_counts()

### Encoding

In [None]:
d['response'].value_counts()

In [None]:
df = d.copy()

df = pd.get_dummies(df, prefix='gender', prefix_sep=':', columns=['gender'])
df = pd.get_dummies(df, prefix='genotype', prefix_sep=':', columns=['Genotype'])
df['response'] = df['response'].map({'NotImpaired': 0, 'Impaired': 1})
df.filter(regex='age|gender|genotype|response').sample(n=6)

In [None]:
len(d)

In [None]:
cy = 'response'
X, y = df[[c for c in df if c != cy]].astype(np.float64), df[cy]

In [None]:
pd.set_option('display.max_info_columns', 1000)
X.info()

In [None]:
tf.real

In [None]:
tf.contrib.learn.DNNClassifier

In [None]:
# from sklearn.datasets import load_iris
# load_iris().data

In [None]:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from tensorflow.contrib.learn.python.learn.estimators.estimator import SKCompat
from tensorflow.contrib.learn.python.learn.estimators import run_config

#feature_columns = [tf.contrib.layers.real_valued_column(c) for c in X]
feature_columns = tf.contrib.learn.infer_real_valued_columns_from_input(X.values)
clf = SKCompat(tf.contrib.learn.DNNClassifier(
    [10], feature_columns,
    config=run_config.RunConfig(save_checkpoints_steps=10, save_checkpoints_secs=None, save_summary_steps=10)
))
clf = Pipeline([
    ('scale', StandardScaler()),
    ('clf', clf)
])
clf.fit(X, y, clf__max_steps=100)

In [None]:
from sklearn.base import clone
clf = SKCompat(tf.contrib.learn.DNNClassifier(
    [10], feature_columns,
    config=run_config.RunConfig(save_checkpoints_steps=10, save_checkpoints_secs=None, save_summary_steps=10)
))
clone(clf)

In [None]:
clf.named_steps['clf']

In [None]:
tf.contrib.learn.DNNClassifier()

In [None]:
d['gender'].value_counts()