Skip to content

Commit

Permalink
Merge pull request #7 from neuroinfo-os/AddingImplementationTishby
Browse files Browse the repository at this point in the history
Add Tishbys initial implementation
  • Loading branch information
jrebstadt committed Apr 18, 2018
2 parents 7e2094f + 3bc7718 commit 3a3c2ae
Show file tree
Hide file tree
Showing 33 changed files with 2,988 additions and 0 deletions.
14 changes: 14 additions & 0 deletions IDNNs/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
LICENSE CONDITIONS

Copyright (2016) Ravid Shwartz-Ziv
All rights reserved.

For details, see the paper:
Ravid Shwartz-Ziv, Naftali Tishby,
Opening the Black Box of Deep Neural Networks via Information
Arxiv, 2017
Permission to use, copy, modify, and distribute this software and its documentation for educational, research, and non-commercial purposes, without fee and without a signed licensing agreement, is hereby granted, provided that the above copyright notice and this paragraph appear in all copies, modifications, and distributions.

Any commercial use or any redistribution of this software requires a license. For further details, contact Ravid Shwartz-Ziv (ravidziv@gmail.com).

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
49 changes: 49 additions & 0 deletions IDNNs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# IDNNs
## Description
IDNNs is a python library that implements training and calculating of information in deep neural networks
[\[Shwartz-Ziv & Tishby, 2017\]](#IDNNs) in TensorFlow. The library allows you to investigate how networks look on the
information plane and how it changes during the learning.
<img src="https://github.com/ravidziv/IDNNs/blob/master/compare_percent_mnist_5_AND_85_PERCENT_old.JPG" width="1000px"/>

## Prerequisites
- tensorflow r1.0 or higher version
- numpy 1.11.0
- matplotlib 2.0.2
- multiprocessing
- joblib

## Usage
All the code is under the `idnns/` directory.
For training a network and calculate the MI and the gradients of it run the an example in [main.py](main.py).
Off course you can also run only specific methods for running only the training procedure/calculating the MI.
This file has command-line arguments as follow -
- `start_samples` - The number of the first sample for calculate the information
- `batch_size` - The size of the batch
- `learning_rate` - The learning rate of the network
- `num_repeat` - The number of times to run the network
- `num_epochs` - maximum number of epochs for training
- `net_arch` - The architecture of the networks
- `per_data` - The percent of the training data
- `name` - The name for saving the results
- `data_name` - The dataset name
- `num_samples` - The max number of indexes for calculate the information
- `save_ws` - True if we want to save the outputs of the network
- `calc_information` - 1 if we want to calculate the MI of the network
- `save_grads` - True if we want to save the gradients of the network
- `run_in_parallel` - True if we want to run all the networks in parallel mode
- `num_of_bins` - The number of bins that we divide the neurons' output
- `activation_function` - The activation function of the model 0 for thnh 1 for RelU'
- `interval_accuracy_display` - The interval for display accuracy
- `interval_information_display` - The interval for display the information calculation
- `cov_net` - True if we want covnet
- `rand_labels` - True if we want to set random labels
- `data_dir` - The directory for finding the data
The results are save under the folder jobs. Each run create a directory with a name that contains the run properties. In this directory there are the data.pickle file with the data of run and python file that is a copy of the file that create this run.
The data is under the data directory.

For plotting the results we have the file [plot_figures.py](idnns/plot/plot_figures.py).
This file contains methods for plotting diffrent aspects of the data (the information plane, the gradients,the norms, etc).

## References

1. <a name="IDNNs"></a> Ravid. Shwartz-Ziv, Naftali Tishby, [Opening the Black Box of Deep Neural Networks via Information](https://arxiv.org/abs/1703.00810), 2017, Arxiv.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added IDNNs/data/MNIST_data/t10k-images-idx3-ubyte.gz
Binary file not shown.
Binary file added IDNNs/data/MNIST_data/t10k-labels-idx1-ubyte.gz
Binary file not shown.
Binary file added IDNNs/data/MNIST_data/train-images-idx3-ubyte.gz
Binary file not shown.
Binary file added IDNNs/data/MNIST_data/train-labels-idx1-ubyte.gz
Binary file not shown.
Binary file added IDNNs/data/g1.mat
Binary file not shown.
Binary file added IDNNs/data/g2.mat
Binary file not shown.
Binary file added IDNNs/data/var_u.mat
Binary file not shown.
Empty file added IDNNs/idnns/__init__.py
Empty file.
Empty file.
283 changes: 283 additions & 0 deletions IDNNs/idnns/information/entropy_estimators.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,283 @@
#!/usr/bin/env python
# Written by Greg Ver Steeg
# See readme.pdf for documentation
# Or go to http://www.isi.edu/~gregv/npeet.html

import scipy.spatial as ss
from scipy.special import digamma
from math import log
import numpy.random as nr
import numpy as np
import random

# CONTINUOUS ESTIMATORS

def entropy(x, k=3, base=2):
""" The classic K-L k-nearest neighbor continuous entropy estimator
x should be a list of vectors, e.g. x = [[1.3], [3.7], [5.1], [2.4]]
if x is a one-dimensional scalar and we have four samples
"""
assert k <= len(x) - 1, "Set k smaller than num. samples - 1"
d = len(x[0])
N = len(x)
intens = 1e-10 # small noise to break degeneracy, see doc.
x = [list(p + intens * nr.rand(len(x[0]))) for p in x]
tree = ss.cKDTree(x)
nn = [tree.query(point, k + 1, p=float('inf'))[0][k] for point in x]
const = digamma(N) - digamma(k) + d * log(2)
return (const + d * np.mean(map(log, nn))) / log(base)

def centropy(x, y, k=3, base=2):
""" The classic K-L k-nearest neighbor continuous entropy estimator for the
entropy of X conditioned on Y.
"""
hxy = entropy([xi + yi for (xi, yi) in zip(x, y)], k, base)
hy = entropy(y, k, base)
return hxy - hy

def column(xs, i):
return [[x[i]] for x in xs]

def tc(xs, k=3, base=2):
xis = [entropy(column(xs, i), k, base) for i in range(0, len(xs[0]))]
return np.sum(xis) - entropy(xs, k, base)

def ctc(xs, y, k=3, base=2):
xis = [centropy(column(xs, i), y, k, base) for i in range(0, len(xs[0]))]
return np.sum(xis) - centropy(xs, y, k, base)

def corex(xs, ys, k=3, base=2):
cxis = [mi(column(xs, i), ys, k, base) for i in range(0, len(xs[0]))]
return np.sum(cxis) - mi(xs, ys, k, base)

def mi(x, y, k=3, base=2):
""" Mutual information of x and y
x, y should be a list of vectors, e.g. x = [[1.3], [3.7], [5.1], [2.4]]
if x is a one-dimensional scalar and we have four samples
"""
assert len(x) == len(y), "Lists should have same length"
assert k <= len(x) - 1, "Set k smaller than num. samples - 1"
intens = 1e-10 # small noise to break degeneracy, see doc.
x = [list(p + intens * nr.rand(len(x[0]))) for p in x]
y = [list(p + intens * nr.rand(len(y[0]))) for p in y]
points = zip2(x, y)
# Find nearest neighbors in joint space, p=inf means max-norm
tree = ss.cKDTree(points)
dvec = [tree.query(point, k + 1, p=float('inf'))[0][k] for point in points]
a, b, c, d = avgdigamma(x, dvec), avgdigamma(y, dvec), digamma(k), digamma(len(x))
return (-a - b + c + d) / log(base)


def cmi(x, y, z, k=3, base=2):
""" Mutual information of x and y, conditioned on z
x, y, z should be a list of vectors, e.g. x = [[1.3], [3.7], [5.1], [2.4]]
if x is a one-dimensional scalar and we have four samples
"""
assert len(x) == len(y), "Lists should have same length"
assert k <= len(x) - 1, "Set k smaller than num. samples - 1"
intens = 1e-10 # small noise to break degeneracy, see doc.
x = [list(p + intens * nr.rand(len(x[0]))) for p in x]
y = [list(p + intens * nr.rand(len(y[0]))) for p in y]
z = [list(p + intens * nr.rand(len(z[0]))) for p in z]
points = zip2(x, y, z)
# Find nearest neighbors in joint space, p=inf means max-norm
tree = ss.cKDTree(points)
dvec = [tree.query(point, k + 1, p=float('inf'))[0][k] for point in points]
a, b, c, d = avgdigamma(zip2(x, z), dvec), avgdigamma(zip2(y, z), dvec), avgdigamma(z, dvec), digamma(k)
return (-a - b + c + d) / log(base)


def kldiv(x, xp, k=3, base=2):
""" KL Divergence between p and q for x~p(x), xp~q(x)
x, xp should be a list of vectors, e.g. x = [[1.3], [3.7], [5.1], [2.4]]
if x is a one-dimensional scalar and we have four samples
"""
assert k <= len(x) - 1, "Set k smaller than num. samples - 1"
assert k <= len(xp) - 1, "Set k smaller than num. samples - 1"
assert len(x[0]) == len(xp[0]), "Two distributions must have same dim."
d = len(x[0])
n = len(x)
m = len(xp)
const = log(m) - log(n - 1)
tree = ss.cKDTree(x)
treep = ss.cKDTree(xp)
nn = [tree.query(point, k + 1, p=float('inf'))[0][k] for point in x]
nnp = [treep.query(point, k, p=float('inf'))[0][k - 1] for point in x]
return (const + d * np.mean(map(log, nnp)) - d * np.mean(map(log, nn))) / log(base)


# DISCRETE ESTIMATORS
def entropyd(sx, base=2):
""" Discrete entropy estimator
Given a list of samples which can be any hashable object
"""
return entropyfromprobs(hist(sx), base=base)


def midd(x, y, base=2):
""" Discrete mutual information estimator
Given a list of samples which can be any hashable object
"""
return -entropyd(zip(x, y), base) + entropyd(x, base) + entropyd(y, base)

def cmidd(x, y, z):
""" Discrete mutual information estimator
Given a list of samples which can be any hashable object
"""
return entropyd(zip(y, z)) + entropyd(zip(x, z)) - entropyd(zip(x, y, z)) - entropyd(z)

def centropyd(x, y, base=2):
""" The classic K-L k-nearest neighbor continuous entropy estimator for the
entropy of X conditioned on Y.
"""
return entropyd(zip(x, y), base) - entropyd(y, base)

def tcd(xs, base=2):
xis = [entropyd(column(xs, i), base) for i in range(0, len(xs[0]))]
hx = entropyd(xs, base)
return np.sum(xis) - hx

def ctcd(xs, y, base=2):
xis = [centropyd(column(xs, i), y, base) for i in range(0, len(xs[0]))]
return np.sum(xis) - centropyd(xs, y, base)

def corexd(xs, ys, base=2):
cxis = [midd(column(xs, i), ys, base) for i in range(0, len(xs[0]))]
return np.sum(cxis) - midd(xs, ys, base)

def hist(sx):
sx = discretize(sx)
# Histogram from list of samples
d = dict()
for s in sx:
if type(s) == list:
s = tuple(s)
d[s] = d.get(s, 0) + 1
return map(lambda z: float(z) / len(sx), d.values())


def entropyfromprobs(probs, base=2):
# Turn a normalized list of probabilities of discrete outcomes into entropy (base 2)
return -sum(map(elog, probs)) / log(base)


def elog(x):
# for entropy, 0 log 0 = 0. but we get an error for putting log 0
if x <= 0. or x >= 1.:
return 0
else:
return x * log(x)


# MIXED ESTIMATORS
def micd(x, y, k=3, base=2, warning=True):
""" If x is continuous and y is discrete, compute mutual information
"""
overallentropy = entropy(x, k, base)

n = len(y)
word_dict = dict()
for i in range(len(y)):
if type(y[i]) == list:
y[i] = tuple(y[i])
for sample in y:
word_dict[sample] = word_dict.get(sample, 0) + 1. / n
yvals = list(set(word_dict.keys()))

mi = overallentropy
for yval in yvals:
xgiveny = [x[i] for i in range(n) if y[i] == yval]
if k <= len(xgiveny) - 1:
mi -= word_dict[yval] * entropy(xgiveny, k, base)
else:
if warning:
print("Warning, after conditioning, on y=", yval, " insufficient data. Assuming maximal entropy in this case.")
mi -= word_dict[yval] * overallentropy
return np.abs(mi) # units already applied

def midc(x, y, k=3, base=2, warning=True):
return micd(y, x, k, base, warning)

def centropydc(x, y, k=3, base=2, warning=True):
return entropyd(x, base) - midc(x, y, k, base, warning)

def centropycd(x, y, k=3, base=2, warning=True):
return entropy(x, k, base) - micd(x, y, k, base, warning)

def ctcdc(xs, y, k=3, base=2, warning=True):
xis = [centropydc(column(xs, i), y, k, base, warning) for i in range(0, len(xs[0]))]
return np.sum(xis) - centropydc(xs, y, k, base, warning)

def ctccd(xs, y, k=3, base=2, warning=True):
xis = [centropycd(column(xs, i), y, k, base, warning) for i in range(0, len(xs[0]))]
return np.sum(xis) - centropycd(xs, y, k, base, warning)

def corexcd(xs, ys, k=3, base=2, warning=True):
cxis = [micd(column(xs, i), ys, k, base, warning) for i in range(0, len(xs[0]))]
return np.sum(cxis) - micd(xs, ys, k, base, warning)

def corexdc(xs, ys, k=3, base=2, warning=True):
#cxis = [midc(column(xs, i), ys, k, base, warning) for i in range(0, len(xs[0]))]
#joint = midc(xs, ys, k, base, warning)
#return np.sum(cxis) - joint
return tcd(xs, base) - ctcdc(xs, ys, k, base, warning)

# UTILITY FUNCTIONS
def vectorize(scalarlist):
""" Turn a list of scalars into a list of one-d vectors
"""
return [[x] for x in scalarlist]


def shuffle_test(measure, x, y, z=False, ns=200, ci=0.95, **kwargs):
""" Shuffle test
Repeatedly shuffle the x-values and then estimate measure(x, y, [z]).
Returns the mean and conf. interval ('ci=0.95' default) over 'ns' runs.
'measure' could me mi, cmi, e.g. Keyword arguments can be passed.
Mutual information and CMI should have a mean near zero.
"""
xp = x[:] # A copy that we can shuffle
outputs = []
for i in range(ns):
random.shuffle(xp)
if z:
outputs.append(measure(xp, y, z, **kwargs))
else:
outputs.append(measure(xp, y, **kwargs))
outputs.sort()
return np.mean(outputs), (outputs[int((1. - ci) / 2 * ns)], outputs[int((1. + ci) / 2 * ns)])


# INTERNAL FUNCTIONS

def avgdigamma(points, dvec):
# This part finds number of neighbors in some radius in the marginal space
# returns expectation value of <psi(nx)>
N = len(points)
tree = ss.cKDTree(points)
avg = 0.
for i in range(N):
dist = dvec[i]
# subtlety, we don't include the boundary point,
# but we are implicitly adding 1 to kraskov def bc center point is included
num_points = len(tree.query_ball_point(points[i], dist - 1e-15, p=float('inf')))
avg += digamma(num_points) / N
return avg


def zip2(*args):
# zip2(x, y) takes the lists of vectors and makes it a list of vectors in a joint space
# E.g. zip2([[1], [2], [3]], [[4], [5], [6]]) = [[1, 4], [2, 5], [3, 6]]
return [sum(sublist, []) for sublist in zip(*args)]

def discretize(xs):
def discretize_one(x):
if len(x) > 1:
return tuple(x)
else:
return x[0]
# discretize(xs) takes a list of vectors and makes it a list of tuples or scalars
return [discretize_one(x) for x in xs]

if __name__ == "__main__":
print("NPEET: Non-parametric entropy estimation toolbox. See readme.pdf for details on usage.")

0 comments on commit 3a3c2ae

Please sign in to comment.