In [1]:
import tensorflow as tf
tf.compat.v1.disable_eager_execution()
print(tf.__version__)

2.0.0


# Shape Error:

In [2]:
def some_method(data):
  a = data[:,0:2]
  c = data[:,1]
  s = (a + c)
  return tf.sqrt(tf.matmul(s, tf.transpose(s)))

with tf.compat.v1.Session() as sess:
  fake_data = tf.constant([
      [5.0, 3.0, 7.1],
      [2.3, 4.1, 4.8],
      [2.8, 4.2, 5.6],
      [2.9, 8.3, 7.3]
    ])
  print(sess.run(some_method(fake_data)))

ValueError: Dimensions must be equal, but are 2 and 4 for 'add' (op: 'AddV2') with input shapes: [4,2], [4].

We know that matrix / tensor addition will not work, unless those are not in same shape. From above error message, we can see that shape of matrix a and c are different. So lets debug our code.

##### Debug Shape Error:

Lets first print the shape of matrix a and c.

In [3]:
def some_method(data):
  a = data[:,0:2]
  print('a.get_shape() = {}'.format(a.get_shape()))
  c = data[:,1]
  print('c.get_shape() = {}'.format(c.get_shape()))
  s = (a + c)
  return tf.sqrt(tf.matmul(s, tf.transpose(s)))

with tf.compat.v1.Session() as sess:
  fake_data = tf.constant([
      [5.0, 3.0, 7.1],
      [2.3, 4.1, 4.8],
      [2.8, 4.2, 5.6],
      [2.9, 8.3, 7.3]
    ])
  print(sess.run(some_method(fake_data)))

a.get_shape() = (4, 2)
c.get_shape() = (4,)


ValueError: Dimensions must be equal, but are 2 and 4 for 'add_1' (op: 'AddV2') with input shapes: [4,2], [4].

##### Fix Shape Error:

In [4]:
def some_method(data):
  a = data[:,0:2]
  print('a.get_shape() = {}'.format(a.get_shape()))
  c = data[:,1:3] 
  print('c.get_shape() = {}'.format(c.get_shape()))
  assert(len(c.get_shape()) == 2)
  s = (a + c)
  return tf.sqrt(tf.matmul(s, tf.transpose(s)))

with tf.compat.v1.Session() as sess:
  fake_data = tf.constant([
      [5.0, 3.0, 7.1],
      [2.3, 4.1, 4.8],
      [2.8, 4.2, 5.6],
      [2.9, 8.3, 7.3]
    ])
  print(sess.run(some_method(fake_data)))

a.get_shape() = (4, 2)
c.get_shape() = (4, 2)
[[12.884487 11.878131 12.449096 15.721323]
 [11.878131 10.962208 11.489996 14.509306]
 [12.449096 11.489996 12.043255 15.207892]
 [15.721323 14.509306 15.207892 19.204166]]


# Shape Error Because Of Batch Size:

Shape problems can also happen because of batch size. Remember that when I was talking about shapes of tensors, I said that sometimes one or more of the dimensions might be variable length. One common reason for variable length tensors is when you're writing a program that deals with batches. Batches are all usually the same size. For example 64 examples each time, except when it gets to the end of the input file. At that time though, you may not have 64 examples to fill the batch and so you might create a tensor that contains only say 42 examples.

In [5]:
X = tf.compat.v1.placeholder(dtype=tf.float32, shape=(None,64,64,3), name='X') # 64 By 64 Color Image(So 3)
print('X = {}'.format(X))

X = Tensor("X:0", shape=(None, 64, 64, 3), dtype=float32)


# Solution Of Shape Problem:

#### Solution 1: tf.reshape()

In [6]:
X = tf.constant(
    value=
    [[1,2,3],
    [4,5,6]],
    dtype=tf.int32,
    shape=(2,3),
    name='X')
print('X = {}'.format(X))

X = Tensor("X_1:0", shape=(2, 3), dtype=int32)


In [7]:
reshaped_X = tf.reshape(X, shape=(3,2),name='reshaped_X')
print('reshaped_X = {}'.format(reshaped_X))

reshaped_X = Tensor("reshaped_X:0", shape=(3, 2), dtype=int32)


In [8]:
with tf.compat.v1.Session() as sess:
    print('X = {}'.format(X.eval()))
    print('reshaped_X = {}'.format(reshaped_X.eval()))

X = [[1 2 3]
 [4 5 6]]
reshaped_X = [[1 2]
 [3 4]
 [5 6]]


#### Solution 2: tf.expand_dims()

tf.expand_dims is a way of changing the shape by inserting a dimension of 1 into a tensor shape. Say, we have a matrix X, which is a 3 By 2 matrix. When we call tf.expand_dims on X, we specify which dimension we want to insert a new thing at. We say one, which in Python means a second place, 0 will mean first place. So, the expanded shape changes from (3, 2) to (3, 1, 2). 

In [9]:
X = tf.constant([
    [3,2],
    [4,5],
    [6,7]
])
print('X = {}'.format(X))

X = Tensor("Const_3:0", shape=(3, 2), dtype=int32)


In [10]:
expanded_dims_1 = tf.expand_dims(X,axis=0)
print('expanded_dims_1 = {}'.format(expanded_dims_1))

expanded_dims_1 = Tensor("ExpandDims:0", shape=(1, 3, 2), dtype=int32)


In [11]:
expanded_dims_2 = tf.expand_dims(X,axis=1)
print('expanded_dims_2 = {}'.format(expanded_dims_2))

expanded_dims_2 = Tensor("ExpandDims_1:0", shape=(3, 1, 2), dtype=int32)


In [12]:
expanded_dims_3 = tf.expand_dims(X,axis=2)
print('expanded_dims_3 = {}'.format(expanded_dims_3))

expanded_dims_3 = Tensor("ExpandDims_2:0", shape=(3, 2, 1), dtype=int32)


In [13]:
with tf.compat.v1.Session() as sess:
    print('X = {}'.format(X.eval()))
    print('expanded_dims_1 = {}'.format(expanded_dims_1.eval()))
    print('expanded_dims_2 = {}'.format(expanded_dims_2.eval()))
    print('expanded_dims_3 = {}'.format(expanded_dims_3.eval()))

X = [[3 2]
 [4 5]
 [6 7]]
expanded_dims_1 = [[[3 2]
  [4 5]
  [6 7]]]
expanded_dims_2 = [[[3 2]]

 [[4 5]]

 [[6 7]]]
expanded_dims_3 = [[[3]
  [2]]

 [[4]
  [5]]

 [[6]
  [7]]]


#### Solution 3: tf.slice()

tf.slice() extracts a slice from a tensor.

###### Example 1:

In [14]:
X = tf.constant([
    [3,2],
    [4,5],
    [6,7]
])
print('X = {}'.format(X))

X = Tensor("Const_4:0", shape=(3, 2), dtype=int32)


In [15]:
slice_1 = tf.slice(input_=X,begin=[0,1],size=[2,1])
print('slice_1 = {}'.format(slice_1))

slice_1 = Tensor("Slice:0", shape=(2, 1), dtype=int32)


`begin=[0,1]`, means that start from row 1 column 2 position (Which is start from element 2) and `size=[2,1]` means that starting from this position go down 2 index in below and return the result.

In [16]:
alt_slice_1 = X[0:2, 1]
print('alt_slice_1 = {}'.format(alt_slice_1))

alt_slice_1 = Tensor("strided_slice_6:0", shape=(2,), dtype=int32)


In [17]:
slice_2 = tf.slice(input_=X,begin=[0,1],size=[3,1])
print('slice_2 = {}'.format(slice_2))

slice_2 = Tensor("Slice_1:0", shape=(3, 1), dtype=int32)


In [18]:
with tf.compat.v1.Session() as sess:
    print('X = {}'.format(X.eval()))
    print('slice_1 = {}'.format(slice_1.eval()))
    print('alt_slice_1 = {}'.format(alt_slice_1.eval()))
    print('slice_2 = {}'.format(slice_2.eval()))

X = [[3 2]
 [4 5]
 [6 7]]
slice_1 = [[2]
 [5]]
alt_slice_1 = [2 5]
slice_2 = [[2]
 [5]
 [7]]


###### Example 1:

In [19]:
X = tf.constant([
    [3,2,5,6],
    [4,5,9,8],
    [6,7,1,2]
])
print('X = {}'.format(X))

X = Tensor("Const_5:0", shape=(3, 4), dtype=int32)


In [20]:
slice_1 = tf.slice(input_=X,begin=[0,1],size=[2,2])
print('slice_1 = {}'.format(slice_1))

slice_1 = Tensor("Slice_2:0", shape=(2, 2), dtype=int32)


In [21]:
slice_2 = tf.slice(input_=X,begin=[0,1],size=[3,3])
print('slice_2 = {}'.format(slice_2))

slice_2 = Tensor("Slice_3:0", shape=(3, 3), dtype=int32)


In [22]:
with tf.compat.v1.Session() as sess:
    print('X = {}'.format(X.eval()))
    print('slice_1 = {}'.format(slice_1.eval()))
    print('slice_2 = {}'.format(slice_2.eval()))

X = [[3 2 5 6]
 [4 5 9 8]
 [6 7 1 2]]
slice_1 = [[2 5]
 [5 9]]
slice_2 = [[2 5 6]
 [5 9 8]
 [7 1 2]]


#### Solution 4: tf.squeeze()

Squeeze is the inverse operation to expand dims. Expand dims lets you insert a dimension of size one anywhere within the tensor. Squeeze lets you remove dimensions of size one from the shape of a tensor.

In [23]:
X = tf.constant([
    [[1],[2],[3],[4]],
    [[5],[6],[7],[8]]
])
print('X = {}'.format(X))

X = Tensor("Const_6:0", shape=(2, 4, 1), dtype=int32)


In [24]:
squeezed_1 = tf.squeeze(input=X,axis=2)
print('squeezed_1 = {}'.format(squeezed_1))

squeezed_1 = Tensor("Squeeze:0", shape=(2, 4), dtype=int32)


In [25]:
squeezed_2 = tf.squeeze(input=X,axis=2)
print('squeezed_2 = {}'.format(squeezed_2))

squeezed_2 = Tensor("Squeeze_1:0", shape=(2, 4), dtype=int32)


In [26]:
with tf.compat.v1.Session() as sess:
    print('X = {}'.format(X.eval()))
    print('squeezed_1 = {}'.format(squeezed_1.eval()))

X = [[[1]
  [2]
  [3]
  [4]]

 [[5]
  [6]
  [7]
  [8]]]
squeezed_1 = [[1 2 3 4]
 [5 6 7 8]]


# Data Type Problem:

In [27]:
def add_tensor(a,b):
    return tf.add(a,b)

In [28]:
with tf.compat.v1.Session() as sess:
    fake_data_a = tf.constant([
        [5.0, 3.0, 7.1],
        [2.3, 4.1, 4.8]
    ])
    fake_data_b = tf.constant([
        [2, 4, 5],
        [2, 8, 7]
    ])
    result = add_tensor(fake_data_a, fake_data_b)
    print('Result = {}'.format(result))

TypeError: Input 'y' of 'Add' Op has type int32 that does not match type float32 of argument 'x'.

###### Fix The Problem:

In [29]:
def add_tensor_solved(a,b):
    a = tf.cast(a, tf.float32)
    b = tf.cast(b, tf.float32)
    return tf.add(a,b)

In [30]:
with tf.compat.v1.Session() as sess:
    fake_data_a = tf.constant([
        [5.0, 3.0, 7.1],
        [2.3, 4.1, 4.8]
    ])
    fake_data_b = tf.constant([
        [2, 4, 5],
        [2, 8, 7]
    ])
    result = add_tensor_solved(fake_data_a, fake_data_b)
    print('Result = {}'.format(result.eval()))

Result = [[ 7.   7.  12.1]
 [ 4.3 12.1 11.8]]


# Debugging Full Blown Tensorflow Programme:

In the previous few lessons, we talked about how you can debug a TensorFlow program by looking at the error message, isolating the method in question, feeding it fake data, and then fixing the error once we understand what's going on. Sometimes, though, the problems are more subtle. They only happen when specific things happen. And you may not be able to identify why things are working for five, six, seven batches, and then all of a sudden you get an error, and then things will go back to normal. In other words, when the errors are associated with some specific input value or condition of the execution system. At that point, you need to debug the full-blown program, and there are three methods to do this.

### Method 1: tf.print()

tf.Print() is a way to print out the values of tensors when specific conditions are met. tf.Print() can be used to log specific tensor values.

Perhaps you're dividing a by b and you're getting NAN, not a number, NAN, in the output and you want to figure out the value of a and the value of b that's causing the problem. Well, if you print a, you would only get the debug output of the tensor, you won't get its value. Lazy execution, remember, you have to evaluate a tensor to get its value, so you don't want to print the value of a every single time. The idea here is a `print_ab` is a tensor, it wraps s and prints out a and b. I then replace s in the graph by print_ab only for those batches where s is NAN. Example:

In [36]:
%%writefile debugger.py
import tensorflow as tf

def devide_tensor(a,b):
    a = tf.cast(a, tf.float32)
    b = tf.cast(b, tf.float32)
    s = tf.divide(a,b) # OPPS NAN
    print_ab = tf.print(s,[a,b])
    s = tf.where(tf.is_nan(s), print_ab, s)
    return s

with tf.compat.v1.Session() as sess:
    fake_a = tf.constant([
        [5.0, 3.0, 7.1],
        [2.3, 4.1, 4.8],
    ])
    fake_b = tf.constant([
        [2, 0, 5],
        [2, 8, 7],
    ])
    result = devide_tensor(fake_a, fake_b)
    print('Result = {}'.format(result))

Overwriting debugger.py


In [37]:
%%bash
python debugger.py

2019-11-24 15:22:01.349088: I tensorflow/core/platform/cpu_feature_guard.cc:145] This TensorFlow binary is optimized with Intel(R) MKL-DNN to use the following CPU instructions in performance critical operations:  SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in non-MKL-DNN operations, rebuild TensorFlow with the appropriate compiler flags.
2019-11-24 15:22:01.372316: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2904000000 Hz
2019-11-24 15:22:01.372715: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x56085ba0b4a0 executing computations on platform Host. Devices:
2019-11-24 15:22:01.372741: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Host, Default Version
2019-11-24 15:22:01.373035: I tensorflow/core/common_runtime/process_util.cc:115] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
Traceback (most recent call last):
  File "debugger.py", li

CalledProcessError: Command 'b'python debugger.py\n'' returned non-zero exit status 1.

This has to be done in a standalone program, because Jupyter Notebook consumes the tensor for log messages. Hence my workaround of writing the code to a file (debugger.py) and then running it. You tend to use tf.Print on running TensorFlow programs to diagnose rare errors, and make sure to capture things in the logs

### Method 2: tf.debug()

tf.debug() is an interactive debugger that you can run from a terminal and attach to a local or remote TensorFlow session.

TensorFlow also has a dynamic interactive debugger called tf.debug(). You run it from the command line. So you run the TensorFlow program from a terminal as a standalone program. And then when you run it, you would add the command line flag `python debugger.py --debug`. This is also helpful to debug remotely running TensorFlow programs, in other words, you can attach to the program. There are also special debug hooks for debugging experiment and estimator programs. And once a program starts, you can use a debugger to step through the code, set break points, etc. If you've ever used an interactive debugger for any other language or environment, the terminology, steps, break points, etc., they should all be pretty familiar.

In [41]:
import tensorflow as tf
from tensorflow.python import debug as tf_debug

def some_method(a, b):
  b = tf.cast(b, tf.float32)
  s = (a / b)
  s2 = tf.matmul(s, tf.transpose(s))
  return tf.sqrt(s2)

with tf.compat.v1.Session() as sess:
  fake_a = [
      [5.0, 3.0, 7.1],
      [2.3, 4.1, 4.8],
    ]
  fake_b = [
      [2, 0, 5],
      [2, 8, 7]
    ]
  a = tf.compat.v1.placeholder(tf.float32, shape=[2, 3])
  b = tf.compat.v1.placeholder(tf.int32, shape=[2, 3])
  k = some_method(a, b)
  
  # Note: won't work without the ui_type="readline" argument because
  # Datalab is not an interactive terminal and doesn't support the default "curses" ui_type.
  # If you are running this a standalone program, omit the ui_type parameter and add --debug
  # when invoking the TensorFlow program
  #      --debug (e.g: python debugger.py --debug )
  sess = tf_debug.LocalCLIDebugWrapperSession(sess, ui_type="readline")
  sess.add_tensor_filter("has_inf_or_nan", tf_debug.has_inf_or_nan)
  print(sess.run(k, feed_dict = {a: fake_a, b: fake_b}))

run-start: run #1: 1 fetch (Sqrt_1:0); 2 feeds

TTTTTT FFFF DDD  BBBB   GGG 
  TT   F    D  D B   B G    
  TT   FFF  D  D BBBB  G  GG
  TT   F    D  D B   B G   G
  TT   F    DDD  BBBB   GGG 

TensorFlow version: 2.0.0

Session.run() call #1:

Fetch(es):
  Sqrt_1:0

Feed dict:
  Placeholder:0
  Placeholder_1:0

Select one of the following commands to proceed ---->
  run:
    Execute the run() call with debug tensor-watching
  run -n:
    Execute the run() call without debug tensor-watching
  run -t <T>:
    Execute run() calls (T - 1) times without debugging, then execute run() once more with debugging and drop back to the CLI
  run -f <filter_name>:
    Keep executing run() calls until a dumped tensor passes a given, registered filter (conditional breakpoint mode)
    Registered filter(s):
        * has_inf_or_nan

For more details, see help..


tfdbg> run
run-end: run #1: 1 fetch (Sqrt_1:0); 2 feeds
6 dumped tensor(s):

t (ms)    Size (B) Op type    Tensor name
[0.000]   232      _A

### Method 3: TensorBoard

TensorBoard is a visual monitoring tool. We talked about this as a way to look at the DAG (Direct Acyclic Graph), but there's more kinds of troubleshooting that you can do with TensorBoard. You can look at evaluation metrics, look for over-fitting, layers that are dead, etc. Higher level debugging of neural networks, in other words. We look at TensorBoard in a future chapter of this course, ubt for now I just wanted to drop in a placeholder so you know and you keep in mind that TensorBoard is a powerful troubleshooting tool.

### Method 4: Change The Log Level To Info

The default level in terms of logging for TensorFlow programs is WARN. So it runs sort of quiet. Change the log level to INFO to see many more log messages as TensorFlow trains. You can change the log level by using tf.logging and setting the verbosity level. The levels are debug, info, warn, error, and fatal, in that order. Debug is the most verbose, and fatal is the most quiet, info is what I tend to use in development, and warn is what I tend to use in production. Of course, you can set up a command line parameter to switch from one to the other.

In [None]:
tf.logging.set_verbosity(tf.logging.INFO)