## Tensors

* constant tensor `tf.constant()`
    * Value of tensor never changes, hence *constant*.
    * `tf.constant(1234)` is a 0-dimensional int32 tensor
    * `tf.constant([1,2,3,4])` is a 4-dimensional int32 tensor

Sample code:

In [1]:
import tensorflow as tf

# Create TensorFlow object called tensor
hello_constant = tf.constant('Hello World!')

## Session
* An environment for running a graph. In charge of allocating the operations to GPU(s) and/or CPU(s).

Continuing our example:

In [3]:
with tf.Session() as sess:
    output = sess.run(hello_constant)
    print(output)

b'Hello World!'


## Input

* `tf.placeholder()`: returns a tensor that gets it’s value from data passed to the `tf.session.run()` function, allowing you to set the input right before the session runs.
* `feed_dict`: Use the feed_dict parameter in tf.session.run() to set the placeholder tensor. 

Example: 


In [4]:
a = tf.placeholder(tf.string)
b = tf.placeholder(tf.int32)
c = tf.placeholder(tf.float32)
with tf.Session() as sess:
    output = sess.run(a, feed_dict={a: 'hi', b: 23, c: 32.0})
    print(output)

hi


It also works if you feed it only `{a: 'hi'}`, i.e. the relevant placeholder value(s).


## Maths

In [9]:
# Add, subtract, multiply and divide operations
add = tf.add(5, 2) # 7
sub = tf.sub(10, 4) # 6
mul = tf.mul(2, 5)  # 10
div = tf.div(10, 5) # 2

with tf.Session() as sess:
    output = [sess.run(add), sess.run(sub), sess.run(mul), 
              sess.run(div)]
    print(output)

[7, 6, 10, 2]


[TF Math documentation](https://www.tensorflow.org/versions/r0.11/api_docs/python/math_ops.html)

## Variables

* `tf.Variable()` function creates a tensor with an initial value that can be modified later, much like a normal Python variable. This tensor stores it’s state in the session, so you must use the `tf.initialize_all_variables()` function to initialize the state of the tensor.


In [14]:
# Initialisation

def variables():
    output = None
    
    x = tf.Variable([1, 2, 3, 4])
    
    # Initialise all variables
    init = tf.initialize_all_variables()
    
    with tf.Session() as sess:
        sess.run(init)
        output = sess.run(x)
    
    return output

variables()

array([1, 2, 3, 4], dtype=int32)

In [13]:
# Logistic Regression

def logits():
    output = None
    x_data = [[1.0, 2.0], [2.5, 6.3]]
    test_weights = [[-0.3545495, -0.17928936], [-0.63093454, 0.74906588]]
    class_size = 2
    
    
    x = tf.placeholder(tf.float32)
    weights = tf.Variable(test_weights)
    biases = tf.Variable(tf.zeros([class_size]))
    
    # ToDo: Implement wx + b in TensorFlow
    logits = tf.matmul(weights, x)
    
    init = tf.initialize_all_variables()
    with tf.Session() as sess:
        sess.run(init)
        output = sess.run(logits, feed_dict={x: x_data})
        
    return output

logits()

array([[-0.80277288, -1.83862185],
       [ 1.24173021,  3.4572463 ]], dtype=float32)

## Softmax

Turns logits into probabilities that sum to 1.
* `tf.nn.softmax()`.

Example of how it works:

```
# logits is a one-dimensional array with 3 elements
logits = [1.0, 2.0, 3.0]
# softmax will return a one-dimensional array with 3 elements
print softmax(logits)

[ 0.09003057  0.24472847  0.66524096]

# logits is a two-dimensional array
logits = np.array([
    [1, 2, 3, 6],
    [2, 4, 5, 6],
    [3, 8, 7, 6]])
# softmax will return a two-dimensional array with the same shape
print softmax(logits)


[
    [ 0.09003057  0.00242826  0.01587624  0.33333333]
    [ 0.24472847  0.01794253  0.11731043  0.33333333]
    [ 0.66524096  0.97962921  0.86681333  0.33333333]
]
```

In [None]:
# Softmax function in ram Python

import numpy as np

def softmax(x):
    """Compute softmax values for each sets of scores in x."""
    # TODO: Compute and return softmax(x)
    # S(y_i) = (e**(y_i) / sum_over_j(e**y_j))
    return np.exp(x) / np.sum(np.exp(x), axis=0)

That's some elegant Numpy code.

In [None]:
# Softmax with TF

import tensorflow as tf


def run():
    output = None
    logit_data = [2.0, 1.0, 0.1]
    logits = tf.placeholder(tf.float32)
    
    # ToDo: Calculate the softmax of the logits
    softmax = tf.nn.softmax(logits)    
    
    with tf.Session() as sess:
        # ToDo: Feed in the logits data
        output = sess.run(softmax, feed_dict={logits: logit_data})

    return output

Scaling and Softmax
* When you divide all the logits by e.g. 10, the probabilities get closer to the uniform distribution.
* When you multiply all the logits by e.g. 10, the probabilities get closer to 0.0 or 1.0.

## One-Hot Encodings
* Vectors with one 1.0 and 0.0 everywhere else.


## ReLUs: f(x) = max(0,x)
*Adding nonlinearities*

A Rectified linear unit (ReLU) is type of **activation function** that is defined as `f(x) = max(0, x)`. The function returns 0 if `x` is negative, otherwise it returns `x`. TensorFlow provides the ReLU function as `tf.nn.relu()`, as shown below.

![](images/relu.png)

In [None]:
# Hidden Layer with ReLU activation function
hidden_layer = tf.add(tf.matmul(features, weights), biases)
hidden_layer = tf.nn.relu(hidden_layer)

output = tf.add(tf.matmul(hidden_layer, weights), biases)

The above code applies the `tf.nn.relu()` function to the `hidden_layer`, effectively turning off any negative weights and acting like an on/off switch. Adding additional layers, like the output layer, after an activation function turns the model into a nonlinear function. This nonlinearity allows the network to solve more complex problems.



It's interesting how you just add `hidden_layer=tf.nn.relu(hidden_layer)`.

In [None]:
# Solution is available in the other "solution.py" tab
def run():
    output = None
    hidden_layer_weighats = [
        [0.1, 0.2, 0.4],
        [0.4, 0.6, 0.6],
        [0.5, 0.9, 0.1],
        [0.8, 0.2, 0.8]]
    out_weights = [
        [0.1, 0.6],
        [0.2, 0.1],
        [0.7, 0.9]]

    # Weights and biases
    weights = [
        tf.Variable(hidden_layer_weights),
        tf.Variable(out_weights)]
    biases = [
        tf.Variable(tf.zeros(3)),
        tf.Variable(tf.zeros(2))]

    # Input
    features = tf.Variable([[1.0, 2.0, 3.0, 4.0], [-1.0, -2.0, -3.0, -4.0], [11.0, 12.0, 13.0, 14.0]])

    # Model
    hidden_layer = tf.matmul(features, weights[0]) + biases[0]
    # ToDo: Apply activation using a single Relu
    hidden_layer = tf.nn.relu(hidden_layer)
    logits = tf.matmul(hidden_layer, weights[1]) + biases[1]

    # Calculate logits
    with tf.Session() as sess:
        sess.run(tf.initialize_all_variables())
        output = sess.run(logits)

    return output


## DNN in Tensorflow

In [1]:
import tensorflow as tf

### Import data

In [14]:
help(input_data.read_data_sets)

Help on function read_data_sets in module tensorflow.contrib.learn.python.learn.datasets.mnist:

read_data_sets(train_dir, fake_data=False, one_hot=False, dtype=tf.float32)



In [17]:
mnist

Datasets(train=<tensorflow.contrib.learn.python.learn.datasets.mnist.DataSet object at 0x11a43deb8>, validation=<tensorflow.contrib.learn.python.learn.datasets.mnist.DataSet object at 0x11a43df98>, test=<tensorflow.contrib.learn.python.learn.datasets.mnist.DataSet object at 0x11a43def0>)

In [22]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets(".", one_hot=True)
# Udacity version included reshape=False but this got the 
# 'unexpected keyword' error

Extracting ./train-images-idx3-ubyte.gz
Extracting ./train-labels-idx1-ubyte.gz
Extracting ./t10k-images-idx3-ubyte.gz
Extracting ./t10k-labels-idx1-ubyte.gz


### Learning parameters

In [4]:
# Learning Parameters
## Usually we have to find these.
learning_rate = 0.001
training_epochs = 15
batch_size = 100
display_step = 1

n_input = 784  # MNIST data input (img shape: 28*28)
n_classes = 10  # MNIST total classes (0-9 digits)

### Hidden layer width

In [5]:
n_hidden_layer = 256 # layer number of features (width of a layer)

### Weights and biases

In [6]:
# Store layers weight & bias
weights = {
    'hidden_layer': tf.Variable(tf.random_normal([n_input, n_hidden_layer])),
    'out': tf.Variable(tf.random_normal([n_hidden_layer, n_classes]))
}
biases = {
    'hidden_layer': tf.Variable(tf.random_normal([n_hidden_layer])),
    'out': tf.Variable(tf.random_normal([n_classes]))
}

### Input

In [20]:
# tf Graph input
x = tf.placeholder("float", [None, n_input])
y = tf.placeholder("float", [None, n_classes])
x_flat = x

In [27]:
tf.shape(x)

<tf.Tensor 'Shape_1:0' shape=(2,) dtype=int32>

In [28]:
x_flat2 = tf.reshape(x, [-1, n_input])

In [29]:
tf.shape(x_flat2)

<tf.Tensor 'Shape_2:0' shape=(2,) dtype=int32>

### Multilayer Perceptron

In [8]:
# Hidden layer with RELU activation
layer_1 = tf.add(tf.matmul(x_flat, weights['hidden_layer']), biases['hidden_layer'])
layer_1 = tf.nn.relu(layer_1)
# Output layer with linear activation
logits = tf.matmul(layer_1, weights['out']) + biases['out']

*So we're putting RELUs in between layers with weights and biases in them to allow for more complexity. Here we have two layers sandwiching a ReLU.*

In [10]:
# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits, y))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cost)

In [21]:
# Initializing the variables
init = tf.initialize_all_variables()


# Launch the graph
with tf.Session() as sess:
    sess.run(init)
    # Training cycle
    for epoch in range(training_epochs):
        total_batch = int(mnist.train.num_examples/batch_size)
        # Loop over all batches
        for i in range(total_batch):
            batch_x, batch_y = mnist.train.next_batch(batch_size)
            # Run optimization op (backprop) and cost op (to get loss value)
            sess.run(optimizer, feed_dict={x: batch_x, y: batch_y})

InvalidArgumentError: You must feed a value for placeholder tensor 'Placeholder' with dtype float
	 [[Node: Placeholder = Placeholder[dtype=DT_FLOAT, shape=[], _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
Caused by op 'Placeholder', defined at:
  File "/Users/jessica/anaconda/lib/python3.5/runpy.py", line 170, in _run_module_as_main
    "__main__", mod_spec)
  File "/Users/jessica/anaconda/lib/python3.5/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/jessica/anaconda/lib/python3.5/site-packages/ipykernel/__main__.py", line 3, in <module>
    app.launch_new_instance()
  File "/Users/jessica/anaconda/lib/python3.5/site-packages/traitlets/config/application.py", line 596, in launch_instance
    app.start()
  File "/Users/jessica/anaconda/lib/python3.5/site-packages/ipykernel/kernelapp.py", line 442, in start
    ioloop.IOLoop.instance().start()
  File "/Users/jessica/anaconda/lib/python3.5/site-packages/zmq/eventloop/ioloop.py", line 162, in start
    super(ZMQIOLoop, self).start()
  File "/Users/jessica/anaconda/lib/python3.5/site-packages/tornado/ioloop.py", line 883, in start
    handler_func(fd_obj, events)
  File "/Users/jessica/anaconda/lib/python3.5/site-packages/tornado/stack_context.py", line 275, in null_wrapper
    return fn(*args, **kwargs)
  File "/Users/jessica/anaconda/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py", line 440, in _handle_events
    self._handle_recv()
  File "/Users/jessica/anaconda/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py", line 472, in _handle_recv
    self._run_callback(callback, msg)
  File "/Users/jessica/anaconda/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py", line 414, in _run_callback
    callback(*args, **kwargs)
  File "/Users/jessica/anaconda/lib/python3.5/site-packages/tornado/stack_context.py", line 275, in null_wrapper
    return fn(*args, **kwargs)
  File "/Users/jessica/anaconda/lib/python3.5/site-packages/ipykernel/kernelbase.py", line 276, in dispatcher
    return self.dispatch_shell(stream, msg)
  File "/Users/jessica/anaconda/lib/python3.5/site-packages/ipykernel/kernelbase.py", line 228, in dispatch_shell
    handler(stream, idents, msg)
  File "/Users/jessica/anaconda/lib/python3.5/site-packages/ipykernel/kernelbase.py", line 391, in execute_request
    user_expressions, allow_stdin)
  File "/Users/jessica/anaconda/lib/python3.5/site-packages/ipykernel/ipkernel.py", line 199, in do_execute
    shell.run_cell(code, store_history=store_history, silent=silent)
  File "/Users/jessica/anaconda/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2723, in run_cell
    interactivity=interactivity, compiler=compiler, result=result)
  File "/Users/jessica/anaconda/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2825, in run_ast_nodes
    if self.run_code(code, result):
  File "/Users/jessica/anaconda/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2885, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-7-faf4d67d540e>", line 2, in <module>
    x = tf.placeholder("float", [None, 28, 28, 1])
  File "/Users/jessica/anaconda/lib/python3.5/site-packages/tensorflow/python/ops/array_ops.py", line 895, in placeholder
    name=name)
  File "/Users/jessica/anaconda/lib/python3.5/site-packages/tensorflow/python/ops/gen_array_ops.py", line 1238, in _placeholder
    name=name)
  File "/Users/jessica/anaconda/lib/python3.5/site-packages/tensorflow/python/ops/op_def_library.py", line 704, in apply_op
    op_def=op_def)
  File "/Users/jessica/anaconda/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2260, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/Users/jessica/anaconda/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1230, in __init__
    self._traceback = _extract_stack()


In [23]:
'''
A Multilayer Perceptron implementation example using TensorFlow library.
This example is using the MNIST database of handwritten digits
(http://yann.lecun.com/exdb/mnist/)

Author: Aymeric Damien
Project: https://github.com/aymericdamien/TensorFlow-Examples/
'''

from __future__ import print_function

# Import MNIST data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)

import tensorflow as tf

# Parameters
learning_rate = 0.001
training_epochs = 15
batch_size = 100
display_step = 1

# Network Parameters
n_hidden_1 = 256 # 1st layer number of features
n_hidden_2 = 256 # 2nd layer number of features
n_input = 784 # MNIST data input (img shape: 28*28)
n_classes = 10 # MNIST total classes (0-9 digits)

# tf Graph input
x = tf.placeholder("float", [None, n_input])
y = tf.placeholder("float", [None, n_classes])


# Create model
def multilayer_perceptron(x, weights, biases):
    # Hidden layer with RELU activation
    layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
    layer_1 = tf.nn.relu(layer_1)
    # Hidden layer with RELU activation
    layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])
    layer_2 = tf.nn.relu(layer_2)
    # Output layer with linear activation
    out_layer = tf.matmul(layer_2, weights['out']) + biases['out']
    return out_layer

# Store layers weight & bias
weights = {
    'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])),
    'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])),
    'out': tf.Variable(tf.random_normal([n_hidden_2, n_classes]))
}
biases = {
    'b1': tf.Variable(tf.random_normal([n_hidden_1])),
    'b2': tf.Variable(tf.random_normal([n_hidden_2])),
    'out': tf.Variable(tf.random_normal([n_classes]))
}

# Construct model
pred = multilayer_perceptron(x, weights, biases)

# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred, y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

# Initializing the variables
init = tf.initialize_all_variables()

# Launch the graph
with tf.Session() as sess:
    sess.run(init)

    # Training cycle
    for epoch in range(training_epochs):
        avg_cost = 0.
        total_batch = int(mnist.train.num_examples/batch_size)
        # Loop over all batches
        for i in range(total_batch):
            batch_x, batch_y = mnist.train.next_batch(batch_size)
            # Run optimization op (backprop) and cost op (to get loss value)
            _, c = sess.run([optimizer, cost], feed_dict={x: batch_x,
                                                          y: batch_y})
            # Compute average loss
            avg_cost += c / total_batch
        # Display logs per epoch step
        if epoch % display_step == 0:
            print("Epoch:", '%04d' % (epoch+1), "cost=", \
                "{:.9f}".format(avg_cost))
    print("Optimization Finished!")

    # Test model
    correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
    # Calculate accuracy
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
    print("Accuracy:", accuracy.eval({x: mnist.test.images, y: mnist.test.labels}))


Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Extracting /tmp/data/train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Extracting /tmp/data/train-labels-idx1-ubyte.gz
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting /tmp/data/t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting /tmp/data/t10k-labels-idx1-ubyte.gz
Epoch: 0001 cost= 168.469008747
Epoch: 0002 cost= 43.013559784
Epoch: 0003 cost= 27.156405791
Epoch: 0004 cost= 18.927629952
Epoch: 0005 cost= 13.789107725
Epoch: 0006 cost= 10.074024224
Epoch: 0007 cost= 7.566845441
Epoch: 0008 cost= 5.788042164
Epoch: 0009 cost= 4.283909895
Epoch: 0010 cost= 3.292516946
Epoch: 0011 cost= 2.470400692
Epoch: 0012 cost= 1.912726223
Epoch: 0013 cost= 1.389478198
Epoch: 0014 cost= 1.238731926
Epoch: 0015 cost= 0.942797896
Optimization Finished!
Accuracy: 0.9479


## Dropout

Dropout is a regularization technique for reducing overfitting. The technique temporarily drops units (Artificial Neurons) from the network, along with all its incoming and outgoing connections as shown in Figure 1.
*Presumably it then drops those units if it obtains better results when dropping those units.*

`tf.nn.dropout()`


In [None]:
keep_prob = tf.placeholder(tf.float32) # probability to keep units

hidden_layer = tf.add(tf.matmul(features, weights[0]), biases[0])
hidden_layer = tf.nn.relu(hidden_layer)
hidden_layer = tf.nn.dropout(hidden_layer, keep_prob)

logits = tf.add(tf.matmul(hidden_layer, weights[1]), biases[1])

The first parameter, hidden_layer, is the tensor that is regularized using dropout.

The second parameter, keep_prob, is the probability of keeping (i.e. not dropping) any given unit.

keep_prob allows you to adjust the number of units to drop. In order to compensate for dropped units, tf.nn.dropout() multiplies all units that are kept (i.e. not dropped) by 1/keep_prob.

During training, a good starting value for keep_prob is 0.5.

During testing, use a keep_prob value of 1.0 to keep all units and maximize the power of the model.

Added: 
* `hidden_layer = tf.nn.dropout(hidden_layer, keep_prob)`
and 
* `feed_dict` portion of
`output = sess.run(logits, feed_dict={keep_prob: 0.5})`

In [None]:
# Solution is available in the other "solution.py" tab
import tensorflow as tf


def run():
    output = None
    hidden_layer_weights = [
        [0.1, 0.2, 0.4],
        [0.4, 0.6, 0.6],
        [0.5, 0.9, 0.1],
        [0.8, 0.2, 0.8]]
    out_weights = [
        [0.1, 0.6],
        [0.2, 0.1],
        [0.7, 0.9]]

    # Weights and biases
    weights = [
        tf.Variable(hidden_layer_weights),
        tf.Variable(out_weights)]
    biases = [
        tf.Variable(tf.zeros(3)),
        tf.Variable(tf.zeros(2))]
        
    keep_prob = tf.placeholder(tf.float32)

    # Input
    features = tf.Variable([[0.0, 2.0, 3.0, 4.0], [0.1, 0.2, 0.3, 0.4], [11.0, 12.0, 13.0, 14.0]])

    # Model
    hidden_layer = tf.matmul(features, weights[0]) + biases[0]
    hidden_layer = tf.nn.relu(hidden_layer)
    # TODO: Add dropout
    hidden_layer = tf.nn.dropout(hidden_layer, keep_prob)
    
    logits = tf.matmul(hidden_layer, weights[1]) + biases[1]

    # Calculate logits
    with tf.Session() as sess:
        sess.run(tf.initialize_all_variables())
        output = sess.run(logits, feed_dict={keep_prob: 0.5})

    return output


## Convolution layer
* `tf.nn.conv2d()`
    * Computes convolution. TensorFlow uses a stride for each input dimension, [batch, input_height, input_width, input_channels].
* `tf.nn.bias_add()`
    * adds a 1-d bias to the last dimension in a matrix.
    


You'll focus on changing input_height and input_width while setting batch and input_channels to 1. The input_height and input_width strides are for striding the filter over input. In the example code, I'm using a stride of 2 with 5x5 filter over input.



In [None]:
# Output depth
k_output = 64

# Image Properties
image_width = 10
image_height = 10
color_channels = 3

# Convolution filter
filter_size_width = 5
filter_size_height = 5

# Input/Image
input = tf.placeholder(
    tf.float32,
    shape=[None, image_width, image_height, color_channels])

# Weight and bias
weight = tf.Variable(tf.truncated_normal(
    [filter_size_width, filter_size_height, color_channels, k_output]))
bias = tf.Variable(tf.zeros(k_output))

# Apply Convolution
conv_layer = tf.nn.conv2d(input, weight, strides=[1, 2, 2, 1], padding='SAME')
# Add bias
conv_layer = tf.nn.bias_add(conv_layer, bias)
# Apply activation function
conv_layer = tf.nn.relu(conv_layer)

## Max Pooling

`tf.nn.max_pool()`

The image above is an example of max pooling with a 2x2 filter and stride of 2. The four 2x2 colors represent each time the filter was applied to find the maximum value.

* **Benefits of max pooling**: reduces the size of the input, and allow the neural network to focus on only the most important elements. 
* **Method**: Max pooling does this by only retaining the maximum value for each filtered area, and removing the remaining values.



In [None]:
conv_layer = tf.nn.conv2d(input, weight, strides=[1, 2, 2, 1], padding='SAME')
conv_layer = tf.nn.bias_add(conv_layer, bias)
conv_layer = tf.nn.relu(conv_layer)
# Apply Max Pooling
conv_layer = tf.nn.max_pool(
    conv_layer,
    ksize=[1, 2, 2, 1],
    strides=[1, 2, 2, 1],
    padding='SAME')

The tf.nn.max_pool() function performs max pooling with the ksize parameter as the size of the filter and the strides parameter as the length of the stride. 2x2 filters with a stride of 2x2 are common in practice.

The ksize and strides parameters are structured as 4-element lists, with each element corresponding to a dimension of the input tensor ([batch, height, width, channels]). For both ksize and strides, the batch and channel dimensions are typically set to 1.