<a href="https://colab.research.google.com/github/kilos11/Tensoflow-by-Ricardo-Solinzki/blob/main/Introducing_Neural_Networks_and_Deep_Learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
##From Neurons to Perceptrons##
'''
Neurons:
» A neuron receives one or more incoming signals and produces one outgo￾ing signal.
» A neuron’s output can serve as the input of another neuron.
» Every neuron has a threshold, and the neuron won’t produce output until its
electricity exceeds the threshold.

Perceptron:
» A perceptron is a linear neuron with a single input and a single output.
» Like a neuron, a perceptron receives multiple inputs and produces one output. But
a perceptron’s inputs are provided as numeric values instead of electrical pulses.
Similarly, the perceptron’s threshold value is represented by a number. If the sum
of the inputs exceeds the threshold, the perceptron’s output will be 1. If the sum
of the signals falls below the threshold, the output will be 0.
For example, suppose that x0 is set to 0.5, x1
 is set to 1.5, x2 is set to 2.5, and x3 is
set to -1.0. The sum of the signals is 3.5. If the perceptron’s threshold value is 3.0,
the perceptron will produce an output of 1. If the threshold value is 4.0, the per￾ceptron will produce an output of 0.'''

##Improving the Model##
'''
» Each incoming signal is assigned a weight that indicates its influence.
» Instead of a threshold value, a constant called a bias is added to the incoming
signals.
» The sum of weighted inputs is passed to an activation function that determines
the output.
These changes make neural networks suitable for machine learning. Modern
developers refer to the elements of these networks as nodes instead of perceptrons.

Weights:
To determine the total effect of the inputs, a node multiplies each input by its
weight and adds the products together. Then it compares the sum to its threshold.
If the sum is greater than the threshold, the node produces an output value equal
to 1. If not, the output value is 0.
Mathematically, weights are denoted as wi
, where i represents the weight of the
ith input.
Weights play a vital role in machine learning because they enable an application
to update the neural network’s behavior. As an application performs training, it
updates the weights to improve the model.

Bias:
In machine learning, bias is a constant that is added to the weighted sum of the
inputs. Bias is used to make the model more resilient to noise and outliers.
The bias receives a weight just like every other input. For this reason, it makes sense
to set the bias’s value to 1, which is why the lowest node on the left is given as +1.

Activation functions:
The unit step function is simple to understand, but it’s not practical for machine
learning. Computer scientists have devised many more suitable functions for pro￾ducing a
perceptron’s output, and they’re called activation functions.
A node’s activation function accepts the weighted sum of the node’s inputs and
produces a single output value. In TensorFlow, an activation function accepts a
tensor of values and returns a tensor containing output values.
I like to divide these functions into two categories: rectifiers and classifiers. The
distinction is simple: If a node’s output identifies a category, set its activation
function to a classifier. Otherwise, set the node’s activation function to a rectifier.

Activation Functions:
Activation Function              Description
tf.nn.relu(input, name=None)     Returns the input value if positive, returns 0 otherwise
tf.nn.relu6(input, name=None)    Returns the input value if positive, up to a maximum of 6. Returns
0 otherwise
tf.nn.crelu(input, name=None)    Returns a concatenated tensor that separates the positive and
negative portions of the input
tf.nn.elu(input, name=None)      Returns the input value if positive, returns the exponential of the
input otherwise
tf.nn.sigmoid(input, name=None)  Returns 1/(1 + exp(-x))
tf.nn.tanh(input, name=None)     Returns tanh(x)
tf.nn.softsign(input, name=None) Returns x/(abs(x) + 1)

Rectifier functions:
In an electrical circuit, a rectifier accepts an input signal and transmits an equal
output signal if the input is positive. If the input signal is negative, the rectifier
transmits an output of zero.
The rectified linear unit function, or ReLU, performs a similar operation. It returns
the input if it’s positive and returns 0 otherwise. Put another way, the ReLU
function returns the maximum of the input and 0.
In TensorFlow, applications can perform ReLU operations by calling tf.nn.relu.
The ELU in tf.nn.elu stands for Exponential Linear Unit. This activation function
returns the input value if it’s greater than zero. If the input is zero or less,
tf.nn.elu returns the exponential of the input minus one.
'''

##Layers and Deep Learning##
'''
Layers:
The columns of a neural network are referred to as layers, and for this reason,
neural networks are frequently called multilayer perceptrons (MLPs). Every
neural net has at least two layers.
The layers of a neural network have specific names.
The first layer, which pro￾vides input values, is called the input layer. The last layer, which provides output
values, is called the output layer. The layers between the input layer and output
layer are called hidden layers. Layers are numbered from left to right, starting
with 0.
A layer is considered dense or fully connected if each of its nodes is connected to
each node in the next layer.
Each node in a hidden layer is denoted hy
x
, where x identifies the number of the
layer and y identifies the index of the node in the layer. For example, h1
2 identifies
the second node in the third layer.
You can determine the output of each hidden node using the same methods
dis￾cussed in the “Improving the Model” section.

Deep learning:
As you add more hidden layers to a network, it becomes capable of more sophis￾ticated
detection and classification. When an application uses a network with
multiple hidden layers, it’s making use of deep learning.
Deep learning has proven effective in many applications. Two famous examples
include Google’s AlphaGo program, which uses deep learning to beat professional
Go players, and Google’s 2012 demonstration of an application that recognized cat
videos on YouTube.
Adding hidden layers to a network has two drawbacks. First, each hidden layer
increases the amount of time needed to train the network. Second,
each new hid￾den layer increases the chances of overfitting, which I discuss in the “Tuning the
Neural Network” section.'''

##Training with Backpropagation##
'''
The goal of training is to find the weights that bring y(xi
) as close as possible to
the observed data. Put another way, the goal is to minimize the difference between
y(xi
) and the observed data.
As discussed in Chapters 5 and 6, this difference is
called the loss, and one popular method of computing the loss is called the mean
squared error (MSE).
If you set y(xi
) equal to a simple line or polynomial, you can easily compute the loss
and pass its operation to an optimizer, such as the GradientDescentOptimizer.
Chapter 5 covers the different optimization algorithms and their corresponding
TensorFlow classes.
A neural network’s model is more complicated, so the loss isn’t as easy to com￾pute.
But in 1974, Paul Werbos was the first person to optimize the weights of a
neural network using a method called backpropagation. Researchers have devised
other algorithms for training neural networks since then,
but because of its sim￾plicity and speed, backpropagation remains the most popular method.
In essence, backpropagation extends the optimization algorithms from Chapter 5
to apply to neural networks. The general process involves six steps:
1. Initialize the network’s weights.
2. For the set of inputs xi
, compute y(xi
).
This computation is called forward propagation.
3. For the set of inputs xi
, determine the loss.
4. For each weight, compute the partial derivative of the loss with respect
to the weight.
5. Using the partial derivatives computed in Step 4, update each weight in
the network.
6. Return to Step 2 and continue until the partial derivatives of the loss
approach zero.
Backpropagation extends the chain rule to partial derivatives and derivatives
involving sums of functions. In this manner, the algorithm determines the partial
derivative of the loss with respect to each weight in the network.
Thankfully, you don’t need to worry about partial derivatives or the chain rule
because TensorFlow performs backpropagation automatically. But you do need to
create the optimizer that backpropagation will employ to update the network’s
weights.'''

##Tuning the Neural Network##
'''
» Input standardization: Preprocesses input data to statistically resemble
training data
» Weight initialization: Obtains suitable values for initial weights
» Batch normalization: Processes data before the activation function to reduce
the likelihood of saturation
» Regularization: Reduces the likelihood of overfitting
Most developers agree that neural networks require some measure of tuning, but
few agree on the best procedure. Rather than take sides, I focus on explaining how
you can perform operations in TensorFlow applications.

Input standardization:
A machine learning application should be able to analyze data it has never seen.
But even if incoming data is completely new, it should have the same mean and
standard deviation as the application’s training data. This consistency ensures
that the application won’t be confused from one data set to the next.
For this reason, developers frequently transform input data to set the mean equal
to 0 and the standard deviation equal to 1. This operation is called standardization,
and TensorFlow’s tf.nn package provides two functions that assist with stan￾dardization: moments and batch_normalization.
moments returns a tuple containing the mean and variance of the elements in a
tensor’s axis. Its signature is given as follows:'''

moments(x, axes, shift=None, name=None, keep_dims=False)

'''To set the mean and variance, assign x to the tensor to be analyzed and axes to an
array of integers that identify the tensor’s axes. If you set keep_dims to True, the
returned mean and variance will have the same dimensionality as the input tensor.
batch_normalization accepts a tensor’s mean and variance and standardizes the
tensor’s elements. Its signature is given as follows:'''

batch_normalization(x, mean, variance, offset, scale, variance_
epsilon, name=None)

'''The offset parameter adds a constant to each value in the tensor, and scale
multiplies each value by a constant. variance_epsilon identifies a value to be
added to the denominator to ensure that TensorFlow doesn’t divide by zero.
Applications frequently set offset to 0.0, scale to 1.0, and variance_epsilon to
0.0001.
For example, the following code calls moments to obtain the mean and variance of
a tensor. Then it calls batch_normalization to obtain a new tensor with stan￾dardized data:'''

#Regularization
'''
One of the most difficult tasks in machine learning involves finding the right
structure for a neural network. If you add too few nodes, your network will be too
simple to classify data accurately. This is called underfitting.
If you add too many nodes, your network will tailor itself specifically for your
training set and will be unsuitable for analyzing general data. This problem is
called overfitting, and it’s a serious issue in machine learning.
The process of updating a neural network (or other machine learning algorithm)
to analyze general data is called regularization. Researchers have devised many
methods for regularizing networks, and this section focuses on two:
» Dropout: Randomly removes nodes from the network
» L1/L2 regularization: Reduces weights by increasing the loss'''

#Dropout
'''
The dropout process randomly removes one or more nodes from a network. For
each node removed, dropout removes the node’s incoming and outgoing connec￾tions and their weights.
In TensorFlow, you can configure dropout for a neural network by adding a drop￾out layer.
Adding this layer involves calling the tf.nn.dropout function:'''

dropout(x, keep_prob, noise_shape=None, seed=None, name=None)

'''In this function, x is the tensor containing values from the preceding layer, and
keep_prob is a scalar with the same type as x. The function returns a tensor with
the same size as x.
dropout sets each of its output values to 0 or 1/keep_prob times the correspond￾ing input value.
 More precisely, dropout sets an output value to 0 with a proba￾bility of 1-keep_prob and sets
 the output value to 1/keep_prob times the input value with a probability of keep_prob.'''

 #L1/L2 regularization
 '''
 L1 and L2 regularization prevent overfitting by reducing the network’s weights.
Both methods increase the loss by a value that depends on two factors: the net￾work’s weights and a constant denoted λ.
L1 regularization increases the loss by λ multiplied by the absolute value of the
weight to be updated. Therefore, when the algorithm updates the weight w0
through backpropagation, it adds a value to the loss equal to λ|w0|.
L2 regularization increases the loss by λ/2 multiplied by the square of the weight to
be updated. Therefore, when the algorithm updates w0, it adds λ|w0|2/2 to the loss.
In both cases, the loss increases when the weights increase and decreases when
the weights decrease. Therefore, the regularization process tends to reduce non￾essential weights to zero, thereby simplifying the model and (hopefully) avoiding
overfitting.'''
To perform L1/L2 regularization in TensorFlow, you can call tf.contrib.layers.
l1_regularizer or tf.contrib.layers.l2_regularizer:
» l1_regularizer(lambda, scope=None): Returns a function that performs
L1 regularization
» l2_regularizer(lambda, scope=None): Returns a function that performs
L2 regularization
'''These functions return special functions called regularizers. After you’ve obtained
a regularizer, you can regularize a set of weights by calling tf.contrib.layers.
apply_regularization:
apply_regularization(regularizer, weights_list=None)
Many TensorFlow functions accept regularizers as arguments. One important
function is tf.contrib.layers.fully_connected, which I discuss in the
“Improving the Deep Learning Process” section.'''






# *Implementing Deep Learning*

In [None]:
# Import the TensorFlow library
import tensorflow as tf

# Read MNIST data
dataset = learn.datasets.mnist.read_data_sets('MNIST-data', one_hot=True)

# Create placeholders for MNIST images and labels
img_holder = tf.placeholder(tf.float32, [None, 784])
lbl_holder = tf.placeholder(tf.float32, [None, 10])

# Set the number of hidden nodes and output nodes
hid_nodes = 200
out_nodes = 10

# Define weights and biases for each layer
w0 = tf.Variable(tf.random_normal([784, hid_nodes]))
w1 = tf.Variable(tf.random_normal([hid_nodes, hid_nodes]))
w2 = tf.Variable(tf.random_normal([hid_nodes, hid_nodes]))
w3 = tf.Variable(tf.random_normal([hid_nodes, out_nodes]))

b0 = tf.Variable(tf.random_normal([hid_nodes]))
b1 = tf.Variable(tf.random_normal([hid_nodes]))
b2 = tf.Variable(tf.random_normal([hid_nodes]))
b3 = tf.Variable(tf.random_normal([out_nodes]))

# Create layers using the defined weights and biases
layer_1 = tf.add(tf.matmul(img_holder, w0), b0)
layer_1 = tf.nn.relu(layer_1)
layer_2 = tf.add(tf.matmul(layer_1, w1), b1)
layer_2 = tf.nn.relu(layer_2)
layer_3 = tf.add(tf.matmul(layer_2, w2), b2)
layer_3 = tf.nn.relu(layer_3)
out_layer = tf.matmul(layer_3, w3) + b3

# Compute the loss using softmax cross-entropy
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=out_layer, labels=lbl_holder))

# Create an optimizer (Adam optimizer) to minimize the loss
learning_rate = 0.01
num_epochs = 15
batch_size = 100
num_batches = int(dataset.train.num_examples / batch_size)

optimizer = tf.train.AdamOptimizer(learning_rate).minimize(loss)

# Initialize variables
init = tf.global_variables_initializer()

# Launch a TensorFlow session
with tf.Session() as sess:
    sess.run(init)

    # Loop over epochs
    for epoch in range(num_epochs):

        # Loop over batches
        for batch in range(num_batches):
            img_batch, lbl_batch = dataset.train.next_batch(batch_size)
            sess.run(optimizer, feed_dict={img_holder: img_batch, lbl_holder: lbl_batch})

    # Determine the success rate
    prediction = tf.equal(tf.argmax(out_layer, 1), tf.argmax(lbl_holder, 1))
    success = tf.reduce_mean(tf.cast(prediction, tf.float32))
    print('Success rate: ', sess.run(success, feed_dict={img_holder: dataset.test.images, lbl_holder: dataset.test.labels}))


'''In this code:

The TensorFlow library is imported.
The MNIST data is read and stored in the dataset variable.
Placeholders are created for the MNIST images (img_holder) and labels (lbl_holder).
The number of hidden nodes (hid_nodes) and output nodes (out_nodes) is specified.
Weights and biases are defined for each layer.
Layers are created using the defined weights and biases, with ReLU activation applied to each layer except the output layer.
The loss is computed using softmax cross-entropy.
An Adam optimizer is created with a specified learning rate.
Variables are initialized.
A TensorFlow session is launched.
The model is trained for a specified number of epochs and batches.
The success rate is determined by comparing the predicted labels with the true labels, and the result is printed.
'''