### What is Deep Learning?

Deep learning is a subset of machine learning that uses artificial neural networks to automatically learn patterns from large amounts of data. It is particularly useful for detecting complex relationships between inputs and outputs because it can model hierarchical and non-linear dependencies through multiple layers of neurons.

#### Tensorflow

An open-source deep learning framework developed by Google that provides tools for building, training, and deploying neural networks efficiently, often using GPUs or TPUs for acceleration.

#### Keras

A high-level API running on top of TensorFlow that simplifies building and training deep learning models with an intuitive and modular design.

### Build a simple Neural Network

In [1]:
import tensorflow as tf   # Importing the tensorflow library
import numpy as np  # Importing numpy for numerical operations

Let's create our own data in order to understand the working structure of neural networks in a much better sense -

In [2]:
xs = np.array([-1.0, 0.0, 1.0, 2.0, 3.0, 4.0], dtype=float)
ys = np.array([-3.0, -1.0, 1.0, 3.0, 5.0, 7.0], dtype=float)

Now we will build the simplest model possible with one single layer and one single neuron. We will use Keras SEQUENTIAL class using which we can define network as a sequence of LAYERS. A single DENSE LAYER can be used to build the network.

In [3]:
model = tf.keras.models.Sequential([
    tf.keras.Input(shape=(1,)),   # This defines the input shape of the incoming data. This is not a layer
    tf.keras.layers.Dense(units=1) # This is the first layer and has got 1 neuron defined by units=1
]
)

You have your model architecture build now, with 1 layer and 1 neuron. Now you have to compile the network and for this you need to specify two functions - loss and optimizer. For loss - we will use mean square error and for optimizer we will use stochastic gradient descent.

In [4]:
model.compile(optimizer='sgd', loss='mean_squared_error')

In [5]:
model.summary()

The above gives the model summary that is the final model you have with you, using which you will start the training. Let's see what these three column means - <br/>
<ol>
<li>Layer (type): indicates that the layer is a Dense (fully connected) layer. A Dense layer means that every neuron in the layer is connected to every neuron in the previous layer.</li>
<li>Output Shape: (None, 1) - <ul> <li>
None: represents the batch size, which is unspecified at this point. The batch size will be determined when you input data into the model.</li>
    <li>1: indicates that the layer has a single output unit (neuron). </li></ul></li>
<li>Param #: 2 - This is the total number of trainable parameters in the layer. In a Dense layer, the parameters include weights and biases.<ul><li>
Weights: There is one weight for each connection between neurons in the previous layer and neurons in the current layer. If the previous layer has one neuron, there will be 1 weight.</li>
    <li>Bias: Each neuron in the Dense layer has one bias parameter.</li>
<li>In this case, the layer has:
1 weight (from the single neuron in the previous layer to the single neuron in this layer).
    1 bias. So 1+1=2 parameters.</li></ul></li>
    </ol>

Now is the time to train the model - where it learns the relationship between x and y and is done using the fit() method of the model object.

In [6]:
model.fit(xs, ys, epochs = 500)  
#epochs means that it will loop for 500 times

Epoch 1/500
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 2s/step - loss: 12.5066
Epoch 2/500
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 57ms/step - loss: 10.0657
Epoch 3/500
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 34ms/step - loss: 8.1406
Epoch 4/500
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 34ms/step - loss: 6.6215
Epoch 5/500
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 32ms/step - loss: 5.4219
Epoch 6/500
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 41ms/step - loss: 4.4738
Epoch 7/500
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 35ms/step - loss: 3.7235
Epoch 8/500
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 33ms/step - loss: 3.1291
Epoch 9/500
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 29ms/step - loss: 2.6573
Epoch 10/500
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 33ms/step - loss: 2.2821
Epoch 11/

<keras.src.callbacks.history.History at 0x1b32f8f1c30>

Epoch 408/500
Epoch 409/500
Epoch 410/500
Epoch 411/500
Epoch 412/500
Epoch 413/500
Epoch 414/500
Epoch 415/500
Epoch 416/500
Epoch 417/500
Epoch 418/500
Epoch 419/500
Epoch 420/500
Epoch 421/500
Epoch 422/500
Epoch 423/500
Epoch 424/500
Epoch 425/500
Epoch 426/500
Epoch 427/500
Epoch 428/500
Epoch 429/500
Epoch 430/500
Epoch 431/500
Epoch 432/500
Epoch 433/500
Epoch 434/500
Epoch 435/500
Epoch 436/500
Epoch 437/500
Epoch 438/500
Epoch 439/500
Epoch 440/500
Epoch 441/500
Epoch 442/500
Epoch 443/500
Epoch 444/500
Epoch 445/500
Epoch 446/500
Epoch 447/500
Epoch 448/500
Epoch 449/500
Epoch 450/500
Epoch 451/500
Epoch 452/500
Epoch 453/500
Epoch 454/500
Epoch 455/500
Epoch 456/500
Epoch 457/500
Epoch 458/500
Epoch 459/500
Epoch 460/500
Epoch 461/500
Epoch 462/500
Epoch 463/500
Epoch 464/500
Epoch 465/500
Epoch 466/500
Epoch 467/500
Epoch 468/500
Epoch 469/500
Epoch 470/500
Epoch 471/500
Epoch 472/500
Epoch 473/500
Epoch 474/500
Epoch 475/500
Epoch 476/500
Epoch 477/500
Epoch 478/500
Epoch 

<keras.callbacks.History at 0x146451fc6d0>

Now you have the model that has the knowledge of the relationship between xs and ys. Now we can use predict() method to predict for unknown xs values (that is unknown to the model)

In [7]:
new_x = np.array([12.0])
new_y = model.predict(new_x, verbose=0)
new_y

array([[22.977022]], dtype=float32)

In [8]:
new_y.item()

22.977022171020508

As per the numbers in xs and ys, it is showing the relationship of y=2x-1. However we got a little below 23 and not 23 exactly. This is because neural network deals with probabilities. So given the data that we fed to the model, it calculated that there is a very high probability that the relationship between x and y is y=2x-1, but with only 6 data points we cant be very much sure about this. As a result, for 12 we got value very close to 23 but not necessarily 23. Neural networks are data hungry models.

### So what you did here? 
<ol>
    <li> You generated your own data </li>
    <li> You created the model architecture with Input layer for the incoming data and 1 neural network layer with 1 neuron </li>
    <li> Compiled the model with loss and optimizer </li>
    <li> Fitted the model with training data </li>
    <li> Predicted on the trained model </li>
    </ol>