<a href="https://colab.research.google.com/github/naagarjunsa/data-science-portfolio/blob/main/deep-learning/tensors_understanding.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Data Representation for Neural Networks

Tensor is a container for data - usually numbers, in the context of the tensor, the dimension is called an axis.
 
 Types of tensors

*   Scalars - 0 Dimension Tensors
      - contains only a single number
      - can check # of axes by .ndim
*   Vectors - 1 Dimensional Tensors
      - contains only one axis. 
      - axis is better known by rank
      - dimension can be also be entries in an axis
*   Matrices - 2 Dimensional Tensors
      - has two axes, rows and columns
      




# Attributes of Tensor

* Number of axes - rank - ndim

* Shape - tuple of integers to represent dimension across each axis.

* Data Type - dtype - the data type contained in the tensor

In [1]:
#Scalar 0D Tensors
import numpy as np

scalar = np.array(10)
print("Tensor : ", scalar, "\n Axes : ", scalar.ndim)
print("Shape : ", scalar.shape)
print("Data Type : ", scalar.dtype)


Tensor :  10 
 Axes :  0
Shape :  ()
Data Type :  int64


In [2]:
vector = np.array([1,2,3])

print("Tensor : ", vector, "\n Axes : ", vector.ndim)
print("Shape : ", vector.shape)
print("Data Type : ", vector.dtype)

Tensor :  [1 2 3] 
 Axes :  1
Shape :  (3,)
Data Type :  int64


In [3]:
mat = np.array([[1,2,3],
                [4,5,6],
                [6,7,8]])
print("Tensor : ", mat, "\n Axes : ", mat.ndim)
print("Shape : ", mat.shape)
print("Data Type : ", mat.dtype)

Tensor :  [[1 2 3]
 [4 5 6]
 [6 7 8]] 
 Axes :  2
Shape :  (3, 3)
Data Type :  int64


## Manipulating tensors in Numpy

* Tensor Slicing - you may select between any two indices along each tensor axis
* Sample Axes - axis 0 usually represents the number of samples - the rest of the axes represent that particular object
* Whenever we try to batch the training_data we do it with the axis 0.

In [4]:
from keras.datasets import mnist

(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


In [5]:
train_images.shape

(60000, 28, 28)

In [6]:
train_images[:,:,:].shape

(60000, 28, 28)

In [7]:
train_images[:256,:,:].shape

(256, 28, 28)

# Types of Tensor Data : 

- Vector data — 2D tensors of shape (samples, features)
each single data point can be encoded
as a vector
-  Timeseries data or sequence data — 3D tensors of shape (samples, timesteps,
features) 
each sample is a journey of one data point features across many timesteps.
each single data point can be encoded as a matrix.
- Images — 4D tensors of shape (samples,
 height, width, channels)
each single data point can be encoded as a 3d tensor.
- Video — 5D tensors of shape
(samples, frames, height, width, channels)

In [9]:
#element wise parallel operations make numpy fast
x = np.array([1,2,3])
y = np.array([3,4,5])

z = x + y
z

array([4, 6, 8])

In [10]:
np.maximum(z, 0)

array([4, 6, 8])

In [13]:
#broadcasting is increasing the axis and replicating the existing data in other
#axis to match the bigger operand 

x = np.random.random((3,3,3))
y = np.random.random((3,3))

print(x, "\n\n", y)

[[[0.49168733 0.53155657 0.11228104]
  [0.26030245 0.0723359  0.64364427]
  [0.96973348 0.16385194 0.9976055 ]]

 [[0.53339357 0.24990948 0.16998596]
  [0.8718506  0.2837201  0.31292204]
  [0.72612729 0.67640523 0.51877987]]

 [[0.90737556 0.87130895 0.0829014 ]
  [0.96451091 0.79508907 0.26653846]
  [0.6450805  0.32243447 0.67352638]]] 

 [[0.46095753 0.16859737 0.62930225]
 [0.77753709 0.51950907 0.84503217]
 [0.80048838 0.45346599 0.53850086]]


In [14]:
z = x + y
z

array([[[0.95264486, 0.70015394, 0.74158329],
        [1.03783955, 0.59184497, 1.48867645],
        [1.77022186, 0.61731794, 1.53610636]],

       [[0.99435109, 0.41850685, 0.79928821],
        [1.64938769, 0.80322917, 1.15795421],
        [1.52661567, 1.12987122, 1.05728073]],

       [[1.36833308, 1.03990632, 0.71220364],
        [1.742048  , 1.31459813, 1.11157063],
        [1.44556888, 0.77590046, 1.21202724]]])

In [15]:
print(x.shape, y.shape, z.shape)

(3, 3, 3) (3, 3) (3, 3, 3)


In [22]:
#dot product is used a lot in calculating the sum of the node before activation
x = np.random.random((1,2))
y = np.random.random((2,3))
z = np.dot(x, y)

# the above thing works cause the last dim of shape of x and first dim of shape of y are same.
 
print(x.shape, y.shape, z.shape)

(1, 2) (2, 3) (1, 3)


Reshaping a tensor means rearranging its rows and columns to match a target shape.
Naturally, the reshaped tensor has the same total number of coefficients as the initial
tensor

In [24]:
#reshaping tensor is also used a lot to restructure the ip data to network

x = np.random.random((3,2))
x.shape

(3, 2)

In [25]:
x.reshape((6,1))

array([[0.39239632],
       [0.31907384],
       [0.3629386 ],
       [0.13648932],
       [0.09392549],
       [0.98249448]])