### Stacking Transformations

In [None]:
'''
Stacking Transformations

Q1. What does stacking transformations mean in deep learning?
Ans. Stacking transformations means applying many small transformations one after another, where each layer slightly changes the input.
'''
# Example
# Input â†’ Layer 1 (small change) â†’ Layer 2 (small change) â†’ Layer 3 â†’ Output


'''
Q2. Why is stacking many small transformations more powerful than one big transformation?
Ans. Because many small changes can gradually reshape space into complex forms that a single transformation cannot achieve.
'''
# Example
# One stretch â†’ small effect
# Stretch + rotate + shear + stretch â†’ very different final shape


'''
Q3. What is the visual intuition behind stacking transformations?
Ans. Space behaves like a soft rubber sheet that gets bent, stretched, and twisted repeatedly by each layer.
'''
# Example
# Layer 1: stretch space
# Layer 2: rotate stretched space
# Layer 3: shear rotated space


'''
Q4. How are stacked transformations represented conceptually using matrices?
Ans. Each layer has a matrix, and stacking layers corresponds to multiplying these matrices in sequence.
'''
# Example
# v â†’ A1 â†’ A2 â†’ A3
# Equivalent to: A3 Ã— A2 Ã— A1 Ã— v


'''
Q5. Why does depth help neural networks learn complex patterns?
Ans. Because deeper networks can create rich, hierarchical representations by repeatedly transforming feature space.
'''
# Example
# Early layers: edges
# Middle layers: shapes
# Deep layers: objects or meaning


'''
Q6. What happens if we stack only linear transformations?
Ans. All linear transformations collapse into a single linear transformation, limiting expressiveness.
'''
# Example
# A Ã— B Ã— C = D (still linear)


q='''
Q7. Why are non-linearities necessary when stacking transformations?
Ans. Non-linearities prevent collapse into a single transformation and allow networks to model complex, non-linear relationships.
'''
# Example
# Linear â†’ Linear â†’ Linear = Linear
# Linear â†’ ReLU â†’ Linear = Non-linear power
'''


### Stacking Transformations

In [1]:
# Applying many small transformations one after another.

import numpy as np
x = np.array([[1],
              [1]])
W1 = np.array([[1.1, 0.0],
               [0.0, 1.0]])
W2 = np.array([[0.0, -1.0],
               [1.0,  0.0]])
y = W2 @ (W1 @ x)
print(y)

# ðŸ“Œ Each layer slightly changes the representation.

[[-1. ]
 [ 1.1]]


### Why many small > one big? (Transformation)

In [None]:
# Gradual reshaping allows complex geometry.
# One matrix â†’ limited shape change
# Many matrices â†’ progressive bending of space


### Stacking Layers and Matrix Multiplication

In [2]:
# Stacking layers = multiplying matrices in sequence.

# Apply A1, then A2, then A3
# y = A3 @ A2 @ A1 @ x

# ðŸ“Œ Read right to left.

### Why are NON-LINEARITIES important in Neural Networks ?

In [None]:
'''

1. Linear layers:
   â€¢ stretch, rotate, and shift data
   â€¢ always keep space flat

2. Stacking linear layers:
   Linear â†’ Linear â†’ Linear
   â€¢ collapses into ONE linear transformation
   â€¢ adds no extra learning power

3. Problem:
   â€¢ real-world data is curved and complex
   â€¢ linear models can only make straight-line decisions

4. Non-linearity (e.g., ReLU):
   â€¢ bends and folds the space
   â€¢ breaks linear collapse

5. With non-linearity:
   Linear â†’ ReLU â†’ Linear
   â€¢ becomes a truly non-linear function
   â€¢ allows curved decision boundaries

6. Result:
   â€¢ networks gain expressive power
   â€¢ deep learning becomes possible

Final rule:
Linear layers move space.
Non-linearities reshape space.
'''


### Bending Space = Complex Learning

In [None]:
bending_space = '''
Bending space = complex learning (visual intuition)

1. Matrix (linear layer):
   â€¢ stretches, rotates, shifts points
   â€¢ keeps space flat
   â€¢ straight lines stay straight

2. Stacking matrices:
   â€¢ collapses into one linear transform
   â€¢ no added learning power

3. ReLU (non-linearity):
   â€¢ cuts space at 0 (bends here)
   â€¢ flattens one side
   â€¢ introduces bends and corners

4. Effect of ReLU:
   â€¢ space is no longer smooth
   â€¢ creates folds in data
   â€¢ enables curved decision boundaries

5. Matrix + ReLU together:
   â€¢ matrix positions points
   â€¢ ReLU bends space
   â€¢ repetition increases complexity

Final intuition:
Matrices move space.
ReLU bends space.
Bending space = learning complex patterns.
'''


### Glance at ReLU

In [None]:
relu = '''
ReLU (explained very simply)

1. What ReLU does:
   â€¢ looks at a number
   â€¢ if number < 0 â†’ make it 0
   â€¢ if number â‰¥ 0 â†’ keep it

2. Simple rule:
   ReLU = "no negatives allowed"

3. Visual idea:
   â€¢ left side (negative) â†’ flattened to zero
   â€¢ right side (positive) â†’ stays as it is

4. Why this matters:
   â€¢ space is no longer smooth
   â€¢ a sharp corner appears at 0
   â€¢ this is called a "bend"

5. What this helps with:
   â€¢ straight lines can turn
   â€¢ simple shapes become complex
   â€¢ model can learn harder patterns

Final intuition:
Matrix = move points
ReLU = block negatives
Blocking negatives = bending space
'''